Research Cluster Applications

Research Computing offers a wide range of applications on the HPC Research Cluster. Below is the list of applications and their versions installed on our Red Hat Enterprise Linux (RHEL) 9.2 compute nodes. Cluster users may request additional applications to be installed and supported by the Research Computing staff, and research groups are also provided with shared storage suitable for installing and maintaining their own discipline specific applications.

Applications
Development
Libraries
MPI (Message Passing Interface)

Applications

abaqus
ABAQUS is used for both the modeling and analysis of mechanical components and assemblies (pre-processing) and visualizing the finite element analysis result.
versions available: 2021, 2022, 2024

abyss
ABySS, Assembly by Short Sequences, is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
versions available: 2.2.5, 2.3.7

admixture
ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.
versions available: 1.3.0

alphafastppi
AlphaFastPPi is a Python package designed to streamline large-scale protein-protein interaction analysis using AlphaFold-Multimer. For each protein combination tested, AlphaFastPPi will return a single model.
versions available: 0.1

alphafold
AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. The program is designed as a deep learning system.
versions available: 2.3.1, 3.0.0

anvio
Anvi’o is an open-source, community-driven analysis and visualization platform for ‘omics data. It brings together many aspects of today’s cutting-edge genomic, metagenomic, metatranscriptomic, pangenomic, and phylogenomic analysis practices to address a wide array of needs.
versions available: 8

aria
ARIA (Ambiguous Restraints for Iterative Assignment) is a software for automated NOE assignment and NMR structure calculation. It speeds up and automatizes the assignment process through the use of an iterative structure calculation scheme. Additionally, a refinement in explicit water improves the quality of the calculated structures, validation tests help spectroscopists to judge the quality of the final structures, and the support of the CCPN data model simplifies the exchange of information with other NMR software packages.
versions available: 2.3.2

artic
ARTIC is a pipeline and set of accompanying tools for working with viral nanopore sequencing data, generated from tiling amplicon schemes. It is designed to help run the artic bioinformatics protocols; for example the SARS-CoV-2 coronavirus protocol. There are 2 workflows baked into this pipeline, one which uses signal data (via nanopolish) and one that does not (via medaka).
versions available: 1.2.4

augustus
AUGUSTUS is a gene prediction program for eukaryotes. It can be used as an ab initio program, which means it bases its prediction purely on the sequence.
versions available: 3.4.0, 3.5.0

aws-cli
The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. With minimal configuration, the AWS CLI enables you to start running commands that implement functionality equivalent to that provided by the browser-based AWS Management Console from the command prompt in your terminal program
versions available: 2.15.34

bamtools
BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.
versions available: 2.4.1, 2.5.2

bayesase
BayesASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating Allelic Imbalance (AI) and formally comparing levels of AI between conditions. AI indicates the presence of functional variation in cis regulatory regions. Detecting cis regulatory differences using AI is widespread, yet there is no formal statistical methodology that tests whether AI differs between conditions.
versions available: 21.1.13

beagle
Beagle is a software package for phasing genotypes and imputing ungenotyped markers. Beagle has improved memory and computational efficiency when analyzing large sequence data sets.
versions available: 5.4

bedtools2
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome.
versions available: 2.29.0, 2.31.1

blast
NCBI BLAST (Basic Local Alignment Search Tool) is a suite of programs for aligning query sequences against those present in a selected target database.
versions available: 2.11.0+, 2.15.0+

blatsuite
Blat produces two major classes of alignments: 1) at the DNA level between two sequences that are of 95% or greater identity, but which may include large inserts, 2) at the protein or translated DNA level between sequences that are of 80% or greater identity and may also include large inserts. (v37 / 64-bit)
versions available: 36, 37, 38

bowtie2
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
versions available: 2.5.1, 2.5.3

bracken
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample. Braken uses the taxonomy labels assigned by Kraken, a highly accurate metagenomics classification algorithm, to estimate the number of reads originating from each species present in a sample.
versions available: 2.9

braker
From demultiplexing to consensus for Nanopore amplicon data, Decona can process multiple samples in one line of code: Mixed samples containing multiple species from bulk and eDNA, Mixed amplicons in one barcode, Multiplexed barcodes, Multiple samples in one run, Outputs Medaka polished consensus sequences
versions available: 3.0.7

busco
BUSCO provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9.
versions available: 3.0.2, 4.1.4, 5.7.1

bwa
BWA (Burrows-Wheeler Aligner) is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
versions available: 0.7.17

bwa-mem2
The tool bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine.
versions available: 2.2.1

canu
Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing, such as the PacBio RS II/Sequel or Oxford Nanopore MinION.
versions available: 2.1.1, 2.2

cd-hit
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.
versions available: 4.8.1

checkm
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. CheckM also provides tools for identifying genome bins that are likely candidates for merging based on marker set compatibility, similarity in genomic characteristics, and proximity within a reference genome tree.
versions available: 1.2.2

clustal-omega
Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. In addition, the quality of alignments is superior to previous versions.
versions available: 1.2.4

comsol
COMSOL Multiphysics is a finite element analysis, solver, and simulation software package for various physics and engineering applications, especially coupled phenomena and multiphysics. The software facilitates conventional physics-based user interfaces and coupled systems of partial differential equations (PDEs).
versions available: 6.2

cufflinks
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
versions available: 2.2.1

ddocent
dDocent is simple bash wrapper to QC, assemble, map, and call SNPs from almost any kind of RAD sequencing. If you have a reference already, dDocent can be used to call SNPs from almost any type of NGS data set.
versions available: 2.9.8

decona
From demultiplexing to consensus for Nanopore amplicon data, Decona can process multiple samples in one line of code: Mixed samples containing multiple species from bulk and eDNA, Mixed amplicons in one barcode, Multiplexed barcodes, Multiple samples in one run, Outputs Medaka polished consensus sequences
versions available: 1.3.1

degenprime
DeGenPrime is a console-based high-quality PCR primer design tool that can utilize MSA formats and degenerate bases expanding the target range for a single primer set. Our software utilizes thermodynamic properties, filtration metrics, penalty scoring, and conserved region finding of any proposed primer. It has degeneracy, repeated k-mers, relative GC content, and temperature range filters. Minimal penalty scoring is included according to secondary structure self-dimerization metrics, GC clamping, tri- and tetra-loop hairpins, and internal repetition. DeGenPrime provides a universal and scalable primer design tool for the entire tree of life.
versions available: 0.1.2

diamond
DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are: Pairwise alignment of proteins and translated DNA at 100x-10,000x speed of BLAST; Frameshift alignments for long read analysis; Low resource requirements and suitable for running on standard desktops or laptops; Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.
versions available: 2.0.9, 2.1.9

dm_control
DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo physics.
versions available: 1.0.16

dram
DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases.
versions available: 1.5.0

drep
dRep is a python program for rapidly comparing large numbers of genomes. dRep can also ‘de-replicate’ a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set.
versions available: 3.5.0

emboss
EMBOSS (European Molecular Biology Open Software Suite) is a software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web.
versions available: 6.6.0

examl
This code implements the popular RAxML search algorithm for maximum likelihood based inference of phylogenetic trees. It uses a radically new MPI parallelization approach that yields improved parallel efficiency, in particular on partitioned multi-gene or whole-genome datasets.
versions available: 3.0.21, 3.0.22

exonerate
Exonerate is a generic tool for sequence alignment
versions available: 2.4.0

f5c
An optimised re-implementation of the index, call-methylation and eventalign modules in Nanopolish. Given a set of basecalled Nanopore reads and the raw signals, f5c call-methylation detects the methylated cytosine and f5c eventalign aligns raw nanopore signals (events) to the reference k-mers.
versions available: 1.3, 1.4

famsa
FAMSA is Fast and Accurate Multiple Sequence Alignment of large protein families. It first determines the longest common subsequences and has a unique way to compute gap costs. It proceeds progressively to add sequences into the alignments using a novel iterative approach.
versions available: 2.2.2

fastqc
FastQC is a quality control tool for high throughput sequence data. It takes a FastQ file and runs a series of tests on it to generate a comprehensive QC report. FastQC can be run either as an interactive GUI app, or in a non-interactive way (say as part of a pipeline) which will generate an HTML report for each file you process.
versions available: 0.11.9, 0.12.1

ffmpeg
FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge. It contains libavcodec, libavutil, libavformat, libavfilter, libavdevice, libswscale and libswresample which can be used by applications. As well as ffmpeg, ffserver, ffplay and ffprobe which can be used by end users for transcoding, streaming and playing.
versions available: 4.2.1, 6.1.1, 7.0.1

fluent
Ansys Fluent is a general-purpose computational fluid dynamics (CFD) software used to model fluid flow, heat and mass transfer, chemical reactions, and more. Also known for its efficient HPC scaling, large models can easily be solved in Fluent on multiple processors on either CPU or GPU.
versions available: 2022

flye
Flye is a de novo assembler for single-molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PacBio / ONT reads as input and outputs polished contigs. Flye also has a special mode for metagenome assembly.
versions available: 2.9.3

freyja
Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational ‘barcodes’ derived from the UShER global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.
versions available: 1.5.1

funannotate
Funannotate is a genome prediction, annotation, and comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accomodate larger genomes. The impetus for this software package was to be able to accurately and easily annotate a genome for submission to NCBI GenBank.
versions available: 1.8.17

gatk
GATK (Genome Analysis Toolkit) offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. GATK4 aims to bring together well-established tools from the GATK and Picard codebases under a streamlined framework, and to enable selected tools to be run in a massively parallel way on local clusters or in the cloud using Apache Spark. It also contains many newly developed tools not present in earlier releases of the toolkit.
versions available: 4.2.0.0, 4.6.0.0

genemark
GeneMark developed in 1993 was the first gene finding method recognized as an efficient and accurate tool for genome projects. GeneMark was used for annotation of the first completely sequenced bacteria, Haemophilus influenzae, and the first completely sequenced archaea, Methanococcus jannaschii.
versions available: 4.72

getorganelle
This toolkit assemblies organelle genome from genomic skimming data.
versions available: 1.7.7

gridlab-d
GridLAB-D is a new power distribution system simulation and analysis tool that provides valuable information to users who design and operate distribution systems, and to utilities that wish to take advantage of the latest energy technologies. It incorporates the most advanced modeling techniques, with high-performance algorithms to deliver the best in end-use modeling.
versions available: 5.2

gridpack
GridPACK is an open-source high-performance (HPC) package for simulation of large-scale electrical grids. Powered by distributed (parallel) computing and high-performance numerical solvers, GridPACK offers several applications forfast simulation of electrical transmission systems.
versions available: 3.4

gromacs
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions, many groups are also using it for research on non-biological systems, e.g. polymers.
versions available: 2023.4, 2023.4-cuda, 2023.4-mpi, 2023.4-mpi-cuda, 2024.2, 2024.2-cuda, 2024.2-mpi, 2024.2-mpi-cuda

gtdbtk
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB. It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes.
versions available: 2.4.0

guppy
Guppy is a data processing toolkit that contains the Oxford Nanopore Technologies’ basecalling algorithms, and several bioinformatic post-processing features. Early downstream analysis components such as barcoding/demultiplexing, adapter trimming and alignment are contained within Guppy.
versions available: 6.0.6, 6.0.6-cuda11.8, 6.3.4, 6.3.4-cuda11.8, 6.5.7, 6.5.7-cuda11.8

gurobi
Gurobi is a state-of-the-art optimization tool designed from the ground up to exploit modern architectures and multi-core processors, using the most advanced implementations of the latest optimization algorithms so you can solve your models faster and more reliably.
versions available: 10.0.3, 11.0.3

helics
HELICS is a multi-language, cross-platform library that enables different simulators to easily exchange data and stay synchronized in time. Scalable from two simulators on a laptop to 100,000+ running on supercomputers, the cloud, or a mix of these platforms.
versions available: 3.5.3

hhsuite
The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs). It contains HHsearch and HHblits among other programs and utilities. HHsearch takes as input a multiple sequence alignment (MSA) or profile HMM and searches a database of HMMs (e.g. PDB, Pfam, or InterPro) for homologous proteins.
versions available: 3.3.0

hicexplorer
HiCExplorer facilitates the creation of contact matrices, correction of contacts, TAD detection, A/B compartments, merging, reordering or chromosomes, conversion from different formats including cooler and detection of long-range contacts. Moreover, it allows the visualization of multiple contact matrices along with other types of data like genes, compartments, ChIP-seq coverage tracks (and in general any type of genomic scores), long range contacts and the visualization of viewpoints.
versions available: 3.7.3

hisat2
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). Based on GCSA (an extension of BWT for a graph), we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge.
versions available: 2.2.0, 2.2.1

hmmer
HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
versions available: 3.3.2, 3.4

homopolish
Homopolish is a genome polisher originally developed for Nanopore and subsequently extended for PacBio CLR. It generates a high-quality genome (>Q50) for virus, bacteria, and fungus. Nanopore/PacBio systematic errors are corrected by retreiving homologs from closely-related genomes and polished by an SVM.
versions available: 0.4.1

humann
HUMAnN is the HMP Unified Metabolic Analysis Network. HUMAnN is a method for efficiently and accurately profiling the abundance of microbial metabolic pathways and other molecular functions from metagenomic or metatranscriptomic sequencing data.
versions available: 3.8

hyphy
HyPhy (Hypothesis Testing using Phylogenies) is an open-source software package for the analysis of genetic sequences (in particular the inference of natural selection) using techniques in phylogenetics, molecular evolution, and machine learning. It features a rich scripting language for limitless customization of analyses.
versions available: 2.5.67

igdetective
IgDetective is a tool for annotation of variable (V), diversity (D), and joining (J) immunoglobulin genes in genomes. IGDetective takes a genome in FASTA format as an input and operates in three stages: 1) Identifying contigs containing IG gene matches (this step is performed using minimap2 and usually takes several minutes). 2) Detecting putative gene candidates using RSS matches. 3) Refining genes candidates using the iterative mode.
versions available: 1.1.0

imagej
ImageJ is a public domain Java image processing program inspired by NIH Image for the Macintosh. ImageJ can display, edit, analyze, process, save and print 8-bit, 16-bit and 32-bit images. ImageJ can read many image formats including TIFF, GIF, JPEG, BMP, DICOM, FITS and ‘raw’, and supports ‘stacks’ (a series of images that share a single window). ImageJ is multithreaded, so time-consuming operations such as image file reading can be performed in parallel with other operations.
versions available: 1.54

imagemagick
ImageMagick is a free, open-source software suite, used for editing and manipulating digital images. It can be used to create, edit, compose, or convert bitmap images, and supports a wide range of file formats, including JPEG, PNG, GIF, TIFF, and Ultra HDR.
versions available: 7.1.0

interproscan
InterPro is a database which integrates together predictive information about proteins’ function from a number of partner resources, giving an overview of the families that a protein belongs to and the domains and sites it contains.
versions available: 5.60-92.0, 5.67-99.0, 5.72-103.0

iphop
iPHoP stands for integrated Phage Host Prediction. It is an automated command-line pipeline for predicting host genus of novel bacteriophages and archaeoviruses based on their genome sequences.
versions available: 1.3.3

iqtree
A fast and effective stochastic algorithm to infer phylogenetic trees by maximum likelihood. IQ-TREE compares favorably to RAxML and PhyML in terms of likelihoods with similar computing time
versions available: 1.6.12, 2.1.2, 2.3.4

i-tasser
I-TASSER is an integrated package for protein structure and function predictions. For a given sequence, I-TASSER first identifies template proteins from the Protein Data Bank (PDB) by multiple threading techniques (LOMETS).
versions available: 5.1, 5.2

jags
JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind: 1) To have a cross-platform engine for the BUGS language, 2) To be extensible, allowing users to write their own functions, distributions and samplers, and 3) To be a platform for experimentation with ideas in Bayesian modelling.
versions available: 4.3.1, 4.3.2

jellyfish
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the ‘compare-and-swap’ CPU instruction to increase parallelism.
versions available: 2.3.0, 2.3.1

kraken2
Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
versions available: 2.1.2, 2.1.3

lammps
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Packages built: ASPHERE ATC AWPMD BOCS BODY CLASS2 COLLOID COLVARS COMPRESS CORESHELL DIFFRACTION DIPOLE DRUDE EFF EXTRA-COMPUTE EXTRA-DUMP EXTRA-FIX FEP GRANULAR H5MD KIM KSPACE MACHDYN MANIFOLD MANYBODY MC MEAM MGPT MISC MOFFF MOLECULE MOLFILE MPIIO OPT PERI PHONON POEMS PTM PYTHON QEQ QTB REAXFF REPLICA RIGID SHOCK SMTBQ SPH SPIN SRD TALLY UEF VORONOI
versions available: 02Aug23-atomistica, 02Aug23-cuda, 02Aug23-mpi, 02Aug23-plumed, 23Jun22-cuda, 23Jun22-mpi

lastz
LASTZ: A tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically.
versions available: 1.04.22

libradtran
libRadtran – library for radiative transfer – is a collection of C and Fortran functions and programs for calculation of solar and thermal radiation in the Earth’s atmosphere.
versions available: 2.0.6

liggghts
LIGGGHTS(R)-PUBLIC is an Open Source Discrete Element Method Particle Simulation Software based on LAMMPS. LIGGGHTS (R) stands for LAMMPS improved for general granular and granular heat transfer simulations. LIGGGHTS (R) aims to improve the capabilities of LAMMPS with the goal to apply it to industrial applications.
versions available: 3.8.0

longstitch
A genome assembly correction and scaffolding pipeline using long reads, consisting of up to three steps: 1) Tigmint cuts the draft assembly at potentially misassembled regions, 2) ntLink is then used to scaffold the corrected assembly, and 3) followed by ARKS for further scaffolding (optional extra step of scaffolding)
versions available: 1.0.5

ls-dyna
LS-DYNA is a general-purpose finite element program capable of simulating complex real world problems. The code’s origins lie in highly nonlinear, transient dynamic finite element analysis using explicit time integration.
versions available: 12.0.0, 13.0.0, 15.0.2

mafft
MAFFT is a Multiple alignment program for amino acid or nucleotide sequences. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
versions available: 7.487woe, 7.525woe

maker
MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values.
versions available: 3.01.03

masurca
The MaSuRCA (Maryland Super Read Cabog Assembler) genome assembly and analysis toolkit contains of MaSuRCA genome assembler, QuORUM error corrector for Illumina data, POLCA genome polishing software, Chromosome scaffolder, jellyfish mer counter, and MUMmer aligner. The MaSuRCA assembler combines the benefits of deBruijn graph and Overlap-Layout-Consensus assembly approaches. MaSuRCA supports hybrid assembly with short Illumina reads and long high error PacBio/MinION data.
versions available: 4.1.0, 4.1.1

mathematica
Mathematica is a software package which is ideal for communicating scientific ideas, whether this is visualization of a concept in an intro-level course, or creating a simulation of a new idea related to research.
versions available: 13.3.0, 14.1.0

matlab
MATLAB is a high-level language and interactive environment for numerical computation, visualization, and programming.
versions available: R2022b, R2023b, R2024b

mcr
The MATLAB Compiler Runtime is a standalone set of shared libraries that enables the execution of compiled MATLAB applications or components on computers that do not have MATLAB installed. When used together, MATLAB, MATLAB Compiler, and the MATLAB Runtime enable you to create and distribute numerical applications or software components quickly and securely.
versions available: R2018a, R2018b, R2019b

mega
The objective of the MEGA (Molecular Evolutionary Genetics Analysis) software has been to provide tools for exploring, discovering, and analyzing DNA and protein sequences from an evolutionary perspective. MEGA is designed to facilitate extensive sequence data analysis from an evolutionary perspective using a single program package. At the same time, the overlap between the methods implemented in MEGA and those in other existing evolutionary analysis programs has been consciously avoided. This is reflected in the exclusion of the maximum likelihood method (PHYLIP) and in the absence of extensive options for the maximum parsimony method (PAUP and MacClade.
versions available: 10.2.6, 11.0.13

megadock
MEGADOCK is an ultra-high-performance FFT-grid-based protein-protein docking for heterogeneous supercomputers that takes advantage of the massively parallel CUDA architechture of NVIDIA GPUs and multiple computation nodes.
versions available: 4.1.1, 4.1.1-mpi

megahit
MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly.
versions available: 1.2.9

megalodon
Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.
versions available: 2.5.0

meme
Multiple Em for Motif Elicitation. MEME discovers novel, ungapped motifs (recurring, fixed-length patterns) in your sequences. MEME splits variable-length patterns into two or more separate motifs.
versions available: 5.4.1, 5.5.6

mercat2
MerCat2 (‘Mer—Catenate2’) is a versatile, parallel, scalable and modular property software package for robustly analyzing features in omics data. Using massively parallel sequencing raw reads, assembled contigs, and protein sequences from any platform as input, MerCat2 performs k-mer counting of any length k, resulting in feature abundance counts tables, quality control reports, protein feature metrics, and graphical representation (i.e. principal component analysis (PCA)).
versions available: 1.4.1

merqury
Evaluate genome assemblies with k-mers and more. Often, genome assembly projects have illumina whole genome sequencing reads available for the assembled individual. The k-mer spectrum of this read set can be used for independently evaluating assembly quality without the need of a high quality reference. Merqury provides a set of tools for this purpose.
versions available: 1.3

metabat
MetaBAT: A robust statistical framework for reconstructing genomes from metagenomic data
versions available: 2.15, 2.17

metabolic
METABOLIC enables the prediction of metabolic and biogeochemical functional trait profiles to any given genome datasets. METABOLIC has two main implementations, which are METABOLIC-G and METABOLIC-C. METABOLIC-G.pl allows for generation of metabolic profiles and biogeochemical cycling diagrams of input genomes and does not require input of sequencing reads. METABOLIC-C.pl generates the same output as METABOLIC-G.pl, but as it allows for the input of metagenomic read data, it will generate information pertaining to community metabolism.
versions available: 4.0

metacerberus
MetaCerberus is a massively parallel, fast, low memory, scalable annotation tool for inference gene function across genomes to metagenomes/Metatranscriptomes. MetaCerberus provides an elusive HMM/HMMER-based tool at a rapid scale with low memory then DRAM, eggnog-mapper, and MicrobeAnnotator. It offers scalable gene elucidation to major public databases, including KEGG (KO), COGs, CAZy, FOAM, TIGRfam, Pfam, NFixDB and specific databases for viruses (e.g., VOGs and PHROGs) more with custom database options, from single genomes to metacommunities.
versions available: 1.4.0

metaphlan
MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
versions available: 4.1.0

metawrap
MetaWRAP aims to be an easy-to-use metagenomic wrapper suite that accomplishes the core tasks of metagenomic analysis from start to finish: read quality control, assembly, visualization, taxonomic profiling, extracting draft genomes (binning), and functional annotation.
versions available: 1.3.2

microbeannotator
MicrobeAnnotator uses an iterative approach to annotate microbial genomes (Bacteria, Archaea and Virus) starting from proteins predicted using your favorite ORF prediction tool, e.g. Prodigal. The iterative approach is composed of three or five main steps, depending on the flavor of MicrobeAnnotator you run.
versions available: 2.0.5

minialign

versions available: 0.6.0

miniasm
Miniasm is a very fast OLC-based *de novo* assembler for noisy long reads. It takes all-vs-all read self-mappings, typically by [minimap][minimap] as input and outputs an assembly graph in the [GFA][gfa] format.
versions available: 0.3

minimap2
Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database.
versions available: 2.18, 2.28

mira
MIRA is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads). It can be seen as a Swiss army knife of sequence assembly developed and used in the past 16 years to get assembly jobs done efficiently – and especially accurately.
versions available: 5rc2

mitobim
The MITObim procedure (mitochondrial baiting and iterative mapping) represents a highly efficient approach to assembling novel mitochondrial genomes of non-model organisms directly from total genomic DNA derived NGS reads. Labor intensive long-range PCR steps prior to sequencing are no longer required.
versions available: 1.9.1

mpp-dyna
LS-DYNA is a general-purpose finite element program capable of simulating complex real world problems. The code’s origins lie in highly nonlinear, transient dynamic finite element analysis using explicit time integration.
versions available: 12.0.0, 12.0.0-avx512, 13.0.0, 13.0.0-avx512, 15.0.2, 15.0.2-avx512

mrbayes
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
versions available: 3.2.7

msprime
msprime is a population genetics simulator of ancestry and DNA sequence evolution based on tskit. msprime can simulate ancestral histories for a sample of individuals, consistent with a given demography under a range of different models and evolutionary processes. It can also simulate mutations on a given ancestral history (which can be produced by msprime ancestry simulations or other programs supporting tskit) under a variety of different models of genome sequence evolution.
versions available: 1.3.1

multiqc
MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples. Use MultiQC to aggregate results from bioinformatics analyses across many samples into a single report MultiQC searches a given directory for analysis logs and compiles a HTML report. It’s a general use tool, perfect for summarising the output from numerous bioinformatics tools.
versions available: 1.21

mummer
MUMmer is a system for rapidly aligning entire genomes. The current version (release 4.x) can find all 20 base pair maximal exact matches between two bacterial genomes of ~5 million base pairs each in 20 seconds. MUMmer can also align incomplete genomes; it handles the 100s or 1000s of contigs from a shotgun sequencing project with ease, and will align them to another set of contigs or a genome, using the nucmer utility included with the system.
versions available: 4.0.1

namd
NAMD (2.14 x86_64 mpi) is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations.
versions available: 2.13-mcore, 2.13-mcore-cuda, 2.13-mpi, 2.14-mcore, 2.14-mcore-cuda, 2.14-mpi, 3.0b6-mcore, 3.0b6-mcore-cuda, 3.0b6-mpi

nanophase
nanophase is an easy-to-use pipeline to generate reference-quality MAGs using only Nanopore long reads (long-read-only strategy) or both Nanopore long and Illumina short reads (hybrid strategy) from complex metagenomes. Since nanophase v0.2.0, it also supports to generate reference-quality genomes from bacterial/archaeal isolates (long-read-only or hybrid strategy). If nanophase is interrupted, it will resume from the last completed stage.
versions available: 0.2.3

nanopolish
Software package for signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more.
versions available: 0.14.0

netlogo
NetLogo is a programmable modeling environment for simulating natural and social phenomena. NetLogo is particularly well suited for modeling complex systems developing over time.
versions available: 6.2.2, 6.4.0

nextstrain
Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. We provide a continually-updated view of publicly available data alongside powerful analytic and visualization tools for use by the community. Our goal is to aid epidemiological understanding and improve outbreak response.
versions available: 8.2.0

nf-core
Nextflow is an incredibly powerful and flexible workflow language. Nextflow lets you run nf-core pipelines on virtually any computing environment. nf-core pipelines adhere to strict guidelines – if one works, they all will.
versions available: 2.13.1, 3.3.2

node.js
Node.js is a JavaScript runtime built on Chrome’s V8 JavaScript engine. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. As an asynchronous event-driven JavaScript runtime, Node.js is designed to build scalable network applications.
versions available: 18.20.2, 20.12.2

openbabel
Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It’s an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
versions available: 3.1.1

openfoam
OpenFOAM is the free, open source CFD software released and developed primarily by the OpenFOAM Foundation. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to acoustics, solid mechanics and electromagnetics. We offer versions from both OpenCFD Ltd and the OpenFOAM Foundation.
versions available: 11, 12

openmm
OpenMM is a toolkit for molecular simulation. It can be used either as a stand-alone application for running simulations, or as a library you call from your own code. It provides a combination of extreme flexibility (through custom forces and integrators), openness, and high performance (especially on recent GPUs) that make it truly unique among simulation codes.
versions available: 8.1.2-cuda11.8

orp
The Oyster River Protocol for (eukaryotic) transcriptome assembly is an actively developed, evidenced based method for optimizing transcriptome assembly. The protocol assembles the transcriptome using a multi-kmer multi-assembler approach, then merges those assemblies into 1 final assembly. Version 2.3.3u1 is an update to ORP 2.3.3 based on Anaconda3-2023.03-1, along with the following updated components: trinity 2.15.1, salmon 1.10.1, spades 3.15.5, busco 5.1.3, rcorrector 1.0.5, samtools 1.17, cd-hit 4.8.1, diamond 2.1.6.
versions available: 2.3.3u1

orthofinder
OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics. It finds orthogroups and orthologs, infers rooted gene trees for all orthogroups and identifies all of the gene duplcation events in those gene trees.
versions available: 2.4.0, 2.5.5

pacbio
The PacBio tools distributed via Bioconda are pre-release versions, not necessarily ISO compliant, intended for Research Use Only and not for use in diagnostic procedures. This module includes several of the PacBio open source tools, including blasr, python-consensuscore, genomicconsensus, bam2fastx, bax2bam, isoseq, jasmine, recalladapters, trgt, isoseq3, lima, perl-yaml, pbmm2, pbpigeon, pbskera, pbsv, pbtk, pb-falcon, pb-dazzler, pb-assembly, pbccs, pbcore, pbcommand, pbcoretools, pbalign, pbbam, pbaa, pbcopper, pbfusion, pbmarkdup
versions available: 2024.05

packmol
PACKMOL creates an initial point for molecular dynamics simulations by packing molecules in defined regions of space. The packing guarantees that short range repulsive interactions do not disrupt the simulations. The great variety of types of spatial constraints that can be attributed to the molecules, or atoms within the molecules, makes it easy to create ordered systems, such as lamellar, spherical or tubular lipid layers.
versions available: 20.15.1

pairtools
pairtools is a simple and fast command-line framework to process sequencing data from a Hi-C experiment. pairtools process pair-end sequence alignments and perform the following operations: detect ligation junctions (a.k.a. Hi-C pairs) in aligned paired-end sequences of Hi-C DNA molecules, sort .pairs files for downstream analyses, detect, tag and remove PCR/optical duplicates, generate extensive statistics of Hi-C datasets, select Hi-C pairs given flexibly defined criteria, restore .sam alignments from Hi-C pairs
versions available: 1.0.3

paml
PAML (Phylogenetic Analysis by Maximum Likelihood) is a program package for model fitting and phylogenetic tree reconstruction using DNA and protein sequence data. The programs are written in ANSI C.
versions available: 4.10.7

pangolin
Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences.
versions available: 4

parallel
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input.
versions available: 20240422

paraview
ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques.
versions available: 5.12.1-mpi, 5.12.1-osmesa-mpi

parcels
The OceanParcels project develops Parcels (Probably A Really Computationally Efficient Lagrangian Simulator), a set of Python classes and methods to create customisable particle tracking simulations using output from Ocean Circulation models. Parcels can be used to track passive and active particulates such as water, plankton, plastic and fish.
versions available: 2.1.5, 3.0.3

pbsuite
Software for Long-Read Sequencing Data from PacBio. PBSuite is made up of 2 tools: PBJelly and PBHoney. PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp).
versions available: 15.8.24

pcangsd
PCAngsd is a framework for analyzing low-depth next-generation sequencing (NGS) data in heterogeneous/structured populations using principal component analysis (PCA). Population structure is inferred by estimating individual allele frequencies in an iterative approach using a truncated SVD model. The covariance matrix is estimated using the estimated individual allele frequencies as prior information for the unobserved genotypes in low-depth NGS data.
versions available: 1.2

phyg
Phylogenetic Graph (PhyG) is a multi-platform program designed to produce phylogenetic graphs from input data and graphs via heuristic searching of general phylogenetic graph space. The bio-informatics framework libraries of the broader Phylogenetic Haskell Analytic Network Engine (PHANE) project are the foundation upon which PhyG is constructed. PhyG offers vast functionality, including the optimization of unaligned sequences, and the ability to implement search strategies such as random addition sequence, swapping, and tree fusing.
versions available: 1.3.0

phylophlan
PhyloPhlAn is an integrated pipeline for large-scale phylogenetic profiling of genomes and metagenomes. PhyloPhlAn is an accurate, rapid, and easy-to-use method for large-scale microbial genome characterization and phylogenetic analysis at multiple levels of resolution. PhyloPhlAn can assign both genomes and metagenome-assembled genomes (MAGs) to species-level genome bins (SGBs). PhyloPhlAn can reconstruct strain-level phylogenies using clade-specific maximally informative phylogenetic markers, and can also scale to very-large phylogenies comprising >17,000 microbial species.
versions available: 3.1.1

picard
Picard is a set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats. Picard is implemented using the HTSJDK Java library HTSJDK to support accessing file formats that are commonly used for high-throughput sequencing data such as SAM and VCF.
versions available: 2.27.5, 3.2.0

plink
PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
versions available: 1.90, 2.00

plumed
PLUMED is a Command line tool to perform analysis on trajectories saved in most of the existing formats.
versions available: 2.8.4, 2.9.2

poy
POY is a phylogenetic analysis program that supports multiple kinds of data (e.g. morphology, nucleotides, genes and gene regions, chromosomes, whole genomes, etc). POY is particular in that it can perform true alignment and phylogeny inference (i.e. input sequences need not to be prealigned).
versions available: 5.1.2

prokka
Prokka is rapid prokaryotic genome annotation. Whole genome annotation is the process of identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files.
versions available: 1.14.6

prosplign
This module includes Splign. ProSplign is a global alignment tool developed by Dr. Boris Kiryutin. It produces accurate spliced alignments and computes alignments of distantly related proteins with low similarity. Extra afford is taken to locate frameshift positions. Splign is a utility for computing cDNA-to-Genomic, or spliced sequence alignments, which uses a compartmentization algorithm to identify possible gene duplications, and a refined alignment algorithm recognizing introns and splice signals.
versions available: 2.0.0

psmc
Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
versions available: 0.6.5

purge_haplotigs
A simple pipeline for reassigning primary contigs that should be labeled as haplotigs. Purge Haplotigs helps with curating heterozygous diploid genome assemblies from third-gen long-read sequencing.
versions available: 1.1.3

pytorch
PyTorch is a python package that provides two high-level features: Tensor computation (like numpy) with strong GPU acceleration, and Deep Neural Networks built on a tape-based autodiff system. Built with CUDA Toolkit 12.1.
versions available: 1.13.1-cuda11.7, 2.3.0-cuda12.1

pytransaln
Pytransaln can perform simple translation-guided nucleotide (codon) alignments, and screen for pseudogenes with frameshift indels or non-sense substitutions. Pytransaln can be used to perform alignment or simply report sequence statistics and flag potential pseudogenes. The intended use case is to screen and align collections of PCR-amplified coding sequences used for metabarcoding, e.g. the mitochondrial cytochrome c oxidase subunit I (mtCOI) gene fragment.
versions available: 0.2.1

qe
Quantum Espresso (QE) is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.
versions available: 7.1-intel-mpi, 7.3-intel-mpi

qiime2
QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.
versions available: amplicon-2024.10, amplicon-2024.2, metagenome-2024.10, pathogenome-2024.10, tiny-2024.10

quast
QUAST (Quality Assessment Tool for Genome Assemblies) evaluates genome/metagenome assemblies by computing various metrics. The current QUAST toolkit includes the general QUAST tool for genome assemblies, MetaQUAST, the extension for metagenomic datasets, QUAST-LG, the extension for large genomes (e.g., mammalians), and Icarus, the interactive visualizer for these tools.
versions available: 5.0.2

R
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Labs, by John Chambers and colleagues. R can be considered as a different implementation of S.
versions available: 4.2.2, 4.2.2-mpi, 4.3.3, 4.3.3-mpi

racon
Racon is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step. The goal of Racon is to generate genomic consensus which is of similar or better quality compared to the output generated by assembly methods which employ both error correction and consensus steps, while providing a speedup of several times compared to those methods.
versions available: 1.4.21, 1.5.0

ragtag
RagTag, the successor to RaGOO, is a command line tool for reference-guided genome assembly improvement. Currently, the two main features are misassembly correction and scaffolding. After correction and/or scaffolding, RagTag also provides utilities to update annotations or work with AGP files.
versions available: 1.1.1, 2.1.0

raxml
RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees.
versions available: 8.2.13, 8.2.13-mpi, 8.2.4, 8.2.4-mpi

reduce
Reduce is tool for adding and correcting hydrogens in PDB files
versions available: 4.14

repdenovo
REPdenovo is designed for constructing repeats directly from sequence reads. It based on the idea of frequent k-mer assembly. REPdenovo provides many functionalities, and can generate much longer repeats than existing tools.
versions available: 0.1.0

repeatmasker
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns).
versions available: 4.1.2, 4.1.6

repeatmodeler
RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.
versions available: 2.0.2, 2.0.5

repeatscout
The purpose of the RepeatScout software is to identify repeat family sequences from genomes where hand-curated repeat databases (a la RepBase update) are not available.
versions available: 1.0.6

rmblast
RMBlast is a RepeatMasker compatible version of the standard NCBI blastn program. The primary difference between this distribution and the NCBI distribution is the addition of a new program ‘rmblastn’ for use with RepeatMasker and RepeatModeler.
versions available: 2.11.0, 2.14.1

rnaquast
rnaQUAST is a tool for evaluating RNA-Seq assemblies using reference genome and gene database. In addition, rnaQUAST is also capable of estimating gene database coverage by raw reads and de novo quality assessment using third-party software.
versions available: 2.3.0

rsem
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels.
versions available: 1.3.3

rseqc
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, transcript level RNA integrity etc.
versions available: 5.0.3

rstudio
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
versions available: 2024.04

samtools
Samtools is a suite of programs for interacting with high-throughput sequencing data, allowing you to read/write/edit/index/view SAM/BAM/CRAM format. This module includes BCFtools, which is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.
versions available: 1.11, 1.19

sas
SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics.
versions available: 9.4

selscan
Selscan is a program to calculate EHH-based scans for positive selection in genomes. selscan currently implements EHH, iHS, XP-EHH, nSL, XP-nSL and iHH12. It should be run separately for each chromosome and population (or population pair for XP-EHH). selscan is ‘dumb’ with respect ancestral/derived coding and simply expects haplotype data to be coded 0/1. Unstandardized iHS/nSL scores are thus reported as log(iHH1/iHH0) based on the coding you have provided.
versions available: 1.3.0, 2.0.0

seqkit
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
versions available: 0.16.1, 2.9.0

seqtk
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
versions available: 1.3, 1.4

shortbred
ShortBRED (Short, Better Representative Extract Dataset) is a pipeline to take a set of protein sequences, reduce them to a set of unique identifying strings (‘markers’), and then search for these markers in metagenomic data and determine the presence and abundance of the protein families of interest.
versions available: 0.9.5

siesta
SIESTA, a first-principles materials simulation code using DFT, is both a method and its computer program implementation, to perform efficient electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids.
versions available: 4.0.2, 4.1.5

singularity
Singularity is a free, cross-platform and open-source computer program that performs operating-system-level virtualization also known as containerization. One of the main uses of Singularity is to bring containers and reproducibility to scientific computing and the high-performance computing (HPC) world. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data.
versions available: 4.1.4

slim
SLiM is an evolutionary simulation framework that combines a powerful engine for population genetic simulations with the capability of modeling arbitrarily complex evolutionary scenarios. Simulations are configured via the integrated Eidos scripting language that allows interactive control over practically every aspect of the simulated evolutionary scenarios
versions available: 3.7, 4.2.2

snakemake
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition.
versions available: 7.20.0, 8.5.3

snp-pipeline
The CFSAN SNP Pipeline is a Python-based system for the production of SNP matrices from sequence data used in the phylogenetic analysis of pathogenic organisms sequenced from samples of interest to food safety. The SNP Pipeline was developed by the United States Food and Drug Administration, Center for Food Safety and Applied Nutrition.
versions available: 2.2.1

spades
SPAdes (St. Petersburg genome assembler) is intended for both standard isolates and single-cell MDA bacteria assemblies. The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. One can also provide additional contigs that will be used as long reads. SPAdes supports paired-end reads, mate-pairs and unpaired reads.
versions available: 3.15.5, 4.0.0

sqanti3
Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences.
versions available: 5.2.2

sra-tools
The Sequence Read Archive (SRA) stores raw sequence data from ‘next-generation’ sequencing technologies including Illumina, 454, IonTorrent, Complete Genomics, PacBio and OxfordNanopores. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Includes NCBI VDB and NGS SDK.
versions available: 2.11.0, 3.1.0

stacks
Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
versions available: 2.59, 2.68

star
Spliced Transcripts Alignment to a Reference (STAR) is an ultrafast universal RNA-seq aligner, which was developed to align a large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset.
versions available: 2.7.11b, 2.7.9a

staramr
staramr (*AMR) scans bacterial genome contigs against the ResFinder, PointFinder, and PlasmidFinder databases (used by the ResFinder webservice and other webservices offered by the Center for Genomic Epidemiology) and compiles a summary report of detected antimicrobial resistance genes. The star|* in staramr indicates that it can handle all of the ResFinder, PointFinder, and PlasmidFinder databases.
versions available: 0.11.0

starccm
STARCCM+ is much more than just a CFD solver, STAR-CCM+ is an entire engineering process for solving problems involving flow (of fluids or solids), heat transfer and stress.
versions available: 2022.1, 2310, 2406

star-fusion
STAR-Fusion is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set.
versions available: 1.11.1, 1.13.0

syri
Synteny and Rearrangement Identifier (SyRI). SyRI is a comprehensive tool for predicting genomic differences between related genomes using whole-genome assemblies (WGA). The assemblies are aligned using whole-genome alignment tools, and these alignments are then used as input to SyRI.
versions available: 1.6.3

tensorflow
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
versions available: 2.11.1-cuda11.2, 2.16.1-cuda12.5

tophat
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
versions available: 2.1.1

trand
TranD is a collection of tools to facilitate metrics of structural variation for whole genome transcript annotation files (GTF) that pinpoint structural variation to the nucleotide level. TranD (Transcript Distances) can be used to calculate metrics of structural variation within and between annotation files (GTF).
versions available: 23.5.30

transdecoder
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
versions available: 5.7.1

treetime
TreeTime provides routines for ancestral sequence reconstruction and inference of molecular-clock phylogenies, i.e., a tree where all branches are scaled such that the positions of terminal nodes correspond to their sampling times and internal nodes are placed at the most likely time of divergence.
versions available: 0.11.2

trimmomatic
Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application.
versions available: 0.38, 0.39

trinity
Trinity assembles transcript sequences from Illumina RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.
versions available: 2.14.0, 2.15.1

trycycler
Trycycler is a tool that takes as input multiple separate long-read assemblies of the same genome (e.g. from different assemblers or different read subsets) and produces a consensus long-read assembly.
versions available: 0.5.4

usearch
USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST. USEARCH combines many different algorithms into a single package
versions available: 10.0.240, 11.0.667

vamb
Vamb is a metagenomic binner which feeds sequence composition from a FASTA file of contigs, and abundance information from e.g. BAM files into a variational autoencoder and clusters the latent representation. It performs excellently with multiple samples, and pretty good on single-sample data.
versions available: 4.1.3

vcftools
VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
versions available: 0.1.16

vdb
The vdb program is designed to query the SARS-CoV-2 mutational landscape. It runs as a command shell in a terminal, and it allows customized searches for mutation patterns over the entire SARS-CoV-2 genome dataset or subsets thereof. These patttern searches can be for spike protein mutations or nucleotide mutations over the whole genome.
versions available: 2.7, 3.5

velvet
Velvet is a sequence assembler for very short reads
versions available: 1.2.10

viennarna
The ViennaRNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
versions available: 2.6.4

visit
VisIt is an Open Source, interactive, scalable, visualization, animation and analysis tool. Users can interactively visualize and analyze data ranging in scale from small (<10 core) desktop-sized projects to large (>10,000 core) leadership-class computing facility simulation campaigns. Users can quickly generate visualizations, animate them through time, manipulate them with a variety of operators and mathematical expressions, and save the resulting images and animations for presentations.
versions available: 3.2.0, 3.4.1, 3.4.2

vsearch
VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, re-replication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
versions available: 2.29.2

vtk
The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, modeling, image processing, volume rendering, scientific visualization, and information visualization.
versions available: 9.3.0, 9.3.0-mpi

wengan
Wengan is a new, accurate, and ultra-fast genome assembler that, unlike most of the current long-reads assemblers, avoids entirely the all-vs-all read comparison. The key idea behind Wengan is that long-read alignments can be inferred by building paths on a sequence graph.
versions available: 0.2

wrf
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs.
versions available: 4.3.3, 4.3.3-mpi, 4.6.0, 4.6.0-mpi

wtdbg
wtdbg is a fuzzy Bruijn graph (FBG) approach to long noisy reads assembly. wtdbg is desiged to assemble huge genomes in very limited time, it requires a PowerPC with multiple-cores and very big RAM (1Tb+). wtdbg can assemble a 100 X human pacbio dataset within one day.
versions available: 2.3, 2.5

Development

anaconda3
Anaconda is a distribution of the Python for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment. Package versions in Anaconda are managed by the package management system conda. This package manager was spun out as a separate open-source package as it ended up being useful on its own and for things other than Python.
versions available: 2023.09

autoconf
Autoconf is an extensible package of M4 macros that produce shell scripts to automatically configure software source code packages. These scripts can adapt the packages to many kinds of UNIX-like systems without manual user intervention. Autoconf creates a configuration script for a package from a template file that lists the operating system features that the package can use, in the form of M4 macro calls.
versions available: 2.72

bazel
Bazel is Google’s own build tool. Bazel has built-in support for building both client and server software, and also provides an extensible framework that you can use to develop your own build rules.
versions available: 6.5.0, 7.1.1

cmake
CMake is a cross-platform, open-source build system. CMake is a family of tools designed to build, test and package software.
versions available: 3.29.0

cuda
The NVIDIA CUDA Toolkit provides a development environment for creating high performance GPU-accelerated applications. With the CUDA Toolkit, you can develop, optimize and deploy your applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms and HPC supercomputers.
versions available: 11.8, 12.1, 12.4

dmd
DMD is the reference compiler for the D programming language. The D programming language has been said to be ‘what C++ wanted to be,’ which is a better C. D is developed with system level programming in mind, but brings to the table modern language design with a simple C-like syntax. For these reasons D makes for a good language choice for both performance code and application development.
versions available: 2.103.1, 2.108.0

gcc
The GNU Compiler Collection includes front ends for C, C++, Objective-C, and Fortran, as well as libraries for these languages (libstdc++, libgcj,…).
versions available: 12.3.0, 13.2.0

ghc
The Glasgow Haskell Compiler is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic IO. It is named after logician Haskell Curry.
versions available: 9.8.2

versions available: 1.21.8, 1.22.1

hpc-sdk
The NVIDIA HPC Software Development Kit (SDK) includes the proven compilers, libraries and software tools essential to maximizing developer productivity and the performance and portability of HPC applications. The NVIDIA HPC SDK C, C++, and Fortran compilers support GPU acceleration of HPC modeling and simulation applications with standard C++ and Fortran, OpenACC directives, and CUDA. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming.
versions available: 21.3, 24.1

intel
Name: Intel® oneAPI DPC++/C++ Compiler} Version: intel/2024} Description: Intel® oneAPI C/C++ and SYCL code compiler for CPUs, GPUs and FPGAs} URL: https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html} Dependencies: tbb compiler-rt oclfpga
versions available: compiler-rt, mkl, oclfpga, tbb, 2020, 2024

julia
Julia is a high-level, high-performance dynamic programming language for numerical computing. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library.
versions available: 1.10.2, 1.11.2

lua
Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object-oriented programming, functional programming, data-driven programming, and data description.
versions available: 5.4.6

mambaforge
Mamba is a reimplementation of the conda package manager in C++, which uses libsolv for much faster dependency solving and allows parallel downloading of repository data and package files using multi-threading. Mamba utilizes the same command line parser, package installation and deinstallation code and transaction verification routines as conda to stay as compatible as possible.
versions available: 23.11

miniforge
Miniforge is an community-led alternative to the data science platforms Anaconda and Miniconda. Packages in the base environment are obtained from the conda-forge (default) channel. Initially started as a multi-platform package management tool, the term ‘conda’ has since evolved to encompass an entire open-source packaging ecosystem and philosophy. This ecosystem is supported by many organizations who all share the common goal of providing easier access to programming tools and libraries.
versions available: 24.3

nasm
The Netwide Assembler, NASM, is an 80×86 and x86-64 assembler designed for portability and modularity. It supports a range of object file formats, including Linux and `*BSD’ `a.out’, `ELF’, `COFF’, `Mach-O’, 16-bit and 32-bit `OBJ’ (OMF) format, `Win32′ and `Win64′.
versions available: 2.16

netbeans
NetBeans is a free, open source IDE that allows you to quickly and easily develop desktop, mobile and web applications with Java, HTML5, PHP, C/C++ and more.
versions available: 12.2

openjdk
OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE). It is the result of an effort Sun Microsystems began in 2006. The implementation is licensed under the GNU General Public License (GNU GPL) version 2.
versions available: 21, 22

powershell
PowerShell is a cross-platform task automation solution made up of a command-line shell, a scripting language, and a configuration management framework. PowerShell is a modern command shell that includes the best features of other popular shells. Unlike most shells that only accept and return text, PowerShell accepts and returns .NET objects.
versions available: 7.3.1

python
Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.
versions available: 2.7.18, 3.12.3

scala
Scala is an acronym for ‘Scalable Language’. Scala is a pure-bred object-oriented language. Conceptually, every value is an object and every operation is a method-call. The language supports advanced component architectures through classes and traits.
versions available: 2.13.13, 3.4.1

swift
Swift is a general-purpose programming language built using a modern approach to safety, performance, and software design patterns. The goal of the Swift project is to create the best available language for uses ranging from systems programming, to mobile and desktop apps, scaling up to cloud services.
versions available: 5.10

upcxx
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming, and is designed to interoperate smoothly and efficiently with MPI, OpenMP, C++/POSIX threads, CUDA, ROCm/HIP, oneAPI and other HPC frameworks. It leverages GASNet-EX to deliver low-overhead, fine-grained communication, including Remote Memory Access (RMA) and Remote Procedure Call (RPC).
versions available: 2023.9.0

yasm
YASM, an assembler and disassembler for the Intel x86 architecture, is a complete rewrite of the NASM assembler. YASM currently supports the x86 and AMD64 instruction sets, accepts NASM and GAS assembler syntaxes, outputs binary, ELF32, ELF64, 32 and 64-bit Mach-O, RDOFF2, COFF, Win32, and Win64 object formats, and generates source debugging information in STABS, DWARF 2, and CodeView 8 formats.
versions available: 1.3.0

Libraries

boost
Boost is a set of libraries for the C++ programming language that provide support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading, image processing, regular expressions, and unit testing.
versions available: 1.84.0, 1.84.0-mpi

cudnn
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
versions available: 8.9.7-cuda11, 8.9.7-cuda12, 9.0.0-cuda11, 9.0.0-cuda12

hdf5
HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.
versions available: 1.10.7, 1.10.7-mpi, 1.12.3, 1.12.3-intel-mpi, 1.12.3-mpi, 1.14.3, 1.14.3-intel-mpi, 1.14.3-mpi

netcdf
NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
versions available: 4.8.1, 4.8.1-intel-mpi, 4.8.1-mpi, 4.9.2, 4.9.2-intel-mpi, 4.9.2-mpi

openblas
Hierarchical Data Format (OPENBLAS4; also known as OPENBLAS) is a library and multi-object file format for storing and managing data between machines.
versions available: 0.3.27

MPI (Message Passing Interface)

intel-mpi
Name: Intel(R) MPI Library} Version: modules/2021.11} Description: Intel(R) MPI Library} URL: https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html} Dependencies: none
versions available: 2021.11

mpich
MPICH is a high-performance and widely portable implementation of the Message Passing Interface (MPI) standard MPI-1, MPI-2 and MPI-3.
versions available: 4.2.2-mpirun, 4.2.2-mpirun-intel, 4.2.2-srun, 4.2.2-srun-intel

openmpi
The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners.
versions available: 4.1.6, 4.1.6-intel, 5.0.2, 5.0.2-intel