Research Cluster Applications

Research Computing offers a wide range of applications on the HPC Research Cluster. Below is the list of applications and their versions installed on our Red Hat Enterprise Linux (RHEL) 8.3 compute nodes. Cluster users may request additional applications to be installed and supported by the Research Computing staff, and research groups are also provided with shared storage suitable for installing and maintaining their own discipline specific applications.

URC software

Applications

abaqus
ABAQUS is used for both the modeling and analysis of mechanical components and assemblies (pre-processing) and visualizing the finite element analysis result.
versions available: 2021, 2022

abyss
ABySS, Assembly by Short Sequences, is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
versions available: 2.1.5, 2.2.5

admixture
ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.
versions available: 1.3.0

agwg-merge
AGWG-merge is a version of the 3D-DNA pipeline (Dudchenko et al., Science, 2017) that was used to help generate AaegL5 genome assembly for the mosquito Aedes aegypti.
versions available: 180806

alphafold
An implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP14 and published in Nature.
versions available: 2.1.1, 2.3.1

ambertools
AmberTools consists of several independently developed packages that work well by themselves, and with Amber itself. The suite can also be used to carry out complete molecular dynamics simulations, with either explicit water or generalized Born solvent models. AmberTools20 consists of the following major codes: NAB/sff, antechamber and MCPB, tleap and parmed, sqm, pbsa, 3D-RISM, sander, mdgx, cpptraj and pytraj, MMPBSA.py and amberlite
versions available: 20, 20-mpi

anvio
Anvi’o is an open-source, community-driven analysis and visualization platform for ‘omics data. It brings together many aspects of today’s cutting-edge genomic, metagenomic, metatranscriptomic, pangenomic, and phylogenomic analysis practices to address a wide array of needs.
versions available: 7

apricot
Apricot implements submodular optimization for the purpose of summarizing massive data sets into minimally redundant subsets that are still representative of the original data. These subsets are useful for both visualizing the modalities in the data (such as in the two data sets below) and for training accurate machine learning models with just a fraction of the examples and compute.
versions available: 0.6.1

aria
ARIA (Ambiguous Restraints for Iterative Assignment) is a software for automated NOE assignment and NMR structure calculation. It speeds up and automatizes the assignment process through the use of an iterative structure calculation scheme. Additionally, a refinement in explicit water improves the quality of the calculated structures, validation tests help spectroscopists to judge the quality of the final structures, and the support of the CCPN data model simplifies the exchange of information with other NMR software packages.
versions available: 2.3.2

artic
ARTIC is a pipeline and set of accompanying tools for working with viral nanopore sequencing data, generated from tiling amplicon schemes. It is designed to help run the artic bioinformatics protocols; for example the SARS-CoV-2 coronavirus protocol. There are 2 workflows baked into this pipeline, one which uses signal data (via nanopolish) and one that does not (via medaka).
versions available: 1.2.1

atsas
ATSAS is a data analysis software suite for small-angle scattering data analysis from biological macromolecules. Included in the ATSAS suite: BUNCH CHROMIXS, CORAL, CRYSOL, CRYSON, DAMAVER, DAMMIF, DAMMIN, DATtools, EOM, GASBOR, GNOM, MONSA, OLIGOMER, PRIMUS, SASFLOW, SASREF, SREFLEX, SUPCOMB
versions available: 3.0.3

augustus
AUGUSTUS is a gene prediction program for eukaryotes. It can be used as an ab initio program, which means it bases its prediction purely on the sequence.
versions available: 3.3.3, 3.4.0

aws-cli
The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. With minimal configuration, the AWS CLI enables you to start running commands that implement functionality equivalent to that provided by the browser-based AWS Management Console from the command prompt in your terminal program
versions available: 2.11.2

bamtools
BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.
versions available: 2.4.1, 2.5.1

bayesase
BayesASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating Allelic Imbalance (AI) and formally comparing levels of AI between conditions. AI indicates the presence of functional variation in cis regulatory regions. Detecting cis regulatory differences using AI is widespread, yet there is no formal statistical methodology that tests whether AI differs between conditions.
versions available: 21.1.13

beagle
Beagle is a software package for phasing genotypes and imputing ungenotyped markers. Beagle has improved memory and computational efficiency when analyzing large sequence data sets.
versions available: 5.4

beast
BEAST 2 is a cross-platform program for Bayesian phylogenetic analysis of molecular sequences. It estimates rooted, time-measured phylogenies using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology.
versions available: 2.6.3

bedtools2
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome.
versions available: 2.26.0, 2.29.0

bismark
Bismark is a set of tools for the time-efficient analysis of Bisulfite-Seq (BS-Seq) data. Bismark performs alignments of bisulfite-treated reads to a reference genome and cytosine methylation calls at the same time. (Requires Bowtie or Bowtie2)
versions available: 0.22.3, 0.24.0

blast
NCBI BLAST (Basic Local Alignment Search Tool) is a suite of programs for aligning query sequences against those present in a selected target database.
versions available: 2.11.0+, 2.15.0+, 2.3.0+, 2.9.0+

blatsuite
Blat produces two major classes of alignments: 1) at the DNA level between two sequences that are of 95% or greater identity, but which may include large inserts, 2) at the protein or translated DNA level between sequences that are of 80% or greater identity and may also include large inserts. (v36 / 64-bit)
versions available: 36

bowtie2
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
versions available: 2.2.9, 2.4.1, 2.5.1

bracken
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample. Braken uses the taxonomy labels assigned by Kraken, a highly accurate metagenomics classification algorithm, to estimate the number of reads originating from each species present in a sample.
versions available: 2.9

braker
BRAKER2 is an unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS
versions available: 2.1.2, 2.1.5, 3.0.7

busco
BUSCO provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9.
versions available: 3.0.2, 4.0.6, 5.1.3, 5.4.7

bwa
BWA (Burrows-Wheeler Aligner) is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
versions available: 0.7.12, 0.7.17

canu
Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing, such as the PacBio RS II/Sequel or Oxford Nanopore MinION.
versions available: 1.8, 2.1.1

cd-hit
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.
versions available: 4.8.1

cfm-id
CFM-ID provides a method for accurately and efficiently identifying metabolites in spectra generated by electrospray tandem mass spectrometry (ESI-MS/MS). The program uses Competitive Fragmentation Modeling to produce a probabilistic generative model for the MS/MS fragmentation process and machine learning techniques to adapt the model parameters from data.
versions available: 2.4.3, 4.4.7

checkm
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. CheckM also provides tools for identifying genome bins that are likely candidates for merging based on marker set compatibility, similarity in genomic characteristics, and proximity within a reference genome tree.
versions available: 1.2.2

clustal-omega
Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. In addition, the quality of alignments is superior to previous versions.
versions available: 1.2.4

cmake
CMake is a cross-platform, open-source build system. CMake is a family of tools designed to build, test and package software.
versions available: 3.19.7, 3.25.0

comsol
COMSOL Multiphysics is a finite element analysis, solver, and simulation software package for various physics and engineering applications, especially coupled phenomena and multiphysics. The software facilitates conventional physics-based user interfaces and coupled systems of partial differential equations (PDEs).
versions available: 6.2

converge
As a leading computational fluid dynamics (CFD) software for simulating three-dimensional fluid flow, CONVERGE is designed to facilitate your innovation process. CONVERGE features truly autonomous meshing, state-of-the-art physical models, a robust chemistry solver, and the ability to easily accommodate complex moving geometries, so you can take on the hard CFD problems.
versions available: 3.0.23, 3.1.2

cufflinks
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
versions available: 2.2.1

ddocent
dDocent is simple bash wrapper to QC, assemble, map, and call SNPs from almost any kind of RAD sequencing. If you have a reference already, dDocent can be used to call SNPs from almost any type of NGS data set.
versions available: 2.8.13

decona
From demultiplexing to consensus for Nanopore amplicon data, Decona can process multiple samples in one line of code: Mixed samples containing multiple species from bulk and eDNA, Mixed amplicons in one barcode, Multiplexed barcodes, Multiple samples in one run, Outputs Medaka polished consensus sequences
versions available: 1.3.1

diamond
DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are: Pairwise alignment of proteins and translated DNA at 100x-10,000x speed of BLAST; Frameshift alignments for long read analysis; Low resource requirements and suitable for running on standard desktops or laptops; Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.
versions available: 2.0.9

dm_control
DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo physics.
versions available: 0.0.408

dram
DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases.
versions available: 1.4.6

drap
DRAP is a De novo RNA-Seq Assembly Pipeline which wraps two assemblers, Trinity and Oases, in order to improve their results regarding the above-mentioned criteria.
versions available: 1.92

eagle
The Eagle software estimates haplotype phase either within a genotyped cohort or using a phased reference panel. Eagle2 uses a new, very fast HMM-based algorithm that improves speed and accuracy over existing methods via two key ideas: a new data structure based on the positional Burrows-Wheeler transform and a rapid search algorithm that explores only the most relevant paths through the HMM.
versions available: 2.4.1

eclipse
Eclipse provides IDEs and platforms nearly every language and architecture, including Java, C/C++, JavaScript and PHP.
versions available: 2020-12

ed2
The Ecosystem Demography Biosphere Model (ED2) is an integrated terrestrial biosphere model incorporating hydrology, land-surface biophysics, vegetation dynamics, and soil carbon and nitrogen biogeochemistry. Like its predecessor, ED, ED2 uses a set of size- and age-structured partial differential equations that track the changing structure and composition of the plant canopy.
versions available: 2.2-intel, 2.2-mpi

emboss
EMBOSS (European Molecular Biology Open Software Suite) is a software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web.
versions available: 6.6.0

entap
The Eukaryotic Non-Model Transcriptome Annotation Pipeline (EnTAP) is designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins.
versions available: 0.10.8

examl
This code implements the popular RAxML search algorithm for maximum likelihood based inference of phylogenetic trees. It uses a radically new MPI parallelization approach that yields improved parallel efficiency, in particular on partitioned multi-gene or whole-genome datasets.
versions available: 3.0.17, 3.0.21

exonerate
Exonerate is a generic tool for sequence alignment
versions available: 2.4.0

famsa
FAMSA is Fast and Accurate Multiple Sequence Alignment of large protein families. It first determines the longest common subsequences and has a unique way to compute gap costs. It proceeds progressively to add sequences into the alignments using a novel iterative approach.
versions available: 2.2.2

fastqc
FastQC is a quality control tool for high throughput sequence data. It takes a FastQ file and runs a series of tests on it to generate a comprehensive QC report. FastQC can be run either as an interactive GUI app, or in a non-interactive way (say as part of a pipeline) which will generate an HTML report for each file you process.
versions available: 0.11.9

ffmpeg
FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge. It contains libavcodec, libavutil, libavformat, libavfilter, libavdevice, libswscale and libswresample which can be used by applications. As well as ffmpeg, ffserver, ffplay and ffprobe which can be used by end users for transcoding, streaming and playing.
versions available: 4.2.1, 4.2.1-cuda11.2

fluent
Ansys Fluent is a general-purpose computational fluid dynamics (CFD) software used to model fluid flow, heat and mass transfer, chemical reactions, and more. Also known for its efficient HPC scaling, large models can easily be solved in Fluent on multiple processors on either CPU or GPU.
versions available: 2022

flye
Flye is a de novo assembler for single-molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PacBio / ONT reads as input and outputs polished contigs. Flye also has a special mode for metagenome assembly.
versions available: 2.9.1

freyja
Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational ‘barcodes’ derived from the UShER global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.
versions available: 1.3

gatk
GATK (Genome Analysis Toolkit) offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
versions available: 3.8.1, 4.1.6, 4.2.0

genemark
GeneMark developed in 1993 was the first gene finding method recognized as an efficient and accurate tool for genome projects. GeneMark was used for annotation of the first completely sequenced bacteria, Haemophilus influenzae, and the first completely sequenced archaea, Methanococcus jannaschii.
versions available: 4.71, 4.72

genesis
GENESIS (GENeralized-Ensemble SImulation System) has been developed mainly by Sugita group in RIKEN. Using GENESIS, molecular dynamics simulation and modeling of various biomolecular systems are possible with high performance. Multi-scale simulations with atomistic, coarse-grained, and QM/MM models are available together with enhanced sampling methods and other advanced simulation techniques.
versions available: 2.1.2-mpi

gridlab-d
GridLAB-D is a new power distribution system simulation and analysis tool that provides valuable information to users who design and operate distribution systems, and to utilities that wish to take advantage of the latest energy technologies. It incorporates the most advanced modeling techniques, with high-performance algorithms to deliver the best in end-use modeling.
versions available: 5.1

gridpack
GridPACK is an open-source high-performance (HPC) package for simulation of large-scale electrical grids. Powered by distributed (parallel) computing and high-performance numerical solvers, GridPACK offers several applications forfast simulation of electrical transmission systems.
versions available: 3.4

gromacs
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions, many groups are also using it for research on non-biological systems, e.g. polymers.
versions available: 2020.6, 2020.6-cuda, 2020.6-mpi, 2020.6-mpi-cuda, 2021.1, 2021.1-cuda, 2021.1-mpi, 2021.1-mpi-cuda

gtdbtk
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB. It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes.
versions available: 2.3.2

guppy
Guppy is a data processing toolkit that contains the Oxford Nanopore Technologies’ basecalling algorithms, and several bioinformatic post-processing features. Early downstream analysis components such as barcoding/demultiplexing, adapter trimming and alignment are contained within Guppy.
versions available: 6.0.6, 6.0.6-cuda11.2, 6.3.4, 6.3.4-cuda11.2, 6.5.7, 6.5.7-cuda11.4

gurobi
Gurobi is a state-of-the-art optimization tool designed from the ground up to exploit modern architectures and multi-core processors, using the most advanced implementations of the latest optimization algorithms so you can solve your models faster and more reliably.
versions available: 9.1.1, 10.0.1, 10.0.3

hecaton
Hecaton is a framework specifically designed for plant genomes that detects copy number variants (CNVs) using short paired-end Illumina reads. CNVs are called by integrating existing structural variant callers through a machine-learning model and several custom post-processing scripts.
versions available: 0.4.0, 0.5.0

helics
HELICS is a multi-language, cross-platform library that enables different simulators to easily exchange data and stay synchronized in time. Scalable from two simulators on a laptop to 100,000+ running on supercomputers, the cloud, or a mix of these platforms.
versions available: 3.4.0

hhsuite
The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs). It contains HHsearch and HHblits among other programs and utilities. HHsearch takes as input a multiple sequence alignment (MSA) or profile HMM and searches a database of HMMs (e.g. PDB, Pfam, or InterPro) for homologous proteins.
versions available: 3.3.0

hicexplorer
HiCExplorer facilitates the creation of contact matrices, correction of contacts, TAD detection, A/B compartments, merging, reordering or chromosomes, conversion from different formats including cooler and detection of long-range contacts. Moreover, it allows the visualization of multiple contact matrices along with other types of data like genes, compartments, ChIP-seq coverage tracks (and in general any type of genomic scores), long range contacts and the visualization of viewpoints.
versions available: 3.6

hic-pro
HiC-Pro is an optimized and flexible pipeline for Hi-C data processing. HiC-Pro was designed to process Hi-C data, from raw fastq files (paired-end Illumina data) to the normalized contact maps.
versions available: 2.11.1, 3.0.0

hisat2
HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). Based on GCSA (an extension of BWT for a graph), we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge.
versions available: 2.2.0, 2.2.1

hmmer
HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
versions available: 3.3.2

homopolish
Homopolish is a genome polisher originally developed for Nanopore and subsequently extended for PacBio CLR. It generates a high-quality genome (>Q50) for virus, bacteria, and fungus. Nanopore/PacBio systematic errors are corrected by retreiving homologs from closely-related genomes and polished by an SVM.
versions available: 0.4

htseq
HTSeq is a tool for the analysis of high-throughput sequencing data. It processes reads aligned with HISTAT or STAR and assign expression value counts. The HTSeq is also suitable for the quantification of single-cell RNA-seq data (scRNA-seq). The package also includes a htseq-count tool for pre-processing RNA-seq reads before differential expression analysis and a htseq-qa tool that assesses the read quality.
versions available: 2.0.5

humann
HUMAnN is the HMP Unified Metabolic Analysis Network. HUMAnN is a method for efficiently and accurately profiling the abundance of microbial metabolic pathways and other molecular functions from metagenomic or metatranscriptomic sequencing data.
versions available: 3.8

humann2
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads).
versions available: 0.11.2

hyperworks
Altair HyperWorks is the most comprehensive open-architecture simulation platform, offering best-in-class technologies to design and optimize high performance, efficient and innovative products. The HyperWorks Physics solvers are installed for use on the cluster, which includes the Mechanical solvers (Optistruct, Radioss and Motionsolve), Feko + winprop, Acusolve, and Flux.
versions available: 2020

interproscan
InterPro is a database which integrates together predictive information about proteins’ function from a number of partner resources, giving an overview of the families that a protein belongs to and the domains and sites it contains.
versions available: 5.55-88.0, 5.60-92.0

iphop
iPHoP stands for integrated Phage Host Prediction. It is an automated command-line pipeline for predicting host genus of novel bacteriophages and archaeoviruses based on their genome sequences.
versions available: 1.3.0

iqtree
A fast and effective stochastic algorithm to infer phylogenetic trees by maximum likelihood. IQ-TREE compares favorably to RAxML and PhyML in terms of likelihoods with similar computing time
versions available: 1.6.12, 2.1.2

i-tasser
I-TASSER is an integrated package for protein structure and function predictions. For a given sequence, I-TASSER first identifies template proteins from the Protein Data Bank (PDB) by multiple threading techniques (LOMETS).
versions available: 5.1

jags
JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind: 1) To have a cross-platform engine for the BUGS language, 2) To be extensible, allowing users to write their own functions, distributions and samplers, and 3) To be a platform for experimentation with ideas in Bayesian modelling.
versions available: 4.3.0, 4.3.1

jellyfish
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the ‘compare-and-swap’ CPU instruction to increase parallelism.
versions available: 2.2.6, 2.3.0

kneaddata
KneadData is a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.
versions available: 0.7.2

kraken
Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
versions available: 1.1.1

kraken2
Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
versions available: 2.1.2, 2.1.3

lammps
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Packages built: ASPHERE ATC AWPMD BOCS BODY CLASS2 COLLOID COLVARS COMPRESS CORESHELL DIFFRACTION DIPOLE DRUDE EFF FEP GRANULAR H5MD KIM KSPACE LATTE MANIFOLD MANYBODY MC MGPT MISC MOFFF MOLECULE MOLFILE MPIIO OPT PERI PHONON POEMS PTM PYTHON QEQ QTB REAXFF REPLICA RIGID SHOCK SMTBQ SPH SPIN SRD TALLY UEF VORONOI
versions available: 02Aug23-cuda, 02Aug23-mpi, 10Mar21-cuda, 10Mar21-mpi, 23Jun22-cuda, 23Jun22-mpi

lastz
LASTZ: A tool for (1) aligning two DNA sequences, and (2) inferring appropriate scoring parameters automatically.
versions available: 1.04.03

liggghts
LIGGGHTS(R)-PUBLIC is an Open Source Discrete Element Method Particle Simulation Software based on LAMMPS. LIGGGHTS (R) stands for LAMMPS improved for general granular and granular heat transfer simulations. LIGGGHTS (R) aims to improve the capabilities of LAMMPS with the goal to apply it to industrial applications.
versions available: 3.8.0

links
LINKS is a genomics application for scaffolding genome assemblies with long reads, such as those produced by Oxford Nanopore Technologies Ltd. It can be used to scaffold high-quality draft genome assemblies with any long sequences (eg. ONT reads, PacBio reads, other draft genomes, etc). It is also used to scaffold contig pairs linked by ARCS/ARKS.
versions available: 1.8.7

longstitch
A genome assembly correction and scaffolding pipeline using long reads, consisting of up to three steps: 1) Tigmint cuts the draft assembly at potentially misassembled regions, 2) ntLink is then used to scaffold the corrected assembly, and 3) followed by ARKS for further scaffolding (optional extra step of scaffolding)
versions available: 1.0.2

lordec
LoRDEC (built with GATB v1.4.1) is a program to correct sequencing errors in long reads from 3rd generation sequencing with high error rate, and is especially intended for PacBio reads. It uses a hybrid strategy, meaning that it uses two sets of reads: the reference read set, whose error rate is assumed to be small, and the PacBio read set, which is then corrected using the reference set. Typically, the reference set contains Illumina reads.
versions available: 0.9

ls-dyna
LS-DYNA is a general-purpose finite element program capable of simulating complex real world problems. The code’s origins lie in highly nonlinear, transient dynamic finite element analysis using explicit time integration.
versions available: 11.2.1, 12.0.0, 13.0.0

mafft
MAFFT is a Multiple alignment program for amino acid or nucleotide sequences. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
versions available: 7.055woe, 7.273woe, 7.487woe

maker
MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases.
versions available: 2.31, 2.31-mpi, 3.01, 3.01-mpi

masurca
The MaSuRCA (Maryland Super Read Cabog Assembler) assembler combines the benefits of deBruijn graph and Overlap-Layout-Consensus assembly approaches. MaSuRCA supports hybrid assembly with short Illumina reads and long high error PacBio/MinION data.
versions available: 3.3.4, 4.0.1, 4.0.9

mathematica
Mathematica is a software package which is ideal for communicating scientific ideas, whether this is visualization of a concept in an intro-level course, or creating a simulation of a new idea related to research.
versions available: 11.3.0, 12.3.1, 13.3.0

matlab
MATLAB is a high-level language and interactive environment for numerical computation, visualization, and programming.
versions available: R2020b, R2022b, R2023b

maxssmap
MaxSSmap is a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence. MaxSSmap aims to achieve comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most programs MaxSSmap identifies a local region of the genome followed by exact alignment.
versions available: 1.0

mcr
The MATLAB Compiler Runtime is a standalone set of shared libraries that enables the execution of compiled MATLAB applications or components on computers that do not have MATLAB installed. When used together, MATLAB, MATLAB Compiler, and the MATLAB Runtime enable you to create and distribute numerical applications or software components quickly and securely.
versions available: R2018a, R2018b, R2019b

megadock
MEGADOCK is an ultra-high-performance FFT-grid-based protein-protein docking for heterogeneous supercomputers that takes advantage of the massively parallel CUDA architechture of NVIDIA GPUs and multiple computation nodes.
versions available: 4.1.1, 4.1.1-mpi

megalodon
Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.
versions available: 2.4.1, 2.5.0

megax
The objective of the MEGA (Molecular Evolutionary Genetics Analysis) software has been to provide tools for exploring, discovering, and analyzing DNA and protein sequences from an evolutionary perspective. MEGA is designed to facilitate extensive sequence data analysis from an evolutionary perspective using a single program package. At the same time, the overlap between the methods implemented in MEGA and those in other existing evolutionary analysis programs has been consciously avoided. This is reflected in the exclusion of the maximum likelihood method (PHYLIP) and in the absence of extensive options for the maximum parsimony method (PAUP and MacClade.
versions available: 10.2.6

meme
Multiple Em for Motif Elicitation. MEME discovers novel, ungapped motifs (recurring, fixed-length patterns) in your sequences. MEME splits variable-length patterns into two or more separate motifs.
versions available: 5.4.1

merqury
Evaluate genome assemblies with k-mers and more. Often, genome assembly projects have illumina whole genome sequencing reads available for the assembled individual. The k-mer spectrum of this read set can be used for independently evaluating assembly quality without the need of a high quality reference. Merqury provides a set of tools for this purpose.
versions available: 1.3

metabat
MetaBAT: A robust statistical framework for reconstructing genomes from metagenomic data
versions available: 2.13, 2.15

metabolic
METABOLIC enables the prediction of metabolic and biogeochemical functional trait profiles to any given genome datasets. METABOLIC has two main implementations, which are METABOLIC-G and METABOLIC-C. METABOLIC-G.pl allows for generation of metabolic profiles and biogeochemical cycling diagrams of input genomes and does not require input of sequencing reads. METABOLIC-C.pl generates the same output as METABOLIC-G.pl, but as it allows for the input of metagenomic read data, it will generate information pertaining to community metabolism.
versions available: 4.0

metaphlan
MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
versions available: 2.8.1, 3.0.7, 4.0.3

microbeannotator
MicrobeAnnotator uses an iterative approach to annotate microbial genomes (Bacteria, Archaea and Virus) starting from proteins predicted using your favorite ORF prediction tool, e.g. Prodigal. The iterative approach is composed of three or five main steps, depending on the flavor of MicrobeAnnotator you run.
versions available: 2.0.5

minialign
Minialign is a little bit fast and moderately accurate nucleotide sequence alignment tool designed for PacBio and Nanopore long reads. It is built on three key algorithms, minimizer-based index of the minimap overlapper, array-based seed chaining, and SIMD-parallel Smith-Waterman-Gotoh extension.
versions available: 0.4.4, 0.6.0

miniasm
Miniasm is a very fast OLC-based *de novo* assembler for noisy long reads. It takes all-vs-all read self-mappings, typically by [minimap][minimap] as input and outputs an assembly graph in the [GFA][gfa] format.
versions available: 0.3

minimap2
Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database.
versions available: 2.17, 2.18

mira
MIRA is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads). It can be seen as a Swiss army knife of sequence assembly developed and used in the past 16 years to get assembly jobs done efficiently – and especially accurately.
versions available: 4.0.2

mirdeep2
miRDeep2 is a software package for identification of novel and known miRNAs in deep sequencing data. Furthermore, it can be used for miRNA expression profiling across samples. Last, a new module for preprocessing of raw Illumina sequencing data produces files for downstream analysis with the miRDeep2 or quantifier module. Colorspace sequencing data is currently not supported by the preprocessing module but it is planed to be implemented.
versions available: 0.1.2

mirdp2
miRDeep-P2 (miRDP2) is developed to accurately and fast analyze microRNAs (miRNAs) transcriptome in plants. It is adopted from miRDeep-P (miRDP) with new strategies and overhauled algorithm. We have tested miRDP2 to analyze miRNA transcriptomes in such plants with gradually increased genome size as Arabidopsis, rice, tomato, maize and wheat.
versions available: 1.1.4

mir-prefer
microRNA PREdiction From small RNAseq data (miR-PREFeR) uses expression patterns of miRNA and follows the criteria for plant microRNA annotation to accurately predict plant miRNAs from one or more small RNA-Seq data samples of the same species. We tested miR-PREFeR on several plant species. The results show that miR-PREFeR is sensitive, accurate, fast, and has low memory footprint.
versions available: 0.24

mitobim
The MITObim procedure (mitochondrial baiting and iterative mapping) represents a highly efficient approach to assembling novel mitochondrial genomes of non-model organisms directly from total genomic DNA derived NGS reads. Labor intensive long-range PCR steps prior to sequencing are no longer required.
versions available: 1.9.1

mpp-dyna
LS-DYNA is a general-purpose finite element program capable of simulating complex real world problems. The code’s origins lie in highly nonlinear, transient dynamic finite element analysis using explicit time integration.
versions available: 11.2.1, 11.2.1-avx512, 12.0.0, 12.0.0-avx512, 13.0.0, 13.0.0-avx512

mrbayes
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
versions available: 3.2.2, 3.2.7

ms-finder
MS-FINDER was launched as a universal program for compound ‘annotation’ that supports EI-MS (GC/MS) and MS/MS spectral mining. MS-FINDER aims to provide solutions for 1) formula predictions, 2) fragment annotations, and 3) structure elucidations by means of unknown spectra. In addition, the program can annotate your unknowns by the public spectral databases such as MassBank, LipidBlast, and GNPS.
versions available: 3.52

msprime
msprime is a population genetics simulator of ancestry and DNA sequence evolution based on tskit. msprime can simulate ancestral histories for a sample of individuals, consistent with a given demography under a range of different models and evolutionary processes. It can also simulate mutations on a given ancestral history (which can be produced by msprime ancestry simulations or other programs supporting tskit) under a variety of different models of genome sequence evolution.
versions available: 1.1.1

multiqc
MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples. Use MultiQC to aggregate results from bioinformatics analyses across many samples into a single report MultiQC searches a given directory for analysis logs and compiles a HTML report. It’s a general use tool, perfect for summarising the output from numerous bioinformatics tools.
versions available: 1.11

mummer
MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. This package provides an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system.
versions available: 3.23

muscle
MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds.
versions available: 3.8.1551, 3.8.31

namd
NAMD (2.14 x86_64 mpi) is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations.
versions available: 2.13-mcore, 2.13-mcore-cuda, 2.13-mpi, 2.14-mcore, 2.14-mcore-cuda, 2.14-mpi

nanopolish
Software package for signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more.
versions available: 0.14.0

netbeans
NetBeans is a free, open source IDE that allows you to quickly and easily develop desktop, mobile and web applications with Java, HTML5, PHP, C/C++ and more.
versions available: 12.2

netlogo
NetLogo is a programmable modeling environment for simulating natural and social phenomena. NetLogo is particularly well suited for modeling complex systems developing over time.
versions available: 6.2.0

nextstrain
Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. We provide a continually-updated view of publicly available data alongside powerful analytic and visualization tools for use by the community. Our goal is to aid epidemiological understanding and improve outbreak response.
versions available: 3.0.3

nf-core
Nextflow is an incredibly powerful and flexible workflow language. Nextflow lets you run nf-core pipelines on virtually any computing environment. nf-core pipelines adhere to strict guidelines – if one works, they all will.
versions available: 2.1, 2.12.1

node.js
Node.js is a JavaScript runtime built on Chrome’s V8 JavaScript engine. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. As an asynchronous event-driven JavaScript runtime, Node.js is designed to build scalable network applications.
versions available: 14.17.3

openbabel
Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It’s an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
versions available: 3.1.1

openfoam
OpenFOAM is the free, open source CFD software released and developed primarily by OpenCFD Ltd since 2004. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to acoustics, solid mechanics and electromagnetics. We offer versions from both OpenCFD Ltd and the OpenFOAM Foundation.
versions available: 9, 11, 2012, 2306

orca
ORCA is an ab initio, DFT, and semi-empirical SCF-MO package. The ORCA Input Library contains a collection of ORCA input that show you how to easily do various tasks using the many methods and approximations in the ORCA quantum chemistry code.
versions available: 4.2.1

orp
The Oyster River Protocol for (eukaryotic) transcriptome assembly is an actively developed, evidenced based method for optimizing transcriptome assembly. The protocol assembles the transcriptome using a multi-kmer multi-assembler approach, then merges those assemblies into 1 final assembly. Version 2.3.3u1 is an update to ORP 2.3.3 based on Anaconda3-2023.03-1, along with the following updated components: trinity 2.15.1, salmon 1.10.1, spades 3.15.5, busco 5.1.3, rcorrector 1.0.5, samtools 1.17, cd-hit 4.8.1, diamond 2.1.6.
versions available: 2.3.3, 2.3.3u1

orthofinder
OrthoFinder is a fast, accurate and comprehensive platform for comparative genomics. It finds orthogroups and orthologs, infers rooted gene trees for all orthogroups and identifies all of the gene duplcation events in those gene trees.
versions available: 2.2.7, 2.4.0, 2.5.4

pacbio
PacBio develops comprehensive solutions for scientists that propel the field of genomics, improve science and research, and create positive impact globally. This module includes several of the PacBio open source tools, including BLASR, CCS, ConsensusCore, GenomicConsensus, IsoSeq3, Lima, pbalign, pbcommand, pbcore, and pbcoretools, pbbam, bam2fastx, pb-dazzler, PB Assembly, and FALCON.
versions available: 2021.4

packmol
PACKMOL creates an initial point for molecular dynamics simulations by packing molecules in defined regions of space. The packing guarantees that short range repulsive interactions do not disrupt the simulations. The great variety of types of spatial constraints that can be attributed to the molecules, or atoms within the molecules, makes it easy to create ordered systems, such as lamellar, spherical or tubular lipid layers.
versions available: 20.14.3

pairtools
pairtools is a simple and fast command-line framework to process sequencing data from a Hi-C experiment. pairtools process pair-end sequence alignments and perform the following operations: detect ligation junctions (a.k.a. Hi-C pairs) in aligned paired-end sequences of Hi-C DNA molecules, sort .pairs files for downstream analyses, detect, tag and remove PCR/optical duplicates, generate extensive statistics of Hi-C datasets, select Hi-C pairs given flexibly defined criteria, restore .sam alignments from Hi-C pairs
versions available: 0.3.0

pangolin
Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences.
versions available: 4

parallel
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input.
versions available: 20210222

paraview
ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques.
versions available: 5.11.0, 5.11.0-mpi, 5.7.1, 5.7.1-mpi

parcels
The OceanParcels project develops Parcels (Probably A Really Computationally Efficient Lagrangian Simulator), a set of Python classes and methods to create customisable particle tracking simulations using output from Ocean Circulation models. Parcels can be used to track passive and active particulates such as water, plankton, plastic and fish.
versions available: 2.1.5

pbsuite
Software for Long-Read Sequencing Data from PacBio. PBSuite is made up of 2 tools: PBJelly and PBHoney. PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp).
versions available: 15.8.24

pcangsd
PCAngsd is a framework for analyzing low-depth next-generation sequencing (NGS) data in heterogeneous/structured populations using principal component analysis (PCA). Population structure is inferred by estimating individual allele frequencies in an iterative approach using a truncated SVD model. The covariance matrix is estimated using the estimated individual allele frequencies as prior information for the unobserved genotypes in low-depth NGS data.
versions available: 1.2

picard
Picard is a set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats. Picard is implemented using the HTSJDK Java library HTSJDK to support accessing file formats that are commonly used for high-throughput sequencing data such as SAM and VCF.
versions available: 2.18.29, 2.25.4, 2.9.2

plink
PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
versions available: 1.90, 2.00

poy
POY is a phylogenetic analysis program that supports multiple kinds of data (e.g. morphology, nucleotides, genes and gene regions, chromosomes, whole genomes, etc). POY is particular in that it can perform true alignment and phylogeny inference (i.e. input sequences need not to be prealigned).
versions available: 5.1.2

prokka
Prokka is rapid prokaryotic genome annotation. Whole genome annotation is the process of identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files.
versions available: 1.14.6

prosplign
This module includes Splign. ProSplign is a global alignment tool developed by Dr. Boris Kiryutin. It produces accurate spliced alignments and computes alignments of distantly related proteins with low similarity. Extra afford is taken to locate frameshift positions. Splign is a utility for computing cDNA-to-Genomic, or spliced sequence alignments, which uses a compartmentization algorithm to identify possible gene duplications, and a refined alignment algorithm recognizing introns and splice signals.
versions available: 2.0.0

psmc
Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
versions available: 0.6.5

purge_haplotigs
A simple pipeline for reassigning primary contigs that should be labeled as haplotigs. Purge Haplotigs helps with curating heterozygous diploid genome assemblies from third-gen long-read sequencing.
versions available: 1.1.2

pytorch
PyTorch is a python package that provides two high-level features: Tensor computation (like numpy) with strong GPU acceleration, and Deep Neural Networks built on a tape-based autodiff system. Built with CUDA Toolkit 11.3, for GPUs with Compute Capabilities of 3.7, 6.1, 7.0, 7.5, 8.0, 8.6
versions available: 1.10.2-cuda11.3, 1.12.0-cuda11.3, 2.3.0-cuda12.1

qe
Quantum Espresso (QE) is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.
versions available: 7.1-intel-mpi

qiime2
QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.
versions available: 2022.11, 2024.2

quast
QUAST (Quality Assessment Tool for Genome Assemblies) evaluates genome/metagenome assemblies by computing various metrics. The current QUAST toolkit includes the general QUAST tool for genome assemblies, MetaQUAST, the extension for metagenomic datasets, QUAST-LG, the extension for large genomes (e.g., mammalians), and Icarus, the interactive visualizer for these tools.
versions available: 5.0.2

R
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Labs, by John Chambers and colleagues. R can be considered as a different implementation of S.
versions available: 4.1.3, 4.1.3-mpi, 4.2.2, 4.2.2-mpi, 4.3.3, 4.3.3-mpi

racon
Racon is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step. The goal of Racon is to generate genomic consensus which is of similar or better quality compared to the output generated by assembly methods which employ both error correction and consensus steps, while providing a speedup of several times compared to those methods.
versions available: 1.4.21

ragtag
RagTag, the successor to RaGOO, is a command line tool for reference-guided genome assembly improvement. Currently, the two main features are misassembly correction and scaffolding. After correction and/or scaffolding, RagTag also provides utilities to update annotations or work with AGP files.
versions available: 1.1.1, 2.0.0

raxml
RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees.
versions available: 7.4.2, 7.4.2-mpi, 8.2.12, 8.2.12-mpi, 8.2.4, 8.2.4-mpi

repdenovo
REPdenovo is designed for constructing repeats directly from sequence reads. It based on the idea of frequent k-mer assembly. REPdenovo provides many functionalities, and can generate much longer repeats than existing tools.
versions available: 0.0, 0.1.0

repeatmasker
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns).
versions available: 4.0.8, 4.1.2

repeatmodeler
RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.
versions available: 1.0.11, 2.0.2

repeatscout
The purpose of the RepeatScout software is to identify repeat family sequences from genomes where hand-curated repeat databases (a la RepBase update) are not available.
versions available: 1.0.5, 1.0.6

rmblast
RMBlast is a RepeatMasker compatible version of the standard NCBI blastn program. The primary difference between this distribution and the NCBI distribution is the addition of a new program ‘rmblastn’ for use with RepeatMasker and RepeatModeler.
versions available: 2.11.0, 2.6.0

rsem
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels.
versions available: 1.3.3

rseqc
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, transcript level RNA integrity etc.
versions available: 4.0.0

rstudio
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
versions available: 1.3, 1.4

salmon
Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data. Salmon achieves its accuracy and speed via a number of different innovations, including the use of selective-alignment (accurate but fast-to-compute proxies for traditional read alignments), and massively-parallel stochastic collapsed variational inference. The result is a versatile tool that fits nicely into many different pipelines.
versions available: 1.10.0

samtools
Samtools is a suite of programs for interacting with high-throughput sequencing data, allowing you to read/write/edit/index/view SAM/BAM/CRAM format. This module includes BCFtools, which is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.
versions available: 1.10, 1.11, 1.3.1, 1.9

sas
SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics.
versions available: 9.4

selscan
Selscan is a program to calculate EHH-based scans for positive selection in genomes. selscan currently implements EHH, iHS, XP-EHH, nSL, XP-nSL and iHH12. It should be run separately for each chromosome and population (or population pair for XP-EHH). selscan is ‘dumb’ with respect ancestral/derived coding and simply expects haplotype data to be coded 0/1. Unstandardized iHS/nSL scores are thus reported as log(iHH1/iHH0) based on the coding you have provided.
versions available: 1.3.0, 2.0.0

seqkit
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
versions available: 0.11.0, 0.16.1

seqtk
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
versions available: 1.2, 1.3

shortbred
ShortBRED (Short, Better Representative Extract Dataset) is a pipeline to take a set of protein sequences, reduce them to a set of unique identifying strings (‘markers’), and then search for these markers in metagenomic data and determine the presence and abundance of the protein families of interest.
versions available: 0.9.5

siesta
SIESTA, a first-principles materials simulation code using DFT, is both a method and its computer program implementation, to perform efficient electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids.
versions available: 4.0.2, 4.1.5

sift4g
The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes.
versions available: 2017, 2017-cuda

singularity
Singularity is a free, cross-platform and open-source computer program that performs operating-system-level virtualization also known as containerization. One of the main uses of Singularity is to bring containers and reproducibility to scientific computing and the high-performance computing (HPC) world. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data.
versions available: 3.8.1

sirius
SIRIUS is a java-based software framework for the analysis of LC-MS/MS data of metabolites and other ‘small molecules of biological interest’.
versions available: 4.0.1

slim
SLiM is an evolutionary simulation framework that combines a powerful engine for population genetic simulations with the capability of modeling arbitrarily complex evolutionary scenarios. Simulations are configured via the integrated Eidos scripting language that allows interactive control over practically every aspect of the simulated evolutionary scenarios
versions available: 3.7

snakemake
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition.
versions available: 7.20.0

snp-pipeline
The CFSAN SNP Pipeline is a Python-based system for the production of SNP matrices from sequence data used in the phylogenetic analysis of pathogenic organisms sequenced from samples of interest to food safety. The SNP Pipeline was developed by the United States Food and Drug Administration, Center for Food Safety and Applied Nutrition.
versions available: 2.2.1

spades
SPAdes (St. Petersburg genome assembler) is intended for both standard isolates and single-cell MDA bacteria assemblies. The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. One can also provide additional contigs that will be used as long reads. SPAdes supports paired-end reads, mate-pairs and unpaired reads.
versions available: 3.14.1, 3.15.2, 3.15.5

spark
Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. These two qualities are key to the worlds of big data and machine learning, which require the marshalling of massive computing power to crunch through large data stores.
versions available: 3.1.3, 3.4.1

sqanti3
SQANTI3 is the newest version of the SQANTI tool that merges features from SQANTI and SQANTI2, together with new additions. SQANTI3 will continue as an integrated development aiming to providing you the best characterization possible for your new long read-defined transcriptome
versions available: 1.6, 4.0

sra-tools
The Sequence Read Archive (SRA) stores raw sequence data from ‘next-generation’ sequencing technologies including Illumina, 454, IonTorrent, Complete Genomics, PacBio and OxfordNanopores. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Includes NCBI VDB and NGS SDK.
versions available: 2.10.5, 2.11.0, 3.1.0

stacks
Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
versions available: 2.59

star
Spliced Transcripts Alignment to a Reference (STAR) is an ultrafast universal RNA-seq aligner, which was developed to align a large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset.
versions available: 2.7.0c, 2.7.9a

starccm
STARCCM+ is much more than just a CFD solver, STAR-CCM+ is an entire engineering process for solving problems involving flow (of fluids or solids), heat transfer and stress.
versions available: 2021.3, 2022.1, 2023.10

star-fusion
STAR-Fusion is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set.
versions available: 1.11.1

stata
Stata provides an integrated statistics, graphics, and data-management solution for anyone who analyzes data.
versions available: 11

structRNAfinder
StructRNAfinder is an automated pipeline that predicts and annotates RNA families in transcript or genome sequences. It not only displays the sequence/structural consensus alignments for each RNA family according to Rfam database, but also provides a taxonomic overview for each assigned functional RNA.
versions available: 17.03.29

syri
Synteny and Rearrangement Identifier (SyRI). SyRI is a comprehensive tool for predicting genomic differences between related genomes using whole-genome assemblies (WGA). The assemblies are aligned using whole-genome alignment tools, and these alignments are then used as input to SyRI.
versions available: 1.4, 1.6.3

tensorflow
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
versions available: 2.9.1-cuda11.2, 2.11.1-cuda11.7, 2.16.1-cuda12.5

tensorrt
NVIDIA TensorRT is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications.
versions available: 7.2.2.3-cuda11.2, 8.2.5.1-cuda11.4

tophat
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
versions available: 2.1.1

transdecoder
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
versions available: 5.5.0

treetime
TreeTime provides routines for ancestral sequence reconstruction and inference of molecular-clock phylogenies, i.e., a tree where all branches are scaled such that the positions of terminal nodes correspond to their sampling times and internal nodes are placed at the most likely time of divergence.
versions available: 0.9.4

trimal
trimAl is a tool for automated alignment trimming. trimAl reads and renders protein or nucleotide alignments in several Multiple Sequence Alignment formats, including Phylip, Fasta, Clustal, NBRF/Pir, Mega and Nexus. The program detects automatically the input format and generates the output file in the same format.
versions available: 1.4.1

trimmomatic
Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application.
versions available: 0.38, 0.39

trinity
Trinity assembles transcript sequences from Illumina RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.
versions available: 2.13.0, 2.14.0, 2.8.5

trinotate
Trinotate is a suite for the functional annotation of transcriptomes, particularly de novo assembled transcriptomes. It uses a number of different well referenced methods for functional annotation, including homology search against sequence databases (BLAST+/SwissProt), protein domain identification (HMMER/PFAM), and comparison to currently curated annotation databases (like eggNOG, and Gene Ontology terms).
versions available: 3.2.2

trycycler
Trycycler is a tool that takes as input multiple separate long-read assemblies of the same genome (e.g. from different assemblers or different read subsets) and produces a consensus long-read assembly.
versions available: 0.5.4

usearch
USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST. USEARCH combines many different algorithms into a single package
versions available: 10.0.240, 11.0.667

vcftools
VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
versions available: 0.1.16

vdb
The vdb program is designed to query the SARS-CoV-2 mutational landscape. It runs as a command shell in a terminal, and it allows customized searches for mutation patterns over the entire SARS-CoV-2 genome dataset or subsets thereof. These patttern searches can be for spike protein mutations or nucleotide mutations over the whole genome.
versions available: 2.7

velvet
Velvet is a sequence assembler for very short reads
versions available: 1.2.10

viennarna
The ViennaRNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
versions available: 2.4.13

visit
VisIt is an Open Source, interactive, scalable, visualization, animation and analysis tool. Users can interactively visualize and analyze data ranging in scale from small (<10 core) desktop-sized projects to large (>10,000 core) leadership-class computing facility simulation campaigns.
versions available: 3.2.0

vmd
VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. (1.9.3 x86_64 64-bit, CUDA 8.0, SSE and AVX2, OpenGL)
versions available: 1.9.3-cuda8-opengl, 1.9.3-text

vtk
The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, modeling, image processing, volume rendering, scientific visualization, and information visualization.
versions available: 8.2.0, 8.2.0-mpi

wengan
Wengan is a new, accurate, and ultra-fast genome assembler that, unlike most of the current long-reads assemblers, avoids entirely the all-vs-all read comparison. The key idea behind Wengan is that long-read alignments can be inferred by building paths on a sequence graph.
versions available: 0.2

wise2
Wise2 is a package focused on comparisons of bio polymers, commonly DNA sequence and protein sequence. Wise2 is now a rather stately bioinformatics package that has be around for a while. Its key programs are genewise, a program for aligning proteins or protein HMMs to DNA, and dynamite a rather cranky ‘macro language’ which automates the production of dynamic programming.
versions available: 2.4.1

wrf
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs.
versions available: 4.0.1-intel, 4.0.1-intel-mpi, 4.3, 4.3-mpi

wtdbg
wtdbg is a fuzzy Bruijn graph (FBG) approach to long noisy reads assembly. wtdbg is desiged to assemble huge genomes in very limited time, it requires a PowerPC with multiple-cores and very big RAM (1Tb+). wtdbg can assemble a 100 X human pacbio dataset within one day.
versions available: 2.3, 2.5

yade
Yade is an extensible open-source framework for discrete numerical models, focused on Discrete Element Method. The computation parts are written in c++ using flexible object model, allowing independent implementation of new alogrithms and interfaces. Python is used for rapid and concise scene construction, simulation control, postprocessing and debugging.
versions available: 2020.01a, 2021.01a

Compilers / Interpreters

anaconda2
Anaconda (python 2.7-based) is the world’s most popular Python data science platform. Anaconda, Inc. continues to lead open source projects like Anaconda, NumPy and SciPy that form the foundation of modern data science. Load this module for CPU ONLY (NON-GPU) compute jobs.
versions available: 2019.10

anaconda3
Anaconda (python 3.8-based) is the world’s most popular Python data science platform. Anaconda, Inc. continues to lead open source projects like Anaconda, NumPy and SciPy that form the foundation of modern data science. Load this module for CPU ONLY (NON-GPU) compute jobs.
versions available: 2020.11, 2022.10

bazel
Bazel is Google’s own build tool. Bazel has built-in support for building both client and server software, and also provides an extensible framework that you can use to develop your own build rules.
versions available: 3.1.0, 4.2.1, 5.0.0

cuda
The NVIDIA CUDA Toolkit provides a development environment for creating high performance GPU-accelerated applications. With the CUDA Toolkit, you can develop, optimize and deploy your applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms and HPC supercomputers.
versions available: 11.2, 11.4, 12.0

dmd
DMD is the reference compiler for the D programming language. The D programming language has been said to be ‘what C++ wanted to be,’ which is a better C. D is developed with system level programming in mind, but brings to the table modern language design with a simple C-like syntax. For these reasons D makes for a good language choice for both performance code and application development.
versions available: 2.103.1

f5c
An optimised re-implementation of the index, call-methylation and eventalign modules in Nanopolish. Given a set of basecalled Nanopore reads and the raw signals, f5c call-methylation detects the methylated cytosine and f5c eventalign aligns raw nanopore signals (events) to the reference k-mers.
versions available: 1.3

gcc
The GNU Compiler Collection includes front ends for C, C++, Objective-C, and Fortran, as well as libraries for these languages (libstdc++, libgcj,…).
versions available: 10.3.0, 11.2.0

go
Go is expressive, concise, clean, and efficient. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel type system enables flexible and modular program construction. Go compiles quickly to machine code yet has the convenience of garbage collection and the power of run-time reflection. It’s a fast, statically typed, compiled language that feels like a dynamically typed, interpreted language.
versions available: 1.14.2, 1.16.4

haskell
Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic IO. Haskell’s main implementation is the Glasgow Haskell Compiler (GHC). It is named after logician Haskell Curry.
versions available: 9.2.7, 9.6.1

hpc-sdk
The NVIDIA HPC Software Development Kit (SDK) includes the proven compilers, libraries and software tools essential to maximizing developer productivity and the performance and portability of HPC applications. The NVIDIA HPC SDK C, C++, and Fortran compilers support GPU acceleration of HPC modeling and simulation applications with standard C++ and Fortran, OpenACC directives, and CUDA. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming.
versions available: 21.3, 21.3-mpi

intel
Intel’s suite of compilers facilitates native code development in C++/C and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.
versions available: compiler-rt, mkl, oclfpga, tbb, 2019, 2020, 2024

julia
Julia is a high-level, high-performance dynamic programming language for numerical computing. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library.
versions available: 1.6.0

lua
Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object-oriented programming, functional programming, data-driven programming, and data description.
versions available: 5.3.5, 5.4.2

mambaforge
Mamba is a reimplementation of the conda package manager in C++, which uses libsolv for much faster dependency solving and allows parallel downloading of repository data and package files using multi-threading. Mamba utilizes the same command line parser, package installation and deinstallation code and transaction verification routines as conda to stay as compatible as possible.
versions available: 4.14

nasm
The Netwide Assembler, NASM, is an 80×86 and x86-64 assembler designed for portability and modularity. It supports a range of object file formats, including Linux and `*BSD’ `a.out’, `ELF’, `COFF’, `Mach-O’, 16-bit and 32-bit `OBJ’ (OMF) format, `Win32′ and `Win64′.
versions available: 2.14

openjdk
OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE). It is the result of an effort Sun Microsystems began in 2006. The implementation is licensed under the GNU General Public License (GNU GPL) version 2.
versions available: 11, 13, 15

powershell
PowerShell is a cross-platform task automation solution made up of a command-line shell, a scripting language, and a configuration management framework. PowerShell is a modern command shell that includes the best features of other popular shells. Unlike most shells that only accept and return text, PowerShell accepts and returns .NET objects.
versions available: 7.3.1

scala
Scala is an acronym for ‘Scalable Language’. Scala is a pure-bred object-oriented language. Conceptually, every value is an object and every operation is a method-call. The language supports advanced component architectures through classes and traits.
versions available: 2.12.13, 2.13.5

swift
Swift is a general-purpose programming language built using a modern approach to safety, performance, and software design patterns. The goal of the Swift project is to create the best available language for uses ranging from systems programming, to mobile and desktop apps, scaling up to cloud services.
versions available: 5.6.1

upcxx
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming, and is designed to interoperate smoothly and efficiently with MPI, OpenMP, C++/POSIX threads, CUDA, ROCm/HIP, oneAPI and other HPC frameworks. It leverages GASNet-EX to deliver low-overhead, fine-grained communication, including Remote Memory Access (RMA) and Remote Procedure Call (RPC).
versions available: 2023.9.0

yasm
YASM, an assembler and disassembler for the Intel x86 architecture, is a complete rewrite of the NASM assembler. YASM currently supports the x86 and AMD64 instruction sets, accepts NASM and GAS assembler syntaxes, outputs binary, ELF32, ELF64, 32 and 64-bit Mach-O, RDOFF2, COFF, Win32, and Win64 object formats, and generates source debugging information in STABS, DWARF 2, and CodeView 8 formats.
versions available: 1.3.0

Libraries

boost
Boost is a set of libraries for the C++ programming language that provide support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading, image processing, regular expressions, and unit testing.
versions available: 1.66.0, 1.66.0-mpi, 1.76.0, 1.76.0-mpi

cdo
Climate Data Operators (CDO) is a collection of command line Operators to manipulate and analyse Climate and NWP model Data. Supported data formats are GRIB 1/2, netCDF 3/4, SERVICE, EXTRA and IEG. There are more than 600 operators available.
versions available: 2.0.5, 2.0.5-intel

cudnn
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
versions available: 8.1.0-cuda11.2, 8.2.4-cuda11.4

eigen
Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
versions available: 3.4.0

fftw
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).
versions available: 3.3.10, 3.3.10-mpi

google-code
Various Google codes, including Gflags v2.2.2 (Google’s commandline flags library); Glog v0.4.0 (C++ implementation of the Google logging module); LevelDB v1.23 (A fast key-value storage library); Protocol Buffers v3.15.8 ( Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data)
versions available: 2021

hdf5
HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.
versions available: 1.10.5, 1.10.5-intel, 1.10.5-mpi, 1.10.7, 1.10.7-intel, 1.10.7-intel-mpi, 1.10.7-mpi

intel-rtl
Intel runtime libraries
versions available: 2019, 2020

netcdf
NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.
versions available: 4.6.3, 4.6.3-mpi, 4.7.2, 4.7.2-mpi, 4.8.1, 4.8.1-intel, 4.8.1-intel-mpi, 4.8.1-mpi

openblas
Hierarchical Data Format (OPENBLAS4; also known as OPENBLAS) is a library and multi-object file format for storing and managing data between machines.
versions available: 0.3.13

root
A modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage.
versions available: 6.12.04, 6.22.08

suitesparse
SuiteSparse is a suite of sparse matrix algorithms, including GraphBLAS, Mongoose, ssget, UMFPACK, CHOLMOD, SPQR, KLU and BTF, CSparse and CXSparse, spqr_rank, Factorize, SSMULT, SFMULT, and ordering methods (AMD, CAMD, COLAMD, and CCOLAMD); AMD and COLAMD appear in MATLAB.
versions available: 5.7.2, 5.9.0

MPI (Message Passing Interface)

mpich
MPICH is a high-performance and widely portable implementation of the Message Passing Interface (MPI) standard MPI-1, MPI-2 and MPI-3.
versions available: 3.2.1, 3.3.1, 3.4.1

openmpi
The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners.
versions available: 3.1.6, 3.1.6-intel, 4.0.3, 4.0.3-intel, 4.1.0, 4.1.0-intel