.. _eMicroApps: Available Environmental Microbiology ("eMicro") Apps and Tools ============================================================== Below is a list and description of the apps available to anyone on OSC. Please keep in mind that this list is not 100% comprehensive and *does not* detail the methods underlying the tool. Where possible, citations have been included so users can read the original source's documentation and theory. **Always check for the latest versions of Singularity containers and modules!** Some of this documentation is lifted from the `iVirus project `_ to avoid reinventing the wheel. Every effort is being made to ensure that **both** locations are up-to-date with the latest tools and literature. **One last thing to note: All of the eMicro singularity images are located at:** /users/PAS1117/osu9664/eMicro-Apps/ **Additionally Microbial Informatics students can also find additional images at:** /fs/project/PAS1573/sif/ You must provide full paths to each image/container, or link them (see :ref:`UNIX_LINUX`). **Example**: .. code-block:: bash $ module load singularity/current $ singularity run /users/PAS1117/osu9664/eMicro-Apps/Prokka-1.12.0.img -h # Alternatively $ singularity run /fs/project/PAS1573/sif/fastqc_0.11.9--hdfd78af_1.sif Alternatively, if you do not want to type out the full paths each time you run the container, you'll want to add the container location to your PATH. .. code-block:: bash $ export PATH=/users/PAS1117/osu9664/eMicro-Apps/:$PATH $ Prokka-1.12.0.img -h # OR, alternatively $ export PATH=/fs/project/PAS1573/sif/:$PATH $ fastqc_0.11.9--hdfd78af_1.sif **Keep in mind that NONE of these apps/tools should be run on the login nodes. Please create a job script and submit it or incur OSC's wrath!** Also to note: There are several cases where these tools have been used in the `CyVerse cyberinfrastructure `_. For these, there is a `protocols.io `_ link. We're continually developing these protocols and trying to keep them up to date (though if it's not broke and a current version, it'll likely not be updated), so always make sure it's the latest version. **For Sullivan lab members, also included are OSC module system, to use:** .. code-block:: bash module use /fs/project/PAS1117/modulefiles # Load Sullivan lab's modules module load Prokka/1.13 # OR module load Prokka/1.14.6 prokka -h Quality Control [of Reads] and Read Mapping -------------------------------------------- Generally speaking, quality control (QC) is a technique applied to to [most commonly] raw read data. This ensures that the data going into the assembly (common next step) is of high quality. Poor read quality can result in mis- or incorrectly assembled sequences. Most frequently, read data QC involves trimming reads according to their quality scores. Although some assemblers do not require QC’d reads, we highly recommend it! BBTools ~~~~~~~ **Reference**: http://sourceforge.net/projects/bbmap/ **Reference** (BBMerge): Bushnell, B., Rood, J., & Singer, E. (2017). BBMerge – Accurate paired shotgun read merging via overlap. PLOS ONE, 12(10), e0185056. https://doi.org/10.1371/journal.pone.0185056 **Short description**: BBTools is a suite of fast, multi-threaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving. **Note**: This is SEVERAL tools, BBDuk (discussed below) is just one of them. We'll be working on detailing this here, but in the meantime, any tool available on https://jgi.doe.gov/data-and-tools/bbtools/ is available through this image. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/BBTools-38.97.sif # For PAS1117 module use /fs/project/PAS1117/modulefiles module load singularityImages BBTools-38.97.sif BBDuk (in the BBTools package) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Website**: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/ **Short description**: “Duk” stands for Decontamination Using Kmers. BBDuk was developed to combine most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. **Singularity use**: .. code-block:: bash module load singularity/current # Just adapter trimming singularity run /users/PAS1117/osu9664/eMicro-Apps/BBTools-38.69.sif bbduk.sh in1= in2= out1= out2= ref=/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo # Just quality filtering singularity run /users/PAS1117/osu9664/eMicro-Apps/BBTools-38.69.sif bbduk.sh in1= in2= qtrim=rl trimq=10 out1= out2= Alternatively, run them both at the same time! .. code-block:: bash # Adapter and quality filtering *at the same time* singularity run /users/PAS1117/osu9664/eMicro-Apps/BBTools-38.69.sif bbduk.sh in1= in2= out1= out2= ref=/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo trimq=10 qtrim=rl minlength=35 BWA ~~~ **Website**: https://github.com/lh3/bwa **Reference**: Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–60 (2009). **Short description**: BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to a few megabases. BWA-MEM and BWA-SW share similar features such as the support of long reads and chimeric alignment, but BWA-MEM, which is the latest, is generally recommended as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads. **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load bwa/0.7.17-r1198 FastQC ~~~~~~~ **Website**: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ **Short description**: FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/FastQC-0.11.8.sif **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load fastqc/0.11.5 **Module use (directly from OSC)**: .. code-block:: bash module load fastqc/0.11.8 Kraken2 ~~~~~~~ **Website**: https://github.com/DerrickWood/kraken2 **Website**: https://ccb.jhu.edu/software/kraken2/ **Manual**: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown **Reference**: Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0 **Short description**: Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Kraken-2.1.2.sif # To run against the standard database # For PAS1573 singularity run /users/PAS1117/osu9664/eMicro-Apps/Kraken-2.1.2.sif --db /fs/project/PAS1573/modules/sequence_dbs/kraken2_dbs/standard --gzip-compressed --paired --classified-out Reads_R#.fastq.gz Reads_1.fastq.gz Reads_2.fastq.gz > kraken2_results # To run against the standard database # For PAS1117 singularity run /users/PAS1117/osu9664/eMicro-Apps/Kraken-2.1.2.sif --db /fs/project/PAS1117/modules/sequence_dbs/kraken2_dbs/standard --gzip-compressed --paired --classified-out Reads_R#.fastq.gz Reads_1.fastq.gz Reads_2.fastq.gz > kraken2_results Note: Please check the kraken2_dbs folder for additional databases! MultiQC ~~~~~~~ **Website**: https://multiqc.info/ **Reference**: Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047–3048. https://doi.org/10.1093/bioinformatics/btw354 **Short description**: MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/MultiQC-1.7.sif **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load MultiQC NanoFilt ~~~~~~~~ **Website**: https://github.com/wdecoster/nanofilt **Short Description**: Filtering and trimming of long read sequencing data. **Reference**: De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018). https://doi.org/10.1093/bioinformatics/bty149 **Singularity Use**: Forthcoming... **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load Nanofilt/2.8.0 QUAST/MetaQUAST ~~~~~~~~~~~~~~~ **Website**: http://quast.sourceforge.net/ **Manual**: http://cab.cc.spbu.ru/quast/manual.html **Short Description**: The project aim is to create easy-to-use tools for genome assemblies evaluation and comparison. **Reference**: Gurevich, A., Saveliev, V., Vyahhi, N., & Tesler, G. (2013). QUAST: Quality assessment tool for genome assemblies. Bioinformatics, 29(8), 1072–1075. https://doi.org/10.1093/bioinformatics/btt086 **Reference (using v4.x)**: Mikheenko, A., Valin, G., Prjibelski, A., Saveliev, V., & Gurevich, A. (2016). Icarus: Visualizer for de novo assembly evaluation. Bioinformatics, 32(21), 3321–3323. https://doi.org/10.1093/bioinformatics/btw379 **Reference (using v5.x)**: Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D., & Gurevich, A. (2018). Versatile genome assembly evaluation with QUAST-LG. Bioinformatics, 34(13), i142–i150. https://doi.org/10.1093/bioinformatics/bty266 **Singularity use**: .. code-block:: bash export SIF=/fs/project/PAS1573/sif # QUAST $SIF/quast.py contigs_1.fasta contigs_2.fasta --threads 48 # MetaQUAST $SIF/metaquast.py contigs_1.fasta contigs_2.fasta ... --threads 48 # MetaQUAST can optionally be run with a list of reference genomes $SIF/metaquast.py contigs_1.fasta contigs_2.fasta ... -r reference_1,reference_2,reference_3,... --threads 48 **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load quast/4.5 Samtools ~~~~~~~~ **Website**: http://www.htslib.org/ **Reference**: Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, 1–4 (2021). **Short description**: Samtools is a suite of programs for interacting with high-throughput sequencing data **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load samtools/1.10 SAMBAMBA ~~~~~~~~ **Website**: https://github.com/lomereiter/sambamba **Reference**: **Short description**: Sambamba is a high performance highly parallel robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Because of its efficiency Sambamba is an important work horse running in many sequencing centres around the world today. **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load SAMBAMBA/0.7.1 Trimmomatic ~~~~~~~~~~~ **Reference**: Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170. **Short description**: Identifies adapter sequences in raw sequencing reads and quality filters **Protocols.io**: `Trimmomatic on CyVerse `_ **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Trimmomatic-0.36.0.img PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load trimmomatic/0.36-sulli trimmomatic PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 **Notes**: Trimmomatic is a java jar file, and *normally* needs to be executed with "java -jar trimmomatic.jar [commands]", but a tiny bash script has been written to automate this, which is why you can call "trimmomatic" without the java component. Assembly -------- gsAssembler (aka Newbler) ~~~~~~~~~~~~~~~~~~~~~~~~~ **Reference**: Genivaldo, GZ; Silva, Bas E; Dutilh, David; Matthews, Keri; Elkins, Robert; Schmieder, Elizabeth A; Dinsdale, Robert A Edwards. "Combining de novo and reference-guided assembly with scaffold_builder". Source Code Biomed Central. 8 (23). doi:10.1186/1751-0473-8-23. **Short description**: De novo assembly based on overlap-layout-consensus **Notes on use**: 454 Life Sciences was purchased by Roche in 2007 and shut down in 2013. There haven't been **any** updates for the software since then, making it an increasingly aging tool. **Singularity use**: We provide several versions of the tool on OSC, but please use the latest version unless you have a good reason otherwise (i.e. reproducing previous results). These are 2.3 and 2.5. .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Newbler-2.9.img -o output_dir /path/to/sff/file The singularity container *does contain* the mapper, but for all intents and purposes, the tool uses runAssembly. SPAdes ~~~~~~ **Reference**: Bankevich A., Nurk S., Antipov D., Gurevich A., Dvorkin M., Kulikov A. S., Lesin V., Nikolenko S., Pham S., Prjibelski A., Pyshkin A., Sirotkin A., Vyahhi N., Tesler G., Alekseyev M. A., Pevzner P. A. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology, 2012 **Short description**: SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines **Protocols.io**: `Running SPAdes on CyVerse `_ **Notes on use**: SPAdes, as with many de Bruijn assemblers, can consume incredibly amounts of memory. In the context of viral metagenomics, it's been known to use 2-3, and upwards of 6 TB of memory (and more if you give it more data!). There are multiple implementations on OSC using different runtimes and memory allocations. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/SPAdes-3.15.5.sif **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load spades/3.15.2 IDBA-UD ~~~~~~~ **Reference**: Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon. Peng, Y., et al. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth, Bioinformatics, 28, 1420-1428. **Short description**: IDBA-UD is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth. It is an extension of IDBA algorithm. **Long description**: IDBA-UD is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth. It is an extension of IDBA algorithm. IDBA-UD also iterates from small k to a large k. In each iteration, short and low-depth contigs are removed iteratively with cutoff threshold from low to high to reduce the errors in low-depth and high-depth regions. Paired-end reads are aligned to contigs and assembled locally to generate some missing k-mers in low-depth regions. With these technologies, IDBA-UD can iterate k value of de Bruijn graph to a very large value with less gaps and less branches to form long contigs in both low-depth and high-depth regions. (taken from website) **Singularity use**: .. code-block:: bash singularity run /users/PAS1117/osu9664/eMicro-Apps/IDBA-UD-1.1.3.sif --num_threads -r -o Trinity ~~~~~~~ **Reference**: Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat Biotechnol. 2011 May 15;29(7):644-52. doi: 10.1038/nbt.1883. PubMed PMID: 21572440. **Short description**: Trinity assembles transcript sequences from Illumina RNA-Seq data. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Trinity-2.9.0.sif MEGAHIT ~~~~~~~ **Reference**: Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2014). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674–1676. https://doi.org/10.1093/bioinformatics/btv033 **Short description**: MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly. **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load MEGAHIT/1.2.9 **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/MEGAHIT-1.2.8.sif --k-list 21,41,61,81,99 -t -m 0.9 -1 -2 -o --presets meta-sensitive (or meta-large for complex metagenomes like soils or oceans) Binning ------- MetaBAT2 ~~~~~~~~ **Reference**: https://bitbucket.org/berkeleylab/metabat **Reference**: Kang, D. D., Froula, J., Egan, R., & Wang, Z. (2015). MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ, 3(8), e1165. https://doi.org/10.7717/peerj.1165 **Short description**: A robust statistical framework for reconstructing genomes from metagenomic data **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/MetaBAT2-2.14.sif # Download test data (instructions from https://bitbucket.org/berkeleylab/metabat/wiki/Best%20Binning%20Practices) wget https://portal.nersc.gov/dna/RD/Metagenome_RD/MetaBAT/Files/BestPractices/V2/CASE1/assembly.fa.gz wget https://portal.nersc.gov/dna/RD/Metagenome_RD/MetaBAT/Files/BestPractices/V2/CASE1/depth.txt # Run MetaBAT2 singularity run /users/PAS1117/osu9664/eMicro-Apps/MetaBAT2-2.14.sif -i assembly.fa.gz -a depth.txt -o resA1/bin -v MaxBin2 ~~~~~~~ **Website**: https://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html **Website (alt)**: https://sourceforge.net/projects/maxbin/ **Reference** (MaxBin1): Wu, Y.-W., Tang, Y.-H., Tringe, S. G., Simmons, B. A., & Singer, S. W. (2014). MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome, 2(1), 26. https://doi.org/10.1186/2049-2618-2-26 **Reference** (MaxBin2): Yu-Wei Wu, Blake A. Simmons, Steven W. Singer, MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets, Bioinformatics, Volume 32, Issue 4, 15 February 2016, Pages 605–607, https://doi.org/10.1093/bioinformatics/btv638 **Short description**: MaxBin2 is the next-generation of MaxBin () that supports multiple samples at the same time. MaxBin is a software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users could understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page. Users could use MEGAN or similar software on MaxBin bins to find out the taxonomy of each bin after the binning process is finished. **Singularity use**: .. code-block:: bash module load singularity/current singularity run MaxBin2-2.2.6.sif # Download test data wget -O 20x.scaffold https://downloads.jbei.org/data/microbial_communities/MaxBin/getfile.php?20x.scaffold wget -O 20x.abund https://downloads.jbei.org/data/microbial_communities/MaxBin/getfile.php?20x.abund # Run MaxBin2 singularity run /users/PAS1117/osu9664/eMicro-Apps/MaxBin2-2.2.6.sif -contig 20x.scaffold -abund 20x.abund -out 20x.out -thread 4 **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load MaxBin/2.2.6 CONCOCT ~~~~~~~ **Website**: https://concoct.readthedocs.io/en/latest/ **Reference**: Alneberg, J., Bjarnason, B. S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U. Z., … Quince, C. (2013). CONCOCT: Clustering cONtigs on COverage and ComposiTion, 1–28. Retrieved from http://arxiv.org/abs/1312.4038 **Short description**: CONCOCT “bins” metagenomic contigs. Metagenomic binning is the process of clustering sequences into clusters corresponding to operational taxonomic units of some level. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/CONCOCT-1.1.0.sif See :ref:`processing_microbe` for a more detailed explanation on usage. MetaWRAP ~~~~~~~~ **Website**: https://github.com/bxlab/metaWRAP **Reference**: Uritskiy, G. V., DiRuggiero, J., & Taylor, J. (2018). MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome, 6(1), 158. https://doi.org/10.1186/s40168-018-0541-1 **Short description**: MetaWRAP aims to be an easy-to-use metagenomic wrapper suite that accomplishes the core tasks of metagenomic analysis from start to finish: read quality control, assembly, visualization, taxonomic profiling, extracting draft genomes (binning), and functional annotation. Additionally, metaWRAP takes bin extraction and analysis to the next level (see module overview below). While there is no single best approach for processing metagenomic data, metaWRAP is meant to be a fast and simple approach before you delve deeper into parameterization of your analysis. MetaWRAP can be applied to a variety of environments, including gut, water, and soil microbiomes (see metaWRAP paper for benchmarks). Each individual module of metaWRAP is a standalone program, which means you can use only the modules you are interested in for your data. **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load metaWRAP DAS_Tool ~~~~~~~~ **Website**: https://github.com/cmks/DAS_Tool **Reference**: Sieber, C. M. K., Probst, A. J., Sharrar, A., Thomas, B. C., Hess, M., Tringe, S. G., & Banfield, J. F. (2018). Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nature Microbiology, 3(7), 836–843. https://doi.org/10.1038/s41564-018-0171-1 **Short description**: DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/DAS_Tool-1.1.1.sif # You can test the installation (if you've git cloned the repository!) git clone https://github.com/cmks/DAS_Tool.git singularity run /users/PAS1117/osu9664/eMicro-Apps/DAS_Tool-1.1.1.sif -i DAS_Tool/sample_data/sample.human.gut_concoct_scaffolds2bin.tsv,DAS_Tool/sample_data/sample.human.gut_maxbin2_scaffolds2bin.tsv,DAS_Tool/sample_data/sample.human.gut_metabat_scaffolds2bin.tsv,DAS_Tool/sample_data/sample.human.gut_tetraESOM_scaffolds2bin.tsv -l concoct,maxbin,metabat,tetraESOM -c DAS_Tool/sample_data/sample.human.gut_contigs.fa --search_engine diamond -o DASToolTestRun --write_bins **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load DAS_Tool UniteM ~~~~~~ **Website**: https://github.com/dparks1134/UniteM **Reference**: https://github.com/dparks1134/UniteM (cite the repository) **Short description**: UniteM is a software toolkit implementing different ensemble binning strategies for producing a non-redundant set of bins from the output of multiple binning methods. **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load uniteM Gene Callers ------------ FragGeneScan ~~~~~~~~~~~~ **Reference**: Mina Rho, Haixu Tang, and Yuzhen Ye. FragGeneScan: Predicting Genes in Short and Error-prone Reads. Nucl. Acids Res., 2010 doi: 10.1093/nar/gkq747 **Short description**: FragGeneScan is an application for finding (fragmented) genes in short reads **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/FragGeneScan-1.30.0.img Prodigal ~~~~~~~~ **Reference**: Hyatt, D. Prodigal (2.6.3) [Software]. Available at https://github.com/hyattpd/Prodigal **Short description**: Fast, reliable protein-coding gene prediction for prokaryotic genomes. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Prodigal-2.6.3.img -i metagenome.fna -o coords.gbk -a proteins.faa -p anon **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load prodigal/2.6.3 prodigal -i metagenome.fna -o coords.gbk -a proteins.faa -p anon MetaGeneAnnotator ("MGA") ~~~~~~~~~~~~~~~~~~~~~~~~~ **Reference**: Noguchi, H., Taniguchi, T., & Itoh, T. (2008). MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes. DNA Research, 15(6), 387–396. https://doi.org/10.1093/dnares/dsn027 **Short description**: MetaGeneAnnotator is a gene-finding program for prokaryote and phage **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/MetaGeneAnnotator-1.1.0.img MetaGeneMark ~~~~~~~~~~~~ **Website**: **Reference**: Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, 1–15 (2010). **Short description**: ORF prediction... **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load MetaGeneMark/3.38 gmhmmp Annotation and Analyses ----------------------- This is a catch-all category that doesn't fit with the other sections. CAT ~~~ **Reference**: https://github.com/dutilh/CAT **Short description**: Contig Annotation Tool (CAT) is a pipeline for the taxonomic classification of long DNA sequences and metagenome assembled genomes (MAGs/bins) of both known and (highly) unknown microorganisms, as generated by contemporary metagenomics studies. **Other notes**: There are two versions of CAT. A pre-4.x version ("1.0.0") and a post 4.x version ("4.3.3"). The new one is superior in all aspects, except database setup. The paths provided will not work unless you have the appropriate databases installed. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/CAT-4.3.3.simg contigs -c {contigs fasta} -d 2019-03-31_CAT_database -t 2019-03-31_taxonomy **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load CAT/4.3.3 Centrifuge ~~~~~~~~~~ **Website**: http://www.ccb.jhu.edu/software/centrifuge **Reference**: Kim, D., Song, L., Breitwieser, F. P., & Salzberg, S. L. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Research, 26(12), 1721–1729. https://doi.org/10.1101/gr.210641.116 **Short description**: [Centrifuge] is a novel microbial classification engine that enables rapid, accurate and sensitive labeling of reads and quantification of species on desktop computers. The system uses a novel indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.7 GB for all complete bacterial and viral genomes plus the human genome) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Centrifuge-X.sif **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load centrifuge/1.0.3 CheckM ~~~~~~ **Website**: https://github.com/Ecogenomics/CheckM **Reference**: Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043–1055. **Short description**: CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/CheckM-1.0.18.sif Diamond ~~~~~~~ **Reference**: B. Buchfink, Xie C., D. Huson, "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12, 59-60 (2015) **Short description**: DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Diamond-0.9.26.sif **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load diamond/0.9.24 # OR module load diamond/ 2.0.5 Prokka ~~~~~~ **Reference**: Seemann T. Prokka: rapid prokaryotic genome annotation Bioinformatics 2014 Jul 15;30(14):2068-9. PMID:24642063 **Short description**: Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Prokka-1.12.0.img **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load Prokka/1.13 InterProScan ~~~~~~~~~~~~ **Website**: https://github.com/ebi-pf-team/interproscan **Reference**: Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005). **Short description**: InterPro is a database which integrates together predictive information about proteins’ function from a number of partner resources, giving an overview of the families that a protein belongs to and the domains and sites it contains. Users who have novel nucleotide or protein sequences that they wish to functionally characterise can use the software package InterProScan to run the scanning algorithms from the InterPro database in an integrated way. Sequences are submitted in FASTA format. Matches are then calculated against all of the required member database’s signatures and the results are then output in a variety of formats. **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load InterProScan/5.36-75.0 interproscan.sh SortMeRNA ~~~~~~~~~ **Website**: https://github.com/biocore/sortmerna **Reference**: Kopylova, E., Noé, L. & Touzet, H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012). **Short description**: SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering. The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments. **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load SortMeRNA/4.2.0 sortmerna -h PROSITE ~~~~~~~ **Website**: **Reference**: **Short description**: **Singularity use**: .. code-block:: bash module load PROSITE/1.86 ps_scan.pl HH-Suite ~~~~~~~~ **Website**: https://github.com/soedinglab/hh-suite **Reference**: Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics 20, 473 (2019). **Short description**: The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs). **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load hhsuite/3.2.0 MINCED ~~~~~~ **Website**: https://github.com/ctSkennerton/minced **Reference**: 1. Bland, C. et al. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007). **Short description**: MinCED is a program to find Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such as assembled contigs from metagenomes. Iff you want to identify CRISPRs in raw short read data, in the size range of 100-200bp try using Crass (https://github.com/ctskennerton/Crass) MinCED runs from the command-line and was derived from CRT (http://www.room220.com/crt/) **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load minced/1.0.0 Clust ~~~~~ **Website**: https://github.com/baselabujamous/clust **Reference**: Abu-Jamous, B., & Kelly, S. (2018). Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data. Genome Biology, 19(1), 172. https://doi.org/10.1186/s13059-018-1536-8 **Short description**: Clust is a fully automated method for identification of clusters (groups) of genes that are consistently co-expressed (well-correlated) in one or more heterogeneous datasets from one or multiple species. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/clust-1.8.9.img data_path -o output_directory [...] Please do read the extensive documentation on the Clust github page. BamM ~~~~ **Website**: http://ecogenomics.github.io/BamM/ **Short description**: Metagenomics-focused BAM file manipulation **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/BamM-1.7.0.sif **Note**: This is no longer actively maintained. CoverM is a direct replacement. CoverM ~~~~~~ **Website**: https://github.com/wwood/CoverM **Short description**: CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications. CoverM calculates coverage of genomes/MAGs (coverm genome) or individual contigs (coverm contig). Calculating coverage by read mapping, its input can either be BAM files sorted by reference, or raw reads and reference FASTA sequences. **Singularity use**: For a directory of genome bins (each fasta file is a bin, all files having the "fna" extension) and the original fastq files used in the assembly... .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/CoverM-0.6.1.sif genome --genome-fasta-directory -x fna --coupled --output-format sparse --min-read-percent-identity .95 --min-read-aligned-percent .75 --min-covered-fraction .75 > coverage_table.csv GraftM ~~~~~~~ **Website**: https://github.com/geronimp/graftM **Reference**: Boyd, J. A., Woodcroft, B. J., & Tyson, G. W. (2018). GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes. Nucleic Acids Research, 46(10), e59–e59. https://doi.org/10.1093/nar/gky174 **Short description**: GraftM is a tool for finding genes of interest in metagenomes, metatranscriptomes, and whole genomes. Using modular gene packages, GraftM will search the provided sequences using hmmsearch (HMMER) and place the identified sequences into a pre-constructed phylogenetic tree. The provides fast, phylogenetically informed community profiles and genome annotations. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/GraftM-0.10.1.img The latest version is 0.13.1. This will be updated. Read2RefMapper ~~~~~~~~~~~~~~ **Website**: https://bitbucket.org/bolduc/docker-read2refmapper **Protocols.io**: `Read2Ref on CyVerse `_ **CyVerse App**: https://de.cyverse.org/de/?type=apps&app-id=Read2RefMapper-1.1.0u3&system-id=agave **Short description**: Read2RefMapper is a python-wrapper for a number of scripts and tools that allow for filtering coverage of BAM files against a reference dataset. It filters reads matching reference sequences for those references that are not covered over a specified threshold length, as well as alignment identity and alignment coverage. It is designed to be used in conjunction with Docker-BatchBowtie. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/Read2RefMapper-1.1.1.simg --dir ${readsDir} --metagenome-sizes reads2refmapper_mysample.csv --num-threads 40 --coverages coverage_table.csv --cov_filter 70 --percent-id 0.95 --percent-aln 0.75 --coverage-mode tpmean --output-fmt png --dpi 300 --log read2refmapper.log ClusterGenomes ~~~~~~~~~~~~~~ **Website**: https://bitbucket.org/MAVERICLab/stampede-clustergenomes/ **Short description**: ClusterGenomes is a nucmer-based tool designed to cluster viral genomes. It can handle circular and short sequences with high accuracy. **Singularity use**: .. code-block:: bash module load singularity/current # Dereplicate singularity run /users/PAS1117/osu9664/eMicro-Apps/ClusterGenomes-1.1.3.img -f -c -i -o Note: Both coverage and identity are 0 - 100, *not* 0.0 - 1.0. DRAM ~~~~ **Website**: https://github.com/shafferm/DRAM **Short description**: DRAM (Distilled and Refined Annotation of MAGs [Metagenome Assembled Genomes]) is a tool for annotating metagenomic assembled genomes and VIRSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, [PFAM (https://pfam.xfam.org/), dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases. DRAM is ran in two stages. Additionally viral contigs are further analyzed to identify potential AMGs. This is done via assigning an auxilary score and flags representing the likelihood that a gene is metabolic and viral. The auxiliary score represents the confidence that a gene is viral in origin based on surrounding genes. **Module use**: (This is always the most up-to-date version, barring the Wrighton lab's constant updates!) .. code-block:: bash # For PAS1117 module use /fs/project/PAS1117/modulefiles module load DRAM DRAM.py annotate -i '/*.fa' -o annotation DRAM.py distill -i annotation/annotations.tsv -o distill --trna_path annotation/trnas.tsv --rrna_path annotation/rrnas.tsv # For PAS1573 export PATH=/fs/ess/PAS1573/modules/DRAM-1.4.0/bin:$PATH DRAM.py annotate -i '/*.fa' -o annotation DRAM.py distill -i annotation/annotations.tsv -o distill --trna_path annotation/trnas.tsv --rrna_path annotation/rrnas.tsv **Singularity use** Unfortunately, due to the size of the database, this is not currently possible. While we work on a solution, please use the module version! dRep ~~~~ **Website**: https://github.com/MrOlm/drep **Website**: https://drep.readthedocs.io/en/master/ **Short description**: dRep is a python program for rapidly comparing large numbers of genomes. dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/dRep-2.3.2.sif # You can test the installation singularity run /users/PAS1117/osu9664/eMicro-Apps/dRep-2.3.2.sif bonus testDir --check_dependencies # More rigorously check git clone https://github.com/MrOlm/drep.git cd drep/tests singularity run /users/PAS1117/osu9664/eMicro-Apps/dRep-2.3.2.sif dereplicate output_dir -g genomes/* # For genome de-replication dRep.sif dereplicate outout_directory -g path/to/genomes/*.fasta # To compare genomes dRep.sif compare output_directory -g path/to/genomes/*.fasta **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load dRep/2.4.2 NanoStat ~~~~~~~~ **Website**: https://github.com/wdecoster/nanostat **Short Description**: Calculate various statistics from a long read sequencing dataset in fastq, bam or albacore sequencing summary format. **Reference**: De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018). https://doi.org/10.1093/bioinformatics/bty149 **Singularity Use**: Forthcoming... **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load Nanostat/1.6.0 ViennaRNA ~~~~~~~~~ **Website**: https://www.tbi.univie.ac.at/RNA/index.html **Reference**: Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011). **Short description**: **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load ViennaRNA/2.4.14 MetaPop ~~~~~~~ **Website**: https://github.com/metaGmetapop/metapop/ **Reference**: Coming soon! **Short description**: MetaPop is a pipeline designed to facilitate the processing of sets of short read data mapped to reference genomes with the twin aims of calculating sample-level diversity metrics such as abundance, population diversity, and similarity across multiple samples, and assessing within-species diversity through the assessment of nucleotide polymorphisms and amino acid substitutions. To further facilitate understanding, the pipeline also produces graphical summaries of its results. **Singularity use**: .. code-block:: bash # Load singularity module load singularity # Set variables threads=40 # Inputs input_contigs=data_dir/individual_fasta_dir/ input_coverage=data_dir/counts.txt bam_dir=data_dir/BAMs singularity run /users/PAS1117/osu9664/eMicro-Apps/MetaPop-0.35.sif -i $bam_dir -r $input_contigs --threads $threads -o $out_dir -n $input_coverage MetaPop requires: * input_contigs: a directory of fasta files representing the contigs/genomes - EACH genome must be its own FASTA file * bam_dir: a directory containing BAM alignment files of reads against the contigs/genomes * input_coverage: a tab-delimited file with the BAM filename (*without* the .bam extension) and the bp of that dataset * out_dir: where to place the output files **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load MetaPop/latest python $(which metapop_main.py) -i $bam_dir -r $input_contigs --threads $threads -o $out_dir -n $input_coverage MetaCHIP ~~~~~~~~ **Website**: https://github.com/songweizhi/MetaCHIP **Reference**: Song, W., Wemheuer, B., Zhang, S., Steensen, K. & Thomas, T. MetaCHIP: community-level horizontal gene transfer identification through the combination of best-match and phylogenetic approaches. Microbiome 7, 36 (2019). **Short description**: MetaCHIP is a pipeline for reference-independent HGT identification at the community level. **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load MetaCHIP SingleM ~~~~~~~ **Website**: https://github.com/wwood/singlem **Short description**: SingleM is a tool to find the abundances of discrete operational taxonomic units (OTUs) directly from shotgun metagenome data, without heavy reliance on reference sequence databases. It is able to differentiate closely related species even if those species are from lineages new to science. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/SingleM-0.13.2.sif # Generate OTU table from RAW metagenomic data singularity run /users/PAS1117/osu9664/eMicro-Apps/SingleM-0.13.2.sif pipe --sequences my_sequences.fastq.gz --otu_table otu_table.csv --threads # Summarize OTU table in Krona plot singularity run /users/PAS1117/osu9664/eMicro-Apps/SingleM-0.13.2.sif summarise --input_otu_tables otu_table.csv --krona krona_plot.html There are a lot more options are customization than is presented here. Check the documentation for more information. Remember, anything after "singlem" in a command can be copy-and-pasted after the "SingleM-0.13.2.sif" in the above examples. VSEARCH ~~~~~~~ **Website**:https://github.com/torognes/vsearch **Reference**: Rognes, T., Flouri, T., Nichols, B., Quince, C., & Mahé, F. (2016). VSEARCH: a versatile open source tool for metagenomics. PeerJ, 4(10), e2584. https://doi.org/10.7717/peerj.2584 **Short description**: VSEARCH is a fast, accurate and full-fledged alternative to USEARCH. It's free, isn't limited to 32-bit, but is only for nucleotide, not protein work. VSEARCH is “more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication.” (Rognes et al, 2016. PeerJ) Long story short: it's a free alternative to USEARCH's 64-bit version. USEARCH does have a free 32-bit version, but that limits the available system memory to 4 GB, hardly sufficient to do large-scale metagenomic analyses. **Singularity use**: .. code-block:: bash module load singularity/current singularity run /users/PAS1117/osu9664/eMicro-Apps/VSEARCH-2.14.1.sif **Module use**: .. code-block:: bash module use /fs/project/PAS1117/modulefiles module load vsearch/2.6.0 **Note**: VSEARCH has **a lot** of options. So. Many. Virus Analyses -------------- "Consider something viral in your research" - Forest Rohwer Cenote-Taker2 ~~~~~~~~~~~~~ **Website**: https://github.com/mtisza1/Cenote-Taker2 **Reference**: Tisza, M. J., Belford, A. K., Domínguez-Huerta, G., Bolduc, B. & Buck, C. B. Cenote-Taker 2 democratizes virus discovery and sequence annotation. Virus Evol. 7, 1–12 (2021). doi:10.1093/ve/veaa100 **Short description**: Cenote-Taker 2 is a dual function bioinformatics tool. On the one hand, Cenote-Taker 2 discovers/predicts virus sequences from any kind of genome or metagenomic assembly. Second, virus sequences/genomes are annotated with a variety of sequences features, genes, and taxonomy. Either the discovery or the the annotation module can be used independently. **Singularity use**: .. code-block:: bash module load singularity/current # For PAS1117 users module use /fs/project/PAS1117/modulefiles module load singularityImages Cenote-Taker2-2.1.3_osc.sif --contigs --run_title --template_file