Workflow: Metagenomics workflow

Fetched 2023-01-09 19:36:06 GMT

Workflow for Metagenomics from raw reads to annotated bins. Steps: - workflow_quality.cwl: - FastQC (control) - fastp (quality trimming) - bbmap contamination filter - SPAdes (Assembly) - QUAST (Assembly quality report) - BBmap (Read mapping to assembly) - MetaBat2 (binning) - CheckM (bin completeness and contamination) - GTDB-Tk (bin taxonomic classification)

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
memory Integer (Optional) memory usage (mb)

maximum memory usage in megabytes

threads Integer (Optional) number of threads

number of threads to use for computational processes

identifier String identifier used

Identifier for this dataset used in this workflow

run_gtdbtk Boolean Run GTDB-Tk

Run GTDB-Tk taxonomic bin classification when true

pacbio_reads File[] (Optional) pacbio reads

file with PacBio reads locally

forward_reads File[] forward reads

forward sequence file locally

reverse_reads File[] reverse reads

reverse sequence file locally

bbmap_reference String contamination reference file

bbmap reference fasta file for contamination filtering

Steps

ID Runs Label Doc
workflow_bbmap
../bbmap/bbmap.cwl (CommandLineTool)
BBMap

Read mapping using BBMap

workflow_quast
../quast/quast.cwl (CommandLineTool)
Quality Assessment Tool for Genome Assemblies

Runs the Quality Assessment Tool for Genome Assemblies application

compress_spades
../bash/pigz.cwl (CommandLineTool)
compress a file multithreaded with pigz
workflow_checkm
../metagenomics/checkm/checkm_lineagewf.cwl (CommandLineTool)
CheckM

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes

workflow_gtdbtk
../metagenomics/gtdbtk/gtdbtk_classify_wf.cwl (CommandLineTool)
GTDBTK Classify Workflow

Taxonomic genome classification workflow with GTDBTK. !! Can use up to ~220GB RAM !!

workflow_spades
../assembly/spades.cwl (CommandLineTool)
spades genomic assembler

Runs the spades assembler using a dataset file

workflow_quality Read quality control, trimming and contamination filter.

Workflow for (paired) read quality control, trimming and contamination filtering. Will output a merged set of read pairs, when multiple datasets are used. Steps: - FastQC (read quality control) - fastp (read quality trimming) - bbduk used for rrna filtering - bbmap for contamination filter

workflow_metabat2
../metagenomics/metabat2/metabat2.cwl (CommandLineTool)
MetaBat2

Metagenome Binning based on Abundance and Tetranucleotide frequency (MetaBat2)

workflow_bins_stats
../metagenomics/bin_assembly_stats.cwl (CommandLineTool)
Bin assembly stats

Table of all bins and there assembly statistics like size, N50, etc.. + table of the bins and their respective assembly contigs names.

workflow_getunbinned
../metagenomics/get_unbinned_contigs.cwl (CommandLineTool)
Unbinned contigs

Get unbinned contigs of the assembbly from a set of binned fasta files in fasta format (compressed).

quast_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

checkm_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

gtdbtk_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

spades_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

workflow_bin_readstats
../metagenomics/assembly_bins_readstats.cwl (CommandLineTool)
Bin read mapping stats

Table of general read mapping statistics of the bins and assembly

metabat_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

workflow_aggregate_bins
../metagenomics/metabat2/aggregateBinDepths.cwl (CommandLineTool)
aggregateBinDepths

Aggregate bin depths using MetaBat2 using the script aggregateBinDepths.pl

workflow_compress_gtdbtk
../bash/compress_directory.cwl (CommandLineTool)
Compress a directory (tar)
sorted_bam_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

workflow_sam_to_sorted_bam
../samtools/sam_to_sorted-bam.cwl (CommandLineTool)
sam to sorted bam

samtools view -@ $2 -hu $1 | samtools sort -@ $2 -o $3.bam

workflow_contig_read_counts
../samtools/samtools_idxstats.cwl (CommandLineTool)
samtools idxstats

samtools idxstats – reports alignment summary statistics

workflow_metabat2_contig_depths
../metagenomics/metabat2/metabatContigDepths.cwl (CommandLineTool)
jgi_summarize_bam_contig_depths

Summarize contig read depth from bam file for metabat2 binning.

Outputs

ID Type Label Doc
bam_output Directory BAM files

Mapping results in indexed BAM format

quast_output Directory QUAST

Quast analysis output folder

checkm_output Directory CheckM

CheckM output directory

gtdbtk_output Directory GTDB-Tk

GTDB-Tk output directory

spades_output Directory SPADES

Metagenome assembly output by SPADES

filtered_stats Directory Filtered statistics

Statistics on quality and preprocessing of the reads

metabat2_output Directory MetaBat2

MetaBat2 output directory

Permalink: https://w3id.org/cwl/view/git/0dd868de067a386be8ec6b147df007e213c7275a/cwl/workflows/workflow_metagenomics.cwl