Workflow: Metagenomics workflow
Workflow for Metagenomics from raw reads to annotated bins. Steps: - workflow_quality.cwl: - FastQC (control) - fastp (quality trimming) - bbmap contamination filter - SPAdes (Assembly) - QUAST (Assembly quality report) - BBmap (Read mapping to assembly) - MetaBat2 (binning) - CheckM (bin completeness and contamination) - GTDB-Tk (bin taxonomic classification)
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
memory | Integer (Optional) | memory usage (mb) |
maximum memory usage in megabytes |
threads | Integer (Optional) | number of threads |
number of threads to use for computational processes |
identifier | String | identifier used |
Identifier for this dataset used in this workflow |
run_gtdbtk | Boolean | Run GTDB-Tk |
Run GTDB-Tk taxonomic bin classification when true |
pacbio_reads | File[] (Optional) | pacbio reads |
file with PacBio reads locally |
forward_reads | File[] | forward reads |
forward sequence file locally |
reverse_reads | File[] | reverse reads |
reverse sequence file locally |
bbmap_reference | String | contamination reference file |
bbmap reference fasta file for contamination filtering |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
workflow_bbmap |
../bbmap/bbmap.cwl
(CommandLineTool)
|
BBMap |
Read mapping using BBMap |
workflow_quast |
../quast/quast.cwl
(CommandLineTool)
|
Quality Assessment Tool for Genome Assemblies |
Runs the Quality Assessment Tool for Genome Assemblies application |
compress_spades |
../bash/pigz.cwl
(CommandLineTool)
|
compress a file multithreaded with pigz | |
workflow_checkm |
../metagenomics/checkm/checkm_lineagewf.cwl
(CommandLineTool)
|
CheckM |
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes |
workflow_gtdbtk |
../metagenomics/gtdbtk/gtdbtk_classify_wf.cwl
(CommandLineTool)
|
GTDBTK Classify Workflow |
Taxonomic genome classification workflow with GTDBTK. !! Can use up to ~220GB RAM !! |
workflow_spades |
../assembly/spades.cwl
(CommandLineTool)
|
spades genomic assembler |
Runs the spades assembler using a dataset file |
workflow_quality |
workflow_quality.cwl
(Workflow)
|
Read quality control, trimming and contamination filter. |
Workflow for (paired) read quality control, trimming and contamination filtering. Will output a merged set of read pairs, when multiple datasets are used. Steps: - FastQC (read quality control) - fastp (read quality trimming) - bbduk used for rrna filtering - bbmap for contamination filter |
workflow_metabat2 |
../metagenomics/metabat2/metabat2.cwl
(CommandLineTool)
|
MetaBat2 |
Metagenome Binning based on Abundance and Tetranucleotide frequency (MetaBat2) |
workflow_bins_stats |
../metagenomics/bin_assembly_stats.cwl
(CommandLineTool)
|
Bin assembly stats |
Table of all bins and there assembly statistics like size, N50, etc.. + table of the bins and their respective assembly contigs names. |
workflow_getunbinned |
../metagenomics/get_unbinned_contigs.cwl
(CommandLineTool)
|
Unbinned contigs |
Get unbinned contigs of the assembbly from a set of binned fasta files in fasta format (compressed). |
quast_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
checkm_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
gtdbtk_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
spades_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
workflow_bin_readstats |
../metagenomics/assembly_bins_readstats.cwl
(CommandLineTool)
|
Bin read mapping stats |
Table of general read mapping statistics of the bins and assembly |
metabat_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
workflow_aggregate_bins |
../metagenomics/metabat2/aggregateBinDepths.cwl
(CommandLineTool)
|
aggregateBinDepths |
Aggregate bin depths using MetaBat2 using the script aggregateBinDepths.pl |
workflow_compress_gtdbtk |
../bash/compress_directory.cwl
(CommandLineTool)
|
Compress a directory (tar) | |
sorted_bam_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
workflow_sam_to_sorted_bam |
../samtools/sam_to_sorted-bam.cwl
(CommandLineTool)
|
sam to sorted bam |
samtools view -@ $2 -hu $1 | samtools sort -@ $2 -o $3.bam |
workflow_contig_read_counts |
../samtools/samtools_idxstats.cwl
(CommandLineTool)
|
samtools idxstats |
samtools idxstats – reports alignment summary statistics |
workflow_metabat2_contig_depths |
../metagenomics/metabat2/metabatContigDepths.cwl
(CommandLineTool)
|
jgi_summarize_bam_contig_depths |
Summarize contig read depth from bam file for metabat2 binning. |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
bam_output | Directory | BAM files |
Mapping results in indexed BAM format |
quast_output | Directory | QUAST |
Quast analysis output folder |
checkm_output | Directory | CheckM |
CheckM output directory |
gtdbtk_output | Directory | GTDB-Tk |
GTDB-Tk output directory |
spades_output | Directory | SPADES |
Metagenome assembly output by SPADES |
filtered_stats | Directory | Filtered statistics |
Statistics on quality and preprocessing of the reads |
metabat2_output | Directory | MetaBat2 |
MetaBat2 output directory |
https://w3id.org/cwl/view/git/0dd868de067a386be8ec6b147df007e213c7275a/cwl/workflows/workflow_metagenomics.cwl