Workflow: Metagenomic Binning from Assembly
Workflow for Metagenomics from raw reads to annotated bins.<br> Summary - MetaBAT2 (binning) - CheckM (bin completeness and contamination) - GTDB-Tk (bin taxonomic classification) - BUSCO (bin completeness) **All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br> The dependencies are either accessible from https://unlock-icat.irods.surfsara.nl (anonymous,anonymous)<br> and/or<br> By using the conda / pip environments as shown in https://git.wur.nl/unlock/docker/-/blob/master/kubernetes/scripts/setup.sh<br>
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
step | Integer (Optional) | CWL base step number |
Step number for order of steps |
memory | Integer (Optional) | memory usage (mb) |
Maximum memory usage in megabytes |
threads | Integer (Optional) | number of threads |
Number of threads to use for computational processes |
assembly | File | Assembly fasta |
Assembly in fasta format |
bam_file | File | Bam file |
Mapping file in sorted bam format containing reads mapped to the assembly |
identifier | String | Identifier used |
Identifier for this dataset used in this workflow |
run_gtdbtk | Boolean | Run GTDB-Tk |
Run GTDB-Tk taxonomic bin classification when true |
destination | String (Optional) | Output Destination |
Optional Output destination used for cwl-prov reporting. |
busco_dataset | String | BUSCO dataset |
Path to the BUSCO dataset download location |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
busco |
../busco/busco.cwl
(CommandLineTool)
|
BUSCO |
Based on evolutionarily-informed expectations of gene content of near-universal single-copy orthologs, BUSCO metric is complementary to technical metrics like N50. |
checkm |
../metagenomics/checkm/checkm_lineagewf.cwl
(CommandLineTool)
|
CheckM |
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes |
gtdbtk |
../gtdbtk/gtdbtk_classify_wf.cwl
(CommandLineTool)
|
GTDBTK Classify Workflow |
Taxonomic genome classification workflow with GTDBTK. !! Can use up to ~220GB RAM !! |
metabat2 |
../metagenomics/metabat2/metabat2.cwl
(CommandLineTool)
|
MetaBAT2 binning |
Metagenome Binning based on Abundance and Tetranucleotide frequency (MetaBat2) |
bins_summary |
../metagenomics/bins_summary.cwl
(CommandLineTool)
|
Bin assembly stats |
Table of all bins and there assembly statistics like size, N50, etc.. + table of the bins and their respective assembly contigs names. |
bin_readstats |
../metagenomics/assembly_bins_readstats.cwl
(CommandLineTool)
|
Bin read mapping stats |
Table of general read mapping statistics of the bins and assembly |
compress_gtdbtk |
../bash/compress_directory.cwl
(CommandLineTool)
|
Compress a directory (tar) | |
contig_read_counts |
../samtools/samtools_idxstats.cwl
(CommandLineTool)
|
samtools idxstats |
samtools idxstats - reports alignment summary statistics |
aggregate_bin_depths |
../metagenomics/metabat2/aggregateBinDepths.cwl
(CommandLineTool)
|
aggregateBinDepths |
Aggregate bin depths using MetaBat2 using the script aggregateBinDepths.pl |
assembly_read_counts |
../samtools/samtools_flagstat.cwl
(CommandLineTool)
|
samtools flagstat |
samtools flagstat - reports general alignment summary statistics |
busco_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
merge_busco_summaries |
../expressions/merge_file_arrays.cwl
(ExpressionTool)
|
Merge file arrays |
Merges arrays of files in an array to a array of files |
checkm_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
gtdbtk_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
|
metabat2_contig_depths |
../metagenomics/metabat2/metabatContigDepths.cwl
(CommandLineTool)
|
jgi_summarize_bam_contig_depths |
Summarize contig read depth from bam file for metabat2 binning. |
metabat_files_to_folder |
../expressions/files_to_folder.cwl
(ExpressionTool)
|
Transforms the input files to a mentioned directory |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
bins_summary | File | Bins summary |
Summary of info about the bins |
busco_output | Directory | BUSCO |
BUSCO output directory |
checkm_output | Directory | CheckM |
CheckM output directory |
gtdbtk_output | Directory (Optional) | GTDB-Tk |
GTDB-Tk output directory |
metabat2_output | Directory | MetaBAT2 |
MetaBAT2 output directory |
https://w3id.org/cwl/view/git/b9097b82e6ab6f2c9496013ce4dd6877092956a0/cwl/workflows/workflow_metagenomics_binning.cwl