Workflow: Metagenomic Binning from Assembly

Fetched 2024-04-19 19:37:58 GMT

Workflow for Metagenomics from raw reads to annotated bins.<br> Summary - MetaBAT2 (binning) - CheckM (bin completeness and contamination) - GTDB-Tk (bin taxonomic classification) - BUSCO (bin completeness) **All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br> The dependencies are either accessible from https://unlock-icat.irods.surfsara.nl (anonymous,anonymous)<br> and/or<br> By using the conda / pip environments as shown in https://git.wur.nl/unlock/docker/-/blob/master/kubernetes/scripts/setup.sh<br>

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
step Integer (Optional) CWL base step number

Step number for order of steps

memory Integer (Optional) memory usage (mb)

Maximum memory usage in megabytes

threads Integer (Optional) number of threads

Number of threads to use for computational processes

assembly File Assembly fasta

Assembly in fasta format

bam_file File Bam file

Mapping file in sorted bam format containing reads mapped to the assembly

identifier String Identifier used

Identifier for this dataset used in this workflow

run_gtdbtk Boolean Run GTDB-Tk

Run GTDB-Tk taxonomic bin classification when true

destination String (Optional) Output Destination

Optional Output destination used for cwl-prov reporting.

busco_dataset String BUSCO dataset

Path to the BUSCO dataset download location

Steps

ID Runs Label Doc
busco
../busco/busco.cwl (CommandLineTool)
BUSCO

Based on evolutionarily-informed expectations of gene content of near-universal single-copy orthologs, BUSCO metric is complementary to technical metrics like N50.

checkm
../metagenomics/checkm/checkm_lineagewf.cwl (CommandLineTool)
CheckM

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes

gtdbtk
../gtdbtk/gtdbtk_classify_wf.cwl (CommandLineTool)
GTDBTK Classify Workflow

Taxonomic genome classification workflow with GTDBTK. !! Can use up to ~220GB RAM !!

metabat2
../metagenomics/metabat2/metabat2.cwl (CommandLineTool)
MetaBAT2 binning

Metagenome Binning based on Abundance and Tetranucleotide frequency (MetaBat2)

bins_summary
../metagenomics/bins_summary.cwl (CommandLineTool)
Bin assembly stats

Table of all bins and there assembly statistics like size, N50, etc.. + table of the bins and their respective assembly contigs names.

bin_readstats
../metagenomics/assembly_bins_readstats.cwl (CommandLineTool)
Bin read mapping stats

Table of general read mapping statistics of the bins and assembly

compress_gtdbtk
../bash/compress_directory.cwl (CommandLineTool)
Compress a directory (tar)
contig_read_counts
../samtools/samtools_idxstats.cwl (CommandLineTool)
samtools idxstats

samtools idxstats - reports alignment summary statistics

aggregate_bin_depths
../metagenomics/metabat2/aggregateBinDepths.cwl (CommandLineTool)
aggregateBinDepths

Aggregate bin depths using MetaBat2 using the script aggregateBinDepths.pl

assembly_read_counts
../samtools/samtools_flagstat.cwl (CommandLineTool)
samtools flagstat

samtools flagstat - reports general alignment summary statistics

busco_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

merge_busco_summaries
../expressions/merge_file_arrays.cwl (ExpressionTool)
Merge file arrays

Merges arrays of files in an array to a array of files

checkm_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

gtdbtk_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

metabat2_contig_depths
../metagenomics/metabat2/metabatContigDepths.cwl (CommandLineTool)
jgi_summarize_bam_contig_depths

Summarize contig read depth from bam file for metabat2 binning.

metabat_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

Outputs

ID Type Label Doc
bins_summary File Bins summary

Summary of info about the bins

busco_output Directory BUSCO

BUSCO output directory

checkm_output Directory CheckM

CheckM output directory

gtdbtk_output Directory (Optional) GTDB-Tk

GTDB-Tk output directory

metabat2_output Directory MetaBAT2

MetaBAT2 output directory

Permalink: https://w3id.org/cwl/view/git/b9097b82e6ab6f2c9496013ce4dd6877092956a0/cwl/workflows/workflow_metagenomics_binning.cwl