Workflow: Metagenomics workflow

Fetched 2024-05-19 04:47:28 GMT

Workflow for Metagenomics from raw reads to annotated bins. Steps: - workflow_illumina_quality.cwl: - FastQC (control) - fastp (quality trimming) - kraken2 (taxonomy) - bbmap contamination filter - SPAdes (Assembly) - QUAST (Assembly quality report) - BBmap (Read mapping to assembly) - Contig binning (OPTIONAL)

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
memory Integer (Optional) memory usage (MB)

maximum memory usage in megabytes

binning Boolean (Optional) Run binning workflow

Run with contig binning workflow

threads Integer (Optional) number of threads

number of threads to use for computational processes

identifier String identifier used

Identifier for this dataset used in this workflow

run_gtdbtk Boolean Run GTDB-Tk

Run GTDB-Tk taxonomic bin classification when true

deduplicate Boolean (Optional) Deduplicate reads

Remove exact duplicate reads with fastp

destination String (Optional) Output Destination

Optional Output destination used for cwl-prov reporting.

pacbio_reads File[] (Optional) pacbio reads

file with PacBio reads locally

nanopore_reads File[] (Optional) pacbio reads

file with PacBio reads locally

kraken_database String Kraken2 database

Absolute path with database location of kraken2

filter_references String[] contamination reference file

bbmap reference fasta file paths for contamination filtering

illumina_forward_reads String[] forward reads

forward sequence file path

illumina_reverse_reads String[] reverse reads

reverse sequence file path

use_reference_mapped_reads Boolean Keep mapped reads

Continue with reads mapped to the given reference

Steps

ID Runs Label Doc
bbmap
../bbmap/bbmap.cwl (CommandLineTool)
BBMap

Read filtering using BBMap against a (contamination) reference genome

quast
../quast/quast.cwl (CommandLineTool)
QUAST: Quality Assessment Tool for Genome Assemblies

Runs the Quality Assessment Tool for Genome Assemblies application

When not using the QUAST 5.1.0rc1 pre release with python 3.8 and above do the following: There is a known issue present since Python 3.8. The work around (https://github.com/ablab/quast/issues/157) requires replace \"cgi.escape\" by \"html.escape\" and \"import cgi\" by \"import html\" in the file \"jsontemplate.py\" in the installation folder: \"path/to/quast-5.0.2/quast_libs/site_packages/jsontemplate/jsontemplate.py\"

spades
../assembly/spades.cwl (CommandLineTool)
spades genomic assembler

Runs the spades assembler using a dataset file

kraken2
../kraken2/kraken2.cwl (CommandLineTool)
Kraken2 metagenomics read classification

Kraken2 metagenomics read classification.

Updated databases available at: https://benlangmead.github.io/aws-indexes/k2 (e.g. PlusPF-8) Original db: https://ccb.jhu.edu/software/kraken2/index.shtml?t=downloads

kraken2_krona
../krona/krona.cwl (CommandLineTool)
Krona

Visualization of Kraken2 report results. ktImportText -o $1 $2

compress_spades
../bash/pigz.cwl (CommandLineTool)
compress a file multithreaded with pigz
kraken2_compress
../bash/pigz.cwl (CommandLineTool)
compress a file multithreaded with pigz
workflow_binning Metagenomic Binning from Assembly

Workflow for Metagenomics from raw reads to annotated bins.<br> Summary - MetaBAT2 (binning) - CheckM (bin completeness and contamination) - GTDB-Tk (bin taxonomic classification) - BUSCO (bin completeness)

**All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br>

The dependencies are either accessible from https://unlock-icat.irods.surfsara.nl (anonymous,anonymous)<br> and/or<br> By using the conda / pip environments as shown in https://git.wur.nl/unlock/docker/-/blob/master/kubernetes/scripts/setup.sh<br>

workflow_quality Illumina read quality control, trimming and contamination filter.

**Workflow for Illumina paired read quality control, trimming and filtering.**<br /> Multiple paired datasets will be merged into single paired dataset.<br /> Summary: - FastQC on raw data files<br /> - fastp for read quality trimming<br /> - BBduk for phiX and (optional) rRNA filtering<br /> - Kraken2 for taxonomic classification of reads (optional)<br /> - BBmap for (contamination) filtering using given references (optional)<br /> - FastQC on filtered (merged) data<br />

**All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br>

WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

sam_to_sorted_bam
../samtools/sam_to_sorted-bam.cwl (CommandLineTool)
sam to sorted bam

samtools view -@ $2 -hu $1 | samtools sort -@ $2 -o $3.bam

contig_read_counts
../samtools/samtools_idxstats.cwl (CommandLineTool)
samtools idxstats

samtools idxstats - reports alignment summary statistics

quast_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

spades_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

binning_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

kraken2_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

sorted_bam_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

Outputs

ID Type Label Doc
bam_output Directory (Optional) BAM files

Mapping results in indexed BAM format

quast_output Directory QUAST

Quast analysis output folder

spades_output Directory SPAdes

Metagenome assembly output by SPADES

binning_output Directory (Optional) Binning output

Binning outputfolders

filtered_stats Directory Filtered statistics

Statistics on quality and preprocessing of the reads

kraken2_output Directory Kraken2 reports

Kraken2 taxonomic classification reports

Permalink: https://w3id.org/cwl/view/git/b9097b82e6ab6f2c9496013ce4dd6877092956a0/cwl/workflows/workflow_metagenomics_assembly.cwl