Workflow: Illumina read quality control, trimming and contamination filter.

Fetched 2024-04-24 01:34:15 GMT

**Workflow for Illumina paired read quality control, trimming and filtering.**<br /> Multiple paired datasets will be merged into single paired dataset.<br /> Summary: - FastQC on raw data files<br /> - fastp for read quality trimming<br /> - BBduk for phiX and (optional) rRNA filtering<br /> - Kraken2 for taxonomic classification of reads (optional)<br /> - BBmap for (contamination) filtering using given references (optional)<br /> - FastQC on filtered (merged) data<br /> **All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br> WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
step Integer (Optional) Output Step number

Step number for output folder numbering

memory Integer (Optional) Maximum memory in MB

Maximum memory usage in MegaBytes

threads Integer (Optional) Number of threads

Number of threads to use for computational processes

identifier String identifier used

Identifier for this dataset used in this workflow

deduplicate Boolean (Optional) Deduplicate reads

Remove exact duplicate reads with fastp

filter_rrna Boolean filter rRNA

Optionally remove rRNA sequences from the reads.

forward_reads String[] Forward reads

Forward sequence fastq file(s) locally

reverse_reads String[] Reverse reads

Reverse sequence fastq file(s) locally

kraken_database String[] (Optional) Kraken2 database

Kraken2 database location

filter_references String[] (Optional) Filter reference file(s)

References fasta file(s) for filtering

keep_reference_mapped_reads Boolean Keep mapped reads

Keep with reads mapped to the given reference

Steps

ID Runs Label Doc
fastp
../fastp/fastp.cwl (CommandLineTool)

Modified from https://github.com/ambarishK/bio-cwl-tools/blob/release/fastp/fastp.cwl

phix_filter
../bbmap/bbduk_filter.cwl (CommandLineTool)
Filter from reads

Filter reads using BBmaps bbduk tool (paired-end only)

rrna_filter
../bbmap/bbduk_filter.cwl (CommandLineTool)
Filter from reads

Filter reads using BBmaps bbduk tool (paired-end only)

fastq_merge_fwd
../bash/concatenate.cwl (CommandLineTool)
Concatenate multiple files
fastq_merge_rev
../bash/concatenate.cwl (CommandLineTool)
Concatenate multiple files
combine_references
../bash/concatenate.cwl (CommandLineTool)
Concatenate multiple files
fastqc_illumina_after
../fastqc/fastqc.cwl (CommandLineTool)
FASTQC

Performs quality control on FASTQ files

fastqc_illumina_before
../fastqc/fastqc.cwl (CommandLineTool)
FASTQC

Performs quality control on FASTQ files

reports_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

illumina_quality_kraken2
../kraken2/kraken2.cwl (CommandLineTool)
Kraken2 metagenomics read classification

Kraken2 metagenomics read classification.

Updated databases available at: https://benlangmead.github.io/aws-indexes/k2 (e.g. PlusPF-8) Original db: https://ccb.jhu.edu/software/kraken2/index.shtml?t=downloads

reference_filter_illumina
../bbmap/bbmap_filter-reads.cwl (CommandLineTool)
BBMap

Read filtering using BBMap against a (contamination) reference genome

illumina_quality_kraken2_krona
../krona/krona.cwl (CommandLineTool)
Krona

Visualization of Kraken2 report results. ktImportText -o $1 $2

Outputs

ID Type Label Doc
destination String (Optional) Output Destination

Optional Output destination used for cwl-prov reporting.

reports_folder Directory Filtering reports folder

Folder containing all reports of filtering and quality control

QC_forward_reads File Filtered forward read

Filtered forward read

QC_reverse_reads File Filtered reverse read

Filtered reverse read

Permalink: https://w3id.org/cwl/view/git/b9097b82e6ab6f2c9496013ce4dd6877092956a0/cwl/workflows/workflow_illumina_quality.cwl