Workflow: Whole Genome Sequence processing workflow scattered over samples

Fetched 2024-11-24 10:24:47 GMT

<p>This is a “real-world” workflow example for processing Next Generation Sequencing (NGS) Whole Genome Sequence (WGS) data.</p> <p>You can learn more and run this workflow yourself by going through the <a href=\"https://doc.arvados.org/main/user/tutorials/wgs-tutorial.html\">Processing Whole Genome Sequences</a> walkthrough in the Arvados user guide.</p> <p>The steps of this workflow include:</p> <ol> <li>Check of fastq quality using FastQC</li> <li>Local alignment using BWA-MEM</li> <li>Variant calling in parallel using GATK Haplotype Caller</li> <li>Generation of an HTML report comparing variants against ClinVar archive</li> </ol> <p>The primary input parameter is the <b>Directory of paired FASTQ files</b>, which should contain paired FASTQ files (suffixed with _1 and _2) to be processed. The workflow scatters over the samples to process them in parallel.</p> <p>The remaining parameters are reference data used by various tools in the pipeline.</p>

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
fastqdir Directory Directory of paired FASTQ files
headhtml File [HTML] Header for HTML report
tailhtml File [HTML] Footer for HTML report
reference File [FASTA] Reference genome
clinvarvcf File [VCF] Reference VCF for ClinVar
reportfunc File Function used to create HTML report
knownsites1 File [VCF] VCF of known SNPS sites for BQSR
knownsites2 File [VCF] VCF of known indel sites for BQSR
scattercount String Desired split for variant calling
fullintervallist File

Steps

ID Runs Label Doc
getfastq
helper/getfastq.cwl (ExpressionTool)
Find matching FASTQ pairs
bwamem-gatk-report WGS processing workflow for single sample

Outputs

ID Type Label Doc
gvcf File[] [VCF] GVCFs generated from GATK
report File[] [HTML] ClinVar variant reports
qcreport 7bba7041921724e763c720ff91d73e46[] [HTML] FASTQ quality reports produced by fastqc
Permalink: https://w3id.org/cwl/view/git/e4d896f5f94a9cf7b157cf87d5042e416649d87b/WGS-processing/cwl/wgs-processing-wf.cwl