Workflow: Whole Genome Sequence processing workflow scattered over samples
<p>This is a “real-world” workflow example for processing Next Generation Sequencing (NGS) Whole Genome Sequence (WGS) data.</p> <p>You can learn more and run this workflow yourself by going through the <a href=\"https://doc.arvados.org/main/user/tutorials/wgs-tutorial.html\">Processing Whole Genome Sequences</a> walkthrough in the Arvados user guide.</p> <p>The steps of this workflow include:</p> <ol> <li>Check of fastq quality using FastQC</li> <li>Local alignment using BWA-MEM</li> <li>Variant calling in parallel using GATK Haplotype Caller</li> <li>Generation of an HTML report comparing variants against ClinVar archive</li> </ol> <p>The primary input parameter is the <b>Directory of paired FASTQ files</b>, which should contain paired FASTQ files (suffixed with _1 and _2) to be processed. The workflow scatters over the samples to process them in parallel.</p> <p>The remaining parameters are reference data used by various tools in the pipeline.</p>
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
| ID | Type | Title | Doc |
|---|---|---|---|
| fastqdir | Directory | Directory of paired FASTQ files | |
| headhtml | File [HTML] | Header for HTML report | |
| tailhtml | File [HTML] | Footer for HTML report | |
| reference | File [FASTA] | Reference genome | |
| clinvarvcf | File [VCF] | Reference VCF for ClinVar | |
| reportfunc | File | Function used to create HTML report | |
| knownsites1 | File [VCF] | VCF of known SNPS sites for BQSR | |
| knownsites2 | File [VCF] | VCF of known indel sites for BQSR | |
| scattercount | String | Desired split for variant calling | |
| fullintervallist | File |
Steps
| ID | Runs | Label | Doc |
|---|---|---|---|
| getfastq |
helper/getfastq.cwl
(ExpressionTool)
|
Find matching FASTQ pairs | |
| bwamem-gatk-report |
helper/bwamem-gatk-report-wf.cwl
(Workflow)
|
WGS processing workflow for single sample |
Outputs
| ID | Type | Label | Doc |
|---|---|---|---|
| gvcf | File[] [VCF] | GVCFs generated from GATK | |
| report | File[] [HTML] | ClinVar variant reports | |
| qcreport | d4f267523493cbb7c5ce0df6368a06b2[] [HTML] | FASTQ quality reports produced by fastqc |
https://w3id.org/cwl/view/git/e4d896f5f94a9cf7b157cf87d5042e416649d87b/WGS-processing/cwl/wgs-processing-wf.cwl
