Workflow: Whole Genome Sequence processing workflow scattered over samples
<p>This is a “real-world” workflow example for processing Next Generation Sequencing (NGS) Whole Genome Sequence (WGS) data.</p> <p>You can learn more and run this workflow yourself by going through the <a href=\"https://doc.arvados.org/main/user/tutorials/wgs-tutorial.html\">Processing Whole Genome Sequences</a> walkthrough in the Arvados user guide.</p> <p>The steps of this workflow include:</p> <ol> <li>Check of fastq quality using FastQC</li> <li>Local alignment using BWA-MEM</li> <li>Variant calling in parallel using GATK Haplotype Caller</li> <li>Generation of an HTML report comparing variants against ClinVar archive</li> </ol> <p>The primary input parameter is the <b>Directory of paired FASTQ files</b>, which should contain paired FASTQ files (suffixed with _1 and _2) to be processed. The workflow scatters over the samples to process them in parallel.</p> <p>The remaining parameters are reference data used by various tools in the pipeline.</p>
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
fastqdir | Directory | Directory of paired FASTQ files | |
headhtml | File [HTML] | Header for HTML report | |
tailhtml | File [HTML] | Footer for HTML report | |
reference | File [FASTA] | Reference genome | |
clinvarvcf | File [VCF] | Reference VCF for ClinVar | |
reportfunc | File | Function used to create HTML report | |
knownsites1 | File [VCF] | VCF of known SNPS sites for BQSR | |
knownsites2 | File [VCF] | VCF of known indel sites for BQSR | |
scattercount | String | Desired split for variant calling | |
fullintervallist | File |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
getfastq |
helper/getfastq.cwl
(ExpressionTool)
|
Find matching FASTQ pairs | |
bwamem-gatk-report |
helper/bwamem-gatk-report-wf.cwl
(Workflow)
|
WGS processing workflow for single sample |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
gvcf | File[] [VCF] | GVCFs generated from GATK | |
report | File[] [HTML] | ClinVar variant reports | |
qcreport | 7bba7041921724e763c720ff91d73e46[] [HTML] | FASTQ quality reports produced by fastqc |
https://w3id.org/cwl/view/git/e4d896f5f94a9cf7b157cf87d5042e416649d87b/WGS-processing/cwl/wgs-processing-wf.cwl