Workflow: exomeseq-01-preprocessing.cwl

Fetched 2024-05-02 17:31:33 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
GATKJar File
library String
threads Integer
platform String
intervals File[] (Optional)
read_pair https://w3id.org/cwl/view/git/bbe24d8d7fde2e918583b96805909a2867b749d6/types/FASTQReadPairType.yml#FASTQReadPairType
knownSites File[]
resource_dbsnp File
interval_padding Integer (Optional)
reference_genome File
bait_interval_list File
target_interval_list File

Steps

ID Runs Label Doc
qc
../tools/fastqc.cwl (CommandLineTool)
map
../tools/bwa-mem-samtools.cwl (CommandLineTool)

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

Algorithm options: -w INT band width for banded alignment [100] -d INT off-diagonal X-dropoff [100] -r FLOAT look for internal seeds inside a seed longer than {-k} * FLOAT [1.5] -y INT seed occurrence for the 3rd round seeding [20] -c INT skip seeds with more than INT occurrences [500] -D FLOAT drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50] -W INT discard a chain if seeded bases shorter than INT [0] -m INT perform at most INT rounds of mate rescues for each read [50] -S skip mate rescue -P skip pairing; mate rescue performed unless -S also in use -e discard full-length exact matches

Scoring options:

-A INT score for a sequence match, which scales options -TdBOELU unless overridden [1] -B INT penalty for a mismatch [4] -O INT[,INT] gap open penalties for deletions and insertions [6,6] -E INT[,INT] gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1] -L INT[,INT] penalty for 5'- and 3'-end clipping [5,5] -U INT penalty for an unpaired read pair [17]

-x STR read type. Setting -x changes multiple parameters unless overriden [null] pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0 (PacBio reads to ref) ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0 (Oxford Nanopore 2D-reads to ref) intractg: -B9 -O16 -L5 (intra-species contigs to ref)

Input/output options:

-p smart pairing (ignoring in2.fq) -R STR read group header line such as '@RG\tID:foo\tSM:bar' [null] -H STR/FILE insert STR to header if it starts with @; or insert lines in FILE [null] -j treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)

-v INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3] -T INT minimum score to output [30] -h INT[,INT] if there are <INT hits with score >80% of the max score, output all in XA [5,200] -a output all alignments for SE or unpaired PE -C append FASTA/FASTQ comment to SAM output -V output the reference FASTA header in the XR tag -Y use soft clipping for supplementary alignments -M mark shorter split hits as secondary

-I FLOAT[,FLOAT[,INT[,INT]]] specify the mean, standard deviation (10% of the mean if absent), max (4 sigma from the mean if absent) and min of the insert size distribution. FR orientation only. [inferred]

Note: Please read the man page for detailed description of the command line and options.

sort
../tools/picard-SortSam.cwl (CommandLineTool)
trim
../tools/trim_galore.cwl (CommandLineTool)
combine_reads
../tools/concat-gz-files.cwl (CommandLineTool)
mark_duplicates
../tools/picard-MarkDuplicates.cwl (CommandLineTool)
variant_calling
../tools/GATK-HaplotypeCaller.cwl (CommandLineTool)

GATK-RealignTargetCreator.cwl is developed for CWL consortium Call germline SNPs and indels via local re-assembly of haplotypes

file_pair_details
../tools/extract-named-file-pair-details.cwl (ExpressionTool)
Given a FASTQReadPairType returns a 2D array of the files contained within
collect_hs_metrics
../tools/picard-CollectHsMetrics.cwl (CommandLineTool)
recalibrate_02_apply
../tools/GATK-PrintReads.cwl (CommandLineTool)

GATK-RealignTargetCreator.cwl is developed for CWL consortium Prints all reads that have a mapping quality above zero Usage: java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -R reference.fasta -I input1.bam -I input2.bam -o output.bam --read_filter MappingQualityZero

recalibrate_01_analyze
../tools/GATK-BaseRecalibrator.cwl (CommandLineTool)

GATK-BaseRecalibrator.cwl is developed for CWL consortium It generate base recalibration table to compensate for systematic errors in basecalling confidences Usage: java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R reference.fasta -I my_reads.bam -knownSites latest_dbsnp.vcf -o recal_data.table.

generate_sample_filenames
../tools/generate-sample-filenames.cwl (ExpressionTool)
Generates a set of file names for preprocessing steps based on an input sample name

Outputs

ID Type Label Doc
hs_metrics File
raw_variants File

VCF file from per sample variant calling

trim_reports File[]
fastqc_reports File[]
haplotypes_bam File

BAM file containing assembled haplotypes and locally realigned reads

markduplicates_bam File
recalibrated_reads File
recalibration_table File
Permalink: https://w3id.org/cwl/view/git/bbe24d8d7fde2e918583b96805909a2867b749d6/subworkflows/exomeseq-01-preprocessing.cwl