Workflow: gathered exome alignment and somatic variant detection

Fetched 2023-01-09 16:41:43 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
docm_vcf File

Common mutations in cancer that will be genotyped and passed through into the merged VCF if they have even low-level evidence of a mutation (by default, marked with filter DOCM_ONLY)

omni_vcf File
vep_pick
reference File
output_dir String
somalier_vcf File
scatter_count Integer

scatters each supported variant detector (varscan, pindel, mutect) into this many parallel jobs

synonyms_file File (Optional)
vep_cache_dir Directory
bait_intervals File
bqsr_intervals String[]
cle_vcf_filter Boolean
tumor_sequence https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/sequence_data.yml#sequence_data[] tumor_sequence: MT sequencing data and readgroup information

tumor_sequence represents the sequencing data for the MT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.

hgvs_annotation Boolean (Optional)
normal_sequence https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/sequence_data.yml#sequence_data[] normal_sequence: WT sequencing data and readgroup information

normal_sequence represents the sequencing data for the WT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.

tumor_cram_name String (Optional)
varscan_p_value Float (Optional)
bqsr_known_sites File[]

One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.

normal_cram_name String (Optional)
target_intervals File target_intervals: interval_list file of targets used in the sequencing experiment

target_intervals is an interval_list corresponding to the targets for the capture reagent. Bed files with this information can be converted to interval_lists with Picard BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same.

summary_intervals https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/labelled_file.yml#labelled_file[]
tumor_sample_name String
normal_sample_name String
per_base_intervals https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/labelled_file.yml#labelled_file[]
pindel_insert_size Integer
validated_variants File (Optional)

An optional VCF with variants that will be flagged as 'VALIDATED' if found in this pipeline's main output VCF

vep_ensembl_species String

ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus

vep_ensembl_version String

ensembl version - Must be present in the cache directory. Example: 95

vep_to_table_fields String[]
annotate_coding_only Boolean (Optional)
filter_docm_variants Boolean (Optional)

Determines whether variants found only via genotyping of DOCM sites will be filtered (as DOCM_ONLY) or passed through as variant calls

per_target_intervals https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/labelled_file.yml#labelled_file[]
strelka_cpu_reserved Integer (Optional)
varscan_min_coverage Integer (Optional)
varscan_min_var_freq Float (Optional)
vep_ensembl_assembly String

genome assembly to use in vep. Examples: GRCh38 or GRCm38

varscan_strand_filter Integer (Optional)
vep_custom_annotations https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/vep_custom_annotation.yml#vep_custom_annotation[]

custom type, check types directory for input format

qc_minimum_base_quality Integer (Optional)
target_interval_padding Integer target_interval_padding

The effective coverage of capture products generally extends out beyond the actual regions targeted. This parameter allows variants to be called in these wingspan regions, extending this many base pairs from each side of the target regions.

varscan_max_normal_freq Float (Optional)
variants_to_table_fields String[]
qc_minimum_mapping_quality Integer (Optional)
filter_somatic_llr_threshold Float

Sets the stringency (log-likelihood ratio) used to filter out non-somatic variants. Typical values are 10=high stringency, 5=normal, 3=low stringency. Low stringency may be desirable when read depths are low (as in WGS) or when tumor samples are impure.

mutect_artifact_detection_mode Boolean
filter_somatic_llr_tumor_purity Float

Sets the purity of the tumor used in the somatic llr filter, used to remove non-somatic variants. Probably only needs to be adjusted for low-purity (< 50%). Range is 0 to 1

picard_metric_accumulation_level String
variants_to_table_genotype_fields String[]
mutect_max_alt_alleles_in_normal_count Integer (Optional)
mutect_max_alt_allele_in_normal_fraction Float (Optional)
filter_somatic_llr_normal_contamination_rate Float

Sets the fraction of tumor present in the normal sample (range 0 to 1), used in the somatic llr filter. Useful for heavily contaminated adjacent normals. Range is 0 to 1

Steps

ID Runs Label Doc
gatherer
../tools/gatherer.cwl (CommandLineTool)
somatic_exome
somatic_exome.cwl (Workflow)
somatic_exome: exome alignment and somatic variant detection

somatic_exome is designed to perform processing of mutant/wildtype H.sapiens exome sequencing data. It features BQSR corrected alignments, 4 caller variant detection, and vep style annotations. Structural variants are detected via manta and cnvkit. In addition QC metrics are run, including somalier concordance metrics.

example input file = analysis_workflows/example_data/somatic_exome.yaml

Outputs

ID Type Label Doc
final_outputs String[]
Permalink: https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/pipelines/somatic_exome_gathered.cwl