Workflow: wgs alignment and somatic variant detection

Fetched 2024-11-26 21:13:37 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
trimming https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/types/trimming_options.yml#trimming_options (Optional)
vep_pick
reference File
tumor_name String (Optional) tumor_name: String specifying the name of the MT sample

tumor_name provides a string for what the MT sample will be referred to in the various outputs, for example the VCF files.

normal_name String (Optional) normal_name: String specifying the name of the WT sample

normal_name provides a string for what the WT sample will be referred to in the various outputs, for example the VCF files.

manta_non_wgs Boolean (Optional)
scatter_count Integer

scatters each supported variant detector (varscan, mutect) into this many parallel jobs

synonyms_file File (Optional)
vep_cache_dir Directory
cle_vcf_filter Boolean
tumor_sequence https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/types/sequence_data.yml#sequence_data[] tumor_sequence: MT sequencing data and readgroup information

tumor_sequence represents the sequencing data for the MT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.

normal_sequence https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/types/sequence_data.yml#sequence_data[] normal_sequence: WT sequencing data and readgroup information

normal_sequence represents the sequencing data for the WT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.

varscan_p_value Float (Optional)
target_intervals File
summary_intervals https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/types/labelled_file.yml#labelled_file[]
tumor_sample_name String
normal_sample_name String
per_base_intervals https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/types/labelled_file.yml#labelled_file[]
vep_ensembl_species String

ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus

vep_ensembl_version String

ensembl version - Must be present in the cache directory. Example: 95

vep_to_table_fields String[]
annotate_coding_only Boolean (Optional)
manta_output_contigs Boolean (Optional)
per_target_intervals https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/types/labelled_file.yml#labelled_file[]
strelka_cpu_reserved Integer (Optional)
varscan_min_coverage Integer (Optional)
varscan_min_var_freq Float (Optional)
vep_ensembl_assembly String

genome assembly to use in vep. Examples: GRCh38 or GRCm38

varscan_strand_filter Integer (Optional)
qc_minimum_base_quality Integer (Optional)
varscan_max_normal_freq Float (Optional)
variants_to_table_fields String[]
cnvkit_target_average_size Integer (Optional)

approximate size of split target bins for CNVkit; if not set a suitable window size will be set by CNVkit automatically

qc_minimum_mapping_quality Integer (Optional)
filter_somatic_llr_threshold Float

Sets the stringency (log-likelihood ratio) used to filter out non-somatic variants. Typical values are 10=high stringency, 5=normal, 3=low stringency. Low stringency may be desirable when read depths are low (as in WGS)

filter_somatic_llr_tumor_purity Float

Sets the purity of the tumor used in the somatic llr filter, used to remove non-somatic variants. Probably only needs to be adjusted for low-purity (< 50%). Range is 0 to 1

picard_metric_accumulation_level String
variants_to_table_genotype_fields String[]
filter_somatic_llr_normal_contamination_rate Float

Sets the fraction of tumor present in the normal sample (range 0 to 1), used in the somatic llr filter. Useful for heavily contaminated adjacent normals. Range is 0 to 1

Steps

ID Runs Label Doc
manta
../tools/manta_somatic.cwl (CommandLineTool)
Set up and execute manta
cnvkit
../tools/cnvkit_batch.cwl (CommandLineTool)

Note: cnvkit batch is a complex command that is capable of running all or part of the cnvkit internal pipeline, depending on the combination of inputs provided to it. In order to take advantage of this, most inputs to this cwl are optional, so that different workflows can use different forms of the command while still using a single cwl file. For further reading, see the relevant cnvkit docs at https://cnvkit.readthedocs.io/en/stable/quickstart.html#build-a-reference-from-normal-samples-and-infer-tumor-copy-ratios https://cnvkit.readthedocs.io/en/stable/pipeline.html#batch In our pipelines, the command form is mainly determined by the components of the reference input. The somatic_exome cwl pipeline provides a fasta file and a normal bam, which causes the batch pipeline to construct a copy number reference (.cnn file) based on the normal bam. The germline_wgs cwl pipeline does not provide a normal bam; instead it passes a cnn reference file as an optional input. This file is intended to be manually generated from a reference normal sample for use in the pipeline. If it is not provided, cnvkit will automatically generate a flat reference file.

detect_variants Detect Variants workflow for nonhuman WGS pipeline
tumor_index_cram
../tools/index_cram.cwl (CommandLineTool)
samtools index cram
normal_index_cram
../tools/index_cram.cwl (CommandLineTool)
samtools index cram
tumor_bam_to_cram
../tools/bam_to_cram.cwl (CommandLineTool)
BAM to CRAM conversion
normal_bam_to_cram
../tools/bam_to_cram.cwl (CommandLineTool)
BAM to CRAM conversion
tumor_alignment_and_qc alignment for nonhuman with qc
normal_alignment_and_qc alignment for nonhuman with qc

Outputs

ID Type Label Doc
final_tsv File
final_vcf File
tumor_cram File
normal_cram File
vep_summary File
all_candidates File
tumor_flagstats File
diploid_variants File (Optional)
normal_flagstats File
small_candidates File
somatic_variants File (Optional)
cnvkit_cn_diagram File
final_filtered_vcf File
mutect_filtered_vcf File
tumor_only_variants File (Optional)
strelka_filtered_vcf File
varscan_filtered_vcf File
mutect_unfiltered_vcf File
cnvkit_cn_scatter_plot File
strelka_unfiltered_vcf File
varscan_unfiltered_vcf File
cnvkit_intervals_target File
tumor_summary_hs_metrics File[]
cnvkit_reference_coverage File
normal_summary_hs_metrics File[]
tumor_insert_size_metrics File
tumor_per_base_hs_metrics File[]
normal_insert_size_metrics File
normal_per_base_hs_metrics File[]
cnvkit_intervals_antitarget File
tumor_per_target_hs_metrics File[]
tumor_snv_bam_readcount_tsv File
cnvkit_tumor_target_coverage File
normal_per_target_hs_metrics File[]
normal_snv_bam_readcount_tsv File
cnvkit_normal_target_coverage File
cnvkit_tumor_bin_level_ratios File
cnvkit_tumor_segmented_ratios File
tumor_indel_bam_readcount_tsv File
tumor_mark_duplicates_metrics File
normal_indel_bam_readcount_tsv File
normal_mark_duplicates_metrics File
tumor_alignment_summary_metrics File
tumor_per_base_coverage_metrics File[]
cnvkit_tumor_antitarget_coverage File
normal_alignment_summary_metrics File
normal_per_base_coverage_metrics File[]
cnvkit_normal_antitarget_coverage File
tumor_per_target_coverage_metrics File[]
normal_per_target_coverage_metrics File[]
Permalink: https://w3id.org/cwl/view/git/bfcb5ffbea3d00a38cc03595d41e53ea976d599d/definitions/pipelines/somatic_wgs_nonhuman.cwl