Workflow: exome alignment and germline variant detection

Fetched 2023-01-09 13:46:05 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
ploidy Integer (Optional)
omni_vcf File
sequence https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/sequence_data.yml#sequence_data[] sequence: sequencing data and readgroup information

sequence represents the sequencing data as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.

trimming https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/trimming_options.yml#trimming_options (Optional)
intervals 46343471451cac800357bfb708bdec26[]
reference File
vep_plugins String[] (Optional)

array of plugins to use when running vep

gvcf_gq_bands String[]
synonyms_file File (Optional)
vep_cache_dir Directory
bait_intervals File bait_intervals: interval_list file of baits used in the sequencing experiment

bait_intervals is an interval_list corresponding to the baits used in sequencing reagent. These are essentially coordinates for regions you were able to design probes for in the reagent. Typically the reagent provider has this information available in bed format and it can be converted to an interval_list with Picard BedToIntervalList. Astrazeneca also maintains a repo of baits for common sequencing reagents available at https://github.com/AstraZeneca-NGS/reference_data

bqsr_intervals String[] (Optional)
bqsr_known_sites File[]

One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.

target_intervals File target_intervals: interval_list file of targets used in the sequencing experiment

target_intervals is an interval_list corresponding to the targets for the capture reagent. Bed files with this information can be converted to interval_lists with Picard BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same.

summary_intervals https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/labelled_file.yml#labelled_file[]
per_base_intervals https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/labelled_file.yml#labelled_file[]
vep_ensembl_species String

ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus

vep_ensembl_version String

ensembl version - Must be present in the cache directory. Example: 95

vep_to_table_fields String[] (Optional)
annotate_coding_only Boolean (Optional)
per_target_intervals https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/labelled_file.yml#labelled_file[]
vep_ensembl_assembly String

genome assembly to use in vep. Examples: GRCh38 or GRCm38

vep_custom_annotations https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/types/vep_custom_annotation.yml#vep_custom_annotation[]

custom type, check types directory for input format

qc_minimum_base_quality Integer (Optional)
target_interval_padding Integer target_interval_padding: number of bp flanking each target region in which to allow variant calls

The effective coverage of capture products generally extends out beyond the actual regions targeted. This parameter allows variants to be called in these wingspan regions, extending this many base pairs from each side of the target regions.

variants_to_table_fields String[] (Optional)
qc_minimum_mapping_quality Integer (Optional)
picard_metric_accumulation_level String
variants_to_table_genotype_fields String[] (Optional)

Steps

ID Runs Label Doc
index_cram
../tools/index_cram.cwl (CommandLineTool)
samtools index cram
bam_to_cram
../tools/bam_to_cram.cwl (CommandLineTool)
BAM to CRAM conversion
detect_variants exome alignment and germline variant detection
extract_freemix
germline_exome.cwl#extract_freemix/2a8e5b6b-e2fc-4219-8c58-06b98344971a (ExpressionTool)
alignment_and_qc
alignment_exome.cwl (Workflow)
exome alignment with qc
pad_target_intervals
../tools/interval_list_expand.cwl (CommandLineTool)
expand interval list regions by a given number of basepairs

Outputs

ID Type Label Doc
cram File
raw_vcf File
final_tsv File
final_vcf File
flagstats File
hs_metrics File
vep_summary File
filtered_tsv File
filtered_vcf File
summary_hs_metrics File[]
insert_size_metrics File
per_base_hs_metrics File[]
verify_bam_id_depth File
insert_size_histogram File
per_target_hs_metrics File[]
verify_bam_id_metrics File
mark_duplicates_metrics File
alignment_summary_metrics File
per_base_coverage_metrics File[]
per_target_coverage_metrics File[]
Permalink: https://w3id.org/cwl/view/git/25eab0390f6866ce491b44c89d9e0435d228ab6f/definitions/pipelines/germline_exome.cwl