Workflow: Immunotherapy Workflow

Fetched 2023-01-10 09:32:19 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
ploidy Integer (Optional)
strand
refFlat File
docm_vcf File

Common mutations in cancer that will be genotyped and passed through into the merged VCF if they have even low-level evidence of a mutation (by default, marked with filter DOCM_ONLY)

expn_val Float (Optional)
omni_vcf File
rna_bams File[]
tdna_cov Integer (Optional)
tdna_vaf Float (Optional)
trna_cov Integer (Optional)
trna_vaf Float (Optional)
vep_pick
reference File reference: Reference fasta file for a desired assembly

reference contains the nucleotide sequence for a given assembly (hg37, hg38, etc.) in fasta format for the entire genome. This is what reads will be aligned to. Appropriate files can be found on ensembl at https://ensembl.org/info/data/ftp/index.html When providing the reference secondary files corresponding to reference indices must be located in the same directory as the reference itself. These files can be created with samtools index, bwa index, and picard CreateSequenceDictionary.

fasta_size Integer (Optional)
normal_cov Integer (Optional)
normal_vaf Float (Optional)
tumor_name String (Optional) tumor_name: String specifying the name of the MT sample

tumor_name provides a string for what the MT sample will be referred to in the various outputs, for example the VCF files.

exclude_nas Boolean (Optional)
netmhc_stab Boolean (Optional) netmhc_stab: sets an option whether to run NetMHCStabPan or not

netmhc_stab sets an option that decides whether it will run NetMHCStabPan after all filtering and add stability predictions to predicted epitopes.

normal_name String (Optional) normal_name: String specifying the name of the WT sample

normal_name provides a string for what the WT sample will be referred to in the various outputs, for example the VCF files.

sample_name String
somalier_vcf File
gvcf_gq_bands String[]
manta_non_wgs Boolean (Optional)
optitype_name String (Optional)
scatter_count Integer

scatters each supported variant detector (varscan, pindel, mutect) into this many parallel jobs

synonyms_file File (Optional)
vep_cache_dir Directory
bait_intervals File bait_intervals: interval_list file of baits used in the sequencing experiment

bait_intervals is an interval_list corresponding to the baits used in sequencing reagent. These are essentially coordinates for regions you were able to design probes for in the reagent. Typically the reagent provider has this information available in bed format and it can be converted to an interval_list with Picard BedToIntervalList. AstraZeneca also maintains a repo of baits for common sequencing reagents available at https://github.com/AstraZeneca-NGS/reference_data

bqsr_intervals String[] bqsr_intervals: Array of strings specifying regions for base quality score recalibration

bqsr_intervals provides an array of genomic intervals for which to apply GATK base quality score recalibrations. Typically intervals are given for the entire chromosome (chr1, chr2, etc.), these names should match the format in the reference file.

cle_vcf_filter Boolean
kallisto_index File
reference_dict File
rna_readgroups String[]
tumor_sequence https://w3id.org/cwl/view/git/1750cd5cc653f058f521b6195e3bec1e7df1a086/definitions/types/sequence_data.yml#sequence_data[] tumor_sequence: MT sequencing data and readgroup information

tumor_sequence represents the sequencing data for the MT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.

net_chop_method net_chop_method: NetChop prediction method to use ('cterm' for C term 3.0, '20s' for 20S 3.0)

net_chop_method is used to specify which NetChop prediction method to use (\"cterm\" for C term 3.0, \"20s\" for 20S 3.0). C-term 3.0 is trained with publicly available MHC class I ligands and the authors believe that is performs best in predicting the boundaries of CTL epitopes. 20S is trained with in vitro degradation data.

normal_sequence https://w3id.org/cwl/view/git/1750cd5cc653f058f521b6195e3bec1e7df1a086/definitions/types/sequence_data.yml#sequence_data[] normal_sequence: WT sequencing data and readgroup information

normal_sequence represents the sequencing data for the WT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.

pvacseq_threads Integer (Optional) pvacseq_threads: Number of threads to use for parallelizing pvacseq prediction

pvacseq_threads specifies the number of threads to use for parallelizing peptide-MHC binding prediction calls.

reference_index File
varscan_p_value Float (Optional)
bqsr_known_sites File[] bqsr_known_sites: One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.

Known polymorphic indels recommended by GATK for a variety of tools including the BaseRecalibrator. This is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 File should be in vcf format, and tabix indexed.

target_intervals File target_intervals: interval_list file of targets used in the sequencing experiment

target_intervals is an interval_list corresponding to the targets for the capture reagent. BED files with this information can be converted to interval_lists with Picard BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same.

top_score_metric
binding_threshold Integer (Optional)
read_group_fields 79197c9bb94b421e695b2f46c674b149[]
summary_intervals https://w3id.org/cwl/view/git/1750cd5cc653f058f521b6195e3bec1e7df1a086/definitions/types/labelled_file.yml#labelled_file[]
trimming_adapters File
tumor_sample_name String tumor_sample_name: Name of the tumor sample

tumor_sample_name is the name of the tumor sample being processed. When processing a multi-sample VCF the sample name must be a sample ID in the input VCF #CHROM header line.

manta_call_regions File (Optional)
net_chop_threshold Float (Optional) net_chop_threshold: NetChop prediction threshold

net_chop_threshold specifies the threshold to use for NetChop prediction; increasing the threshold results in better specificity, but worse sensitivity.

normal_sample_name String tumor_sample_name: Name of the normal sample

normal_sample_name is the name of the normal sample to use for phasing of germline variants.

per_base_intervals https://w3id.org/cwl/view/git/1750cd5cc653f058f521b6195e3bec1e7df1a086/definitions/types/labelled_file.yml#labelled_file[]
pindel_insert_size Integer
validated_variants File (Optional)

An optional VCF with variants that will be flagged as 'VALIDATED' if found in this pipeline's main output VCF

minimum_fold_change Float (Optional)
ribosomal_intervals File (Optional)
vep_ensembl_species String

ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus

vep_ensembl_version String

ensembl version - Must be present in the cache directory. Example: 95

vep_to_table_fields String[]
annotate_coding_only Boolean (Optional)
filter_docm_variants Boolean (Optional)

Determines whether variants found only via genotyping of DOCM sites will be filtered (as DOCM_ONLY) or passed through as variant calls

manta_output_contigs Boolean (Optional)
per_target_intervals https://w3id.org/cwl/view/git/1750cd5cc653f058f521b6195e3bec1e7df1a086/definitions/types/labelled_file.yml#labelled_file[]
percentile_threshold Integer (Optional)
reference_annotation File
strelka_cpu_reserved Integer (Optional)
varscan_min_coverage Integer (Optional)
varscan_min_var_freq Float (Optional)
vep_ensembl_assembly String

genome assembly to use in vep. Examples: GRCh38 or GRCm38

prediction_algorithms String[]
trimming_max_uncalled Integer
varscan_strand_filter Integer (Optional)
vep_custom_annotations https://w3id.org/cwl/view/git/1750cd5cc653f058f521b6195e3bec1e7df1a086/definitions/types/vep_custom_annotation.yml#vep_custom_annotation[]

custom type, check types directory for input format

epitope_lengths_class_i Integer[] (Optional)
qc_minimum_base_quality Integer (Optional)
target_interval_padding Integer target_interval_padding: number of bp flanking each target region in which to allow variant calls

The effective coverage of capture products generally extends out beyond the actual regions targeted. This parameter allows variants to be called in these wingspan regions, extending this many base pairs from each side of the target regions.

trimming_min_readlength Integer
varscan_max_normal_freq Float (Optional)
epitope_lengths_class_ii Integer[] (Optional)
variants_to_table_fields String[]
additional_report_columns
trimming_adapter_trim_end String
downstream_sequence_length String (Optional)
qc_minimum_mapping_quality Integer (Optional)
clinical_mhc_classI_alleles String[] (Optional) Clinical HLA typing results, limited to MHC Class I alleles; element format: HLA-X*01:02[/HLA-X...]

used to provide clinical HLA typing results in the format HLA-X*01:02[/HLA-X...] when available.

clinical_mhc_classII_alleles String[] (Optional) Clinical HLA typing results, limited to MHC Class II alleles

used to provide clinical HLA typing results; separated from class I due to nomenclature inconsistencies

gene_transcript_lookup_table File
phased_proximal_variants_vcf File (Optional)
trimming_adapter_min_overlap Integer
gatk_haplotypecaller_intervals 9801093b78bcfba821b1f77698a60df2[]
mutect_artifact_detection_mode Boolean
readcount_minimum_base_quality Integer (Optional)
maximum_transcript_support_level
picard_metric_accumulation_level String
readcount_minimum_mapping_quality Integer (Optional)
run_reference_proteome_similarity Boolean (Optional) run_reference_proteome_similarity: sets an option whether to run reference proteome similarity or not

run_reference_proteome_similarity sets an option that decides whether it will run reference proteome similarity after all filtering and BLAST peptide sequences against the reference proteome to see if they appear elsewhere in the proteome.

variants_to_table_genotype_fields String[]
allele_specific_binding_thresholds Boolean (Optional)
mutect_max_alt_alleles_in_normal_count Integer (Optional)
mutect_max_alt_allele_in_normal_fraction Float (Optional)

Steps

ID Runs Label Doc
rnaseq
rnaseq.cwl (Workflow)
RNA-Seq alignment and transcript/gene abundance workflow
pvacseq Workflow to run pVACseq from detect_variants and rnaseq pipeline outputs
somatic
somatic_exome.cwl (Workflow)
somatic_exome: exome alignment and somatic variant detection

somatic_exome is designed to perform processing of mutant/wildtype H.sapiens exome sequencing data. It features BQSR corrected alignments, 4 caller variant detection, and vep style annotations. Structural variants are detected via manta and cnvkit. In addition QC metrics are run, including somalier concordance metrics.

example input file = analysis_workflows/example_data/somatic_exome.yaml

germline exome alignment and germline variant detection, with optitype for HLA typing
phase_vcf phase VCF
hla_consensus
../tools/hla_consensus.cwl (CommandLineTool)
Script to create consensus from optitype and clinical HLA typing
extract_alleles
../tools/extract_hla_alleles.cwl (CommandLineTool)
intersect_passing_variants
../tools/intersect_known_variants.cwl (CommandLineTool)
Intersect passing validated variants and passing pipeline variants for use in pvacseq

Outputs

ID Type Label Doc
cram File
chart File (Optional) Plot for RNA-seq diagnosis/quality metrics

PDF file for the plot of RNA sequencing coverage at the normalized position across transcript as RNA-seq diagnosis/quality metrics, created by picard CollectRnaSeqMetrics tool

metrics File RNA-seq Diagnosis/quality metrics from tumor RNA

RNA-seq Diagnosis/quality metrics showing the distribution of the bases within the transcripts, created by picard CollectRnaSeqMetrics tool

final_bam File Sorted BAM from tumor RNA

Sorted BAM file of sequencing read alignments by HISAT2 with duplicate reads tagged

final_tsv File
flagstats File
cn_diagram File (Optional)
hs_metrics File
phased_vcf File
tumor_cram File Sorted CRAM from tumor DNA

Sorted CRAM file of sequencing read alignments by bwa-mem from a tumor DNA sample with duplicate reads tagged

normal_cram File Sorted CRAM from normal DNA

Sorted CRAM file of sequencing read alignments by bwa-mem from a normal DNA sample with duplicate reads tagged

final_bigwig File
optitype_tsv File
allele_string String[]
annotated_tsv File
annotated_vcf File
optitype_plot File
all_candidates File
gene_abundance File Gene-level abundance output by tximport with kallisto output

Tab-delimited file containing the abundance estimates summarized in the gene level with kallisto output by Bioconductor tximport tool

hla_call_files Directory
cn_scatter_plot File (Optional)
tumor_flagstats File Sequencing count metrics based on SAM FLAG field from tumor sample

Summary with the count numbers of alignments for each FLAG type from a tumor DNA sample, including 13 categories based on the bit flags in the FLAG field

diploid_variants File (Optional)
germline_raw_vcf File
intervals_target File (Optional)
normal_flagstats File Sequencing count metrics based on SAM FLAG field from normal sample

Summary with the count numbers of alignments for each FLAG type from a normal DNA sample, including 13 categories based on the bit flags in the FLAG field

small_candidates File
somatic_variants File (Optional)
tumor_hs_metrics File Sequencing coverage summary of target intervals from tumor DNA

Diagnosis/quality metrics specific for sequencing data generated through hybrid-selection (e.g. whole exome) from a tumor DNA sample, for example to assess target coverage of WES

consensus_alleles String[]
docm_filtered_vcf File
normal_hs_metrics File Sequencing coverage summary of target intervals from normal DNA

Diagnosis/quality metrics specific for sequencing data generated through hybrid-selection (e.g. whole exome) from a normal DNA sample, for example to assess target coverage

somatic_final_vcf File
final_filtered_vcf File
germline_final_vcf File
reference_coverage File (Optional)
summary_hs_metrics File[]
insert_size_metrics File
mutect_filtered_vcf File
per_base_hs_metrics File[]
pindel_filtered_vcf File
pvacseq_predictions Directory
somatic_vep_summary File
tumor_only_variants File (Optional)
verify_bam_id_depth File
germline_vep_summary File
intervals_antitarget File (Optional)
strelka_filtered_vcf File
varscan_filtered_vcf File
germline_filtered_vcf File
insert_size_histogram File
mutect_unfiltered_vcf File
per_target_hs_metrics File[]
pindel_unfiltered_vcf File
tumor_target_coverage File
verify_bam_id_metrics File
normal_target_coverage File
strelka_unfiltered_vcf File
tumor_bin_level_ratios File
tumor_segmented_ratios File
varscan_unfiltered_vcf File
mark_duplicates_metrics File
transcript_abundance_h5 File Transcript-level abundance table in HDF5 format by kallisto

HDF5 binary file containing transcript-level abundance esimates, bootstrap estimate, and so on, created by kallisto

stringtie_transcript_gtf File Transcript GTF assembled from tumor RNA by StringTie

GTF file containing the transcripts assembled from the tumor RNA sample, created by StringTie

transcript_abundance_tsv File Transcript-level abundance table by kallisto

Tab-delimited file containing transcript-level abundance estimates in TPM, created by kallisto

tumor_summary_hs_metrics File[]
alignment_summary_metrics File
normal_summary_hs_metrics File[]
per_base_coverage_metrics File[]
tumor_antitarget_coverage File
tumor_insert_size_metrics File Paired-end sequencing diagnosis/quality metrics from tumor DNA

Diagnosis/quality metrics including the insert size distribution and read orientation of the paired-end libraries from a tumor DNA sample

tumor_per_base_hs_metrics File[] Sequencing coverage summary at target sites from tumor DNA

Diagnosis/quality metrics for sequencing coverage at target sites (optional, known variant sites of clinical significance from ClinVar for example) from a tumor DNA sample

tumor_verify_bam_id_depth File Sequencing quality assessment metric for tumor sample genotyping

verifyBamID output files showing the sequencing depth distribution at the marker positions from Omni genotype data with a tumor DNA sample, across all readGroups and per readGroup separately

normal_antitarget_coverage File
normal_insert_size_metrics File Paired-end sequencing diagnosis/quality metrics from normal DNA

Diagnosis/quality metrics including the insert size distribution and read orientation of the paired-end libraries from a normal DNA sample

normal_per_base_hs_metrics File[] Sequencing coverage summary at target sites from normal DNA

Diagnosis/quality metrics for sequencing coverage at target sites (optional, known variant sites of clinical significance from ClinVar for example) from a normal DNA sample

normal_verify_bam_id_depth File Sequencing quality assessment metric for normal sample genotyping

verifyBamID output files showing the sequencing depth distribution at the marker positions from Omni genotype data with a normal DNA sample, across all readGroups and per readGroup separately

per_target_coverage_metrics File[]
tumor_per_target_hs_metrics File[] Sequencing coverage summary of target intervals from tumor DNA

Diagnosis/quality metrics for sequencing coverage for target intervals (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a tumor DNA sample

tumor_snv_bam_readcount_tsv File
tumor_verify_bam_id_metrics File Sequencing quality assessment metric for tumor sample contamination

verifyBamID output files containing the contamination estimate in a tumor DNA sample, across all readGroups and per readGroup separately

normal_per_target_hs_metrics File[] Sequencing coverage summary of target intervals from normal DNA

Diagnosis/quality metrics for sequencing coverage for target intervals (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a normal DNA sample

normal_snv_bam_readcount_tsv File
normal_verify_bam_id_metrics File Sequencing quality assessment metric for normal sample contamination

verifyBamID output files containing the contamination estimate in a normal DNA sample, across all readGroups and per readGroup separately

somalier_concordance_metrics File
stringtie_gene_expression_tsv File Gene abundance table from tumor RNA by StringTie

Tab-delimited file containing gene abundances in FPKM and TPM, created by StringTie

tumor_indel_bam_readcount_tsv File
tumor_mark_duplicates_metrics File Sequencing duplicate metrics from tumor DNA

Duplication metrics on duplicate sequencing reads from a tumor DNA sample, identified by picard MarkDuplicates tool

normal_indel_bam_readcount_tsv File
normal_mark_duplicates_metrics File Sequencing duplicate metrics from normal DNA

Duplication metrics on duplicate sequencing reads from a normal DNA sample, identified by picard MarkDuplicates tool

somalier_concordance_statistics File
tumor_alignment_summary_metrics File Sequencign alignment summary from tumor DNA

Diagnosis/quality metrics summarizing the quality of sequencing read alignments from a tumor DNA sample, reported by the picard CollectAlignmentSummaryMetrics tool

tumor_per_base_coverage_metrics File[] Sequencing per-base coverage summary at target sites from tumor DNA

Diagnosis/quality metrics showing detailed sequencing coverage per target site (optional, known variant sites of clinical significance from ClinVar for example) from a tumor DNA sample

normal_alignment_summary_metrics File Sequencign alignment summary from normal DNA

Diagnosis/quality metrics summarizing the quality of sequencing read alignments from a normal DNA sample, reported by the picard CollectAlignmentSummaryMetrics tool

normal_per_base_coverage_metrics File[] Sequencing per-base coverage summary at target sites from normal DNA

Diagnosis/quality metrics showing detailed sequencing coverage per target site (optional, known variant sites of clinical significance from ClinVar for example) from a normal DNA sample

tumor_per_target_coverage_metrics File[] Sequencing per-target coverage summary of target intervals from tumor DNA

Diagnosis/quality metrics showing detailed sequencing coverage per target interval (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a tumor DNA sample

normal_per_target_coverage_metrics File[] Sequencing per-target coverage summary of target intervals from normal DNA

Diagnosis/quality metrics showing detailed sequencing coverage per target interval (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a normal DNA sample

Permalink: https://w3id.org/cwl/view/git/1750cd5cc653f058f521b6195e3bec1e7df1a086/definitions/pipelines/immuno.cwl