CWL Workflow: Immunotherapy Workflow

Workflow: Immunotherapy Workflow

Fetched 2023-01-10 12:27:15 GMT

Verified with cwltool version 3.1.20221201130942

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: MIT License

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
mills	File	mills: File specifying common polymorphic indels from mills et al.	mills provides known polymorphic indels recommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essentially it is a list of known indels originally discovered by mill et al. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557762/ File should be in vcf format, and tabix indexed.
ploidy	Integer (Optional)
strand
refFlat	File
docm_vcf	File
expn_val	Float (Optional)
omni_vcf	File
rna_bams	File[]
tdna_cov	Integer (Optional)
tdna_vaf	Float (Optional)
trna_cov	Integer (Optional)
trna_vaf	Float (Optional)
vep_pick
dbsnp_vcf	File	dbsnp_vcf: File specifying common polymorphic indels from dbSNP	dbsnp_vcf provides known indels reecommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essintially it is a list of known indels from dbSNP. File should be in vcf format, and tabix indexed.
reference	File	reference: Reference fasta file for a desired assembly	reference contains the nucleotide sequence for a given assembly (hg37, hg38, etc.) in fasta format for the entire genome. This is what reads will be aligned to. Appropriate files can be found on ensembl at https://ensembl.org/info/data/ftp/index.html When providing the reference secondary files corresponding to reference indices must be located in the same directory as the reference itself. These files can be created with samtools index, bwa index, and picard CreateSequenceDictionary.
cosmic_vcf	File (Optional)
fasta_size	Integer (Optional)
normal_cov	Integer (Optional)
normal_vaf	Float (Optional)
tumor_name	String (Optional)	tumor_name: String specifying the name of the MT sample	tumor_name provides a string for what the MT sample will be referred to in the various outputs, for exmaple the VCF files.
exclude_nas	Boolean (Optional)
netmhc_stab	Boolean (Optional)	netmhc_stab: sets an option whether to run NetMHCStabPan or not	netmhc_stab sets an option that decides whether it will run NetMHCStabPan after all filtering and add stability predictions to predicted epitopes.
normal_name	String (Optional)	normal_name: String specifying the name of the WT sample	normal_name provides a string for what the WT sample will be referred to in the various outputs, for exmaple the VCF files.
sample_name	String
known_indels	File	known_indels: File specifying common polymorphic indels from 1000G	known_indels provides known indels reecommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essintially it is a list of known indels from 1000 Genomes Phase I indel calls. File should be in vcf format, and tabix indexed.
somalier_vcf	File
gvcf_gq_bands	String[]
manta_non_wgs	Boolean (Optional)
optitype_name	String (Optional)
synonyms_file	File (Optional)
vep_cache_dir	Directory
bait_intervals	File	bait_intervals: interval_list file of baits used in the sequencing experiment	bait_intervals is an interval_list corresponding to the baits used in sequencing reagent. These are essentially coordinates for regions you were able to design probes for in the reagent. Typically the reagent provider has this information available in bed format and it can be converted to an interval_list with Picard BedToIntervalList. AstraZeneca also maintains a repo of baits for common sequencing reagents available at https://github.com/AstraZeneca-NGS/reference_data
bqsr_intervals	String[]	bqsr_intervals: Array of strings specifying regions for base quality score recalibration	bqsr_intervals provides an array of genomic intervals for which to apply GATK base quality score recalibrations. Typically intervals are given for the entire chromosome (i.e. chr1, chr2, etc.), these names should match the format in the reference file.
cle_vcf_filter	Boolean
kallisto_index	File
known_variants	File (Optional)		Previously discovered variants to be flagged in this pipelines's output vcf
reference_dict	File
rna_readgroups	String[]
tumor_sequence	https://w3id.org/cwl/view/git/04d21c33a5f2950e86db285fa0a32a6659198d8a/definitions/types/sequence_data.yml#sequence_data[]	tumor_sequence: file specifying the location of MT sequencing data	tumor_sequence is a data structure described in sequence_data.yml used to pass information regarding sequencing data for single sample (i.e. fastq files). If more than one fastq file exist for a sample, as in the case for multiple instrument data, the sequence tag is simply repeated with the additional data (see example input file). Note that in the @RG field ID and SM are required.
epitope_lengths	Integer[] (Optional)
net_chop_method		net_chop_method: NetChop prediction method to use ('cterm' for C term 3.0, '20s' for 20S 3.0)	net_chop_method is used to specify which NetChop prediction method to use (\"cterm\" for C term 3.0, \"20s\" for 20S 3.0). C-term 3.0 is trained with publicly available MHC class I ligands and the authors believe that is performs best in predicting the boundaries of CTL epitopes. 20S is trained with in vitro degradation data.
normal_sequence	https://w3id.org/cwl/view/git/04d21c33a5f2950e86db285fa0a32a6659198d8a/definitions/types/sequence_data.yml#sequence_data[]	normal_sequence: file specifying the location of WT sequencing data	normal_sequence is a data structure described in sequence_data.yml used to pass information regarding sequencing data for single sample (i.e. fastq files). If more than one fastq file exist for a sample, as in the case for multiple instrument data, the sequence tag is simply repeated with the additional data (see example input file). Note that in the @RG field ID and SM are required.
pvacseq_threads	Integer (Optional)	pvacseq_threads: Number of threads to use for parallelizing pvacseq prediction	pvacseq_threads specifies the number of threads to use for parallelizing peptide-MHC binding prediction calls.
reference_index	File
varscan_p_value	Float (Optional)
target_intervals	File	target_intervals: interval_list file of targets used in the sequencing experiment	target_intervals is an interval_list corresponding to the targets for the capture reagent. BED files with this information can be converted to interval_lists with Picard BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same.
top_score_metric
binding_threshold	Integer (Optional)
read_group_fields	9e1f9ee45a365577d99e22dc6cd8acb8[]
summary_intervals	https://w3id.org/cwl/view/git/04d21c33a5f2950e86db285fa0a32a6659198d8a/definitions/types/labelled_file.yml#labelled_file[]
trimming_adapters	File
tumor_sample_name	String	tumor_sample_name: Name of the tumor sample	tumor_sample_name is the name of the tumor sample being processed. When processing a multi-sample VCF the sample name must be a sample ID in the input VCF #CHROM header line.
manta_call_regions	File (Optional)
net_chop_threshold	Float (Optional)	net_chop_threshold: NetChop prediction threshold	net_chop_threshold specifies the threshold to use for NetChop prediction; increasing the threshold results in better specificity, but worse sensitivity.
normal_sample_name	String	tumor_sample_name: Name of the normal sample	normal_sample_name is the name of the normal sample to use for phasing of germline variants.
per_base_intervals	https://w3id.org/cwl/view/git/04d21c33a5f2950e86db285fa0a32a6659198d8a/definitions/types/labelled_file.yml#labelled_file[]
pindel_insert_size	Integer
minimum_fold_change	Float (Optional)
ribosomal_intervals	File (Optional)
vep_ensembl_species	String		ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus
vep_ensembl_version	String		ensembl version - Must be present in the cache directory. Example: 95
vep_to_table_fields	String[]
annotate_coding_only	Boolean (Optional)
filter_docm_variants	Boolean (Optional)
manta_output_contigs	Boolean (Optional)
mutect_scatter_count	Integer
panel_of_normals_vcf	File (Optional)
per_target_intervals	https://w3id.org/cwl/view/git/04d21c33a5f2950e86db285fa0a32a6659198d8a/definitions/types/labelled_file.yml#labelled_file[]
reference_annotation	File
strelka_cpu_reserved	Integer (Optional)
varscan_min_coverage	Integer (Optional)
varscan_min_var_freq	Float (Optional)
vep_ensembl_assembly	String		genome assembly to use in vep. Examples: GRCh38 or GRCm38
prediction_algorithms	String[]
trimming_max_uncalled	Integer
varscan_strand_filter	Integer (Optional)
vep_custom_annotations	https://w3id.org/cwl/view/git/04d21c33a5f2950e86db285fa0a32a6659198d8a/definitions/types/vep_custom_annotation.yml#vep_custom_annotation[]		custom type, check types directory for input format
peptide_sequence_length	Integer (Optional)
qc_minimum_base_quality	Integer (Optional)
target_interval_padding	Integer	target_interval_padding: number of bp flanking each target region in which to allow variant calls	The effective coverage of capture products generally extends out beyond the actual regions targeted. This parameter allows variants to be called in these wingspan regions, extending this many base pairs from each side of the target regions.
trimming_min_readlength	Integer
varscan_max_normal_freq	Float (Optional)
variants_to_table_fields	String[]
additional_report_columns
emit_reference_confidence
trimming_adapter_trim_end	String
downstream_sequence_length	String (Optional)
qc_minimum_mapping_quality	Integer (Optional)
clinical_mhc_classI_alleles	String[] (Optional)	Clinical HLA typing results, limited to MHC Class I alleles; element format: HLA-X*01:02[/HLA-X...]	used to provide clinical HLA typing results in the format HLA-X*01:02[/HLA-X...] when available.
clinical_mhc_classII_alleles	String[] (Optional)	Clinical HLA typing results, limited to MHC Class II alleles	used to provide clinical HLA typing results; separated from class I due to nomenclature inconsistencies
gene_transcript_lookup_table	File
phased_proximal_variants_vcf	File (Optional)
trimming_adapter_min_overlap	Integer
gatk_haplotypecaller_intervals	f170caffb40a8ff38b5af51cc579cdbc[]
mutect_artifact_detection_mode	Boolean
readcount_minimum_base_quality	Integer (Optional)
maximum_transcript_support_level
picard_metric_accumulation_level	String
readcount_minimum_mapping_quality	Integer (Optional)
variants_to_table_genotype_fields	String[]
allele_specific_binding_thresholds	Boolean (Optional)
mutect_max_alt_alleles_in_normal_count	Integer (Optional)
mutect_max_alt_allele_in_normal_fraction	Float (Optional)

Steps

ID	Runs	Label	Doc
rnaseq	rnaseq.cwl (Workflow)	RNA-Seq alignment and transcript/gene abundance workflow
pvacseq	../subworkflows/pvacseq.cwl (Workflow)	Workflow to run pVACseq from detect_variants and rnaseq pipeline outputs
somatic	somatic_exome.cwl (Workflow)	somatic_exome: exome alignment and somatic variant detection	somatic_exome is designed to perform processing of mutant/wildtype H.sapiens exome sequencing data. It features BQSR corrected alignments, 4 caller variant detection, and vep style annotations. Structural variants are detected via manta and cnvkit. In addition QC metrics are run, including somalier concordance metrics. example input file = analysis_workflows/example_data/somatic_exome.yaml
germline	germline_exome_hla_typing.cwl (Workflow)	exome alignment and germline variant detection, with optitype for HLA typing
phase_vcf	../subworkflows/phase_vcf.cwl (Workflow)	phase VCF
hla_consensus	../tools/hla_consensus.cwl (CommandLineTool)	Script to create consensus from optitype and clinical HLA typing
extract_alleles	../tools/extract_hla_alleles.cwl (CommandLineTool)

Outputs

ID	Type	Label	Doc
cram	File
gvcf	File[]
chart	File (Optional)	Plot for RNA-seq diagnosis/quality metrics	PDF file for the plot of RNA sequencing coverage at the normalized position across transcript as RNA-seq diagnosis/quality metrics, created by picard CollectRnaSeqMetrics tool
metrics	File	RNA-seq Diagnosis/quality metrics from tumor RNA	RNA-seq Diagnosis/quality metrics showing the distribution of the bases within the transcripts, created by picard CollectRnaSeqMetrics tool
final_bam	File	Sorted BAM from tumor RNA	Sorted BAM file of sequencing read alignments by HISAT2 with duplicate reads tagged
final_tsv	File
flagstats	File
cn_diagram	File (Optional)
hs_metrics	File
phased_vcf	File
tumor_cram	File	Sorted CRAM from tumor DNA	Sorted CRAM file of sequencing read alignments by bwa-mem from a tumor DNA sample with duplicate reads tagged
normal_cram	File	Sorted CRAM from normal DNA	Sorted CRAM file of sequencing read alignments by bwa-mem from a normal DNA sample with duplicate reads tagged
optitype_tsv	File
allele_string	String[]
annotated_tsv	File
annotated_vcf	File
optitype_plot	File
all_candidates	File
gene_abundance	File	Gene-level abundance output by tximport with kallisto output	Tab-delimited file containing the abundance estimates summarized in the gene level with kallisto output by Bioconductor tximport tool
hla_call_files	Directory
cn_scatter_plot	File (Optional)
tumor_flagstats	File	Sequencing count metrics based on SAM FLAG field from tumor sample	Summary with the count numbers of alignments for each FLAG type from a tumor DNA sample, including 13 categories based on the bit flags in the FLAG field
diploid_variants	File (Optional)
germline_raw_vcf	File
intervals_target	File (Optional)
normal_flagstats	File	Sequencing count metrics based on SAM FLAG field from normal sample	Summary with the count numbers of alignments for each FLAG type from a normal DNA sample, including 13 categories based on the bit flags in the FLAG field
small_candidates	File
somatic_variants	File (Optional)
tumor_hs_metrics	File	Sequencing coverage summary of target intervals from tumor DNA	Diagnosis/quality metrics specific for sequencing data generated through hybrid-selection (e.g. whole exome) from a tumor DNA sample, for example to assess target coverage of WES
consensus_alleles	String[]
docm_filtered_vcf	File
normal_hs_metrics	File	Sequencing coverage summary of target intervals from normal DNA	Diagnosis/quality metrics specific for sequencing data generated through hybrid-selection (e.g. whole exome) from a normal DNA sample, for example to assess target coverage
somatic_final_vcf	File
final_filtered_vcf	File
germline_final_vcf	File
reference_coverage	File (Optional)
summary_hs_metrics	File[]
insert_size_metrics	File
mutect_filtered_vcf	File
per_base_hs_metrics	File[]
pindel_filtered_vcf	File
pvacseq_predictions	Directory
somatic_vep_summary	File
tumor_only_variants	File (Optional)
verify_bam_id_depth	File
germline_vep_summary	File
intervals_antitarget	File (Optional)
strelka_filtered_vcf	File
varscan_filtered_vcf	File
germline_filtered_vcf	File
insert_size_histogram	File
mutect_unfiltered_vcf	File
per_target_hs_metrics	File[]
pindel_unfiltered_vcf	File
tumor_target_coverage	File
verify_bam_id_metrics	File
normal_target_coverage	File
strelka_unfiltered_vcf	File
tumor_bin_level_ratios	File
tumor_segmented_ratios	File
varscan_unfiltered_vcf	File
mark_duplicates_metrics	File
transcript_abundance_h5	File	Transcript-level abundance table in HDF5 format by kallisto	HDF5 binary file containing transcript-level abundance esimates, bootstrap estimate, and so on, created by kallisto
stringtie_transcript_gtf	File	Transcript GTF assembled from tumor RNA by StringTie	GTF file containing the transcripts assembled from the tumor RNA sample, created by StringTie
transcript_abundance_tsv	File	Transcript-level abundance table by kallisto	Tab-delimited file containing transcript-level abundance estimates in TPM, created by kallisto
tumor_summary_hs_metrics	File[]
alignment_summary_metrics	File
normal_summary_hs_metrics	File[]
per_base_coverage_metrics	File[]
tumor_antitarget_coverage	File
tumor_insert_size_metrics	File	Paired-end sequencing diagnosis/quality metrics from tumor DNA	Diagnosis/quality metrics including the insert size distribution and read orientation of the paired-end libraries from a tumor DNA sample
tumor_per_base_hs_metrics	File[]	Sequencing coverage summary at target sites from tumor DNA	Diagnosis/quality metrics for sequencing coverage at target sites (optional, known variant sites of clinical significance from ClinVar for example) from a tumor DNA sample
tumor_verify_bam_id_depth	File	Sequencing quality assessment metric for tumor sample genotyping	verifyBamID output files showing the sequencing depth distribution at the marker positions from Omni genotype data with a tumor DNA sample, across all readGroups and per readGroup separately
normal_antitarget_coverage	File
normal_insert_size_metrics	File	Paired-end sequencing diagnosis/quality metrics from normal DNA	Diagnosis/quality metrics including the insert size distribution and read orientation of the paired-end libraries from a normal DNA sample
normal_per_base_hs_metrics	File[]	Sequencing coverage summary at target sites from normal DNA	Diagnosis/quality metrics for sequencing coverage at target sites (optional, known variant sites of clinical significance from ClinVar for example) from a normal DNA sample
normal_verify_bam_id_depth	File	Sequencing quality assessment metric for normal sample genotyping	verifyBamID output files showing the sequencing depth distribution at the marker positions from Omni genotype data with a normal DNA sample, across all readGroups and per readGroup separately
per_target_coverage_metrics	File[]
tumor_per_target_hs_metrics	File[]	Sequencing coverage summary of target intervals from tumor DNA	Diagnosis/quality metrics for sequencing coverage for target intervals (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a tumor DNA sample
tumor_snv_bam_readcount_tsv	File
tumor_verify_bam_id_metrics	File	Sequencing quality assessment metric for tumor sample contamination	verifyBamID output files containing the contamination estimate in a tumor DNA sample, across all readGroups and per readGroup separately
normal_per_target_hs_metrics	File[]	Sequencing coverage summary of target intervals from normal DNA	Diagnosis/quality metrics for sequencing coverage for target intervals (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a normal DNA sample
normal_snv_bam_readcount_tsv	File
normal_verify_bam_id_metrics	File	Sequencing quality assessment metric for normal sample contamination	verifyBamID output files containing the contamination estimate in a normal DNA sample, across all readGroups and per readGroup separately
somalier_concordance_metrics	File
stringtie_gene_expression_tsv	File	Gene abundance table from tumor RNA by StringTie	Tab-delimited file containing gene abundances in FPKM and TPM, created by StringTie
tumor_indel_bam_readcount_tsv	File
tumor_mark_duplicates_metrics	File	Sequencing duplicate metrics from tumor DNA	Duplication metrics on duplicate sequencing reads from a tumor DNA sample, identified by picard MarkDuplicates tool
normal_indel_bam_readcount_tsv	File
normal_mark_duplicates_metrics	File	Sequencing duplicate metrics from normal DNA	Duplication metrics on duplicate sequencing reads from a normal DNA sample, identified by picard MarkDuplicates tool
somalier_concordance_statistics	File
tumor_alignment_summary_metrics	File	Sequencign alignment summary from tumor DNA	Diagnosis/quality metrics summarizing the quality of sequencing read alignments from a tumor DNA sample, reported by the picard CollectAlignmentSummaryMetrics tool
tumor_per_base_coverage_metrics	File[]	Sequencing per-base coverage summary at target sites from tumor DNA	Diagnosis/quality metrics showing detailed sequencing coverage per target site (optional, known variant sites of clinical significance from ClinVar for example) from a tumor DNA sample
normal_alignment_summary_metrics	File	Sequencign alignment summary from normal DNA	Diagnosis/quality metrics summarizing the quality of sequencing read alignments from a normal DNA sample, reported by the picard CollectAlignmentSummaryMetrics tool
normal_per_base_coverage_metrics	File[]	Sequencing per-base coverage summary at target sites from normal DNA	Diagnosis/quality metrics showing detailed sequencing coverage per target site (optional, known variant sites of clinical significance from ClinVar for example) from a normal DNA sample
tumor_per_target_coverage_metrics	File[]	Sequencing per-target coverage summary of target intervals from tumor DNA	Diagnosis/quality metrics showing detailed sequencing coverage per target interval (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a tumor DNA sample
normal_per_target_coverage_metrics	File[]	Sequencing per-target coverage summary of target intervals from normal DNA	Diagnosis/quality metrics showing detailed sequencing coverage per target interval (optional, 59 genes recommended by ACMG for clinical exome and genome sequencing for example) from a normal DNA sample

Permalink: https://w3id.org/cwl/view/git/04d21c33a5f2950e86db285fa0a32a6659198d8a/definitions/pipelines/immuno.cwl