DESeq - differential gene expression analysis - Common Workflow Language Viewer

Workflow: DESeq - differential gene expression analysis

Fetched 2023-01-03 19:15:29 GMT

Verified with cwltool version 3.1.20221201130942

Differential gene expression analysis ===================================== Differential gene expression analysis based on the negative binomial distribution Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution. DESeq1 ------ High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. Simon Anders and Wolfgang Huber propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, [DESeq](http://bioconductor.org/packages/release/bioc/html/DESeq.html), as an R/Bioconductor package DESeq2 ------ In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. [DESeq2](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html), a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
alias	String	Experiment short name/Alias
threads	Integer (Optional)	Number of threads	Number of threads for those steps that support multithreading
group_by		Group by	Grouping method for features: isoforms, genes or common tss
batch_file	File (Optional) [Textual format]	Headerless TSV/CSV file for multi-factor analysis. First column - experiments' names from condition 1 and 2, second column - batch name	Metadata file for multi-factor analysis. Headerless TSV/CSV file. First column - names from --ua and --ta, second column - batch name. Default: None
rpkm_cutoff	Float (Optional)	Minimum rpkm cutoff. Applied before running DEseq	Minimum threshold for rpkm filtering. Default: 5
alias_cond_1	String (Optional)	Alias for condition 1, aka 'untreated' (letters and numbers only)	Name to be displayed for condition 1, aka 'untreated' (letters and numbers only)
alias_cond_2	String (Optional)	Alias for condition 2, aka 'treated' (letters and numbers only)	Name to be displayed for condition 2, aka 'treated' (letters and numbers only)
rpkm_genes_cond_1	File[] (Optional) [CSV]	RNA-Seq experiments (condition 1, aka 'untreated')	CSV/TSV input files grouped by genes (condition 1, aka 'untreated')
rpkm_genes_cond_2	File[] (Optional) [CSV]	RNA-Seq experiments (condition 2, aka 'treated')	CSV/TSV input files grouped by genes (condition 2, aka 'treated')
sample_names_cond_1	String[] (Optional)	Sample names for RNA-Seq experiments (condition 1, aka 'untreated')	Aliases for RNA-Seq experiments (condition 1, aka 'untreated') to make the legend for generated plots. Order corresponds to the rpkm_isoforms_cond_1
sample_names_cond_2	String[] (Optional)	Sample names for RNA-Seq experiments (condition 2, aka 'treated')	Aliases for RNA-Seq experiments (condition 2, aka 'treated') to make the legend for generated plots. Order corresponds to the rpkm_isoforms_cond_2
rpkm_isoforms_cond_1	File[] (Optional) [CSV]	RNA-Seq experiments (condition 1, aka 'untreated')	CSV/TSV input files grouped by isoforms (condition 1, aka 'untreated')
rpkm_isoforms_cond_2	File[] (Optional) [CSV]	RNA-Seq experiments (condition 2, aka 'treated')	CSV/TSV input files grouped by isoforms (condition 2, aka 'treated')
rpkm_common_tss_cond_1	File[] (Optional) [CSV]	RNA-Seq experiments (condition 1, aka 'untreated')	CSV/TSV input files grouped by common TSS (condition 1, aka 'untreated')
rpkm_common_tss_cond_2	File[] (Optional) [CSV]	RNA-Seq experiments (condition 2, aka 'treated')	CSV/TSV input files grouped by common TSS (condition 2, aka 'treated')

Steps

ID	Runs	Label	Doc
deseq	../tools/deseq-advanced.cwl (CommandLineTool)		Tool runs DESeq/DESeq2 script similar to the original one from BioWArdrobe. untreated_files and treated_files input files should have the following header (case-sensitive) <RefseqId,GeneId,Chrom,TxStart,TxEnd,Strand,TotalReads,Rpkm> - CSV <RefseqId\tGeneId\tChrom\tTxStart\tTxEnd\tStrand\tTotalReads\tRpkm> - TSV Format of the input files is identified based on file's extension .csv - CSV .tsv - TSV Otherwise used CSV by default The output file's rows order corresponds to the rows order of the first CSV/TSV file in the untreated group. Output is always saved in TSV format Output file includes only intersected rows from all input files. Intersected by RefseqId, GeneId, Chrom, TxStart, TxEnd, Strand DESeq/DESeq2 always compares untreated_vs_treated groups. Normalized read counts and phenotype table are exported as GCT and CLS files for GSEA downstream analysis.

Outputs

ID	Type	Label	Doc
plot_pca	File (Optional) [PNG]	PCA plot for variance stabilized count data	PCA plot for variance stabilized count data. Values are now approximately homoskedastic (have constant variance along the range of mean values)
plot_pca_pdf	File (Optional) [PDF]	PCA plot for variance stabilized count data	PCA plot for variance stabilized count data. Values are now approximately homoskedastic (have constant variance along the range of mean values)
diff_expr_file	File [TSV]	Differentially expressed features grouped by isoforms, genes or common TSS	DESeq generated file of differentially expressed features grouped by isoforms, genes or common TSS in TSV format
phenotypes_file	File [Textual format]	Phenotype data file in CLS format. Compatible with GSEA	DESeq generated file with phenotypes in CLS format. Compatible with GSEA
deseq_stderr_log	File [Textual format]	DESeq stderr log	DESeq stderr log
deseq_stdout_log	File [Textual format]	DESeq stdout log	DESeq stdout log
plot_lfc_vs_mean	File (Optional) [PNG]	Plot of normalised mean versus log2 fold change	Plot of the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples
read_counts_file	File [GCT/Res format]	Normalized read counts in GCT format. Compatible with GSEA	DESeq generated file of with normalized read counts in GCT format. Compatible with GSEA
gene_expr_heatmap	File (Optional) [PNG]	Heatmap of the 30 most highly expressed features	Heatmap showing the expression data of the 30 most highly expressed features grouped by isoforms, genes or common TSS, based on the variance stabilisation transformed data
plot_lfc_vs_mean_pdf	File (Optional) [PDF]	Plot of normalised mean versus log2 fold change	Plot of the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples
gene_expr_heatmap_pdf	File (Optional) [PDF]	Heatmap of the 30 most highly expressed features	Heatmap showing the expression data of the 30 most highly expressed features grouped by isoforms, genes or common TSS, based on the variance stabilisation transformed data

Permalink: https://w3id.org/cwl/view/git/4360fb2e778ecee42e5f78f83b78c65ab3a2b1df/workflows/deseq.cwl