Workflow: DESeq - differential gene expression analysis
Differential gene expression analysis ===================================== Differential gene expression analysis based on the negative binomial distribution Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution. DESeq1 ------ High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. Simon Anders and Wolfgang Huber propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, [DESeq](http://bioconductor.org/packages/release/bioc/html/DESeq.html), as an R/Bioconductor package DESeq2 ------ In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. [DESeq2](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html), a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
alias | String | Experiment short name/Alias | |
threads | Integer (Optional) | Number of threads |
Number of threads for those steps that support multithreading |
group_by | Group by |
Grouping method for features: isoforms, genes or common tss |
|
batch_file | File (Optional) [Textual format] | Headerless TSV/CSV file for multi-factor analysis. First column - experiments' names from condition 1 and 2, second column - batch name |
Metadata file for multi-factor analysis. Headerless TSV/CSV file. First column - names from --ua and --ta, second column - batch name. Default: None |
rpkm_cutoff | Float (Optional) | Minimum rpkm cutoff. Applied before running DEseq |
Minimum threshold for rpkm filtering. Default: 5 |
alias_cond_1 | String (Optional) | Alias for condition 1, aka 'untreated' (letters and numbers only) |
Name to be displayed for condition 1, aka 'untreated' (letters and numbers only) |
alias_cond_2 | String (Optional) | Alias for condition 2, aka 'treated' (letters and numbers only) |
Name to be displayed for condition 2, aka 'treated' (letters and numbers only) |
rpkm_genes_cond_1 | File[] (Optional) [CSV] | RNA-Seq experiments (condition 1, aka 'untreated') |
CSV/TSV input files grouped by genes (condition 1, aka 'untreated') |
rpkm_genes_cond_2 | File[] (Optional) [CSV] | RNA-Seq experiments (condition 2, aka 'treated') |
CSV/TSV input files grouped by genes (condition 2, aka 'treated') |
sample_names_cond_1 | String[] (Optional) | Sample names for RNA-Seq experiments (condition 1, aka 'untreated') |
Aliases for RNA-Seq experiments (condition 1, aka 'untreated') to make the legend for generated plots. Order corresponds to the rpkm_isoforms_cond_1 |
sample_names_cond_2 | String[] (Optional) | Sample names for RNA-Seq experiments (condition 2, aka 'treated') |
Aliases for RNA-Seq experiments (condition 2, aka 'treated') to make the legend for generated plots. Order corresponds to the rpkm_isoforms_cond_2 |
rpkm_isoforms_cond_1 | File[] (Optional) [CSV] | RNA-Seq experiments (condition 1, aka 'untreated') |
CSV/TSV input files grouped by isoforms (condition 1, aka 'untreated') |
rpkm_isoforms_cond_2 | File[] (Optional) [CSV] | RNA-Seq experiments (condition 2, aka 'treated') |
CSV/TSV input files grouped by isoforms (condition 2, aka 'treated') |
rpkm_common_tss_cond_1 | File[] (Optional) [CSV] | RNA-Seq experiments (condition 1, aka 'untreated') |
CSV/TSV input files grouped by common TSS (condition 1, aka 'untreated') |
rpkm_common_tss_cond_2 | File[] (Optional) [CSV] | RNA-Seq experiments (condition 2, aka 'treated') |
CSV/TSV input files grouped by common TSS (condition 2, aka 'treated') |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
deseq |
../tools/deseq-advanced.cwl
(CommandLineTool)
|
Tool runs DESeq/DESeq2 script similar to the original one from BioWArdrobe.
untreated_files and treated_files input files should have the following header (case-sensitive)
<RefseqId,GeneId,Chrom,TxStart,TxEnd,Strand,TotalReads,Rpkm> - CSV
<RefseqId\tGeneId\tChrom\tTxStart\tTxEnd\tStrand\tTotalReads\tRpkm> - TSV |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
plot_pca | File (Optional) [PNG] | PCA plot for variance stabilized count data |
PCA plot for variance stabilized count data. Values are now approximately homoskedastic (have constant variance along the range of mean values) |
plot_pca_pdf | File (Optional) [PDF] | PCA plot for variance stabilized count data |
PCA plot for variance stabilized count data. Values are now approximately homoskedastic (have constant variance along the range of mean values) |
diff_expr_file | File [TSV] | Differentially expressed features grouped by isoforms, genes or common TSS |
DESeq generated file of differentially expressed features grouped by isoforms, genes or common TSS in TSV format |
phenotypes_file | File [Textual format] | Phenotype data file in CLS format. Compatible with GSEA |
DESeq generated file with phenotypes in CLS format. Compatible with GSEA |
deseq_stderr_log | File [Textual format] | DESeq stderr log |
DESeq stderr log |
deseq_stdout_log | File [Textual format] | DESeq stdout log |
DESeq stdout log |
plot_lfc_vs_mean | File (Optional) [PNG] | Plot of normalised mean versus log2 fold change |
Plot of the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples |
read_counts_file | File [GCT/Res format] | Normalized read counts in GCT format. Compatible with GSEA |
DESeq generated file of with normalized read counts in GCT format. Compatible with GSEA |
gene_expr_heatmap | File (Optional) [PNG] | Heatmap of the 30 most highly expressed features |
Heatmap showing the expression data of the 30 most highly expressed features grouped by isoforms, genes or common TSS, based on the variance stabilisation transformed data |
plot_lfc_vs_mean_pdf | File (Optional) [PDF] | Plot of normalised mean versus log2 fold change |
Plot of the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples |
gene_expr_heatmap_pdf | File (Optional) [PDF] | Heatmap of the 30 most highly expressed features |
Heatmap showing the expression data of the 30 most highly expressed features grouped by isoforms, genes or common TSS, based on the variance stabilisation transformed data |
https://w3id.org/cwl/view/git/4360fb2e778ecee42e5f78f83b78c65ab3a2b1df/workflows/deseq.cwl