DESeq2 (LRT) - differential gene expression analysis using likelihood ratio test

Workflow: DESeq2 (LRT) - differential gene expression analysis using likelihood ratio test

Fetched 2023-01-09 03:55:35 GMT

Verified with cwltool version 3.1.20221201130942

Runs DESeq2 using LRT (Likelihood Ratio Test) ============================================= The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model. When one performs a likelihood ratio test, the p values and the test statistic (the stat column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test p values therefore represent a test of all the variables and all the levels of factors which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This indicates that the p value is for the likelihood ratio test of all the variables and all the levels, while the log fold change is a single comparison from among those variables and levels. **Technical notes** 1. At least two biological replicates are required for every compared category 2. Metadata file describes relations between compared experiments, for example ``` ,time,condition DH1,day5,WT DH2,day5,KO DH3,day7,WT DH4,day7,KO DH5,day7,KO ``` where `time, condition, day5, day7, WT, KO` should be a single words (without spaces) and `DH1, DH2, DH3, DH4, DH5` correspond to the experiment aliases set in **RNA-Seq experiments** input. 3. Design and reduced formulas should start with **~** and include categories or, optionally, their interactions from the metadata file header. See details in DESeq2 manual [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions) and [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test) 4. Contrast should be set based on your metadata file header and available categories in a form of `Factor Numerator Denominator`, where `Factor` - column name from metadata file, `Numerator` - category from metadata file to be used as numerator in fold change calculation, `Denominator` - category from metadata file to be used as denominator in fold change calculation. For example `condition WT KO`.

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
alias	String	Experiment short name/Alias
threads	Integer (Optional)	Number of threads	Number of threads for those steps that support multithreading
contrast	String	Contrast to be be applied for output, formatted as Factor Numerator Denominator	Contrast to be be applied for output, formatted as Factor Numerator Denominator or 'Factor Numerator Denominator'
group_by		Group by	Grouping method for features: isoforms, genes or common tss
metadata_file	File [Textual format]	Metadata file to describe categories. See workflow description for details	Metadata file to describe relation between samples, formatted as CSV/TSV
design_formula	String	Design formula. See workflow description for details	Design formula. Should start with ~. See DeSeq2 manual for details
reduced_formula	String	Reduced formula to compare against with the term(s) of interest removed. See workflow description for details	Reduced formula to compare against with the term(s) of interest removed. Should start with ~. See DeSeq2 manual for details
expression_files	File[] [CSV]	RNA-Seq experiments	CSV/TSV input files grouped by isoforms
expression_file_names	String[]	RNA-Seq experiments	Aliases for RNA-Seq experiments. The same aliases should be used in metadata file

Steps

ID	Runs	Label	Doc
deseq	../tools/deseq-lrt.cwl (CommandLineTool)	DESeq2 (LRT) - differential gene expression analysis using likelihood ratio test	Runs DESeq2 using LRT (Likelihood Ratio Test) The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model. When one performs a likelihood ratio test, the p values and the test statistic (the stat column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test p values therefore represent a test of all the variables and all the levels of factors which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This indicates that the p value is for the likelihood ratio test of all the variables and all the levels, while the log fold change is a single comparison from among those variables and levels. Note: at least two biological replicates are required for every compared category. All input CSV/TSV files should have the following header (case-sensitive) <RefseqId,GeneId,Chrom,TxStart,TxEnd,Strand,TotalReads,Rpkm> - CSV <RefseqId\tGeneId\tChrom\tTxStart\tTxEnd\tStrand\tTotalReads\tRpkm> - TSV Format of the input files is identified based on file's extension .csv - CSV .tsv - TSV Otherwise used CSV by default The output file's rows order corresponds to the rows order of the first CSV/TSV file. Output file is always saved in TSV format Output file includes only intersected rows from all input files. Intersected by RefseqId, GeneId, Chrom, TxStart, TxEnd, Strand Additionally we calculate -LOG10(pval) and -LOG10(padj) Example of CSV metadata file set with --meta ,time,condition DH1,day5,WT DH2,day5,KO DH3,day7,WT DH4,day7,KO DH5,day7,KO where time, condition, day5, day7, WT, KO should be a single words (without spaces) and DH1, DH2, DH3, DH4, DH5 correspond to the --names (spaces are allowed) --contrast should be set based on your metadata file in a form of Factor Numerator Denominator where Factor - columns name from metadata file Numerator - category from metadata file to be used as numerator in fold change calculation Denominator - category from metadata file to be used as denominator in fold change calculation for example condition WT KO if --contrast is set as a single string \"condition WT KO\" then is will be splitted by space
group_isoforms	../tools/group-isoforms-batch.cwl (Workflow)		Workflow runs group-isoforms.cwl tool using scatter for isoforms_file input. genes_filename and common_tss_filename inputs are ignored.
make_volcano_plot	../tools/custom-bash.cwl (CommandLineTool)		Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

Outputs

ID	Type	Label	Doc
ma_plot	File [PNG]	Plot of normalised mean versus log2 fold change	Plot of the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples
volcano_plot	File [TSV]	Volcano plot	TSV file with input data to build volcano plot - log2FoldChange vs -LOG10(padj)
diff_expr_file	File [TSV]	Differentially expressed features grouped by isoforms, genes or common TSS	DESeq2 generated file of differentially expressed features grouped by isoforms, genes or common TSS in TSV format
deseq_stderr_log	File [Textual format]	DeSeq2 stderr log	DeSeq2 stderr log
deseq_stdout_log	File [Textual format]	DeSeq2 stdout log	DeSeq2 stdout log

Permalink: https://w3id.org/cwl/view/git/581156366f91861bd4dbb5bcb59f67d468b32af3/workflows/deseq-lrt.cwl