GSEApy - Gene Set Enrichment Analysis in Python - Common Workflow Language Viewer

Workflow: GSEApy - Gene Set Enrichment Analysis in Python

Fetched 2023-08-11 17:44:03 GMT

Verified with cwltool version 3.1.20230201224320

GSEAPY: Gene Set Enrichment Analysis in Python ============================================== Gene Set Enrichment Analysis is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column. It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA. Documents ============================================== - GSEA Home Page: https://www.gsea-msigdb.org/gsea/index.jsp - Results Interpretation: https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm#_Interpreting_GSEA_Results - GSEA User Guide: https://gseapy.readthedocs.io/en/latest/faq.html - GSEAPY Docs: https://gseapy.readthedocs.io/en/latest/introduction.html References ============================================== - Subramanian, Tamayo, et al. (2005, PNAS), https://www.pnas.org/content/102/43/15545 - Mootha, Lindgren, et al. (2003, Nature Genetics), http://www.nature.com/ng/journal/v34/n3/abs/ng1180.html

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
seed	Integer (Optional)	Number of random seed. Default: None	Number of random seed. Default: None
threads	Integer (Optional)	Number of threads	Number of threads for those steps that support multithreading
alias_name	String	Experiment short name/Alias
graphs_count	Integer (Optional)	Numbers of top graphs produced	Numbers of top graphs produced. Default: 20
phenotypes_file	File [Textual format]	DESeq experiment	Input class vector (phenotype) file in CLS format. Same with GSEA
ranking_metrics	https://w3id.org/cwl/view/git/27bee2c853c98af5ce8ace0585b74658adc2e955/workflows/gseapy.cwl#ranking_metrics/rankingmetrics (Optional)	Methods to calculate correlations of ranking metrics	Methods to calculate correlations of ranking metrics. Default: log2_ratio_of_classes
permutation_type	https://w3id.org/cwl/view/git/27bee2c853c98af5ce8ace0585b74658adc2e955/workflows/gseapy.cwl#permutation_type/permutationtype (Optional)	Permutation type	Permutation type. Default: gene_set
read_counts_file	File [GCT/Res format]	DESeq experiment	Input gene expression dataset file in txt or gct format. Same with GSEA
gene_set_database	https://w3id.org/cwl/view/git/27bee2c853c98af5ce8ace0585b74658adc2e955/workflows/gseapy.cwl#gene_set_database/genesetdatabase (Optional)	Gene set database. Ignored if GMT file is privided	Gene set database
max_gene_set_size	Integer (Optional)	Max size of input genes presented in Gene Sets	Max size of input genes presented in Gene Sets. Default: 500
min_gene_set_size	Integer (Optional)	Min size of input genes presented in Gene Sets	Min size of input genes presented in Gene Sets. Default: 15
permutation_count	Integer (Optional)	Number of random permutations	Number of random permutations. For calculating esnulls. Default: 1000
ascending_rank_sorting	Boolean (Optional)	Ascending rank metric sorting order	Ascending rank metric sorting order. Default: False
gene_set_database_file	File (Optional) [Textual format]	Gene set database file in GMT format	Gene set database file in GMT (Gene Matrix Transposed) format

Steps

ID	Runs	Label	Doc
run_gseapy	../tools/gseapy.cwl (CommandLineTool)		GSEAPY: Gene Set Enrichment Analysis using Python ============================================== Gene Set Enrichment Analysis is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column. It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA.
convert_to_tsv	../tools/custom-bash.cwl (CommandLineTool)		Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename
report_summary	../tools/gseapy-reportsummary.cwl (CommandLineTool)		Tool runs a custom BASH command to process GSEAPY report and associated input files (gene_counts and phenotypes) to produce a summary report file of GSEA results similar to the one here: http://diverge.hunter.cuny.edu/~weigang/silac-chromatin-gsea/
rename_enrichment_plots	../tools/rename.cwl (CommandLineTool)		Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too
compress_enrichment_plots	../tools/tar-compress.cwl (CommandLineTool)	TAR compress	TAR compress ========================================= Creates compressed TAR file from a folder
rename_enrichment_heatmaps	../tools/rename.cwl (CommandLineTool)		Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too
compress_enrichment_heatmaps	../tools/tar-compress.cwl (CommandLineTool)	TAR compress	TAR compress ========================================= Creates compressed TAR file from a folder

Outputs

ID	Type	Label	Doc
summary_report	File [TIDE TXT]	Enrichment report	Enrichment report
gseapy_stderr_log	File [Textual format]	GSEApy stderr log	GSEApy stderr log
gseapy_stdout_log	File [Textual format]	GSEApy stdout log	GSEApy stdout log
summary_stderr_log	File [Textual format]	stderr log	stderr log
summary_stdout_log	File [Textual format]	stdout log	stdout log
gseapy_enrichment_plots	File	Compressed TAR with enrichment plots	Compressed TAR with enrichment plots
gseapy_enrichment_report	File [TSV]	Enrichment report	Enrichment report
gseapy_enrichment_heatmaps	File	Compressed TAR with enrichment heatmaps	Compressed TAR with enrichment heatmaps

Permalink: https://w3id.org/cwl/view/git/27bee2c853c98af5ce8ace0585b74658adc2e955/workflows/gseapy.cwl