Workflow: GSEApy - Gene Set Enrichment Analysis in Python

Fetched 2023-08-11 17:44:03 GMT

GSEAPY: Gene Set Enrichment Analysis in Python ============================================== Gene Set Enrichment Analysis is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column. It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA. Documents ============================================== - GSEA Home Page: https://www.gsea-msigdb.org/gsea/index.jsp - Results Interpretation: https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm#_Interpreting_GSEA_Results - GSEA User Guide: https://gseapy.readthedocs.io/en/latest/faq.html - GSEAPY Docs: https://gseapy.readthedocs.io/en/latest/introduction.html References ============================================== - Subramanian, Tamayo, et al. (2005, PNAS), https://www.pnas.org/content/102/43/15545 - Mootha, Lindgren, et al. (2003, Nature Genetics), http://www.nature.com/ng/journal/v34/n3/abs/ng1180.html

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
seed Integer (Optional) Number of random seed. Default: None

Number of random seed. Default: None

threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

alias_name String Experiment short name/Alias
graphs_count Integer (Optional) Numbers of top graphs produced

Numbers of top graphs produced. Default: 20

phenotypes_file File [Textual format] DESeq experiment

Input class vector (phenotype) file in CLS format. Same with GSEA

ranking_metrics https://w3id.org/cwl/view/git/27bee2c853c98af5ce8ace0585b74658adc2e955/workflows/gseapy.cwl#ranking_metrics/rankingmetrics (Optional) Methods to calculate correlations of ranking metrics

Methods to calculate correlations of ranking metrics. Default: log2_ratio_of_classes

permutation_type https://w3id.org/cwl/view/git/27bee2c853c98af5ce8ace0585b74658adc2e955/workflows/gseapy.cwl#permutation_type/permutationtype (Optional) Permutation type

Permutation type. Default: gene_set

read_counts_file File [GCT/Res format] DESeq experiment

Input gene expression dataset file in txt or gct format. Same with GSEA

gene_set_database https://w3id.org/cwl/view/git/27bee2c853c98af5ce8ace0585b74658adc2e955/workflows/gseapy.cwl#gene_set_database/genesetdatabase (Optional) Gene set database. Ignored if GMT file is privided

Gene set database

max_gene_set_size Integer (Optional) Max size of input genes presented in Gene Sets

Max size of input genes presented in Gene Sets. Default: 500

min_gene_set_size Integer (Optional) Min size of input genes presented in Gene Sets

Min size of input genes presented in Gene Sets. Default: 15

permutation_count Integer (Optional) Number of random permutations

Number of random permutations. For calculating esnulls. Default: 1000

ascending_rank_sorting Boolean (Optional) Ascending rank metric sorting order

Ascending rank metric sorting order. Default: False

gene_set_database_file File (Optional) [Textual format] Gene set database file in GMT format

Gene set database file in GMT (Gene Matrix Transposed) format

Steps

ID Runs Label Doc
run_gseapy
../tools/gseapy.cwl (CommandLineTool)

GSEAPY: Gene Set Enrichment Analysis using Python ==============================================

Gene Set Enrichment Analysis is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column. It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA.

convert_to_tsv
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

report_summary
../tools/gseapy-reportsummary.cwl (CommandLineTool)

Tool runs a custom BASH command to process GSEAPY report and associated input files (gene_counts and phenotypes) to produce a summary report file of GSEA results similar to the one here: http://diverge.hunter.cuny.edu/~weigang/silac-chromatin-gsea/

rename_enrichment_plots
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

compress_enrichment_plots
../tools/tar-compress.cwl (CommandLineTool)
TAR compress

TAR compress =========================================

Creates compressed TAR file from a folder

rename_enrichment_heatmaps
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

compress_enrichment_heatmaps
../tools/tar-compress.cwl (CommandLineTool)
TAR compress

TAR compress =========================================

Creates compressed TAR file from a folder

Outputs

ID Type Label Doc
summary_report File [TIDE TXT] Enrichment report

Enrichment report

gseapy_stderr_log File [Textual format] GSEApy stderr log

GSEApy stderr log

gseapy_stdout_log File [Textual format] GSEApy stdout log

GSEApy stdout log

summary_stderr_log File [Textual format] stderr log

stderr log

summary_stdout_log File [Textual format] stdout log

stdout log

gseapy_enrichment_plots File Compressed TAR with enrichment plots

Compressed TAR with enrichment plots

gseapy_enrichment_report File [TSV] Enrichment report

Enrichment report

gseapy_enrichment_heatmaps File Compressed TAR with enrichment heatmaps

Compressed TAR with enrichment heatmaps

Permalink: https://w3id.org/cwl/view/git/27bee2c853c98af5ce8ace0585b74658adc2e955/workflows/gseapy.cwl