Workflow: GSEApy - Gene Set Enrichment Analysis in Python

Fetched 2023-01-10 22:59:09 GMT

GSEAPY: Gene Set Enrichment Analysis in Python ============================================== Gene Set Enrichment Analysis is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column. It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
seed Integer (Optional) Number of random seed. Default: None

Number of random seed. Default: None

alias String Experiment short name/Alias
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

graphs_count Integer (Optional) Numbers of top graphs produced

Numbers of top graphs produced. Default: 20

phenotypes_file File [Textual format] DESeq experiment

Input class vector (phenotype) file in CLS format. Same with GSEA

ranking_metrics https://w3id.org/cwl/view/git/799575ce58746813f066a665adeacdda252d8cab/workflows/gseapy.cwl#ranking_metrics/rankingmetrics (Optional) Methods to calculate correlations of ranking metrics

Methods to calculate correlations of ranking metrics. Default: log2_ratio_of_classes

permutation_type https://w3id.org/cwl/view/git/799575ce58746813f066a665adeacdda252d8cab/workflows/gseapy.cwl#permutation_type/permutationtype (Optional) Permutation type

Permutation type. Default: gene_set

read_counts_file File [GCT/Res format] DESeq experiment

Input gene expression dataset file in txt or gct format. Same with GSEA

gene_set_database https://w3id.org/cwl/view/git/799575ce58746813f066a665adeacdda252d8cab/workflows/gseapy.cwl#gene_set_database/genesetdatabase Gene set database

Gene set database

max_gene_set_size Integer (Optional) Max size of input genes presented in Gene Sets

Max size of input genes presented in Gene Sets. Default: 500

min_gene_set_size Integer (Optional) Min size of input genes presented in Gene Sets

Min size of input genes presented in Gene Sets. Default: 15

permutation_count Integer (Optional) Number of random permutations

Number of random permutations. For calculating esnulls. Default: 1000

ascending_rank_sorting Boolean (Optional) Ascending rank metric sorting order

Ascending rank metric sorting order. Default: False

Steps

ID Runs Label Doc
run_gseapy
../tools/gseapy.cwl (CommandLineTool)

GSEAPY: Gene Set Enrichment Analysis in Python ==============================================

Gene Set Enrichment Analysis is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column. It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA.

convert_to_tsv
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

rename_enrichment_plots
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

compress_enrichment_plots
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

rename_enrichment_heatmaps
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

compress_enrichment_heatmaps
../tools/tar-compress.cwl (CommandLineTool)

Compresses input directory to tar.gz

Outputs

ID Type Label Doc
gseapy_stderr_log File [Textual format] GSEApy stderr log

GSEApy stderr log

gseapy_stdout_log File [Textual format] GSEApy stdout log

GSEApy stdout log

gseapy_enrichment_plots File Compressed TAR with enrichment plots

Compressed TAR with enrichment plots

gseapy_enrichment_report File [TSV] Enrichment report

Enrichment report

gseapy_enrichment_heatmaps File Compressed TAR with enrichment heatmaps

Compressed TAR with enrichment heatmaps

Permalink: https://w3id.org/cwl/view/git/799575ce58746813f066a665adeacdda252d8cab/workflows/gseapy.cwl