Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph PCA - Principal Component Analysis

Principal Component Analysis -------------- Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy.

https://github.com/datirium/workflows.git

Path: workflows/pca.cwl

Branch/Commit ID: 12c29f88855329192bfff977f046990031f04931

workflow graph GSEApy - Gene Set Enrichment Analysis in Python

GSEAPY: Gene Set Enrichment Analysis in Python ============================================== Gene Set Enrichment Analysis is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column. It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA.

https://github.com/datirium/workflows.git

Path: workflows/gseapy.cwl

Branch/Commit ID: 4ab9399a4777610a579ea2c259b9356f27641dcc

workflow graph Bacterial Annotation, pass 2, blastp-based functional annotation (first pass)

https://github.com/ncbi/pgap.git

Path: bacterial_annot/wf_bacterial_annot_pass2.cwl

Branch/Commit ID: da35c7b700912dd3643e3dd2c5c96b7be3a4edad

workflow graph Single-Cell Preprocessing Cell Ranger Pipeline

Devel version of Single-Cell Preprocessing Cell Ranger Pipeline ===============================================================

https://github.com/datirium/workflows.git

Path: workflows/single-cell-preprocess-cellranger.cwl

Branch/Commit ID: 09267e79fd867aa68a219c69e6db7d8e2e877be2

workflow graph conflict-wf.cwl#collision

https://github.com/common-workflow-language/cwltool.git

Path: cwltool/schemas/v1.0/v1.0/conflict-wf.cwl

Branch/Commit ID: 047e69bb169e79fad6a7285ee798c4ecec3b218b

Packed ID: collision

workflow graph tt_blastn_wnode

https://github.com/ncbi/pgap.git

Path: task_types/tt_blastn_wnode.cwl

Branch/Commit ID: 7f857f7f2d7c080d27c775b67a6d6f7d94bce31f

workflow graph SPRM pipeline

https://github.com/hubmapconsortium/sprm.git

Path: pipeline.cwl

Branch/Commit ID: b465b8a40344de8d8f6211b1bec4f6d356e1442f

workflow graph scatter-valuefrom-wf1.cwl

https://github.com/common-workflow-language/cwltool.git

Path: cwltool/schemas/v1.0/v1.0/scatter-valuefrom-wf1.cwl

Branch/Commit ID: 047e69bb169e79fad6a7285ee798c4ecec3b218b

workflow graph extract_gencoll_ids

https://github.com/ncbi/pgap.git

Path: task_types/tt_extract_gencoll_ids.cwl

Branch/Commit ID: 803f6367d1b279a7b6dc1a4e8ae43f1bbec9f760

workflow graph DESeq2 (LRT) - differential gene expression analysis using likelihood ratio test

Runs DESeq2 using LRT (Likelihood Ratio Test) ============================================= The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model. When one performs a likelihood ratio test, the p values and the test statistic (the stat column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test p values therefore represent a test of all the variables and all the levels of factors which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This indicates that the p value is for the likelihood ratio test of all the variables and all the levels, while the log fold change is a single comparison from among those variables and levels. **Technical notes** 1. At least two biological replicates are required for every compared category 2. Metadata file describes relations between compared experiments, for example ``` ,time,condition DH1,day5,WT DH2,day5,KO DH3,day7,WT DH4,day7,KO DH5,day7,KO ``` where `time, condition, day5, day7, WT, KO` should be a single words (without spaces) and `DH1, DH2, DH3, DH4, DH5` correspond to the experiment aliases set in **RNA-Seq experiments** input. 3. Design and reduced formulas should start with **~** and include categories or, optionally, their interactions from the metadata file header. See details in DESeq2 manual [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions) and [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test) 4. Contrast should be set based on your metadata file header and available categories in a form of `Factor Numerator Denominator`, where `Factor` - column name from metadata file, `Numerator` - category from metadata file to be used as numerator in fold change calculation, `Denominator` - category from metadata file to be used as denominator in fold change calculation. For example `condition WT KO`.

https://github.com/datirium/workflows.git

Path: workflows/deseq-lrt.cwl

Branch/Commit ID: b957a4f681bf0ca8ebba4e0d0ec3936bf79620c5