Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph Filter single sample sv vcf from depth callers(cnvkit/cnvnator)

https://github.com/apaul7/cancer-genomics-workflow.git

Path: definitions/subworkflows/sv_depth_caller_filter.cwl

Branch/Commit ID: bfcb5ffbea3d00a38cc03595d41e53ea976d599d

workflow graph mut.cwl

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/mut.cwl

Branch/Commit ID: 639229b1159cf484e70e52da10194561b3fad719

workflow graph Filter single sample sv vcf from paired read callers(Manta/Smoove)

https://github.com/apaul7/cancer-genomics-workflow.git

Path: definitions/subworkflows/sv_paired_read_caller_filter.cwl

Branch/Commit ID: bfcb5ffbea3d00a38cc03595d41e53ea976d599d

workflow graph revsort.cwl

Reverse the lines in a document, then sort those lines.

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/revsort.cwl

Branch/Commit ID: 4df56e95e6fceab69e677b539f3532cbf5946197

workflow graph DESeq - differential gene expression analysis

# Differential gene expression analysis This differential gene expression (DGE) analysis takes as input samples from two experimental conditions that have been processed with an RNA-Seq workflow (see list of \"Upstream workflows\" below). DESeq estimates variance-mean dependence in count data from high-throughput sequencing assays, then tests for DGE based on a model which assumes a negative binomial distribution of gene expression (aligned read count per gene). ### Experimental Setup and Results Interpretation The workflow design uses as its fold change (FC) calculation: condition 1 (c1, e.g. treatment) over condition 2 (c2, e.g. control). In other words: `FC == (c1/c2)` Therefore: - if FC<1 the log2(FC) is <0 (negative), meaning expression in condition1<condition2 (gene is downregulated in c1) - if FC>1 the log2(FC) is >0 (positive), meaning expression in condition1>condition2 (gene is upregulated in c1) In other words, if you have input TREATMENT samples as condition 1, and CONTROL samples as condition 2, a positive L2FC for a gene indicates that expression of the gene in TREATMENT is greater (or upregulated) compared to CONTROL. Next, threshold the p-adjusted values with your FDR (false discovery rate) cutoff to determine if the change may be considered significant or not. It is important to note when DESeq1 or DESeq2 is used in our DGE analysis workflow. If a user inputs only a single sample per condition DESeq1 is used for calculating DGE. In this experimental setup, there are no repeated measurements per gene per condition, therefore biological variability in each condition cannot be captured so the output p-values are assumed to be purely \"technical\". On the other hand, if >1 sample(s) are input per condition DESeq2 is used. In this case, biological variability per gene within each condition is available to be incorporated into the model, and resulting p-values are assumed to be \"biological\". Additionally, DESeq2 fold change is \"shrunk\" to account for sample variability, and as Michael Love (DESeq maintainer) puts it, \"it looks at the largest fold changes that are not due to low counts and uses these to inform a prior distribution. So the large fold changes from genes with lots of statistical information are not shrunk, while the imprecise fold changes are shrunk. This allows you to compare all estimated LFC across experiments, for example, which is not really feasible without the use of a prior\". In either case, the null hypothesis (H0) tested is that there are no significantly differentially expressed genes between conditions, therefore a smaller p-value indicates a lower probability of the H0 occurring by random chance and therefore, below a certain threshold (traditionally <0.05), H0 should be rejected. Additionally, due to the many thousands of independent hypotheses being tested (each gene representing an independent test), the p-values attained by the Wald test are adjusted using the Benjamini and Hochberg method by default. These \"padj\" values should be used for determination of significance (a reasonable value here would be <0.10, i.e. below a 10% FDR). Further Analysis: Output from the DESeq workflow may be used as input to the GSEA (Gene Set Enrichment Analysis) workflow for identifying enriched marker gene sets between conditions. ### DESeq1 High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. Simon Anders and Wolfgang Huber propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, [DESeq](http://www.bioconductor.org/packages/3.8/bioc/html/DESeq.html), as an R/Bioconductor package. ### DESeq2 In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. [DESeq2](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html), a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. ### __References__ - Anders S, Huber W (2010). “Differential expression analysis for sequence count data.” Genome Biology, 11, R106. doi: 10.1186/gb-2010-11-10-r106, http://genomebiology.com/2010/11/10/R106/. - Love MI, Huber W, Anders S (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, 550. doi: 10.1186/s13059-014-0550-8.

https://github.com/datirium/workflows.git

Path: workflows/deseq.cwl

Branch/Commit ID: ebbf23764ede324cabc064bd50647c1f643726fa

workflow graph protein_extract

https://github.com/ncbi/pgap.git

Path: progs/protein_extract.cwl

Branch/Commit ID: 8af4e2aabf43d5e3c7162efae4ad4649df5601e2

workflow graph WGS QC workflow

https://github.com/genome/analysis-workflows.git

Path: definitions/subworkflows/qc_wgs.cwl

Branch/Commit ID: 8cee1920920ed73384fb3ab74272da9c92a20cf2

workflow graph Generate ATDP heatmap using Homer

Generate ATDP heatmap centered on TSS from an array of input BAM files and genelist TSV file. Returns array of heatmap JSON files with the names that have the same basenames as input BAM files, but with .json extension

https://github.com/datirium/workflows.git

Path: workflows/heatmap.cwl

Branch/Commit ID: 730b40bc403263b724399a952c0f3e2d28f13519

workflow graph io-int-wf.cwl

https://github.com/common-workflow-language/cwl-v1.1.git

Path: tests/io-int-wf.cwl

Branch/Commit ID: 664835e83eb5e57eee18a04ce7b05fb9d70d77b7

workflow graph DiffBind Multi-factor Analysis

DiffBind Multi-factor Analysis ------------------------------ DiffBind processes ChIP-Seq data enriched for genomic loci where specific protein/DNA binding occurs, including peak sets identified by ChIP-Seq peak callers and aligned sequence read datasets. It is designed to work with multiple peak sets simultaneously, representing different ChIP experiments (antibodies, transcription factor and/or histone marks, experimental conditions, replicates) as well as managing the results of multiple peak callers. For more information please refer to: ------------------------------------- Ross-Innes CS, Stark R, Teschendorff AE, Holmes KA, Ali HR, Dunning MJ, Brown GD, Gojis O, Ellis IO, Green AR, Ali S, Chin S, Palmieri C, Caldas C, Carroll JS (2012). “Differential oestrogen receptor binding is associated with clinical outcome in breast cancer.” Nature, 481, -4.

https://github.com/datirium/workflows.git

Path: workflows/diffbind-multi-factor.cwl

Branch/Commit ID: 261c0232a7a40880f2480b811ed2d7e89c463869