Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph consensus_maf.cwl

Workflow to merge a large number of maf files into a single consensus maf file for use with GetBaseCountsMultiSample

https://github.com/mskcc/pluto-cwl.git

Path: cwl/consensus_maf.cwl

Branch/Commit ID: 462f6015c9268a4205b6e81de018a470b8a4a153

workflow graph mutect panel-of-normals workflow

https://github.com/genome/analysis-workflows.git

Path: definitions/pipelines/panel_of_normals.cwl

Branch/Commit ID: 3034168d652bfa930ba09af20e473a4564a8010d

workflow graph Filter DiffBind results for deepTools heatmap analysis

Filter DiffBind results for deepTools heatmap analysis ====================================================== Filter differentially bound sites from DiffBind analysis to be used with deepTools heatmap analysis

https://github.com/datirium/workflows.git

Path: workflows/filter-diffbind-for-heatmap.cwl

Branch/Commit ID: 12e5256de1b680c551c87fd5db6f3bc65428af67

workflow graph count-lines8-wf-noET.cwl

https://github.com/common-workflow-language/cwl-v1.2.git

Path: tests/count-lines8-wf-noET.cwl

Branch/Commit ID: a0f2d38e37ff51721fdeaf993bb2ab474b17246b

workflow graph tt_kmer_compare_wnode

https://github.com/ncbi/pgap.git

Path: task_types/tt_kmer_compare_wnode.cwl

Branch/Commit ID: f390475a4e0898d4933f0a28dae278aa35803eb1

workflow graph Single-cell Format Transform

Single-cell Format Transform Transforms single-cell sequencing data formats into Cell Ranger like output.

https://github.com/datirium/workflows.git

Path: workflows/sc-format-transform.cwl

Branch/Commit ID: 549fac35bf6b8b1c25af0f4f6c3f162c40dc130e

workflow graph DESeq - differential gene expression analysis for spike-in normalized RNA-Seq

# Differential gene expression analysis This differential gene expression (DGE) analysis takes as input samples from two experimental conditions that have been processed with a spike-in normalized RNA-Seq workflow (see list of \"Upstream workflows\" at top of file). The size factor estimation and application for normalization is disabled in this version of the DESeq workflow, otherwise all other aspects are the same. DESeq estimates variance-mean dependence in count data from high-throughput sequencing assays, then tests for DGE based on a model which assumes a negative binomial distribution of gene expression (aligned read count per gene). ### Experimental Setup and Results Interpretation The workflow design uses as its fold change (FC) calculation: condition 1 (c1, e.g. treatment) over condition 2 (c2, e.g. control). In other words: `FC == (c1/c2)` Therefore: - if FC<1 the log2(FC) is <0 (negative), meaning expression in condition1<condition2 (gene is downregulated in c1) - if FC>1 the log2(FC) is >0 (positive), meaning expression in condition1>condition2 (gene is upregulated in c1) In other words, if you have input TREATMENT samples as condition 1, and CONTROL samples as condition 2, a positive L2FC for a gene indicates that expression of the gene in TREATMENT is greater (or upregulated) compared to CONTROL. Next, threshold the p-adjusted values with your FDR (false discovery rate) cutoff to determine if the change may be considered significant or not. It is important to note when DESeq1 or DESeq2 is used in our DGE analysis workflow. If a user inputs only a single sample per condition DESeq1 is used for calculating DGE. In this experimental setup, there are no repeated measurements per gene per condition, therefore biological variability in each condition cannot be captured so the output p-values are assumed to be purely \"technical\". On the other hand, if >1 sample(s) are input per condition DESeq2 is used. In this case, biological variability per gene within each condition is available to be incorporated into the model, and resulting p-values are assumed to be \"biological\". Additionally, DESeq2 fold change is \"shrunk\" to account for sample variability, and as Michael Love (DESeq maintainer) puts it, \"it looks at the largest fold changes that are not due to low counts and uses these to inform a prior distribution. So the large fold changes from genes with lots of statistical information are not shrunk, while the imprecise fold changes are shrunk. This allows you to compare all estimated LFC across experiments, for example, which is not really feasible without the use of a prior\". In either case, the null hypothesis (H0) tested is that there are no significantly differentially expressed genes between conditions, therefore a smaller p-value indicates a lower probability of the H0 occurring by random chance and therefore, below a certain threshold (traditionally <0.05), H0 should be rejected. Additionally, due to the many thousands of independent hypotheses being tested (each gene representing an independent test), the p-values attained by the Wald test are adjusted using the Benjamini and Hochberg method by default. These \"padj\" values should be used for determination of significance (a reasonable value here would be <0.10, i.e. below a 10% FDR). Further Analysis: Output from the DESeq workflow may be used as input to the GSEA (Gene Set Enrichment Analysis) workflow for identifying enriched marker gene sets between conditions. ### DESeq1 High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. Simon Anders and Wolfgang Huber propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, [DESeq](http://www.bioconductor.org/packages/3.8/bioc/html/DESeq.html), as an R/Bioconductor package. ### DESeq2 In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. [DESeq2](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html), a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. ### __References__ - Anders S, Huber W (2010). “Differential expression analysis for sequence count data.” Genome Biology, 11, R106. doi: 10.1186/gb-2010-11-10-r106, http://genomebiology.com/2010/11/10/R106/. - Love MI, Huber W, Anders S (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, 550. doi: 10.1186/s13059-014-0550-8.

https://github.com/datirium/workflows.git

Path: workflows/deseq-for-spikein.cwl

Branch/Commit ID: fa4f172486288a1a9d23864f1d6962d85a453e16

workflow graph advanced-header.cwl

https://github.com/datirium/workflows.git

Path: metadata/advanced-header.cwl

Branch/Commit ID: 4a80f5b8f86c83af39494ecc309b789aeda77964

workflow graph Generate genome indices for STAR & bowtie

Creates indices for: * [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) * [bowtie](http://bowtie-bio.sourceforge.net/tutorial.shtml) v1.2.0 (12/30/2016) It performs the following steps: 1. `STAR --runMode genomeGenerate` to generate indices, based on [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) and [GTF](http://mblab.wustl.edu/GTF2.html) input files, returns results as an array of files 2. Outputs indices as [Direcotry](http://www.commonwl.org/v1.0/CommandLineTool.html#Directory) data type 3. Separates *chrNameLength.txt* file from Directory output 4. `bowtie-build` to generate indices requires genome [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) file as input, returns results as a group of main and secondary files

https://github.com/datirium/workflows.git

Path: workflows/genome-indices.cwl

Branch/Commit ID: ee66d03be8a7fd61367db40c37a973ff55ece4da

workflow graph trim-chipseq-pe.cwl

Runs ChIP-Seq BioWardrobe basic analysis with paired-end input data files.

https://github.com/Barski-lab/workflows.git

Path: workflows/trim-chipseq-pe.cwl

Branch/Commit ID: 687116aeadebda243e8616e0eda2df4c9466c0bf