Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph scatter-wf1.cwl

https://github.com/common-workflow-language/cwltool.git

Path: cwltool/schemas/v1.0/v1.0/scatter-wf1.cwl

Branch/Commit ID: 3e9bca4e006eae7e9febd76eb9b8292702eba2cb

workflow graph Generate genome indices for STAR & bowtie

Creates indices for: * [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) * [bowtie](http://bowtie-bio.sourceforge.net/tutorial.shtml) v1.2.0 (12/30/2016) It performs the following steps: 1. `STAR --runMode genomeGenerate` to generate indices, based on [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) and [GTF](http://mblab.wustl.edu/GTF2.html) input files, returns results as an array of files 2. Outputs indices as [Direcotry](http://www.commonwl.org/v1.0/CommandLineTool.html#Directory) data type 3. Separates *chrNameLength.txt* file from Directory output 4. `bowtie-build` to generate indices requires genome [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) file as input, returns results as a group of main and secondary files

https://github.com/datirium/workflows.git

Path: workflows/genome-indices.cwl

Branch/Commit ID: 2c486543c335bb99b245dfe7e2f033f535efb9cf

workflow graph mut2.cwl

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/mut2.cwl

Branch/Commit ID: 2710cfe731374cf7244116dd7186fc2b6e4af344

workflow graph process VCF workflow

https://github.com/genome/analysis-workflows.git

Path: definitions/subworkflows/strelka_process_vcf.cwl

Branch/Commit ID: a7838a5ca72b25db5c2af20a15f34303a839980e

workflow graph Unaligned BAM to BQSR

https://github.com/genome/analysis-workflows.git

Path: definitions/subworkflows/bam_to_bqsr.cwl

Branch/Commit ID: a7838a5ca72b25db5c2af20a15f34303a839980e

workflow graph DESeq2 (LRT) - differential gene expression analysis using likelihood ratio test

Runs DESeq2 using LRT (Likelihood Ratio Test) ============================================= The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model. When one performs a likelihood ratio test, the p values and the test statistic (the stat column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test p values therefore represent a test of all the variables and all the levels of factors which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This indicates that the p value is for the likelihood ratio test of all the variables and all the levels, while the log fold change is a single comparison from among those variables and levels. **Technical notes** 1. At least two biological replicates are required for every compared category 2. Metadata file describes relations between compared experiments, for example ``` ,time,condition DH1,day5,WT DH2,day5,KO DH3,day7,WT DH4,day7,KO DH5,day7,KO ``` where `time, condition, day5, day7, WT, KO` should be a single words (without spaces) and `DH1, DH2, DH3, DH4, DH5` correspond to the experiment aliases set in **RNA-Seq experiments** input. 3. Design and reduced formulas should start with **~** and include categories or, optionally, their interactions from the metadata file header. See details in DESeq2 manual [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions) and [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test) 4. Contrast should be set based on your metadata file header and available categories in a form of `Factor Numerator Denominator`, where `Factor` - column name from metadata file, `Numerator` - category from metadata file to be used as numerator in fold change calculation, `Denominator` - category from metadata file to be used as denominator in fold change calculation. For example `condition WT KO`.

https://github.com/datirium/workflows.git

Path: workflows/deseq-lrt.cwl

Branch/Commit ID: 2c486543c335bb99b245dfe7e2f033f535efb9cf

workflow graph QuantSeq 3' mRNA-Seq single-read

### Pipeline for Lexogen's QuantSeq 3' mRNA-Seq Library Prep Kit FWD for Illumina [Lexogen original documentation](https://www.lexogen.com/quantseq-3mrna-sequencing/) * Cost-saving and streamlined globin mRNA depletion during QuantSeq library preparation * Genome-wide analysis of gene expression * Cost-efficient alternative to microarrays and standard RNA-Seq * Down to 100 pg total RNA input * Applicable for low quality and FFPE samples * Single-read sequencing of up to 9,216 samples/lane * Dual indexing and Unique Molecular Identifiers (UMIs) are available ### QuantSeq 3’ mRNA-Seq Library Prep Kit FWD for Illumina The QuantSeq FWD Kit is a library preparation protocol designed to generate Illumina compatible libraries of sequences close to the 3’ end of polyadenylated RNA. QuantSeq FWD contains the Illumina Read 1 linker sequence in the second strand synthesis primer, hence NGS reads are generated towards the poly(A) tail, directly reflecting the mRNA sequence (see workflow). This version is the recommended standard for gene expression analysis. Lexogen furthermore provides a high-throughput version with optional dual indexing (i5 and i7 indices) allowing up to 9,216 samples to be multiplexed in one lane. #### Analysis of Low Input and Low Quality Samples The required input amount of total RNA is as low as 100 pg. QuantSeq is suitable to reproducibly generate libraries from low quality RNA, including FFPE samples. See Fig.1 and 2 for a comparison of two different RNA qualities (FFPE and fresh frozen cryo-block) of the same sample. ![Fig 1](https://www.lexogen.com/wp-content/uploads/2017/02/Correlation_Samples.jpg) Figure 1 | Correlation of gene counts of FFPE and cryo samples. ![Fig 2](https://www.lexogen.com/wp-content/uploads/2017/02/Venn_diagrams.jpg) Figure 2 | Venn diagrams of genes detected by QuantSeq at a uniform read depth of 2.5 M reads in FFPE and cryo samples with 1, 5, and 10 reads/gene thresholds. #### Mapping of Transcript End Sites By using longer reads QuantSeq FWD allows to exactly pinpoint the 3’ end of poly(A) RNA (see Fig. 3) and therefore obtain accurate information about the 3’ UTR. ![Figure 3](https://www.lexogen.com/wp-content/uploads/2017/02/Read_Coverage.jpg) Figure 3 | QuantSeq read coverage versus normalized transcript length of NGS libraries derived from FFPE-RNA (blue) and cryo-preserved RNA (red). ### Current workflow should be used only with the single-end RNA-Seq data. It performs the following steps: 1. Separates UMIes and trims adapters from input FASTQ file 2. Uses ```STAR``` to align reads from input FASTQ file according to the predefined reference indices; generates unsorted BAM file and alignment statistics file 3. Uses ```fastx_quality_stats``` to analyze input FASTQ file and generates quality statistics file 4. Uses ```samtools sort``` and generates coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 2 (after running STAR) 5. Uses ```umi_tools dedup``` and generates final filtered sorted BAM(+BAI) file pair 6. Generates BigWig file on the base of sorted BAM file 7. Maps input FASTQ file to predefined rRNA reference indices using ```bowtie``` to define the level of rRNA contamination; exports resulted statistics to file 8. Calculates isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; exports results to file

https://github.com/datirium/workflows.git

Path: workflows/trim-quantseq-mrnaseq-se.cwl

Branch/Commit ID: e45ab1b9ac5c9b99fdf7b3b1be396dc42c2c9620

workflow graph xenbase-sra-to-fastq-se.cwl

https://github.com/datirium/workflows.git

Path: subworkflows/xenbase-sra-to-fastq-se.cwl

Branch/Commit ID: 2b8146f76595f0c4d8bf692de78b21280162b1d0

workflow graph mpi_simple_wf.cwl

Simple 2 step workflow to check that workflow steps are independently picking up on the number of processes. First run the parallel get PIDs step (on the input num procs) then run (on a single proc) the line count. This should equal the input.

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/mpi_simple_wf.cwl

Branch/Commit ID: 2710cfe731374cf7244116dd7186fc2b6e4af344

workflow graph scatter GATK HaplotypeCaller over intervals

https://github.com/genome/analysis-workflows.git

Path: definitions/subworkflows/gatk_haplotypecaller_iterator.cwl

Branch/Commit ID: a7838a5ca72b25db5c2af20a15f34303a839980e