Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph Motif Finding with HOMER with random background regions

Motif Finding with HOMER with random background regions --------------------------------------------------- HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset. It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any nucleic acids motif finding problem. Here is how we generate background for Motifs Analysis ------------------------------------- 1. Take input file with regions in a form of “chr\" “start\" “end\" 2. Sort and remove duplicates from this regions file 3. Extend each region in 20Kb into both directions 4. Merge all overlapped extended regions 5. Subtract not extended regions from the extended ones 6. Randomly distribute not extended regions within the regions that we got as a result of the previous step 7. Get fasta file from these randomly distributed regions (from the previous step). Use it as background For more information please refer to: ------------------------------------- [Official documentation](http://homer.ucsd.edu/homer/motif/)

https://github.com/datirium/workflows.git

Path: workflows/homer-motif-analysis.cwl

Branch/Commit ID: 46d3d403ddb240d5a8f4f31ab992b6d6a2686745

workflow graph workflow_input_format_expr_v1_2.cwl

https://github.com/common-workflow-language/cwl-utils.git

Path: testdata/workflow_input_format_expr_v1_2.cwl

Branch/Commit ID: 0ad6983898f0d9001fe0f416f97c4d8b940e384a

workflow graph realignment.cwl

https://github.com/mskcc/roslin-variant.git

Path: setup/cwl/modules/realignment.cwl

Branch/Commit ID: bf7303dd44d7f0ec3d3cd2e0829e28a78bd941e2

workflow graph vecscreen.cwl

https://github.com/ncbi/pgap.git

Path: vecscreen/vecscreen.cwl

Branch/Commit ID: 2afb5ebafd1353ba063cc74ee9a7eaf347afce5c

workflow graph Trim Galore RNA-Seq pipeline paired-end

The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) **RNA-Seq** basic analysis for a **pair-end** experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow must be used with paired-end RNA-Seq data. It performs the following steps: 1. Trim adapters from input FASTQ files 2. Use STAR to align reads from input FASTQ files according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 3. Use fastx_quality_stats to analyze input FASTQ files and generate quality statistics files 4. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 2 (after running STAR) 5. Generate BigWig file using sorted BAM file 6. Map input FASTQ files to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file

https://github.com/datirium/workflows.git

Path: workflows/trim-rnaseq-pe.cwl

Branch/Commit ID: 46d3d403ddb240d5a8f4f31ab992b6d6a2686745

workflow graph Detect Variants workflow

https://github.com/genome/analysis-workflows.git

Path: definitions/pipelines/detect_variants_mouse.cwl

Branch/Commit ID: 8438316338e66823e1c9aca9f675b2bf33f2aa59

workflow graph bam_readcount workflow

https://github.com/genome/analysis-workflows.git

Path: definitions/subworkflows/bam_readcount.cwl

Branch/Commit ID: a7838a5ca72b25db5c2af20a15f34303a839980e

workflow graph workflow_input_sf_expr.cwl

https://github.com/common-workflow-language/cwl-utils.git

Path: testdata/workflow_input_sf_expr.cwl

Branch/Commit ID: 124a08ce3389eb49066c34a4163cbbed210a0355

workflow graph DESeq2 (LRT) - differential gene expression analysis using likelihood ratio test

Runs DESeq2 using LRT (Likelihood Ratio Test) ============================================= The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model. When one performs a likelihood ratio test, the p values and the test statistic (the stat column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test p values therefore represent a test of all the variables and all the levels of factors which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This indicates that the p value is for the likelihood ratio test of all the variables and all the levels, while the log fold change is a single comparison from among those variables and levels. **Technical notes** 1. At least two biological replicates are required for every compared category 2. Metadata file describes relations between compared experiments, for example ``` ,time,condition DH1,day5,WT DH2,day5,KO DH3,day7,WT DH4,day7,KO DH5,day7,KO ``` where `time, condition, day5, day7, WT, KO` should be a single words (without spaces) and `DH1, DH2, DH3, DH4, DH5` correspond to the experiment aliases set in **RNA-Seq experiments** input. 3. Design and reduced formulas should start with **~** and include categories or, optionally, their interactions from the metadata file header. See details in DESeq2 manual [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions) and [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test) 4. Contrast should be set based on your metadata file header and available categories in a form of `Factor Numerator Denominator`, where `Factor` - column name from metadata file, `Numerator` - category from metadata file to be used as numerator in fold change calculation, `Denominator` - category from metadata file to be used as denominator in fold change calculation. For example `condition WT KO`.

https://github.com/datirium/workflows.git

Path: workflows/deseq-lrt.cwl

Branch/Commit ID: 7eef0294395d83ff0765fce61726a59d71126422

workflow graph default-wf5.cwl

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/default-wf5.cwl

Branch/Commit ID: f94719e862f86cc88600caf3628faba6c0d05042