Explore Workflows

View already parsed workflows here or click here to add your own

Graph	Name	Retrieved From	View
	RNA-Seq pipeline single-read The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) RNA-Seq basic analysis for a single-read experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-read RNA-Seq data. It performs the following steps: 1. Use STAR to align reads from input FASTQ file according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 2. Use fastx_quality_stats to analyze input FASTQ file and generate quality statistics file 3. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ file to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file	https://github.com/datirium/workflows.git Path: workflows/rnaseq-se.cwl Branch/Commit ID: 5e7385b8cfa4ddae822fff37b6bd22eb0370b389
	MAnorm PE - quantitative comparison of ChIP-Seq paired-end data What is MAnorm? -------------- MAnorm is a robust model for quantitative comparison of ChIP-Seq data sets of TFs (transcription factors) or epigenetic modifications and you can use it for: * Normalization of two ChIP-seq samples * Quantitative comparison (differential analysis) of two ChIP-seq samples * Evaluating the overlap enrichment of the protein binding sites(peaks) * Elucidating underlying mechanisms of cell-type specific gene regulation How MAnorm works? ---------------- MAnorm uses common peaks of two samples as a reference to build the rescaling model for normalization, which is based on the empirical assumption that if a chromatin-associated protein has a substantial number of peaks shared in two conditions, the binding at these common regions will tend to be determined by similar mechanisms, and thus should exhibit similar global binding intensities across samples. The observed differences on common peaks are presumed to reflect the scaling relationship of ChIP-Seq signals between two samples, which can be applied to all peaks. What do the inputs mean? ---------------- ### General Experiment short name/Alias * short name for you experiment to identify among the others ChIP-Seq PE sample 1 * previously analyzed ChIP-Seq paired-end experiment to be used as Sample 1 ChIP-Seq PE sample 2 * previously analyzed ChIP-Seq paired-end experiment to be used as Sample 2 Genome * Reference genome to be used for gene assigning ### Advanced Reads shift size for sample 1 * This value is used to shift reads towards 3' direction to determine the precise binding site. Set as half of the fragment length. Default 100 Reads shift size for sample 2 * This value is used to shift reads towards 5' direction to determine the precise binding site. Set as half of the fragment length. Default 100 M-value (log2-ratio) cutoff * Absolute M-value (log2-ratio) cutoff to define biased (differential binding) peaks. Default: 1.0 P-value cutoff * P-value cutoff to define biased peaks. Default: 0.01 Window size * Window size to count reads and calculate read densities. 2000 is recommended for sharp histone marks like H3K4me3 and H3K27ac, and 1000 for TFs or DNase-seq. Default: 2000	https://github.com/datirium/workflows.git Path: workflows/manorm-pe.cwl Branch/Commit ID: 4106b7dc96e968db291b7a61ecd1641aa3b3dd6d
	Trim Galore RNA-Seq pipeline single-read The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) RNA-Seq basic analysis for a single-end experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-end RNA-Seq data. It performs the following steps: 1. Trim adapters from input FASTQ file 2. Use STAR to align reads from input FASTQ file according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 3. Use fastx_quality_stats to analyze input FASTQ file and generate quality statistics file 4. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ file to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file	https://github.com/datirium/workflows.git Path: workflows/trim-rnaseq-se.cwl Branch/Commit ID: 46a077b51619c6a14f85e0aa5260ae8a04426fab
	alignment for mouse with qc	https://github.com/genome/analysis-workflows.git Path: definitions/pipelines/alignment_wgs_mouse.cwl Branch/Commit ID: 3bebaf9b70331de9f4845e2223c55082f5a812fb
	Gathered Downsample and HaplotypeCaller	https://github.com/tmooney/cancer-genomics-workflow.git Path: definitions/pipelines/gathered_downsample_and_recall.cwl Branch/Commit ID: 0db1a5f1ceedd4416ac550787c27b99c87dbe985
	Trim Galore RNA-Seq pipeline single-read strand specific Note: should be updated The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) RNA-Seq basic analysis for a single-end experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-end RNA-Seq data. It performs the following steps: 1. Trim adapters from input FASTQ file 2. Use STAR to align reads from input FASTQ file according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 3. Use fastx_quality_stats to analyze input FASTQ file and generate quality statistics file 4. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ file to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file	https://github.com/datirium/workflows.git Path: workflows/trim-rnaseq-se-dutp.cwl Branch/Commit ID: 46a077b51619c6a14f85e0aa5260ae8a04426fab
	downsample unaligned BAM and align	https://github.com/genome/analysis-workflows.git Path: definitions/subworkflows/downsampled_alignment.cwl Branch/Commit ID: 0c4f4e59c265eb22aed3d2d37b173cb5430773d2
	DESeq2 (LRT) - differential gene expression analysis using likelihood ratio test Runs DESeq2 using LRT (Likelihood Ratio Test) ============================================= The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model. When one performs a likelihood ratio test, the p values and the test statistic (the stat column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test p values therefore represent a test of all the variables and all the levels of factors which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This indicates that the p value is for the likelihood ratio test of all the variables and all the levels, while the log fold change is a single comparison from among those variables and levels. Technical notes 1. At least two biological replicates are required for every compared category 2. Metadata file describes relations between compared experiments, for example ``` ,time,condition DH1,day5,WT DH2,day5,KO DH3,day7,WT DH4,day7,KO DH5,day7,KO ``` where `time, condition, day5, day7, WT, KO` should be a single words (without spaces) and `DH1, DH2, DH3, DH4, DH5` correspond to the experiment aliases set in RNA-Seq experiments input. 3. Design and reduced formulas should start with ~ and include categories or, optionally, their interactions from the metadata file header. See details in DESeq2 manual [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions) and [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test) 4. Contrast should be set based on your metadata file header and available categories in a form of `Factor Numerator Denominator`, where `Factor` - column name from metadata file, `Numerator` - category from metadata file to be used as numerator in fold change calculation, `Denominator` - category from metadata file to be used as denominator in fold change calculation. For example `condition WT KO`.	https://github.com/datirium/workflows.git Path: workflows/deseq-lrt.cwl Branch/Commit ID: 36fd18f11e939d3908b1eca8d2939402f7a99b0f
	gathered exome alignment and somatic variant detection for cle purpose	https://github.com/genome/analysis-workflows.git Path: definitions/pipelines/somatic_exome_cle_gathered.cwl Branch/Commit ID: cc3e7f1ccfdc7101c22bf88792608504eea7d53a
	count-lines13-wf.cwl	https://github.com/common-workflow-language/cwltool.git Path: cwltool/schemas/v1.0/v1.0/count-lines13-wf.cwl Branch/Commit ID: bbe20f54deea92d9c9cd38cb1f23c4423133d3de