Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph MAnorm PE - quantitative comparison of ChIP-Seq paired-end data

What is MAnorm? -------------- MAnorm is a robust model for quantitative comparison of ChIP-Seq data sets of TFs (transcription factors) or epigenetic modifications and you can use it for: * Normalization of two ChIP-seq samples * Quantitative comparison (differential analysis) of two ChIP-seq samples * Evaluating the overlap enrichment of the protein binding sites(peaks) * Elucidating underlying mechanisms of cell-type specific gene regulation How MAnorm works? ---------------- MAnorm uses common peaks of two samples as a reference to build the rescaling model for normalization, which is based on the empirical assumption that if a chromatin-associated protein has a substantial number of peaks shared in two conditions, the binding at these common regions will tend to be determined by similar mechanisms, and thus should exhibit similar global binding intensities across samples. The observed differences on common peaks are presumed to reflect the scaling relationship of ChIP-Seq signals between two samples, which can be applied to all peaks. What do the inputs mean? ---------------- ### General **Experiment short name/Alias** * short name for you experiment to identify among the others **ChIP-Seq PE sample 1** * previously analyzed ChIP-Seq paired-end experiment to be used as Sample 1 **ChIP-Seq PE sample 2** * previously analyzed ChIP-Seq paired-end experiment to be used as Sample 2 **Genome** * Reference genome to be used for gene assigning ### Advanced **Reads shift size for sample 1** * This value is used to shift reads towards 3' direction to determine the precise binding site. Set as half of the fragment length. Default 100 **Reads shift size for sample 2** * This value is used to shift reads towards 5' direction to determine the precise binding site. Set as half of the fragment length. Default 100 **M-value (log2-ratio) cutoff** * Absolute M-value (log2-ratio) cutoff to define biased (differential binding) peaks. Default: 1.0 **P-value cutoff** * P-value cutoff to define biased peaks. Default: 0.01 **Window size** * Window size to count reads and calculate read densities. 2000 is recommended for sharp histone marks like H3K4me3 and H3K27ac, and 1000 for TFs or DNase-seq. Default: 2000

https://github.com/datirium/workflows.git

Path: workflows/manorm-pe.cwl

Branch/Commit ID: 44214a9d02e6d85b03eb708552ed812ae3d4a733

workflow graph kmer_top_n_extract

https://github.com/ncbi/pgap.git

Path: task_types/tt_kmer_top_n_extract.cwl

Branch/Commit ID: 449f87c8365637e803ba66f83367e96f98c88f5c

workflow graph THOR - differential peak calling of ChIP-seq signals with replicates

What is THOR? -------------- THOR is an HMM-based approach to detect and analyze differential peaks in two sets of ChIP-seq data from distinct biological conditions with replicates. THOR performs genomic signal processing, peak calling and p-value calculation in an integrated framework. For more information please refer to: ------------------------------------- Allhoff, M., Sere K., Freitas, J., Zenke, M., Costa, I.G. (2016), Differential Peak Calling of ChIP-seq Signals with Replicates with THOR, Nucleic Acids Research, epub gkw680.

https://github.com/datirium/workflows.git

Path: workflows/rgt-thor.cwl

Branch/Commit ID: 1131f82a53315cca217a6c84b3bd272aa62e4bca

workflow graph mut.cwl

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/mut.cwl

Branch/Commit ID: 2710cfe731374cf7244116dd7186fc2b6e4af344

workflow graph count-lines12-wf.cwl

https://github.com/common-workflow-language/cwltool.git

Path: cwltool/schemas/v1.0/v1.0/count-lines12-wf.cwl

Branch/Commit ID: 047e69bb169e79fad6a7285ee798c4ecec3b218b

workflow graph sec-wf-out.cwl

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/sec-wf-out.cwl

Branch/Commit ID: fec7a10466a26e376b14181a88734983cfb1b8cb

workflow graph DESeq - differential gene expression analysis

Differential gene expression analysis ===================================== Differential gene expression analysis based on the negative binomial distribution Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution. DESeq1 ------ High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. Simon Anders and Wolfgang Huber propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, [DESeq](http://bioconductor.org/packages/release/bioc/html/DESeq.html), as an R/Bioconductor package DESeq2 ------ In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. [DESeq2](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html), a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

https://github.com/datirium/workflows.git

Path: workflows/deseq.cwl

Branch/Commit ID: 8049a781ac4aae579fbd3036fa0bf654532f15be

workflow graph kmer_cache_store

https://github.com/ncbi/pgap.git

Path: task_types/tt_kmer_cache_store.cwl

Branch/Commit ID: 7f857f7f2d7c080d27c775b67a6d6f7d94bce31f

workflow graph Unaligned BAM to BQSR and VCF

https://github.com/genome/analysis-workflows.git

Path: definitions/subworkflows/bam_to_bqsr_no_dup_marking.cwl

Branch/Commit ID: ae75b938e6e8ae777a55686bbacad824b3c6788c

workflow graph Trim Galore ChIP-Seq pipeline single-read

The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) **ChIP-Seq** basic analysis workflow for a **single-read** experiment with Trim Galore. _Trim Galore_ is a wrapper around [Cutadapt](https://github.com/marcelm/cutadapt) and [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data. In outputs it returns coordinate sorted BAM file alongside with index BAI file, quality statistics of the input FASTQ file, reads coverage in a form of BigWig file, peaks calling data in a form of narrowPeak or broadPeak files, islands with the assigned nearest genes and region type, data for average tag density plot (on the base of BAM file). Workflow starts with step *fastx\_quality\_stats* from FASTX-Toolkit to calculate quality statistics for input FASTQ file. At the same time `bowtie` is used to align reads from input FASTQ file to reference genome *bowtie\_aligner*. The output of this step is unsorted SAM file which is being sorted and indexed by `samtools sort` and `samtools index` *samtools\_sort\_index*. Based on workflow’s input parameters indexed and sorted BAM file can be processed by `samtools rmdup` *samtools\_rmdup* to get rid of duplicated reads. If removing duplicates is not required the original input BAM and BAI files return. Otherwise step *samtools\_sort\_index\_after\_rmdup* repeat `samtools sort` and `samtools index` with BAM and BAI files. Right after that `macs2 callpeak` performs peak calling *macs2\_callpeak*. On the base of returned outputs the next step *macs2\_island\_count* calculates the number of islands and estimated fragment size. If the last one is less that 80bp (hardcoded in the workflow) `macs2 callpeak` is rerun again with forced fixed fragment size value (*macs2\_callpeak\_forced*). If at the very beginning it was set in workflow input parameters to force run peak calling with fixed fragment size, this step is skipped and the original peak calling results are saved. In the next step workflow again calculates the number of islands and estimates fragment size (*macs2\_island\_count\_forced*) for the data obtained from *macs2\_callpeak\_forced* step. If the last one was skipped the results from *macs2\_island\_count\_forced* step are equal to the ones obtained from *macs2\_island\_count* step. Next step (*macs2\_stat*) is used to define which of the islands and estimated fragment size should be used in workflow output: either from *macs2\_island\_count* step or from *macs2\_island\_count\_forced* step. If input trigger of this step is set to True it means that *macs2\_callpeak\_forced* step was run and it returned different from *macs2\_callpeak* step results, so *macs2\_stat* step should return [fragments\_new, fragments\_old, islands\_new], if trigger is False the step returns [fragments\_old, fragments\_old, islands\_old], where sufix \"old\" defines results obtained from *macs2\_island\_count* step and sufix \"new\" - from *macs2\_island\_count\_forced* step. The following two steps (*bamtools\_stats* and *bam\_to\_bigwig*) are used to calculate coverage on the base of input BAM file and save it in BigWig format. For that purpose bamtools stats returns the number of mapped reads number which is then used as scaling factor by bedtools genomecov when it performs coverage calculation and saves it in BED format. The last one is then being sorted and converted to BigWig format by bedGraphToBigWig tool from UCSC utilities. Step *get\_stat* is used to return a text file with statistics in a form of [TOTAL, ALIGNED, SUPRESSED, USED] reads count. Step *island\_intersect* assigns genes and regions to the islands obtained from *macs2\_callpeak\_forced*. Step *average\_tag\_density* is used to calculate data for average tag density plot on the base of BAM file.

https://github.com/datirium/workflows.git

Path: workflows/trim-chipseq-se.cwl

Branch/Commit ID: 12c29f88855329192bfff977f046990031f04931