Explore Workflows
View already parsed workflows here or click here to add your own
| Graph | Name | Retrieved From | View |
|---|---|---|---|
|
|
conflict-wf.cwl#collision
|
Path: cwltool/schemas/v1.0/v1.0/conflict-wf.cwl Branch/Commit ID: e59538cd9899a88d7e31e0f259bc56734f604383 Packed ID: collision |
|
|
|
pipeline.cwl
|
Path: pipeline.cwl Branch/Commit ID: master |
|
|
|
qc_workflow_wo_waltz.cwl
This workflow is intended to be used to test the QC module, without having to run the long waltz step |
Path: workflows/QC/qc_workflow_wo_waltz.cwl Branch/Commit ID: e7754fb4d3cd14068eef0c498cc5c6fcf53dc31f |
|
|
|
abra_workflow.cwl
|
Path: workflows/ABRA/abra_workflow.cwl Branch/Commit ID: 9e6eae9eb8448e68d509397a46303551a93a164d |
|
|
|
Trim Galore ChIP-Seq pipeline paired-end
The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) **ChIP-Seq** basic analysis workflow for a **paired-end** experiment with Trim Galore. _Trim Galore_ is a wrapper around [Cutadapt](https://github.com/marcelm/cutadapt) and [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data. A [FASTQ](http://maq.sourceforge.net/fastq.shtml) input file has to be provided. In outputs it returns coordinate sorted BAM file alongside with index BAI file, quality statistics for both the input FASTQ files, reads coverage in a form of BigWig file, peaks calling data in a form of narrowPeak or broadPeak files, islands with the assigned nearest genes and region type, data for average tag density plot (on the base of BAM file). Workflow starts with running fastx_quality_stats (steps fastx_quality_stats_upstream and fastx_quality_stats_downstream) from FASTX-Toolkit to calculate quality statistics for both upstream and downstream input FASTQ files. At the same time Bowtie is used to align reads from input FASTQ files to reference genome (Step bowtie_aligner). The output of this step is unsorted SAM file which is being sorted and indexed by samtools sort and samtools index (Step samtools_sort_index). Depending on workflow’s input parameters indexed and sorted BAM file could be processed by samtools rmdup (Step samtools_rmdup) to remove all possible read duplicates. In a case when removing duplicates is not necessary the step returns original input BAM and BAI files without any processing. If the duplicates were removed the following step (Step samtools_sort_index_after_rmdup) reruns samtools sort and samtools index with BAM and BAI files, if not - the step returns original unchanged input files. Right after that macs2 callpeak performs peak calling (Step macs2_callpeak). On the base of returned outputs the next step (Step macs2_island_count) calculates the number of islands and estimated fragment size. If the last one is less that 80 (hardcoded in a workflow) macs2 callpeak is rerun again with forced fixed fragment size value (Step macs2_callpeak_forced). If at the very beginning it was set in workflow input parameters to force run peak calling with fixed fragment size, this step is skipped and the original peak calling results are saved. In the next step workflow again calculates the number of islands and estimated fragment size (Step macs2_island_count_forced) for the data obtained from macs2_callpeak_forced step. If the last one was skipped the results from macs2_island_count_forced step are equal to the ones obtained from macs2_island_count step. Next step (Step macs2_stat) is used to define which of the islands and estimated fragment size should be used in workflow output: either from macs2_island_count step or from macs2_island_count_forced step. If input trigger of this step is set to True it means that macs2_callpeak_forced step was run and it returned different from macs2_callpeak step results, so macs2_stat step should return [fragments_new, fragments_old, islands_new], if trigger is False the step returns [fragments_old, fragments_old, islands_old], where sufix \"old\" defines results obtained from macs2_island_count step and sufix \"new\" - from macs2_island_count_forced step. The following two steps (Step bamtools_stats and bam_to_bigwig) are used to calculate coverage on the base of input BAM file and save it in BigWig format. For that purpose bamtools stats returns the number of mapped reads number which is then used as scaling factor by bedtools genomecov when it performs coverage calculation and saves it in BED format. The last one is then being sorted and converted to BigWig format by bedGraphToBigWig tool from UCSC utilities. Step get_stat is used to return a text file with statistics in a form of [TOTAL, ALIGNED, SUPRESSED, USED] reads count. Step island_intersect assigns genes and regions to the islands obtained from macs2_callpeak_forced. Step average_tag_density is used to calculate data for average tag density plot on the base of BAM file. |
Path: workflows/trim-chipseq-pe.cwl Branch/Commit ID: ad948b2691ef7f0f34de38f0102c3cd6f5182b29 |
|
|
|
Trim Galore RNA-Seq pipeline single-read
The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) **RNA-Seq** basic analysis for a **single-end** experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-end RNA-Seq data. It performs the following steps: 1. Trim adapters from input FASTQ file 2. Use STAR to align reads from input FASTQ file according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 3. Use fastx_quality_stats to analyze input FASTQ file and generate quality statistics file 4. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ file to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file |
Path: workflows/trim-rnaseq-se.cwl Branch/Commit ID: 5e7385b8cfa4ddae822fff37b6bd22eb0370b389 |
|
|
|
ChIP-Seq pipeline single-read
# ChIP-Seq basic analysis workflow for single-read data Reads are aligned to the reference genome with [Bowtie](http://bowtie-bio.sourceforge.net/index.shtml). Results are saved as coordinate sorted [BAM](http://samtools.github.io/hts-specs/SAMv1.pdf) alignment and index BAI files. Optionally, PCR duplicates can be removed. To obtain coverage in [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format, average fragment length is calculated by [MACS2](https://github.com/taoliu/MACS), and individual reads are extended to this length in the 3’ direction. Areas of enrichment identified by MACS2 are saved in ENCODE [narrow peak](http://genome.ucsc.edu/FAQ/FAQformat.html#format12) or [broad peak](https://genome.ucsc.edu/FAQ/FAQformat.html#format13) formats. Called peaks together with the nearest genes are saved in TSV format. In addition to basic statistics (number of total/mapped/multi-mapped/unmapped/duplicate reads), pipeline generates several quality control measures. Base frequency plots are used to estimate adapter contamination, a frequent occurrence in low-input ChIP-Seq experiments. Expected distinct reads count from [Preseq](http://smithlabresearch.org/software/preseq/) can be used to estimate read redundancy for a given sequencing depth. Average tag density profiles can be used to estimate ChIP enrichment for promoter proximal histone modifications. Use of different parameters for different antibodies (calling broad or narrow peaks) is possible. Additionally, users can elect to use BAM file from another experiment as control for MACS2 peak calling. ## Cite as *Kartashov AV, Barski A. BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data. Genome Biol. 2015;16(1):158. Published 2015 Aug 7. [doi:10.1186/s13059-015-0720-3](https://www.ncbi.nlm.nih.gov/pubmed/26248465)* ## Software versions - Bowtie 1.2.0 - Samtools 1.4 - Preseq 2.0 - MACS2 2.1.1.20160309 - Bedtools 2.26.0 - UCSC userApps v358 ## Inputs | ID | Label | Description | Required | Default | Upstream analyses | | ------------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------: | ------- | ------------------------------- | | **fastq\_file** | FASTQ file | Single-read sequencing data in FASTQ format (fastq, fq, bzip2, gzip, zip) | + | | | | **indices\_folder** | Genome indices | Directory with the genome indices generated by Bowtie | + | | genome\_indices/bowtie\_indices | | **annotation\_file** | Genome annotation file | Genome annotation file in TSV format | + | | genome\_indices/annotation | | **genome\_size** | Effective genome size | The length of the mappable genome (hs, mm, ce, dm or number, for example 2.7e9) | + | | genome\_indices/genome\_size | | **chrom\_length** | Chromosome lengths file | Chromosome lengths file in TSV format | + | | genome\_indices/chrom\_length | | **broad\_peak** | Call broad peaks | Make MACS2 call broad peaks by linking nearby highly enriched regions | + | | | | **control\_file** | Control ChIP-Seq single-read experiment | Indexed BAM file from the ChIP-Seq single-read experiment to be used as a control for MACS2 peak calling | | Null | control\_file/bambai\_pair | | **exp\_fragment\_size** | Expected fragment size | Expected fragment size for read extenstion towards 3' end if *force\_fragment\_size* was set to True or if calculated by MACS2 fragment size was less that 80 bp | | 150 | | | **force\_fragment\_size** | Force peak calling with expected fragment size | Make MACS2 don't build the shifting model and use expected fragment size for read extenstion towards 3' end | | False | | | **clip\_3p\_end** | Clip from 3' end | Number of base pairs to clip from 3' end | | 0 | | | **clip\_5p\_end** | Clip from 5' end | Number of base pairs to clip from 5' end | | 0 | | | **remove\_duplicates** | Remove PCR duplicates | Remove PCR duplicates from sorted BAM file | | False | | | **threads** | Number of threads | Number of threads for those steps that support multithreading | | 2 | | ## Outputs | ID | Label | Description | Required | Visualization | | ------------------------ | ---------------------------------- | ------------------------------------------------------------------------------------ | :------: | ------------------------------------------------------------------ | | **fastx\_statistics** | FASTQ quality statistics | FASTQ quality statistics in TSV format | + | *Base Frequency* and *Quality Control* plots in *QC Plots* tab | | **bambai\_pair** | Aligned reads | Coordinate sorted BAM alignment and index BAI files | + | *Nucleotide Sequence Alignments* track in *IGV Genome Browser* tab | | **bigwig** | Genome coverage | Genome coverage in bigWig format | + | *Genome Coverage* track in *IGV Genome Browser* tab | | **iaintersect\_result** | Gene annotated peaks | MACS2 peak file annotated with nearby genes | + | *Peak Coordinates* table in *Peak Calling* tab | | **atdp\_result** | Average Tag Density Plot | Average Tag Density Plot file in TSV format | + | *Average Tag Density Plot* in *QC Plots* tab | | **macs2\_called\_peaks** | Called peaks | Called peaks file with 1-based coordinates in XLS format | + | | | **macs2\_narrow\_peaks** | Narrow peaks | Called peaks file in ENCODE narrow peak format | | *Narrow peaks* track in *IGV Genome Browser* tab | | **macs2\_broad\_peaks** | Broad peaks | Called peaks file in ENCODE broad peak format | | *Broad peaks* track in *IGV Genome Browser* tab | | **preseq\_estimates** | Expected Distinct Reads Count Plot | Expected distinct reads count file from Preseq in TSV format | | *Expected Distinct Reads Count Plot* in *QC Plots* tab | | **workflow\_statistics** | Workflow execution statistics | Overall workflow execution statistics from bowtie\_aligner and samtools\_rmdup steps | + | *Overview* tab and experiment's preview | | **bowtie\_log** | Read alignment log | Read alignment log file from Bowtie | + | | |
Path: workflows/chipseq-se.cwl Branch/Commit ID: 4360fb2e778ecee42e5f78f83b78c65ab3a2b1df |
|
|
|
DESeq2 (LRT) - differential gene expression analysis using likelihood ratio test
Runs DESeq2 using LRT (Likelihood Ratio Test) ============================================= The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model. When one performs a likelihood ratio test, the p values and the test statistic (the stat column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test p values therefore represent a test of all the variables and all the levels of factors which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This indicates that the p value is for the likelihood ratio test of all the variables and all the levels, while the log fold change is a single comparison from among those variables and levels. **Technical notes** 1. At least two biological replicates are required for every compared category 2. Metadata file describes relations between compared experiments, for example ``` ,time,condition DH1,day5,WT DH2,day5,KO DH3,day7,WT DH4,day7,KO DH5,day7,KO ``` where `time, condition, day5, day7, WT, KO` should be a single words (without spaces) and `DH1, DH2, DH3, DH4, DH5` correspond to the experiment aliases set in **RNA-Seq experiments** input. 3. Design and reduced formulas should start with **~** and include categories or, optionally, their interactions from the metadata file header. See details in DESeq2 manual [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions) and [here](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test) 4. Contrast should be set based on your metadata file header and available categories in a form of `Factor Numerator Denominator`, where `Factor` - column name from metadata file, `Numerator` - category from metadata file to be used as numerator in fold change calculation, `Denominator` - category from metadata file to be used as denominator in fold change calculation. For example `condition WT KO`. |
Path: workflows/deseq-lrt.cwl Branch/Commit ID: ad948b2691ef7f0f34de38f0102c3cd6f5182b29 |
|
|
|
Trim Galore RNA-Seq pipeline paired-end strand specific
Modified original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) **RNA-Seq** basic analysis for a **pair-end** experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-end RNA-Seq data. It performs the following steps: 1. Trim adapters from input FASTQ files 2. Use STAR to align reads from input FASTQ files according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 3. Use fastx_quality_stats to analyze input FASTQ files and generate quality statistics files 4. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ files to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file |
Path: workflows/trim-rnaseq-pe-dutp.cwl Branch/Commit ID: 2c486543c335bb99b245dfe7e2f033f535efb9cf |
|
|
|
mut.cwl
|
Path: tests/wf/mut.cwl Branch/Commit ID: 65aedc5e7e1f3ccace7f9022f8a54b3f0d5c9a8c |
