Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph exomeseq-03-organizedirectories.cwl

https://github.com/Duke-GCB/bespin-cwl.git

Path: subworkflows/exomeseq-03-organizedirectories.cwl

Branch/Commit ID: 216ff9bf78130add564f7bcfba6385d5dab4c77d

workflow graph CLIP-Seq pipeline for single-read experiment NNNNG

Cross-Linking ImmunoPrecipitation ================================= `CLIP` (`cross-linking immunoprecipitation`) is a method used in molecular biology that combines UV cross-linking with immunoprecipitation in order to analyse protein interactions with RNA or to precisely locate RNA modifications (e.g. m6A). (Uhl|Houwaart|Corrado|Wright|Backofen|2017)(Ule|Jensen|Ruggiu|Mele|2003)(Sugimoto|König|Hussain|Zupan|2012)(Zhang|Darnell|2011) (Ke| Alemu| Mertens| Gantman|2015) CLIP-based techniques can be used to map RNA binding protein binding sites or RNA modification sites (Ke| Alemu| Mertens| Gantman|2015)(Ke| Pandya-Jones| Saito| Fak|2017) of interest on a genome-wide scale, thereby increasing the understanding of post-transcriptional regulatory networks. The identification of sites where RNA-binding proteins (RNABPs) interact with target RNAs opens the door to understanding the vast complexity of RNA regulation. UV cross-linking and immunoprecipitation (CLIP) is a transformative technology in which RNAs purified from _in vivo_ cross-linked RNA-protein complexes are sequenced to reveal footprints of RNABP:RNA contacts. CLIP combined with high-throughput sequencing (HITS-CLIP) is a generalizable strategy to produce transcriptome-wide maps of RNA binding with higher accuracy and resolution than standard RNA immunoprecipitation (RIP) profiling or purely computational approaches. The application of CLIP to Argonaute proteins has expanded the utility of this approach to mapping binding sites for microRNAs and other small regulatory RNAs. Finally, recent advances in data analysis take advantage of cross-link–induced mutation sites (CIMS) to refine RNA-binding maps to single-nucleotide resolution. Once IP conditions are established, HITS-CLIP takes ~8 d to prepare RNA for sequencing. Established pipelines for data analysis, including those for CIMS, take 3–4 d. Workflow -------- CLIP begins with the in-vivo cross-linking of RNA-protein complexes using ultraviolet light (UV). Upon UV exposure, covalent bonds are formed between proteins and nucleic acids that are in close proximity. (Darnell|2012) The cross-linked cells are then lysed, and the protein of interest is isolated via immunoprecipitation. In order to allow for sequence specific priming of reverse transcription, RNA adapters are ligated to the 3' ends, while radiolabeled phosphates are transferred to the 5' ends of the RNA fragments. The RNA-protein complexes are then separated from free RNA using gel electrophoresis and membrane transfer. Proteinase K digestion is then performed in order to remove protein from the RNA-protein complexes. This step leaves a peptide at the cross-link site, allowing for the identification of the cross-linked nucleotide. (König| McGlincy| Ule|2012) After ligating RNA linkers to the RNA 5' ends, cDNA is synthesized via RT-PCR. High-throughput sequencing is then used to generate reads containing distinct barcodes that identify the last cDNA nucleotide. Interaction sites can be identified by mapping the reads back to the transcriptome.

https://github.com/datirium/workflows.git

Path: workflows/clipseq-se.cwl

Branch/Commit ID: 9ee330737f4603e4e959ffe786fbb2046db70a00

workflow graph CUT&RUN/TAG MACS2 pipeline paired-end

A basic analysis workflow for paired-read CUT&RUN and CUT&TAG sequencing experiments. These sequencing library prep methods are ultra-sensitive chromatin mapping technologies compared to the ChIP-Seq methodology. Its primary benefits include 1) length filtering, 2) a higher signal-to-noise ratio, and 3) built-in normalization for between sample comparisons. This workflow utilizes the tool MACS2 which calls enriched regions in the target sequence data by identifying the top regions by area under a poisson distribution (of the alignment pileup). This workflow is loosely based on the [CUT-RUNTools-2.0 pipeline](https://github.com/fl-yu/CUT-RUNTools-2.0) pipeline, and the ChIP-Seq pipeline from [BioWardrobe](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) was used as a CWL template. ### __Inputs__ *General Info (required\*):* - Experiment short name/Alias* - a unique name for the sample (e.g. what was used on tubes while processing it) - Cells* - sample cell type or organism name - Conditions* - experimental condition name - Catalog # - catalog number for cells from vender/supplier - Primary [genome index](https://scidap.com/tutorials/basic/genome-indices) for peak calling* - preprocessed genome index of sample organism for primary alignment and peak calling - Secondary [genome index](https://scidap.com/tutorials/basic/genome-indices) for spike-in normalization* - preprocessed genome index of spike-in organism for secondary alignment (of unaligned reads from primary alignment) and spike-in normalization, default should be E. coli K-12 - FASTQ file for R1* - read 1 file of a pair-end library - FASTQ file for R2* - read 2 file of a pair-end library *Advanced:* - - Number of bases to clip from the 3p end - used by bowtie aligner to trim <int> bases from 3' (right) end of reads - Number of bases to clip from the 5p end - used by bowtie aligner to trim <int> bases from 5' (left) end of reads - Call samtools rmdup to remove duplicates from sorted BAM file? - toggle on/off to remove duplicate reads from analysis - Fragment Length Filter will retain fragments between set base pair (bp) ranges for peak analysis - drop down menu - `default_below_1000` retains fragments <1000 bp - `histones_130_to_300` retains fragments between 130-300 bp - `TF_below_130` retains fragments <130 bp - Max distance (bp) from gene TSS (in both directions) overlapping which the peak will be assigned to the promoter region - default set to `1000` - Max distance (bp) from the promoter (only in upstream directions) overlapping which the peak will be assigned to the upstream region - default set to `20000` - Number of threads for steps that support multithreading - default set to `2` ### __Outputs__ Intermediate and final downloadable outputs include: - IGV with gene, BigWig (raw and normalized), and stringent peak tracks - quality statistics and visualizations for both R1/R2 input FASTQ files - coordinate sorted BAM file with associated BAI file for primary alignment - read pileup/coverage in BigWig format (raw and normalized) - cleaned bed files (containing fragment coordinates), and spike-in normalized peak-called BED files (also includes \"narrow\" and \"broad\" peaks). - stringent peak call bed file with nearest gene annotations per peak ### __Data Analysis Steps__ 1. Trimming the adapters with TrimGalore. - This step is particularly important when the reads are long and the fragments are short - resulting in sequencing adapters at the ends of reads. If adapter is not removed the read will not map. TrimGalore can recognize standard adapters, such as Illumina or Nextera/Tn5 adapters. 2. Generate quality control statistics of trimmed, unmapped sequence data 3. (Optional) Clipping of 5' and/or 3' end by the specified number of bases. 4. Mapping reads to primary genome index with Bowtie. - Only uniquely mapped reads with less than 3 mismatches are used in the downstream analysis. Results are then sorted and indexed. Final outputs are in bam/bai format, which are also used to extrapolate effects of additional sequencing based on library complexity. 5. (Optional) Removal of duplicates (reads/pairs of reads mapping to exactly the same location). - This step is used to remove reads overamplified during amplification of the library. Unfortunately, it may also remove \"good\" reads. We usually do not remove duplicates unless the library is heavily duplicated. 6. Mapping unaligned reads from primary alignment to secondary genome index with Bowtie. - This step is used to obtain the number of reads for normalization, used to scale the pileups from the primary alignment. After normalization, sample pileups/peak may then be appropriately compared to one another assuming an equal use of spike-in material during library preparation. Note the default genome index for this step should be *E. coli* K-12 if no spike-in material was called out in the library protocol. Refer to [Step 16](https://www.protocols.io/view/cut-amp-tag-data-processing-and-analysis-tutorial-e6nvw93x7gmk/v1?step=16#step-4A3D8C70DC3011EABA5FF3676F0827C5) of the \"CUT&Tag Data Processing and Analysis Tutorial\" by Zheng Y et al (2020). Protocol.io. 7. Formatting alignment file to account for fragments based on paired-end BAM. - Generates a filtered and normalized bed file to be used as input for peak calling. 8. Call enriched regions using MACS2. - This step called peaks (broad and narrow) using the MACS2 tool with default parameters and no normalization to a control sample. 9. Generation and formatting of output files. - This step collects read, alignment, and peak statistics, as well asgenerates BigWig coverage/pileup files for display on the browser using IGV. The coverage shows the number of fragments that cover each base in the genome both normalized and unnormalized to the calculated spike-in scaling factor. ### __References__ - Meers MP, Tenenbaum D, Henikoff S. (2019). Peak calling by Sparse Enrichment Analysis for CUT&RUN chromatin profiling. Epigenetics and Chromatin 12(1):42. - Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.

https://github.com/datirium/workflows.git

Path: workflows/cutandrun-macs2-pe.cwl

Branch/Commit ID: 22880e0f41d0420a17d643e8a6e8ee18165bbfbf

workflow graph deeptools - Tag enrichment heatmap and density profile for filtered regions

Generates tag density heatmap and histogram for the list of features in a headerless regions file. Inputs used are the bigWig file(s) of one or more ChIP/ATAC/C&R samples, and one or more filtered feature file(s) from the filtering and/or set operation workflows. The latter format contains `chrom start end name score strand`, only the first 3 columns are used in deeptools computeMatrix tool. The matrix is then used as input to plotHeatmap to generate the tag density plot and tag enrichment heatmap. computeMatrix paramters: --regionsFileName, -R File name, in BED format, containing the regions to plot. If multiple bed files are given, each one is considered a group that can be plotted separately. Also, adding a “#” symbol in the bed file causes all the regions until the previous “#” to be considered one group. --scoreFileName, -S bigWig file(s) containing the scores to be plotted. BigWig files can be obtained by using the bamCoverage or bamCompare tools. More information about the bigWig file format can be found at http://genome.ucsc.edu/goldenPath/help/bigWig.html --outFileName, -o File name to save the gzipped matrix file needed by the “plotHeatmap” and “plotProfile” tools. --beforeRegionStartLength=0, -b=0, --upstream=0 Distance upstream of the start site of the regions defined in the region file. If the regions are genes, this would be the distance upstream of the transcription start site. --regionBodyLength=1000, -m=1000 Distance in bases to which all regions will be fit. --afterRegionStartLength=0, -a=0, --downstream=0 Distance downstream of the end site of the given regions. If the regions are genes, this would be the distance downstream of the transcription end site. --numberOfProcessors=max/2, -p=max/2 Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors. plotHeatmap parameters: --matrixFile, -m Matrix file from the computeMatrix tool. --outFileName, -out File name to save the image to. The file ending will be used to determine the image format. The available options are: “png”, “eps”, “pdf” and “svg”, e.g., MyHeatmap.png. --sortRegions=descend Whether the heatmap should present the regions sorted. The default is to sort in descending order based on the mean value per region. Possible choices: descend, ascend, no --sortUsing=mean Indicate which method should be used for sorting. For each row the method is computed. Possible choices: mean, median, max, min, sum, region_length --colorMap=RdYlBu Color map to use for the heatmap. Available values can be seen here: http://matplotlib.org/users/colormaps.html The available options are: ‘Spectral’, ‘summer’, ‘coolwarm’, ‘Set1’, ‘Set2’, ‘Set3’, ‘Dark2’, ‘hot’, ‘RdPu’, ‘YlGnBu’, ‘RdYlBu’, ‘gist_stern’, ‘cool’, ‘gray’, ‘GnBu’, ‘gist_ncar’, ‘gist_rainbow’, ‘CMRmap’, ‘bone’, ‘RdYlGn’, ‘spring’, ‘terrain’, ‘PuBu’, ‘spectral’, ‘gist_yarg’, ‘BuGn’, ‘bwr’, ‘cubehelix’, ‘YlOrRd’, ‘Greens’, ‘PRGn’, ‘gist_heat’, ‘Paired’, ‘hsv’, ‘Pastel2’, ‘Pastel1’, ‘BuPu’, ‘copper’, ‘OrRd’, ‘brg’, ‘gnuplot2’, ‘jet’, ‘gist_earth’, ‘Oranges’, ‘PiYG’, ‘YlGn’, ‘Accent’, ‘gist_gray’, ‘flag’, ‘BrBG’, ‘Reds’, ‘RdGy’, ‘PuRd’, ‘Blues’, ‘Greys’, ‘autumn’, ‘pink’, ‘binary’, ‘winter’, ‘gnuplot’, ‘RdBu’, ‘prism’, ‘YlOrBr’, ‘rainbow’, ‘seismic’, ‘Purples’, ‘ocean’, ‘PuOr’, ‘PuBuGn’, ‘nipy_spectral’, ‘afmhot’ --kmeans Number of clusters to compute. When this option is set, the matrix is split into clusters using the k-means algorithm. Only works for data that is not grouped, otherwise only the first group will be clustered. If more specific clustering methods are required, then save the underlying matrix and run the clustering using other software. The plotting of the clustering may fail with an error if a cluster has very few members compared to the total number or regions.

https://github.com/datirium/workflows.git

Path: workflows/deeptools.cwl

Branch/Commit ID: fa4f172486288a1a9d23864f1d6962d85a453e16

workflow graph kmer_seq_entry_extract_wnode

https://github.com/ncbi/pgap.git

Path: task_types/tt_kmer_seq_entry_extract_wnode.cwl

Branch/Commit ID: a2d6cd4c53bf3501f6bd79edebb7ca30bba8456f

workflow graph kf_alignment_fq_input_wf.cwl

https://github.com/kids-first/kf-alignment-workflow.git

Path: workflows/kf_alignment_fq_input_wf.cwl

Branch/Commit ID: 55315b6abb488f1f25fe725407814e8d4c23ba81

workflow graph scatter-valuefrom-wf2.cwl

https://github.com/common-workflow-language/cwltool.git

Path: cwltool/schemas/v1.0/v1.0/scatter-valuefrom-wf2.cwl

Branch/Commit ID: 2ae8117360a3cd4909d9d3f2b35c30bfffb25d0a

workflow graph env-wf3.cwl

https://github.com/common-workflow-language/cwltool.git

Path: cwltool/schemas/v1.0/v1.0/env-wf3.cwl

Branch/Commit ID: 6003cbb94f16103241b562f2133e7c4acac6c621

workflow graph gcaccess_from_list

https://github.com/ncbi/pgap.git

Path: task_types/tt_gcaccess_from_list.cwl

Branch/Commit ID: 7f9cfcbda5998b164bd1d8f1f6006aefda0f47f3

workflow graph cond-wf-004.1.cwl

https://github.com/common-workflow-language/cwl-utils.git

Path: testdata/cond-wf-004.1.cwl

Branch/Commit ID: e949503ac0dd7e22ba9b04ac51926d13780f9cee