Explore Workflows

View already parsed workflows here or click here to add your own

Graph	Name	Retrieved From	View
	MAnorm PE - quantitative comparison of ChIP-Seq paired-end data What is MAnorm? -------------- MAnorm is a robust model for quantitative comparison of ChIP-Seq data sets of TFs (transcription factors) or epigenetic modifications and you can use it for: * Normalization of two ChIP-seq samples * Quantitative comparison (differential analysis) of two ChIP-seq samples * Evaluating the overlap enrichment of the protein binding sites(peaks) * Elucidating underlying mechanisms of cell-type specific gene regulation How MAnorm works? ---------------- MAnorm uses common peaks of two samples as a reference to build the rescaling model for normalization, which is based on the empirical assumption that if a chromatin-associated protein has a substantial number of peaks shared in two conditions, the binding at these common regions will tend to be determined by similar mechanisms, and thus should exhibit similar global binding intensities across samples. The observed differences on common peaks are presumed to reflect the scaling relationship of ChIP-Seq signals between two samples, which can be applied to all peaks. What do the inputs mean? ---------------- ### General Experiment short name/Alias * short name for you experiment to identify among the others ChIP-Seq PE sample 1 * previously analyzed ChIP-Seq paired-end experiment to be used as Sample 1 ChIP-Seq PE sample 2 * previously analyzed ChIP-Seq paired-end experiment to be used as Sample 2 Genome * Reference genome to be used for gene assigning ### Advanced Reads shift size for sample 1 * This value is used to shift reads towards 3' direction to determine the precise binding site. Set as half of the fragment length. Default 100 Reads shift size for sample 2 * This value is used to shift reads towards 5' direction to determine the precise binding site. Set as half of the fragment length. Default 100 M-value (log2-ratio) cutoff * Absolute M-value (log2-ratio) cutoff to define biased (differential binding) peaks. Default: 1.0 P-value cutoff * P-value cutoff to define biased peaks. Default: 0.01 Window size * Window size to count reads and calculate read densities. 2000 is recommended for sharp histone marks like H3K4me3 and H3K27ac, and 1000 for TFs or DNase-seq. Default: 2000	https://github.com/datirium/workflows.git Path: workflows/manorm-pe.cwl Branch/Commit ID: 36fd18f11e939d3908b1eca8d2939402f7a99b0f
	mut3.cwl	https://github.com/common-workflow-language/cwltool.git Path: tests/wf/mut3.cwl Branch/Commit ID: 6cfef62c21330672538fd5e9b45ec888569c0a6f
	tt_blastn_wnode	https://github.com/ncbi/pgap.git Path: task_types/tt_blastn_wnode.cwl Branch/Commit ID: 7b5130d2408bce82ee15c666b37d931ef6f452e3
	gcaccess_from_list	https://github.com/ncbi/pgap.git Path: task_types/tt_gcaccess_from_list.cwl Branch/Commit ID: f390475a4e0898d4933f0a28dae278aa35803eb1
	Salmon quantification, FASTQ -> H5AD count matrix	https://github.com/hubmapconsortium/salmon-rnaseq.git Path: steps/salmon-quantification.cwl Branch/Commit ID: 893136839e67fba983f2d22be100fd2db0adc9d9
	cache_asnb_entries	https://github.com/ncbi/pgap.git Path: task_types/tt_cache_asnb_entries.cwl Branch/Commit ID: f390475a4e0898d4933f0a28dae278aa35803eb1
	CLIP-Seq pipeline for single-read experiment NNNNG Cross-Linking ImmunoPrecipitation ================================= `CLIP` (`cross-linking immunoprecipitation`) is a method used in molecular biology that combines UV cross-linking with immunoprecipitation in order to analyse protein interactions with RNA or to precisely locate RNA modifications (e.g. m6A). (Uhl\|Houwaart\|Corrado\|Wright\|Backofen\|2017)(Ule\|Jensen\|Ruggiu\|Mele\|2003)(Sugimoto\|König\|Hussain\|Zupan\|2012)(Zhang\|Darnell\|2011) (Ke\| Alemu\| Mertens\| Gantman\|2015) CLIP-based techniques can be used to map RNA binding protein binding sites or RNA modification sites (Ke\| Alemu\| Mertens\| Gantman\|2015)(Ke\| Pandya-Jones\| Saito\| Fak\|2017) of interest on a genome-wide scale, thereby increasing the understanding of post-transcriptional regulatory networks. The identification of sites where RNA-binding proteins (RNABPs) interact with target RNAs opens the door to understanding the vast complexity of RNA regulation. UV cross-linking and immunoprecipitation (CLIP) is a transformative technology in which RNAs purified from _in vivo_ cross-linked RNA-protein complexes are sequenced to reveal footprints of RNABP:RNA contacts. CLIP combined with high-throughput sequencing (HITS-CLIP) is a generalizable strategy to produce transcriptome-wide maps of RNA binding with higher accuracy and resolution than standard RNA immunoprecipitation (RIP) profiling or purely computational approaches. The application of CLIP to Argonaute proteins has expanded the utility of this approach to mapping binding sites for microRNAs and other small regulatory RNAs. Finally, recent advances in data analysis take advantage of cross-link–induced mutation sites (CIMS) to refine RNA-binding maps to single-nucleotide resolution. Once IP conditions are established, HITS-CLIP takes ~8 d to prepare RNA for sequencing. Established pipelines for data analysis, including those for CIMS, take 3–4 d. Workflow -------- CLIP begins with the in-vivo cross-linking of RNA-protein complexes using ultraviolet light (UV). Upon UV exposure, covalent bonds are formed between proteins and nucleic acids that are in close proximity. (Darnell\|2012) The cross-linked cells are then lysed, and the protein of interest is isolated via immunoprecipitation. In order to allow for sequence specific priming of reverse transcription, RNA adapters are ligated to the 3' ends, while radiolabeled phosphates are transferred to the 5' ends of the RNA fragments. The RNA-protein complexes are then separated from free RNA using gel electrophoresis and membrane transfer. Proteinase K digestion is then performed in order to remove protein from the RNA-protein complexes. This step leaves a peptide at the cross-link site, allowing for the identification of the cross-linked nucleotide. (König\| McGlincy\| Ule\|2012) After ligating RNA linkers to the RNA 5' ends, cDNA is synthesized via RT-PCR. High-throughput sequencing is then used to generate reads containing distinct barcodes that identify the last cDNA nucleotide. Interaction sites can be identified by mapping the reads back to the transcriptome.	https://github.com/datirium/workflows.git Path: workflows/clipseq-se.cwl Branch/Commit ID: 564156a9e1cc7c3679a926c479ba3ae133b1bfd4
	Genelists heatmap - RNA-seq expression data visualized # Genelists heatmap - RNA-seq expression data visualized This visualization workflow takes as input 1 or more genelists derived from the DESeq and/or diffbind workflows along with user-selected samples and visualizes RNA-Seq expression data in a single morpheus heatmap. ### __References__ - Morpheus, https://software.broadinstitute.org/morpheus	https://github.com/datirium/workflows.git Path: workflows/genelists-deseq-only.cwl Branch/Commit ID: d76110e0bfc40c874f82e37cef6451d74df4f908
	gp_makeblastdb	https://github.com/ncbi/pgap.git Path: progs/gp_makeblastdb.cwl Branch/Commit ID: e668f9c4047f1971ae53040a5af3eccc4bfc3c53
	Kraken2 Metagenomic pipeline paired-end This workflow taxonomically classifies paired-end sequencing reads in FASTQ format, that have been optionally adapter trimmed with trimgalore, using Kraken2 and a user-selected pre-built database from a list of [genomic index files](https://benlangmead.github.io/aws-indexes/k2). ### __Inputs__ Kraken2 database for taxonomic classification: - [Viral (0.5 GB)](https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20221209.tar.gz), all refseq viral genomes - [MinusB (8.7 GB)](https://genome-idx.s3.amazonaws.com/kraken/k2_minusb_20221209.tar.gz), standard minus bacteria (archaea, viral, plasmid, human1, UniVec_Core) - [PlusPFP-16 (15.0 GB)](https://genome-idx.s3.amazonaws.com/kraken/k2_pluspfp_16gb_20221209.tar.gz), standard (archaea, bacteria, viral, plasmid, human1, UniVec_Core) + (protozoa, fungi & plant) capped at 16 GB (shrunk via random kmer downselect) - [EuPathDB46 (34.1 GB)](https://genome-idx.s3.amazonaws.com/kraken/k2_eupathdb48_20201113.tar.gz), eukaryotic pathogen genomes with contaminants removed (https://veupathdb.org/veupathdb/app) - [16S_gg_13_5 (73 MB)](https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz), Greengenes 16S rRNA database ([release 13.5](https://greengenes.secondgenome.com/?prefix=downloads/greengenes_database/gg_13_5/), 20200326)\n - [16S_silva_138 (112 MB)](https://genome-idx.s3.amazonaws.com/kraken/16S_Silva138_20200326.tgz), SILVA 16S rRNA database ([release 138.1](https://www.arb-silva.de/documentation/release-1381/), 20200827) Read 1 file: - FASTA/Q input R1 from a paired end library Read 2 file: - FASTA/Q input R2 from a paired end library Number of threads for steps that support multithreading: - Number of threads for steps that support multithreading - default set to `4` Advanced Inputs Tab (Optional): - Number of bases to clip from the 3p end - Number of bases to clip from the 5p end ### __Outputs__ - k2db, an upstream database used by kraken2 classifier ### __Data Analysis Steps__ 1. Trimming the adapters with TrimGalore. - This step is particularly important when the reads are long and the fragments are short - resulting in sequencing adapters at the ends of reads. If adapter is not removed the read will not map. TrimGalore can recognize standard adapters, such as Illumina or Nextera/Tn5 adapters. 2. Generate quality control statistics of trimmed, unmapped sequence data 3. (Optional) Clipping of 5' and/or 3' end by the specified number of bases. 4. Mapping reads to primary genome index with Bowtie. ### __References__ - Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0	https://github.com/datirium/workflows.git Path: workflows/kraken2-classify-pe.cwl Branch/Commit ID: 36fd18f11e939d3908b1eca8d2939402f7a99b0f