Explore Workflows

View already parsed workflows here or click here to add your own

Graph Name Retrieved From View
workflow graph revsort.cwl

Reverse the lines in a document, then sort those lines.

https://github.com/common-workflow-language/cwltool.git

Path: tests/wf/revsort.cwl

Branch/Commit ID: a8d8d00fd1e4274e1bc16001937db5aae46b0b0d

workflow graph phase VCF

https://github.com/genome/analysis-workflows.git

Path: definitions/subworkflows/phase_vcf.cwl

Branch/Commit ID: c6bbd4cdd612b3b5cc6e9000df4800c21e192bf5

workflow graph count-lines3-wf.cwl

https://github.com/common-workflow-language/cwltool.git

Path: cwltool/schemas/v1.0/v1.0/count-lines3-wf.cwl

Branch/Commit ID: e8b3565a008d95859fc44227987a54e6a53a8c29

workflow graph etl_http.cwl

https://github.com/nci-gdc/gdc-dnaseq-cwl.git

Path: workflows/dnaseq/etl_http.cwl

Branch/Commit ID: f34d3963b33e0a379338cb3cb75b0016f012bf2c

workflow graph MAnorm PE - quantitative comparison of ChIP-Seq paired-end data

What is MAnorm? -------------- MAnorm is a robust model for quantitative comparison of ChIP-Seq data sets of TFs (transcription factors) or epigenetic modifications and you can use it for: * Normalization of two ChIP-seq samples * Quantitative comparison (differential analysis) of two ChIP-seq samples * Evaluating the overlap enrichment of the protein binding sites(peaks) * Elucidating underlying mechanisms of cell-type specific gene regulation How MAnorm works? ---------------- MAnorm uses common peaks of two samples as a reference to build the rescaling model for normalization, which is based on the empirical assumption that if a chromatin-associated protein has a substantial number of peaks shared in two conditions, the binding at these common regions will tend to be determined by similar mechanisms, and thus should exhibit similar global binding intensities across samples. The observed differences on common peaks are presumed to reflect the scaling relationship of ChIP-Seq signals between two samples, which can be applied to all peaks. What do the inputs mean? ---------------- ### General **Experiment short name/Alias** * short name for you experiment to identify among the others **ChIP-Seq PE sample 1** * previously analyzed ChIP-Seq paired-end experiment to be used as Sample 1 **ChIP-Seq PE sample 2** * previously analyzed ChIP-Seq paired-end experiment to be used as Sample 2 **Genome** * Reference genome to be used for gene assigning ### Advanced **Reads shift size for sample 1** * This value is used to shift reads towards 3' direction to determine the precise binding site. Set as half of the fragment length. Default 100 **Reads shift size for sample 2** * This value is used to shift reads towards 5' direction to determine the precise binding site. Set as half of the fragment length. Default 100 **M-value (log2-ratio) cutoff** * Absolute M-value (log2-ratio) cutoff to define biased (differential binding) peaks. Default: 1.0 **P-value cutoff** * P-value cutoff to define biased peaks. Default: 0.01 **Window size** * Window size to count reads and calculate read densities. 2000 is recommended for sharp histone marks like H3K4me3 and H3K27ac, and 1000 for TFs or DNase-seq. Default: 2000

https://github.com/datirium/workflows.git

Path: workflows/manorm-pe.cwl

Branch/Commit ID: b5e16e359007150647b14dc6e038f4eb8dccda79

workflow graph ani_top_n

https://github.com/ncbi/pgap.git

Path: task_types/tt_ani_top_n.cwl

Branch/Commit ID: ac387721a55fd91df3dcdf16e199354618b136d1

workflow graph Create tagAlign file

This workflow creates tagAlign file

https://github.com/ncbi/cwl-ngs-workflows-cbb.git

Path: workflows/File-formats/create-tagAlign.cwl

Branch/Commit ID: ebf1dd3c243c08634b0b3d9766c0a354903920ee

workflow graph 16S metagenomic paired-end QIIME2 Analysis (differential abundance)

A workflow for processing a multiple 16S samples from within the SciDAP platform, via a QIIME2 pipeline. ## __Outputs__ #### Output files: Primary output files: - overview.md, list of inputs - demux.qzv, summary visualizations of imported data - alpha-rarefaction.qzv, plot of OTU rarefaction - taxa-bar-plots.qzv, relative frequency of taxomonies barplot - table.qza, table containing how many sequences are associated with each sample and with each feature (OTU) Optional output files: - pcoa-unweighted-unifrac-emperor.qzv, PCoA using unweighted unifrac method - pcoa-bray-curtis-emperor.qzv, PCoA using bray curtis method - heatmap.qzv, output from gneiss differential abundance analysis using unsupervised correlation-clustering method (this will define the partitions of microbes that commonly co-occur with each other using Ward hierarchical clustering) - ancom-\$LEVEL.qzv, output from ANCOM differential abundance analysis at family, genus, and species taxonomic levels (includes volcano plot) ## __Inputs__ #### General Info - Sample short name/Alias: Used for samplename in downstream analyses. Ensure this is the same name used in the metadata samplesheet. - metadata_file: Path to the TSV file containing experiment metadata. The first column must have the header \"sample-id\" with sample names exactly as they have been input into your SciDAP project. The remaining column headers are experiment-specific. NOTE: Custom Label parameter metadata must be INT data type. - Metadata header name for PCoA axis label: Must be identical to one of the headers of the metadata file. Values under this metadata header must be INT. Required for PCoA analysis. - Rarefaction normalization sampling depth: Required for differential abundance analyses (along with group and taxonomic level). This step will subsample the counts in each sample without replacement so that each sample in the resulting table has a total count of INT. If the total count for any sample(s) are smaller than this value, those samples will be dropped from further analysis. It's recommend making your choice by reviewing the rarefaction plot. Choose a value that is as high as possible (so you retain more sequences per sample) while excluding as few samples as possible. - Metadata header name for differential abundance analyses: Required for differential abundance analyses (along with sampling depth and taxonomic level). Group/experimental condition column name from sample metadata file. Must be identical to one of the headers of the sample-metadata file. The corresponding column should only have two groups/conditions. - Taxonomic level for differential abundance analysis: Required for differential abundance analyses (along with sampling depth and group). Collapses the OTU table at the taxonomic level of interest for differential abundance analysis with ANCOM. Default: Genus - 16S samples for combined analysis: Upstream 16S samples for combined analysis. R1 and R2 fastq are used for generating the manifest file for data import to qiime2. - Trim 5' of R1: Recommended if adapters are still on the input sequences. Trims the first J bases from the 5' end of each forward read. - Trim 5' of R2: Recommended if adapters are still on the input sequences. Trims the first K bases from the 5' end of each reverse read. - Truncate 3' of R1: Recommended if quality drops off along the length of the read. Clips the forward read starting M bases from the 5' end (before trimming). - Truncate 3' of R2: Recommended if quality drops off along the length of the read. Clips the reverse read starting N bases from the 5' end (before trimming). - Threads: Number of threads to use for steps that support multithreading. ### __Data Analysis Steps__ 1. Import all sample read data, make a qiime artifact (demux.qza), and summary visualization 2. Denoising will detect and correct (where possible) Illumina amplicon sequence data. This process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences. 3. Generate a phylogenetic tree for diversity analyses and rarefaction processing and plotting. 4. Taxonomy classification of amplicons. Performed using a Naive Bayes classifier trained on the Greengenes2 database \"gg_2022_10_backbone_full_length.nb.qza\". 5. If \"Metadata header name for PCoA axis label\" is provided, principle coordinates analysis (PCoA) will be performed using the unweighted unifrac and bray curtis methods. 3D plots are produced with PCo1, PCo2, and the provided axis label on the x, y, and z axes. 6. If the sampling depth and metadata header for differential analysis are provided, differential abundance analysis will be performed using Gneiss and ANCOM methods at the family, genus, and species taxonomic levels. A unsupervised hierarchical clustering heatmap (Gneiss) and volcano plot (ANCOM) are produced at the taxonomic level between the specified group. ### __References__ 1. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9

https://github.com/datirium/workflows.git

Path: workflows/qiime2-aggregate.cwl

Branch/Commit ID: d76110e0bfc40c874f82e37cef6451d74df4f908

workflow graph Build STAR indices

Workflow runs [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) to build indices for reference genome provided in a single FASTA file as fasta_file input and GTF annotation file from annotation_gtf_file input. Generated indices are saved in a folder with the name that corresponds to the input genome.

https://github.com/datirium/workflows.git

Path: workflows/star-index.cwl

Branch/Commit ID: 57863b6131d8262c5ce864adaf8e4038401e71a2

workflow graph Seed Search Compartments

https://github.com/ncbi/pgap.git

Path: protein_alignment/wf_seed.cwl

Branch/Commit ID: 369e2b6c7f4db75099d258729dec1326f55d2cc5