Explore Workflows

View already parsed workflows here or click here to add your own

Graph	Name	Retrieved From	View
	Bisulfite alignment and QC	https://github.com/genome/analysis-workflows.git Path: definitions/pipelines/bisulfite.cwl Branch/Commit ID: 5cb188131f786ed33156e2f0e3dd63ab9c04245d
	taxonomy_check_16S	https://github.com/ncbi/pgap.git Path: task_types/tt_taxonomy_check_16S.cwl Branch/Commit ID: 252e7214ac64cb1128881e76743013e61bc7ec38
	scatter-valuefrom-wf6.cwl	https://github.com/common-workflow-language/cwltool.git Path: cwltool/schemas/v1.0/v1.0/scatter-valuefrom-wf6.cwl Branch/Commit ID: 7c7615c44b80f8e76e659433f8c7875603ae0b25
	count-lines3-wf.cwl	https://github.com/common-workflow-language/cwltool.git Path: cwltool/schemas/v1.0/v1.0/count-lines3-wf.cwl Branch/Commit ID: 7dec97bb8f0bc2d9e9eb710faf41f2e98cc7cdda
	16S metagenomic paired-end QIIME2 Analysis (differential abundance) A workflow for processing a multiple 16S samples from within the SciDAP platform, via a QIIME2 pipeline. ## __Outputs__ #### Output files: Primary output files: - overview.md, list of inputs - demux.qzv, summary visualizations of imported data - alpha-rarefaction.qzv, plot of OTU rarefaction - taxa-bar-plots.qzv, relative frequency of taxomonies barplot - table.qza, table containing how many sequences are associated with each sample and with each feature (OTU) Optional output files: - pcoa-unweighted-unifrac-emperor.qzv, PCoA using unweighted unifrac method - pcoa-bray-curtis-emperor.qzv, PCoA using bray curtis method - heatmap.qzv, output from gneiss differential abundance analysis using unsupervised correlation-clustering method (this will define the partitions of microbes that commonly co-occur with each other using Ward hierarchical clustering) - ancom-\$LEVEL.qzv, output from ANCOM differential abundance analysis at family, genus, and species taxonomic levels (includes volcano plot) ## __Inputs__ #### General Info - Sample short name/Alias: Used for samplename in downstream analyses. Ensure this is the same name used in the metadata samplesheet. - metadata_file: Path to the TSV file containing experiment metadata. The first column must have the header \"sample-id\" with sample names exactly as they have been input into your SciDAP project. The remaining column headers are experiment-specific. NOTE: Custom Label parameter metadata must be INT data type. - Metadata header name for PCoA axis label: Must be identical to one of the headers of the metadata file. Values under this metadata header must be INT. Required for PCoA analysis. - Rarefaction normalization sampling depth: Required for differential abundance analyses (along with group and taxonomic level). This step will subsample the counts in each sample without replacement so that each sample in the resulting table has a total count of INT. If the total count for any sample(s) are smaller than this value, those samples will be dropped from further analysis. It's recommend making your choice by reviewing the rarefaction plot. Choose a value that is as high as possible (so you retain more sequences per sample) while excluding as few samples as possible. - Metadata header name for differential abundance analyses: Required for differential abundance analyses (along with sampling depth and taxonomic level). Group/experimental condition column name from sample metadata file. Must be identical to one of the headers of the sample-metadata file. The corresponding column should only have two groups/conditions. - Taxonomic level for differential abundance analysis: Required for differential abundance analyses (along with sampling depth and group). Collapses the OTU table at the taxonomic level of interest for differential abundance analysis with ANCOM. Default: Genus - 16S samples for combined analysis: Upstream 16S samples for combined analysis. R1 and R2 fastq are used for generating the manifest file for data import to qiime2. - Trim 5' of R1: Recommended if adapters are still on the input sequences. Trims the first J bases from the 5' end of each forward read. - Trim 5' of R2: Recommended if adapters are still on the input sequences. Trims the first K bases from the 5' end of each reverse read. - Truncate 3' of R1: Recommended if quality drops off along the length of the read. Clips the forward read starting M bases from the 5' end (before trimming). - Truncate 3' of R2: Recommended if quality drops off along the length of the read. Clips the reverse read starting N bases from the 5' end (before trimming). - Threads: Number of threads to use for steps that support multithreading. ### __Data Analysis Steps__ 1. Import all sample read data, make a qiime artifact (demux.qza), and summary visualization 2. Denoising will detect and correct (where possible) Illumina amplicon sequence data. This process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences. 3. Generate a phylogenetic tree for diversity analyses and rarefaction processing and plotting. 4. Taxonomy classification of amplicons. Performed using a Naive Bayes classifier trained on the Greengenes2 database \"gg_2022_10_backbone_full_length.nb.qza\". 5. If \"Metadata header name for PCoA axis label\" is provided, principle coordinates analysis (PCoA) will be performed using the unweighted unifrac and bray curtis methods. 3D plots are produced with PCo1, PCo2, and the provided axis label on the x, y, and z axes. 6. If the sampling depth and metadata header for differential analysis are provided, differential abundance analysis will be performed using Gneiss and ANCOM methods at the family, genus, and species taxonomic levels. A unsupervised hierarchical clustering heatmap (Gneiss) and volcano plot (ANCOM) are produced at the taxonomic level between the specified group. ### __References__ 1. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9	https://github.com/datirium/workflows.git Path: workflows/qiime2-aggregate.cwl Branch/Commit ID: 261c0232a7a40880f2480b811ed2d7e89c463869
	SoupX Estimate SoupX Estimate ==============	https://github.com/datirium/workflows.git Path: workflows/soupx.cwl Branch/Commit ID: c6bfa0de917efb536dd385624fc7702e6748e61d
	Detect Variants workflow	https://github.com/genome/analysis-workflows.git Path: definitions/pipelines/detect_variants_mouse.cwl Branch/Commit ID: 735be84cdea041fcc8bd8cbe5728b29ca3586a21
	Trim Galore ChIP-Seq pipeline single-read The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) ChIP-Seq basic analysis workflow for a single-read experiment with Trim Galore. _Trim Galore_ is a wrapper around [Cutadapt](https://github.com/marcelm/cutadapt) and [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data. In outputs it returns coordinate sorted BAM file alongside with index BAI file, quality statistics of the input FASTQ file, reads coverage in a form of BigWig file, peaks calling data in a form of narrowPeak or broadPeak files, islands with the assigned nearest genes and region type, data for average tag density plot (on the base of BAM file). Workflow starts with step fastx\_quality\_stats from FASTX-Toolkit to calculate quality statistics for input FASTQ file. At the same time `bowtie` is used to align reads from input FASTQ file to reference genome bowtie\_aligner. The output of this step is unsorted SAM file which is being sorted and indexed by `samtools sort` and `samtools index` samtools\_sort\_index. Based on workflow’s input parameters indexed and sorted BAM file can be processed by `samtools rmdup` samtools\_rmdup to get rid of duplicated reads. If removing duplicates is not required the original input BAM and BAI files return. Otherwise step samtools\_sort\_index\_after\_rmdup repeat `samtools sort` and `samtools index` with BAM and BAI files. Right after that `macs2 callpeak` performs peak calling macs2\_callpeak. On the base of returned outputs the next step macs2\_island\_count calculates the number of islands and estimated fragment size. If the last one is less that 80bp (hardcoded in the workflow) `macs2 callpeak` is rerun again with forced fixed fragment size value (macs2\_callpeak\_forced). If at the very beginning it was set in workflow input parameters to force run peak calling with fixed fragment size, this step is skipped and the original peak calling results are saved. In the next step workflow again calculates the number of islands and estimates fragment size (macs2\_island\_count\_forced) for the data obtained from macs2\_callpeak\_forced step. If the last one was skipped the results from macs2\_island\_count\_forced step are equal to the ones obtained from macs2\_island\_count step. Next step (macs2\_stat) is used to define which of the islands and estimated fragment size should be used in workflow output: either from macs2\_island\_count step or from macs2\_island\_count\_forced step. If input trigger of this step is set to True it means that macs2\_callpeak\_forced step was run and it returned different from macs2\_callpeak step results, so macs2\_stat step should return [fragments\_new, fragments\_old, islands\_new], if trigger is False the step returns [fragments\_old, fragments\_old, islands\_old], where sufix \"old\" defines results obtained from macs2\_island\_count step and sufix \"new\" - from macs2\_island\_count\_forced step. The following two steps (bamtools\_stats and bam\_to\_bigwig) are used to calculate coverage on the base of input BAM file and save it in BigWig format. For that purpose bamtools stats returns the number of mapped reads number which is then used as scaling factor by bedtools genomecov when it performs coverage calculation and saves it in BED format. The last one is then being sorted and converted to BigWig format by bedGraphToBigWig tool from UCSC utilities. Step get\_stat is used to return a text file with statistics in a form of [TOTAL, ALIGNED, SUPRESSED, USED] reads count. Step island\_intersect assigns genes and regions to the islands obtained from macs2\_callpeak\_forced. Step average\_tag\_density is used to calculate data for average tag density plot on the base of BAM file.	https://github.com/datirium/workflows.git Path: workflows/trim-chipseq-se.cwl Branch/Commit ID: 282762f8bbaea57dd488115745ef798e128bade1
	align_sort_sa	https://github.com/ncbi/pgap.git Path: task_types/tt_align_sort_sa.cwl Branch/Commit ID: cabb1a9a95244e93294727be8cf5816c38992cb0
	HBA_target.cwl	https://git.astron.nl/RD/LINC.git Path: workflows/HBA_target.cwl Branch/Commit ID: 8a697be0fa85795f7822146015edf963a5681ca7