Explore Workflows

View already parsed workflows here or click here to add your own

Graph	Name	Retrieved From	View
	16S metagenomic paired-end QIIME2 Analysis (differential abundance) A workflow for processing a multiple 16S samples from within the SciDAP platform, via a QIIME2 pipeline. ## __Outputs__ #### Output files: Primary output files: - overview.md, list of inputs - demux.qzv, summary visualizations of imported data - alpha-rarefaction.qzv, plot of OTU rarefaction - taxa-bar-plots.qzv, relative frequency of taxomonies barplot - table.qza, table containing how many sequences are associated with each sample and with each feature (OTU) Optional output files: - pcoa-unweighted-unifrac-emperor.qzv, PCoA using unweighted unifrac method - pcoa-bray-curtis-emperor.qzv, PCoA using bray curtis method - heatmap.qzv, output from gneiss differential abundance analysis using unsupervised correlation-clustering method (this will define the partitions of microbes that commonly co-occur with each other using Ward hierarchical clustering) - ancom-\$LEVEL.qzv, output from ANCOM differential abundance analysis at family, genus, and species taxonomic levels (includes volcano plot) ## __Inputs__ #### General Info - Sample short name/Alias: Used for samplename in downstream analyses. Ensure this is the same name used in the metadata samplesheet. - metadata_file: Path to the TSV file containing experiment metadata. The first column must have the header \"sample-id\" with sample names exactly as they have been input into your SciDAP project. The remaining column headers are experiment-specific. NOTE: Custom Label parameter metadata must be INT data type. - Metadata header name for PCoA axis label: Must be identical to one of the headers of the metadata file. Values under this metadata header must be INT. Required for PCoA analysis. - Rarefaction normalization sampling depth: Required for differential abundance analyses (along with group and taxonomic level). This step will subsample the counts in each sample without replacement so that each sample in the resulting table has a total count of INT. If the total count for any sample(s) are smaller than this value, those samples will be dropped from further analysis. It's recommend making your choice by reviewing the rarefaction plot. Choose a value that is as high as possible (so you retain more sequences per sample) while excluding as few samples as possible. - Metadata header name for differential abundance analyses: Required for differential abundance analyses (along with sampling depth and taxonomic level). Group/experimental condition column name from sample metadata file. Must be identical to one of the headers of the sample-metadata file. The corresponding column should only have two groups/conditions. - Taxonomic level for differential abundance analysis: Required for differential abundance analyses (along with sampling depth and group). Collapses the OTU table at the taxonomic level of interest for differential abundance analysis with ANCOM. Default: Genus - 16S samples for combined analysis: Upstream 16S samples for combined analysis. R1 and R2 fastq are used for generating the manifest file for data import to qiime2. - Trim 5' of R1: Recommended if adapters are still on the input sequences. Trims the first J bases from the 5' end of each forward read. - Trim 5' of R2: Recommended if adapters are still on the input sequences. Trims the first K bases from the 5' end of each reverse read. - Truncate 3' of R1: Recommended if quality drops off along the length of the read. Clips the forward read starting M bases from the 5' end (before trimming). - Truncate 3' of R2: Recommended if quality drops off along the length of the read. Clips the reverse read starting N bases from the 5' end (before trimming). - Threads: Number of threads to use for steps that support multithreading. ### __Data Analysis Steps__ 1. Import all sample read data, make a qiime artifact (demux.qza), and summary visualization 2. Denoising will detect and correct (where possible) Illumina amplicon sequence data. This process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, and will filter chimeric sequences. 3. Generate a phylogenetic tree for diversity analyses and rarefaction processing and plotting. 4. Taxonomy classification of amplicons. Performed using a Naive Bayes classifier trained on the Greengenes2 database \"gg_2022_10_backbone_full_length.nb.qza\". 5. If \"Metadata header name for PCoA axis label\" is provided, principle coordinates analysis (PCoA) will be performed using the unweighted unifrac and bray curtis methods. 3D plots are produced with PCo1, PCo2, and the provided axis label on the x, y, and z axes. 6. If the sampling depth and metadata header for differential analysis are provided, differential abundance analysis will be performed using Gneiss and ANCOM methods at the family, genus, and species taxonomic levels. A unsupervised hierarchical clustering heatmap (Gneiss) and volcano plot (ANCOM) are produced at the taxonomic level between the specified group. ### __References__ 1. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9	https://github.com/datirium/workflows.git Path: workflows/qiime2-aggregate.cwl Branch/Commit ID: 93b844a80f4008cc973ea9b5efedaff32a343895
	count-lines11-null-step-wf-noET.cwl	https://github.com/common-workflow-language/cwl-v1.2.git Path: tests/count-lines11-null-step-wf-noET.cwl Branch/Commit ID: c7c97715b400ff2194aa29fc211d3401cea3a9bf
	Cell Ranger ARC Count Gene Expression + ATAC Cell Ranger ARC Count Gene Expression + ATAC ============================================	https://github.com/datirium/workflows.git Path: workflows/cellranger-arc-count.cwl Branch/Commit ID: c6bfa0de917efb536dd385624fc7702e6748e61d
	directory.cwl Inspect provided directory and return filenames. Generate a new directory and return it (including content).	https://github.com/common-workflow-language/cwltool.git Path: tests/wf/directory.cwl Branch/Commit ID: 3ed10d0ea7ac57550433a89a92bdbe756bdb0e40
	extract_readgroup_fastq_se_http.cwl	https://github.com/nci-gdc/gdc-dnaseq-cwl.git Path: workflows/bamfastq_align/extract_readgroup_fastq_se_http.cwl Branch/Commit ID: 3cb464a3a5c39cc060cd23d9c60918bc9ffb169b
	Filter single sample sv vcf from depth callers(cnvkit/cnvnator)	https://github.com/genome/analysis-workflows.git Path: definitions/subworkflows/sv_depth_caller_filter.cwl Branch/Commit ID: e59c77629936fad069007ba642cad49fef7ad29f
	kmer_seq_entry_extract_wnode	https://github.com/ncbi/pgap.git Path: task_types/tt_kmer_seq_entry_extract_wnode.cwl Branch/Commit ID: 0d9e6bb52eac0c209af3977aa779e39aaa432458
	rnaseq-se.cwl Runs RNA-Seq BioWardrobe basic analysis with single-end data file.	https://github.com/Barski-lab/workflows.git Path: workflows/rnaseq-se.cwl Branch/Commit ID: b4b7b2e7e508be5eac639f9e323d141daf714c0d
	import_schema-def.cwl	https://github.com/common-workflow-language/cwl-v1.2.git Path: tests/import_schema-def.cwl Branch/Commit ID: 1f3ef888d9ef2306c828065c460c1800604f0de4
	Trim Galore SMARTer RNA-Seq pipeline paired-end strand specific https://chipster.csc.fi/manual/library-type-summary.html Modified original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) RNA-Seq basic analysis for a pair-end experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-end RNA-Seq data. It performs the following steps: 1. Trim adapters from input FASTQ files 2. Use STAR to align reads from input FASTQ files according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 3. Use fastx_quality_stats to analyze input FASTQ files and generate quality statistics files 4. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ files to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file	https://github.com/datirium/workflows.git Path: workflows/trim-rnaseq-pe-smarter-dutp.cwl Branch/Commit ID: cc6fa135d04737fdde3b4414d6e214cf8c812f6e