Explore Workflows
View already parsed workflows here or click here to add your own
Graph | Name | Retrieved From | View |
---|---|---|---|
|
WGS Metagenomic pipeline paired-end
This workflow taxonomically classifies paired-end sequencing reads in FASTQ format for a SINGLE sample. Reads are first adapter trimmed with trimgalore and filtered using kneaddata with a bmtagger database. The resulting cleaned reads are classified using Kraken2 and a user-selected pre-built database from a list of [genomic index files](https://benlangmead.github.io/aws-indexes/k2). Unaligned reads are then classified using metaphlan4 with the mpa_vJan21_CHOCOPhlAnSGB_202103 database. The kraken2 report is used to generate a krona plot visualization of the abundance profile. Cleaned reads are also run through HUMANN3 using the uniref90 diamond databaseto produce a gene abundance report and metabolic pathway file. The latter is used for abundance coverage and functional assignment. ### __Inputs__ Kraken2 database for taxonomic classification: - Standard is recommended Read 1 file: - FASTA/Q input R1 from a paired end library Read 2 file: - FASTA/Q input R2 from a paired end library Number of threads for steps that support multithreading: - Number of threads for steps that support multithreading - default set to `4` Advanced Inputs Tab (Optional): - Number of bases to clip from the 3p end - Number of bases to clip from the 5p end ### __Outputs__ - kraken2 report (abundance profile) - krona plot (hierarchical visualization of taxonomic classifications) - various log files - metabolic pathway file - functional assignment ### __Data Analysis Steps__ 1. QC raw FASTQ files with fastQC and trimmomatic - OUTPUT1: trimmed FASTQ files 2. Filter human reads out of OUTPUT1 with the KneadData tool () - OUTPUT2: filtered FASTQ files 3. Classify OUTPUT2 with kraken2 using “Standard” database (Refeq archaea, bacteria, viral, plasmid, human, UniVec_Core) - OUTPUT3: taxonomic abundance profile - OUTPUT4: FASTQ files of unclassified reads - VISUALIZATION1: krakenreport to kronaplot 4. Attempt to classify OUTPUT4 with MetaPhlAn using “latest” database - OUTPUT5: taxonomic abundance profile of unclassified kraken2 reads 5. Classify OUTPUT2 with MetaPhlAn using “latest” database - OUTPUT6: final computed taxon abundances (listed one clade per line, tab-separated from the clade's relative abundance in percent) - format: https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-Workshop-on-Genomics-2023#13-metaphlan-output-files - used in the multi-sample workflow (https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-Workshop-on-Genomics-2023#15-analyzing-multiple-samples) 6. Use OUTPUT2 in Metagenome functional profiling/assignment with HUMAnN using “uniref : uniref90_diamond” database - database link: http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz - OUTPUT7: *_genefamilies.tsv, contains the abundances of each gene family in the community in reads per kilobase (RPK) units - OUTPUT8: *_pathabundance.tsv, lists the abundances of each pathway in the community, also in RPK units as described for gene families - OUTPUT9: normalized_genefamilies-cpm.tsv, contains the normalized abundances of each gene family in counts per million (CPM) units - OUTPUT10: rxn-cpm.tsv, regroup our CPM-normalized gene family abundance values to MetaCyc reaction (RXN) abundances - https://github.com/biobakery/MetaPhlAn/wiki/HUMAnN-Workshop-on-Genomics-2023#3-manipulating-humann-output-tables ### __References__ - McIver LJ, Abu-Ali G, Franzosa EA, Schwager R, Morgan XC, Waldron L, Segata N, Huttenhower C. bioBakery: a meta'omic analysis environment. Bioinformatics. 2018 Apr 1;34(7):1235-1237. PMID: 29194469 - Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999; 27(2):573–580. doi:10.1093/nar/27.2.573 - [**Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4.**](https://doi.org/10.1038/s41587-023-01688-w) Aitor Blanco-Miguez, Francesco Beghini, Fabio Cumbo, Lauren J. McIver, Kelsey N. Thompson, Moreno Zolfo, Paolo Manghi, Leonard Dubois, Kun D. Huang, Andrew Maltez Thomas, Gianmarco Piccinno, Elisa Piperni, Michal Punčochář, Mireia Valles-Colomer, Adrian Tett, Francesca Giordano, Richard Davies, Jonathan Wolf, Sarah E. Berry, Tim D. Spector, Eric A. Franzosa, Edoardo Pasolli, Francesco Asnicar, Curtis Huttenhower, Nicola Segata. Nature Biotechnology (2023) |
![]() Path: workflows/wgs-metagenomics-pe.cwl Branch/Commit ID: 93b844a80f4008cc973ea9b5efedaff32a343895 |
|
|
GSEApy - Gene Set Enrichment Analysis in Python
GSEAPY: Gene Set Enrichment Analysis in Python ============================================== Gene Set Enrichment Analysis is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). GSEA requires as input an expression dataset, which contains expression profiles for multiple samples. While the software supports multiple input file formats for these datasets, the tab-delimited GCT format is the most common. The first column of the GCT file contains feature identifiers (gene ids or symbols in the case of data derived from RNA-Seq experiments). The second column contains a description of the feature; this column is ignored by GSEA and may be filled with “NA”s. Subsequent columns contain the expression values for each feature, with one sample's expression value per column. It is important to note that there are no hard and fast rules regarding how a GCT file's expression values are derived. The important point is that they are comparable to one another across features within a sample and comparable to one another across samples. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA. |
![]() Path: workflows/gseapy.cwl Branch/Commit ID: 7ced5a5259dbd8b3fc64456beaeffd44f4a24081 |
|
|
DESeq - differential gene expression analysis
Differential gene expression analysis ===================================== Differential gene expression analysis based on the negative binomial distribution Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution. DESeq1 ------ High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. Simon Anders and Wolfgang Huber propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, [DESeq](http://bioconductor.org/packages/release/bioc/html/DESeq.html), as an R/Bioconductor package DESeq2 ------ In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. [DESeq2](http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html), a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. |
![]() Path: workflows/deseq.cwl Branch/Commit ID: c5bae2ca862c764911b83d1f15ff6af4e2a0db28 |
|
|
count-lines5-wf.cwl
|
![]() Path: tests/count-lines5-wf.cwl Branch/Commit ID: c7c97715b400ff2194aa29fc211d3401cea3a9bf |
|
|
cache_asnb_entries
|
![]() Path: task_types/tt_cache_asnb_entries.cwl Branch/Commit ID: ca75d68eb74c93b35b404ec7908dc5b260e16466 |
|
|
schemadef_types_with_import-wf.cwl
|
![]() Path: tests/schemadef_types_with_import-wf.cwl Branch/Commit ID: c7c97715b400ff2194aa29fc211d3401cea3a9bf |
|
|
revsort.cwl
Reverse the lines in a document, then sort those lines. |
![]() Path: cwltool/schemas/v1.0/v1.0/revsort.cwl Branch/Commit ID: 46b7f9766d1bc8a4871474eee25ec730b4e173da |
|
|
cache_asnb_entries
|
![]() Path: task_types/tt_cache_asnb_entries.cwl Branch/Commit ID: 9e43bc5cff985574e1f8135d4c50b5a347517c9e |
|
|
THOR - differential peak calling of ChIP-seq signals with replicates
What is THOR? -------------- THOR is an HMM-based approach to detect and analyze differential peaks in two sets of ChIP-seq data from distinct biological conditions with replicates. THOR performs genomic signal processing, peak calling and p-value calculation in an integrated framework. For more information please refer to: ------------------------------------- Allhoff, M., Sere K., Freitas, J., Zenke, M., Costa, I.G. (2016), Differential Peak Calling of ChIP-seq Signals with Replicates with THOR, Nucleic Acids Research, epub gkw680. |
![]() Path: workflows/rgt-thor.cwl Branch/Commit ID: 4f48ee6f8665a34cdf96e89c012ee807f80c7a3d |
|
|
scatter-wf1.cwl
|
![]() Path: tests/scatter-wf1.cwl Branch/Commit ID: c7c97715b400ff2194aa29fc211d3401cea3a9bf |