Explore Workflows
View already parsed workflows here or click here to add your own
Graph | Name | Retrieved From | View |
---|---|---|---|
|
advanced-header.cwl
|
![]() Path: metadata/advanced-header.cwl Branch/Commit ID: 3c11de851cdc030ef50ba795e7a2ecd957a69007 |
|
|
Generate genome indices for STAR & bowtie
Creates indices for: * [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) * [bowtie](http://bowtie-bio.sourceforge.net/tutorial.shtml) v1.2.0 (12/30/2016) It performs the following steps: 1. `STAR --runMode genomeGenerate` to generate indices, based on [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) and [GTF](http://mblab.wustl.edu/GTF2.html) input files, returns results as an array of files 2. Outputs indices as [Direcotry](http://www.commonwl.org/v1.0/CommandLineTool.html#Directory) data type 3. Separates *chrNameLength.txt* file from Directory output 4. `bowtie-build` to generate indices requires genome [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) file as input, returns results as a group of main and secondary files |
![]() Path: workflows/genome-indices.cwl Branch/Commit ID: e238d1756f1db35571e84d72e1699e5d1540f10c |
|
|
mutect parallel workflow
|
![]() Path: definitions/subworkflows/mutect.cwl Branch/Commit ID: e59c77629936fad069007ba642cad49fef7ad29f |
|
|
timelimit-wf.cwl
|
![]() Path: tests/timelimit-wf.cwl Branch/Commit ID: ea9f8634e41824ac3f81c3dde698d5f0eef54f1b |
|
|
wgs alignment with qc
|
![]() Path: definitions/pipelines/wgs_alignment.cwl Branch/Commit ID: 735be84cdea041fcc8bd8cbe5728b29ca3586a21 |
|
|
iwdr_with_nested_dirs.cwl
|
![]() Path: cwltool/schemas/v1.0/v1.0/iwdr_with_nested_dirs.cwl Branch/Commit ID: cd779a90a4336563dcf13795111f502372c6af83 |
|
|
Bismark Methylation - pipeline for BS-Seq data analysis
Sequence reads are first cleaned from adapters and transformed into fully bisulfite-converted forward (C->T) and reverse read (G->A conversion of the forward strand) versions, before they are aligned to similarly converted versions of the genome (also C->T and G->A converted). Sequence reads that produce a unique best alignment from the four alignment processes against the bisulfite genomes (which are running in parallel) are then compared to the normal genomic sequence and the methylation state of all cytosine positions in the read is inferred. A read is considered to align uniquely if an alignment has a unique best alignment score (as reported by the AS:i field). If a read produces several alignments with the same number of mismatches or with the same alignment score (AS:i field), a read (or a read-pair) is discarded altogether. On the next step we extract the methylation call for every single C analysed. The position of every single C will be written out to a new output file, depending on its context (CpG, CHG or CHH), whereby methylated Cs will be labelled as forward reads (+), non-methylated Cs as reverse reads (-). The output of the methylation extractor is then transformed into a bedGraph and coverage file. The bedGraph counts output is then used to generate a genome-wide cytosine report which reports the number on every single CpG (optionally every single cytosine) in the genome, irrespective of whether it was covered by any reads or not. As this type of report is informative for cytosines on both strands the output may be fairly large (~46mn CpG positions or >1.2bn total cytosine positions in the human genome). |
![]() Path: workflows/bismark-methylation-se.cwl Branch/Commit ID: 4f48ee6f8665a34cdf96e89c012ee807f80c7a3d |
|
|
scatter-valuefrom-wf3.cwl#main
|
![]() Path: tests/scatter-valuefrom-wf3.cwl Branch/Commit ID: ea9f8634e41824ac3f81c3dde698d5f0eef54f1b Packed ID: main |
|
|
exome alignment and somatic variant detection for cle purpose
|
![]() Path: definitions/pipelines/cle_somatic_exome.cwl Branch/Commit ID: aba52e94b6d7470132d3c092c26d67e29d615300 |
|
|
WGS Metagenomic pipeline paired-end
This workflow taxonomically classifies paired-end sequencing reads in FASTQ format for a SINGLE sample. Reads are first adapter trimmed with trimgalore and filtered using kneaddata with a bmtagger database. The resulting cleaned reads are classified using Kraken2 and a user-selected pre-built database from a list of [genomic index files](https://benlangmead.github.io/aws-indexes/k2). Unaligned reads are then classified using metaphlan4 with the mpa_vJan21_CHOCOPhlAnSGB_202103 database. The kraken2 report is used to generate a krona plot visualization of the abundance profile. Cleaned reads are also run through HUMANN3 using the uniref90 diamond databaseto produce a gene abundance report and metabolic pathway file. The latter is used for abundance coverage and functional assignment. ### __Inputs__ Kraken2 database for taxonomic classification: - Standard is recommended Read 1 file: - FASTA/Q input R1 from a paired end library Read 2 file: - FASTA/Q input R2 from a paired end library Number of threads for steps that support multithreading: - Number of threads for steps that support multithreading - default set to `4` Advanced Inputs Tab (Optional): - Number of bases to clip from the 3p end - Number of bases to clip from the 5p end ### __Outputs__ - kraken2 report (abundance profile) - krona plot (hierarchical visualization of taxonomic classifications) - various log files - metabolic pathway file - functional assignment ### __Data Analysis Steps__ 1. QC raw FASTQ files with fastQC and trimmomatic - OUTPUT1: trimmed FASTQ files 2. Filter human reads out of OUTPUT1 with the KneadData tool () - OUTPUT2: filtered FASTQ files 3. Classify OUTPUT2 with kraken2 using “Standard” database (Refeq archaea, bacteria, viral, plasmid, human, UniVec_Core) - OUTPUT3: taxonomic abundance profile - OUTPUT4: FASTQ files of unclassified reads - VISUALIZATION1: krakenreport to kronaplot 4. Attempt to classify OUTPUT4 with MetaPhlAn using “latest” database - OUTPUT5: taxonomic abundance profile of unclassified kraken2 reads 5. Classify OUTPUT2 with MetaPhlAn using “latest” database - OUTPUT6: final computed taxon abundances (listed one clade per line, tab-separated from the clade's relative abundance in percent) - format: https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-Workshop-on-Genomics-2023#13-metaphlan-output-files - used in the multi-sample workflow (https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-Workshop-on-Genomics-2023#15-analyzing-multiple-samples) 6. Use OUTPUT2 in Metagenome functional profiling/assignment with HUMAnN using “uniref : uniref90_diamond” database - database link: http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz - OUTPUT7: *_genefamilies.tsv, contains the abundances of each gene family in the community in reads per kilobase (RPK) units - OUTPUT8: *_pathabundance.tsv, lists the abundances of each pathway in the community, also in RPK units as described for gene families - OUTPUT9: normalized_genefamilies-cpm.tsv, contains the normalized abundances of each gene family in counts per million (CPM) units - OUTPUT10: rxn-cpm.tsv, regroup our CPM-normalized gene family abundance values to MetaCyc reaction (RXN) abundances - https://github.com/biobakery/MetaPhlAn/wiki/HUMAnN-Workshop-on-Genomics-2023#3-manipulating-humann-output-tables ### __References__ - McIver LJ, Abu-Ali G, Franzosa EA, Schwager R, Morgan XC, Waldron L, Segata N, Huttenhower C. bioBakery: a meta'omic analysis environment. Bioinformatics. 2018 Apr 1;34(7):1235-1237. PMID: 29194469 - Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999; 27(2):573–580. doi:10.1093/nar/27.2.573 - [**Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4.**](https://doi.org/10.1038/s41587-023-01688-w) Aitor Blanco-Miguez, Francesco Beghini, Fabio Cumbo, Lauren J. McIver, Kelsey N. Thompson, Moreno Zolfo, Paolo Manghi, Leonard Dubois, Kun D. Huang, Andrew Maltez Thomas, Gianmarco Piccinno, Elisa Piperni, Michal Punčochář, Mireia Valles-Colomer, Adrian Tett, Francesca Giordano, Richard Davies, Jonathan Wolf, Sarah E. Berry, Tim D. Spector, Eric A. Franzosa, Edoardo Pasolli, Francesco Asnicar, Curtis Huttenhower, Nicola Segata. Nature Biotechnology (2023) |
![]() Path: workflows/wgs-metagenomics-pe.cwl Branch/Commit ID: 93b844a80f4008cc973ea9b5efedaff32a343895 |