Explore Workflows

View already parsed workflows here or click here to add your own

Graph	Name	Retrieved From	View
	Kraken2 Metagenomic pipeline paired-end This workflow taxonomically classifies paired-end sequencing reads in FASTQ format, that have been optionally adapter trimmed with trimgalore, using Kraken2 and a user-selected pre-built database from a list of [genomic index files](https://benlangmead.github.io/aws-indexes/k2). ### __Inputs__ Kraken2 database for taxonomic classification: - [Viral (0.5 GB)](https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20221209.tar.gz), all refseq viral genomes - [MinusB (8.7 GB)](https://genome-idx.s3.amazonaws.com/kraken/k2_minusb_20221209.tar.gz), standard minus bacteria (archaea, viral, plasmid, human1, UniVec_Core) - [PlusPFP-16 (15.0 GB)](https://genome-idx.s3.amazonaws.com/kraken/k2_pluspfp_16gb_20221209.tar.gz), standard (archaea, bacteria, viral, plasmid, human1, UniVec_Core) + (protozoa, fungi & plant) capped at 16 GB (shrunk via random kmer downselect) - [EuPathDB46 (34.1 GB)](https://genome-idx.s3.amazonaws.com/kraken/k2_eupathdb48_20201113.tar.gz), eukaryotic pathogen genomes with contaminants removed (https://veupathdb.org/veupathdb/app) - [16S_gg_13_5 (73 MB)](https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz), Greengenes 16S rRNA database ([release 13.5](https://greengenes.secondgenome.com/?prefix=downloads/greengenes_database/gg_13_5/), 20200326)\n - [16S_silva_138 (112 MB)](https://genome-idx.s3.amazonaws.com/kraken/16S_Silva138_20200326.tgz), SILVA 16S rRNA database ([release 138.1](https://www.arb-silva.de/documentation/release-1381/), 20200827) Read 1 file: - FASTA/Q input R1 from a paired end library Read 2 file: - FASTA/Q input R2 from a paired end library Number of threads for steps that support multithreading: - Number of threads for steps that support multithreading - default set to `4` Advanced Inputs Tab (Optional): - Number of bases to clip from the 3p end - Number of bases to clip from the 5p end ### __Outputs__ - k2db, an upstream database used by kraken2 classifier ### __Data Analysis Steps__ 1. Trimming the adapters with TrimGalore. - This step is particularly important when the reads are long and the fragments are short - resulting in sequencing adapters at the ends of reads. If adapter is not removed the read will not map. TrimGalore can recognize standard adapters, such as Illumina or Nextera/Tn5 adapters. 2. Generate quality control statistics of trimmed, unmapped sequence data 3. (Optional) Clipping of 5' and/or 3' end by the specified number of bases. 4. Mapping reads to primary genome index with Bowtie. ### __References__ - Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0	https://github.com/datirium/workflows.git Path: workflows/kraken2-classify-pe.cwl Branch/Commit ID: 22880e0f41d0420a17d643e8a6e8ee18165bbfbf
	cmsearch-multimodel.cwl	https://github.com/proteinswebteam/ebi-metagenomics-cwl.git Path: workflows/cmsearch-multimodel.cwl Branch/Commit ID: 5e8217435bcdd597b2ad236f3e847d13d4c21824
	taxonomy_check_16S	https://github.com/ncbi/pgap.git Path: task_types/tt_taxonomy_check_16S.cwl Branch/Commit ID: 7b5130d2408bce82ee15c666b37d931ef6f452e3
	Tumor-Only Detect Variants workflow	https://github.com/genome/analysis-workflows.git Path: definitions/pipelines/tumor_only_detect_variants.cwl Branch/Commit ID: 9143dc4ebacb9e1df36a712b0be6fa5d982b0c4f
	kfdrc_bwamem_subwf.cwl	https://github.com/kids-first/kf-alignment-workflow.git Path: dev/ultra-opt/kfdrc_bwamem_subwf.cwl Branch/Commit ID: e75f0c96153a484db1f882f6ff2a9764519a3179
	Vcf concordance evaluation workflow	https://github.com/genome/analysis-workflows.git Path: definitions/subworkflows/vcf_eval_concordance.cwl Branch/Commit ID: 4aba7c6591c2f1ebd827a36d325a58738c429bea
	Per-chromosome pindel	https://github.com/genome/analysis-workflows.git Path: definitions/subworkflows/pindel_cat.cwl Branch/Commit ID: ddb49a0951d9ad537269d7db3fe8f904495a8bf4
	process VCF workflow	https://github.com/genome/analysis-workflows.git Path: definitions/subworkflows/strelka_process_vcf.cwl Branch/Commit ID: ddb49a0951d9ad537269d7db3fe8f904495a8bf4
	varscan somatic workflow	https://github.com/genome/analysis-workflows.git Path: definitions/subworkflows/varscan.cwl Branch/Commit ID: ddb49a0951d9ad537269d7db3fe8f904495a8bf4
	bgzip and index VCF	https://github.com/genome/analysis-workflows.git Path: definitions/subworkflows/bgzip_and_index.cwl Branch/Commit ID: ddb49a0951d9ad537269d7db3fe8f904495a8bf4