Explore Workflows

View already parsed workflows here or click here to add your own

Graph	Name	Retrieved From	View
	Bacterial Annotation, pass 4, blastp-based functional annotation (second pass)	https://github.com/ncbi/pgap.git Path: bacterial_annot/wf_bacterial_annot_pass4.cwl Branch/Commit ID: 041a234a935c7af7d3db95353ef80c61c88fc010
	Trim Galore RNA-Seq pipeline paired-end strand specific Modified original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) RNA-Seq basic analysis for a pair-end experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-end RNA-Seq data. It performs the following steps: 1. Trim adapters from input FASTQ files 2. Use STAR to align reads from input FASTQ files according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 3. Use fastx_quality_stats to analyze input FASTQ files and generate quality statistics files 4. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ files to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file	https://github.com/datirium/workflows.git Path: workflows/trim-rnaseq-pe-dutp.cwl Branch/Commit ID: 69643d8c15f5357a320aa7e2f6adb2e71302fd20
	env-wf3.cwl	https://github.com/common-workflow-language/cwl-v1.2.git Path: tests/env-wf3.cwl Branch/Commit ID: 31ec48a8d81ef7c1b2c5e9c0a19e7623efe4a1e2
	RNA-Seq pipeline single-read The original [BioWardrobe's](https://biowardrobe.com) [PubMed ID:26248465](https://www.ncbi.nlm.nih.gov/pubmed/26248465) RNA-Seq basic analysis for a single-read experiment. A corresponded input [FASTQ](http://maq.sourceforge.net/fastq.shtml) file has to be provided. Current workflow should be used only with the single-read RNA-Seq data. It performs the following steps: 1. Use STAR to align reads from input FASTQ file according to the predefined reference indices; generate unsorted BAM file and alignment statistics file 2. Use fastx_quality_stats to analyze input FASTQ file and generate quality statistics file 3. Use samtools sort to generate coordinate sorted BAM(+BAI) file pair from the unsorted BAM file obtained on the step 1 (after running STAR) 5. Generate BigWig file on the base of sorted BAM file 6. Map input FASTQ file to predefined rRNA reference indices using Bowtie to define the level of rRNA contamination; export resulted statistics to file 7. Calculate isoform expression level for the sorted BAM file and GTF/TAB annotation file using GEEP reads-counting utility; export results to file	https://github.com/datirium/workflows.git Path: workflows/rnaseq-se.cwl Branch/Commit ID: 5561f7ee11dd74848680351411a19aa87b13d27b
	sum-wf.cwl	https://github.com/common-workflow-language/cwltool.git Path: cwltool/schemas/v1.0/v1.0/sum-wf.cwl Branch/Commit ID: 7ec307b01442936fad9b1149f4500496557505ff
	Hello World Outputs a message using echo	https://github.com/common-workflow-language/cwltool.git Path: tests/wf/hello-workflow.cwl Branch/Commit ID: a8d8d00fd1e4274e1bc16001937db5aae46b0b0d
	exome alignment and germline variant detection, with optitype for HLA typing	https://github.com/genome/analysis-workflows.git Path: definitions/pipelines/germline_exome_hla_typing.cwl Branch/Commit ID: 22fce2dbdada0c4135b6f0677f78535cf980cb07
	revsort.cwl Reverse the lines in a document, then sort those lines.	https://github.com/common-workflow-language/cwltool.git Path: cwltool/schemas/v1.0/v1.0/revsort.cwl Branch/Commit ID: aaaece1c097c3f06afa21f7ecddcc85519e2bb2b
	tt_blastn_wnode	https://github.com/ncbi/pgap.git Path: task_types/tt_blastn_wnode.cwl Branch/Commit ID: a7fced3ed8c839272c8f3a8db9da7bc8cd50271f
	miRNA-Seq miRDeep2 pipeline A CWL workflow for discovering known or novel miRNAs from deep sequencing data using the miRDeep2 tool. The ExoCarta exosome database is also used for identifying exosome-related miRNAs, and TargetScan's organism-specific databases are used for identifying miRNA gene targets. ## __Outputs__ #### Primary Output files: - mirs_known.tsv, detected known mature miRNAs, \"Known miRNAs\" tab - mirs_novel.tsv, detected novel mature miRNAs, \"Novel miRNAs\" tab #### Secondary Output files: - mirs_known_exocarta_deepmirs.tsv, list of detected miRNA also in ExoCarta's exosome database, \"Detected Exosome miRNAs\" tab - mirs_known_gene_targets.tsv, pre-computed gene targets of known mature mirs, downloadable - known_mirs_mature.fa, known mature mir sequences, downloadable - known_mirs_precursor.fa, known precursor mir sequences, downloadable - novel_mirs_mature.fa, novel mature mir sequences, downloadable - novel_mirs_precursor.fa, novel precursor mir sequences, downloadable #### Reports: - overview.md (input list, alignment & mir metrics), \"Overview\" tab - mirdeep2_result.html, summary of mirdeep2 results, \"miRDeep2 Results\" tab ## __Inputs__ #### General Info - Sample short name/Alias: unique name for sample - Experimental condition: condition, variable, etc name (e.g. \"control\" or \"20C 60min\") - Cells: name of cells used for the sample - Catalog No.: vender catalog number if available - Bowtie2 index: Bowtie2 index directory of the reference genome. - Reference Genome FASTA: Reference genome FASTA file to be used for alignment. - Genome short name: Name used for setting organism name, genus, species, and tax ID. - Input FASTQ file: FASTQ file from a single-end miRNA sequencing run. #### Advanced - Adapter: Adapter sequence to be trimmed from miRNA sequence reads. (Default: TCGTAT) - Threads: Number of threads to use for steps that support multithreading (Default: 4). ## Hints & Tips: #### For the identification of novel miRNA candidates, the following may be used as a filtering guideline: 1. miRDeep score > 4 (some authors use 1) 2. not present a match with rfam 3. should present a significant RNAfold (\"yes\") 4. a number of mature reads > 10 5. if applicable, novel mir must be expressed in multiple samples #### For filtering mirbase by organism. \| genome \| organism \| division \| name \| tree \| NCBI-taxid \| \| ---- \| --- \| --- \| ----------- \| ----------- \| ----------- \| \| hg19 \| hsa \| HSA \| Homo sapiens \| Metazoa;Bilateria;Deuterostoma;Chordata;Vertebrata;Mammalia;Primates;Hominidae \| 9606 \| \| hg38 \| hsa \| HSA \| Homo sapiens \| Metazoa;Bilateria;Deuterostoma;Chordata;Vertebrata;Mammalia;Primates;Hominidae \| 9606 \| \| mm10 \| mmu \| MMU \| Mus musculus \| Metazoa;Bilateria;Deuterostoma;Chordata;Vertebrata;Mammalia;Rodentia \| 10090 \| \| rn7 \| rno \| RNO \| Rattus norvegicus \| Metazoa;Bilateria;Deuterostoma;Chordata;Vertebrata;Mammalia;Rodentia \| 10116 \| \| dm3 \| dme \| DME \| Drosophila melanogaster \| Metazoa;Bilateria;Ecdysozoa;Arthropoda;Hexapoda \| 7227 \| ## __Data Analysis Steps__ 1. The miRDeep2 Mapper module processes Illumina FASTQ output and maps it to the reference genome. 2. The miRDeep2 miRDeep2 module identifies known and novel (mature and precursor) miRNAs. 3. The ExoCarta database of miRNA found in exosomes is then used to find overlap between mirs_known.tsv and exosome associated miRNAs. 4. Finally, TargetScan organism-specific miRNA gene target database is used to find overlap between mirs_known.tsv and gene targets. ## __References__ 1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245920 2. https://github.com/rajewsky-lab/mirdeep2 3. https://biocontainers.pro/tools/mirdeep2 4. https://www.mirbase.org/ 5. http://exocarta.org/index.html 6. https://www.targetscan.org/vert_80/	https://github.com/datirium/workflows.git Path: workflows/mirna-mirdeep2-se.cwl Branch/Commit ID: 7030da528559c7106d156284e50ff0ecedab0c4e