Workflow: EMG assembly for paired end Illumina

Fetched 2024-04-25 12:06:25 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
mapseq_ref File [FASTA]
ncRNA_models File[]
forward_reads File (Optional) [FASTQ]
reverse_reads File (Optional) [FASTQ]
unpaired_reads File (Optional) [FASTQ]
mapseq_taxonomy File
ncRNA_model_clans File
sequencing_run_id String
assembly_mem_limit Integer

in Gb

fraggenescan_model https://w3id.org/cwl/view/git/56dafa4dab5892c5afa35713563dddc78ec5a00d/tools/FragGeneScan-model.yaml#model

Steps

ID Runs Label Doc
assembly
../tools/metaspades.cwl (CommandLineTool)
metaSPAdes: de novo metagenomics assembler

https://arxiv.org/abs/1604.03071 http://cab.spbu.ru/files/release3.10.1/manual.html#meta

find_ncRNAs
extract_SSUs
../tools/esl-sfetch-manyseqs.cwl (CommandLineTool)
extract by names from an indexed sequence file

https://github.com/EddyRivasLab/easel

fraggenescan
../tools/FragGeneScan1_20.cwl (CommandLineTool)
FragGeneScan: find (fragmented) genes in short reads

FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.

FragGeneScan was first released through omics website (http://omics.informatics.indiana.edu/FragGeneScan/) in March 2010, where you can find its old releases. FragGeneScan migrated to SourceForge in October, 2013 (https://sourceforge.net/projects/fraggenescan/).

Version 1.20 can be downloaded here: https://sourceforge.net/projects/fraggenescan/files/

interproscan
../tools/InterProScan5.21-60.cwl (CommandLineTool)
InterProScan: protein sequence classifier

Version 5.21-60 can be downloaded here: https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload

Documentation on how to run InterProScan 5 can be found here: https://github.com/ebi-pf-team/interproscan/wiki/HowToRun

classify_SSUs
../tools/mapseq.cwl (CommandLineTool)
MAPseq

sequence read classification tools designed to assign taxonomy and OTU classifications to ribosomal RNA sequences. http://meringlab.org/software/mapseq/

get_SSU_coords
index_scaffolds
../tools/esl-sfetch-index.cwl (CommandLineTool)
index a sequence file for use by esl-sfetch

https://github.com/EddyRivasLab/easel

visualize_otu_counts
../tools/krona.cwl (CommandLineTool)
visualize using krona
discard_short_scaffolds
../tools/discard_short_seqs.cwl (CommandLineTool)
drop short seqs
convert_otu_counts_to_hdf5
../tools/biom-convert.cwl (CommandLineTool)
convert_otu_counts_to_json
../tools/biom-convert.cwl (CommandLineTool)
remove_asterisks_and_reformat
../tools/esl-reformat.cwl (CommandLineTool)
normalize to fasta

normalizes input sequeces to FASTA with fixed number of sequence characters per line using esl-reformat from https://github.com/EddyRivasLab/easel

convert_classifications_to_otu_counts
../tools/mapseq2biom.cwl (CommandLineTool)

Outputs

ID Type Label Doc
SSUs File
pCDS File
scaffolds File
annotations File
classifications File
otu_counts_hdf5 File
otu_counts_json File
otu_visualization File
Permalink: https://w3id.org/cwl/view/git/56dafa4dab5892c5afa35713563dddc78ec5a00d/workflows/emg-assembly.cwl