Workflow: rRNA annotation workflow with scatter processing

Fetched 2025-10-23 12:39:24 GMT

\"This workflow performs rRNA annotation processing for multiple index files using scatter. It executes 4 processes: makeblastdb, blastn alignment, filtering, and rRNA removal for each rRNA index file. related CWL file: ./Tools/09_makeblastdb_rRNA.cwl ./Tools/10_blastn_rRNA_alignment.cwl ./Tools/10_blastn_rRNA_filter1.cwl ./Tools/10_blastn_rRNA_filter2.cwl ./Tools/10_blastn_rRNA_filter3.cwl\"

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
EVALUE Float evalue

E-value threshod of BLASTN search

THREADS Integer threads

number of threads to use

MAX_TARGET_SEQS Integer max target seqs

number of annotaion output for each DNA sequences, sometimes annotaiton output will be more than 1 even though the setting is 1

INPUT_FASTA_FILE File input fasta file (nucleotide sequence generated by prodigal process)

predicted protein coding sequences produced by Prodigal process

OUTPUT_FILE_NAME1 String output file name

text file of annotaion information of contaminating rRNA

OUTPUT_FILE_NAME2 String output file name

text file of annotaion information of contaminating rRNA

BLASTN_rRNA_FASTA_FILE1 File SILVA_138.1_LSUParc_tax_silva

\"rRNA file for SILVA_138.1_LSUParc_tax_silva You must obtain the file in advance from the following link. https://ftp.arb-silva.de/release_138.1/Exports/SILVA_138.1_LSUParc_tax_silva.fasta.gz\"

BLASTN_rRNA_FASTA_FILE2 File SILVA_138.1_SSUParc_tax_silva

\"rRNA file for SILVA_138.1_SSUParc_tax_silva You must obtain the file in advance from the following link. https://ftp.arb-silva.de/release_138.1/Exports/SILVA_138.1_SSUParc_tax_silva.fasta.gz\"

BLASTN_rRNA_INDEX_DIR_NAME1 String SILVA_138.1_LSUParc_tax_silva (directory name)

\"rRNA index directory name for SILVA_138.1_LSUParc_tax_silva\"

BLASTN_rRNA_INDEX_DIR_NAME2 String SILVA_138.1_SSUParc_tax_silva (directory name)

\"rRNA index directory name for SILVA_138.1_SSUParc_tax_silva\"

PRODIGAL_RESULT_PROTEIN_FASTA_FILE File prodigal result protein fasta file

predicted protein sequences of Prodigal output

Steps

ID Runs Label Doc
BLASTN_rRNA_FILTER1
../Tools/10_blastn_rRNA_filter1.cwl (CommandLineTool)
blastn result file filter

\"This tool is used to filter blastn result. BLASTN result text file contains annotation of rRNA. Sometimes, more than one rRNA are annotated to one query sequence, it should be fixed for gft file production. Also, later process needs rRNA annotated predicted coding sequences list. original script: scripts/07_annotation_modified.sh original command1: cat ${f}_*.txt | awk '!x[$1]++' > ${f}_rRNAlist.txt original command2: cut -f1 ${f}_rRNAlist.txt | sort > ${f}_rRNA_toplist.txt\"

BLASTN_rRNA_FILTER2
../Tools/10_blastn_rRNA_filter2.cwl (CommandLineTool)
blastn result file filter

\"This tool is used to filter blastn result. original script: scripts/07_annotation_modified.sh original command: seqkit grep -v -f ${f}_rRNA_toplist.txt ${f}.faa > ${f}-rRNA.faa\"

MAKEBLASTDB_SILVA_138.1_LSUParc_tax_silva
../Tools/09_makeblastdb_rRNA.cwl (CommandLineTool)
makeblastdb command for rRNA database creation

\"This tool is used to create a blast database from a fasta file.\"

MAKEBLASTDB_SILVA_138.1_SSUParc_tax_silva
../Tools/09_makeblastdb_rRNA.cwl (CommandLineTool)
makeblastdb command for rRNA database creation

\"This tool is used to create a blast database from a fasta file.\"

BLASTN_rRNA_alignment_silva_138.1_LSUParc_tax_silva
../Tools/10_blastn_rRNA_alignment.cwl (CommandLineTool)
blastn command for rRNA database creation

\"This tool is used to execute blastn process. original script: scripts/07_annotation_modified.sh original command: blastn -num_threads ${threads} -db ${db}/${rrna} -query ${f}.fna -out ${f}_${rrna}.txt -outfmt \"6 qseqid sseqid stitle evalue\" -max_target_seqs 1 -evalue 0.1\"

BLASTN_rRNA_alignment_silva_138.1_SSUParc_tax_silva
../Tools/10_blastn_rRNA_alignment.cwl (CommandLineTool)
blastn command for rRNA database creation

\"This tool is used to execute blastn process. original script: scripts/07_annotation_modified.sh original command: blastn -num_threads ${threads} -db ${db}/${rrna} -query ${f}.fna -out ${f}_${rrna}.txt -outfmt \"6 qseqid sseqid stitle evalue\" -max_target_seqs 1 -evalue 0.1\"

Outputs

ID Type Label Doc
BLASTN_rRNA_concat_file File blastn result file

blastn result file

FILTERED_rRNA_PROTEIN_FASTA_FILE File filtered rRNA protein fasta file

filtered rRNA protein fasta file

Permalink: https://w3id.org/cwl/view/git/1838569c1d6d3c15f58c254667d4c6258e67e5a6/Workflow/blastn_rRNA_ssw.cwl