Workflow: rRNA annotation workflow with scatter processing
\"This workflow performs rRNA annotation processing for multiple index files using scatter. It executes 4 processes: makeblastdb, blastn alignment, filtering, and rRNA removal for each rRNA index file. related CWL file: ./Tools/09_makeblastdb_rRNA.cwl ./Tools/10_blastn_rRNA_alignment.cwl ./Tools/10_blastn_rRNA_filter1.cwl ./Tools/10_blastn_rRNA_filter2.cwl ./Tools/10_blastn_rRNA_filter3.cwl\"
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
EVALUE | Float | evalue |
E-value threshod of BLASTN search |
THREADS | Integer | threads |
number of threads to use |
MAX_TARGET_SEQS | Integer | max target seqs |
number of annotaion output for each DNA sequences, sometimes annotaiton output will be more than 1 even though the setting is 1 |
INPUT_FASTA_FILE | File | input fasta file (nucleotide sequence generated by prodigal process) |
predicted protein coding sequences produced by Prodigal process |
OUTPUT_FILE_NAME1 | String | output file name |
text file of annotaion information of contaminating rRNA |
OUTPUT_FILE_NAME2 | String | output file name |
text file of annotaion information of contaminating rRNA |
BLASTN_rRNA_FASTA_FILE1 | File | SILVA_138.1_LSUParc_tax_silva |
\"rRNA file for SILVA_138.1_LSUParc_tax_silva You must obtain the file in advance from the following link. https://ftp.arb-silva.de/release_138.1/Exports/SILVA_138.1_LSUParc_tax_silva.fasta.gz\" |
BLASTN_rRNA_FASTA_FILE2 | File | SILVA_138.1_SSUParc_tax_silva |
\"rRNA file for SILVA_138.1_SSUParc_tax_silva You must obtain the file in advance from the following link. https://ftp.arb-silva.de/release_138.1/Exports/SILVA_138.1_SSUParc_tax_silva.fasta.gz\" |
BLASTN_rRNA_INDEX_DIR_NAME1 | String | SILVA_138.1_LSUParc_tax_silva (directory name) |
\"rRNA index directory name for SILVA_138.1_LSUParc_tax_silva\" |
BLASTN_rRNA_INDEX_DIR_NAME2 | String | SILVA_138.1_SSUParc_tax_silva (directory name) |
\"rRNA index directory name for SILVA_138.1_SSUParc_tax_silva\" |
PRODIGAL_RESULT_PROTEIN_FASTA_FILE | File | prodigal result protein fasta file |
predicted protein sequences of Prodigal output |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
BLASTN_rRNA_FILTER1 |
../Tools/10_blastn_rRNA_filter1.cwl
(CommandLineTool)
|
blastn result file filter |
\"This tool is used to filter blastn result. BLASTN result text file contains annotation of rRNA. Sometimes, more than one rRNA are annotated to one query sequence, it should be fixed for gft file production. Also, later process needs rRNA annotated predicted coding sequences list. original script: scripts/07_annotation_modified.sh original command1: cat ${f}_*.txt | awk '!x[$1]++' > ${f}_rRNAlist.txt original command2: cut -f1 ${f}_rRNAlist.txt | sort > ${f}_rRNA_toplist.txt\" |
BLASTN_rRNA_FILTER2 |
../Tools/10_blastn_rRNA_filter2.cwl
(CommandLineTool)
|
blastn result file filter |
\"This tool is used to filter blastn result. original script: scripts/07_annotation_modified.sh original command: seqkit grep -v -f ${f}_rRNA_toplist.txt ${f}.faa > ${f}-rRNA.faa\" |
MAKEBLASTDB_SILVA_138.1_LSUParc_tax_silva |
../Tools/09_makeblastdb_rRNA.cwl
(CommandLineTool)
|
makeblastdb command for rRNA database creation |
\"This tool is used to create a blast database from a fasta file.\" |
MAKEBLASTDB_SILVA_138.1_SSUParc_tax_silva |
../Tools/09_makeblastdb_rRNA.cwl
(CommandLineTool)
|
makeblastdb command for rRNA database creation |
\"This tool is used to create a blast database from a fasta file.\" |
BLASTN_rRNA_alignment_silva_138.1_LSUParc_tax_silva |
../Tools/10_blastn_rRNA_alignment.cwl
(CommandLineTool)
|
blastn command for rRNA database creation |
\"This tool is used to execute blastn process. original script: scripts/07_annotation_modified.sh original command: blastn -num_threads ${threads} -db ${db}/${rrna} -query ${f}.fna -out ${f}_${rrna}.txt -outfmt \"6 qseqid sseqid stitle evalue\" -max_target_seqs 1 -evalue 0.1\" |
BLASTN_rRNA_alignment_silva_138.1_SSUParc_tax_silva |
../Tools/10_blastn_rRNA_alignment.cwl
(CommandLineTool)
|
blastn command for rRNA database creation |
\"This tool is used to execute blastn process. original script: scripts/07_annotation_modified.sh original command: blastn -num_threads ${threads} -db ${db}/${rrna} -query ${f}.fna -out ${f}_${rrna}.txt -outfmt \"6 qseqid sseqid stitle evalue\" -max_target_seqs 1 -evalue 0.1\" |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
BLASTN_rRNA_concat_file | File | blastn result file |
blastn result file |
FILTERED_rRNA_PROTEIN_FASTA_FILE | File | filtered rRNA protein fasta file |
filtered rRNA protein fasta file |
https://w3id.org/cwl/view/git/1838569c1d6d3c15f58c254667d4c6258e67e5a6/Workflow/blastn_rRNA_ssw.cwl