Workflow: rna_prediction-sub-wf.cwl

Fetched 2024-05-20 20:30:11 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
type String
pattern_5S String
pattern_LSU String
pattern_SSU String
pattern_5.8S String
silva_lsu_otus String
silva_ssu_otus String
input_sequences File
silva_lsu_database File
silva_lsu_taxonomy String
silva_ssu_database File
silva_ssu_taxonomy String
ncRNA_ribosomal_models String[]
ncRNA_ribosomal_model_clans String

Steps

ID Runs Label Doc
gzip_files
../../utils/pigz/gzip.cwl (CommandLineTool)
index_reads
../../tools/RNA_prediction/easel/esl-sfetch-index.cwl (CommandLineTool)
index a sequence file for use by esl-sfetch

https://github.com/EddyRivasLab/easel

classify_LSUs Run taxonomic classification, create OTU table and krona visualisation
classify_SSUs Run taxonomic classification, create OTU table and krona visualisation
extract_coords
../../tools/RNA_prediction/extract-coords/extract-coords_awk.cwl (CommandLineTool)

The awk script takes the output of Infernal's cmsearch so-called fmt=1 mode and makes it suitable for use by esl-sfetch, a sequence selector

Reading the user's guide for Infernal, Version 1.1.2; July 2016 http://eddylab.org/infernal/Userguide.pdf#page=60 we see that the relevant fields in the cmsearch output are: (column number: explanation) 1: The name of the target sequence or profile 3: The name of the query sequence or profile 8: The start of the alignment of this hit with respect to the sequence, numbered 1..L for a sequence of L residues. 9: The end of the alignment of this hit with respect to the sequence, numbered 1..L for a sequence of L residues

Likewise the format esl-sfetch wants is: <newname> <from> <to> <source seqname>

Putting it all together we see that the newname (which esl-sfetch with output using) is a concatenation of the original name, the sequence number, and the coordinates.

count_lsu_fasta
../../utils/count_fasta.cwl (CommandLineTool)
count_ssu_fasta
../../utils/count_fasta.cwl (CommandLineTool)
extract_subunits
../../tools/RNA_prediction/get_subunits_fasta/get_subunits.cwl (CommandLineTool)
extract_sequences
../../tools/RNA_prediction/easel/esl-sfetch-manyseqs.cwl (CommandLineTool)
extract by names from an indexed sequence file

https://github.com/EddyRivasLab/easel

find_ribosomal_ncRNAs Identifies non-coding RNAs using Rfams covariance models
extract_subunits_coords
../../tools/RNA_prediction/get_subunits_coords/get_subunits_coords.cwl (CommandLineTool)

Outputs

ID Type Label Doc
ncRNA File
LSU_fasta File (Optional)
SSU_fasta File (Optional)
LSU_coords File
LSU_folder Directory (Optional)
SSU_coords File
SSU_folder Directory (Optional)
LSU-SSU-count File
cmsearch_result File
compressed_rnas File[]
number_LSU_mapseq Integer
number_SSU_mapseq Integer
Permalink: https://w3id.org/cwl/view/git/a83ee883bb3c7480010fa952939fac771491ddf4/workflows/subworkflows/rna_prediction-sub-wf.cwl