Workflow: EMG pipeline v3.0 (single end version)

Fetched 2024-04-19 09:41:42 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
reads File [FASTQ]
run_id String
5S_model File [HMMER format]
16S_model File [HMMER format]
23S_model File [HMMER format]
tRNA_model File [HMMER format]
go_summary_config File
fraggenescan_model https://w3id.org/cwl/view/git/7bb76f33bf40b5cd2604001cac46f967a209c47f/tools/FragGeneScan-model.yaml#model

Steps

ID Runs Label Doc
ipr_stats
../tools/ipr_stats.cwl (CommandLineTool)
gather stats from InterProScan
orf_stats
../tools/orf_stats.cwl (CommandLineTool)
gather stats from ORF caller
divide_faa
../tools/faselector.cwl (CommandLineTool)
divide_ffn
../tools/faselector.cwl (CommandLineTool)
count_reads
../tools/count_fastq.cwl (CommandLineTool)
ORF_prediction
orf_prediction.cwl (Workflow)
Find reads with predicted coding sequences above 60 AA in length
categorisation
../tools/create_categorisations.cwl (CommandLineTool)
categorise sequences
sequence_stats
../tools/qc-stats.cwl (CommandLineTool)
Post QC-ed input analysis of sequence file
generate_summary
../tools/summary.cwl (CommandLineTool)
gather stats from InterProScan
count_masked_reads
../tools/count_fasta.cwl (CommandLineTool)
find_SSUs_and_mask
rna-selector.cwl (Workflow)
RNASelector as a CWL workflow

https://doi.org/10.1007/s12275-011-1213-z

clean_fasta_headers
../tools/clean_fasta_headers.cwl (CommandLineTool)
replace problem characters from FASTA headers with dashes
functional_analysis functional analysis prediction with InterProScan
generate_ipr_summary
../tools/write_ipr_summary.cwl (CommandLineTool)
gather stats from InterProScan
trim_quality_control
../tools/trimmomatic.cwl (CommandLineTool)

Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application. There are two major modes of the program: Paired end mode and Single end mode. The paired end mode will maintain correspondence of read pairs and also use the additional information contained in paired reads to better find adapter or PCR primer fragments introduced by the library preparation process. Trimmomatic works with FASTQ files (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used).

count_processed_reads
../tools/count_fasta.cwl (CommandLineTool)
16S_taxonomic_analysis Functional analyis of sequences that match the 16S SSU
extract_iprscan_coords
../tools/extract_sig_coords.cwl (CommandLineTool)
relabel_annotated_cds_aa_seqs
../tools/map_fa_headers.cwl (CommandLineTool)
convert_trimmed-reads_to_fasta
../tools/fastq_to_fasta.cwl (CommandLineTool)
relabel_annotated_cds_nuc_seqs
../tools/map_fa_headers.cwl (CommandLineTool)

Outputs

ID Type Label Doc
tree File
summary File
biom_tsv File
biom_hdf5 File
biom_json File
ipr_reads File
pCDS_seqs File
5S_matches File [FASTA]
go_summary File
16S_matches File [FASTA]
23S_matches File [FASTA]
ipr_summary File
krona_input File
qc_stats_gc File
tRNA_matches File [FASTA]
actual_run_id String
post_qc_reads File
kingdom_counts File
post_sequences File
go_summary_slim File
ipr_match_count Integer
qc_stats_gc_bin File
annotated_CDS_aa File
predicted_CDS_aa File
qc_stats_seq_len File
qc_stats_summary File
annotated_CDS_nuc File
no_functions_seqs File
otu_table_summary File
otu_visualization File
qc_stats_gc_pcbin File
qc_stats_nuc_dist File
post_qc_read_count Integer
unannotated_CDS_aa File
processed_sequences File [FASTA]
unannotated_CDS_nuc File
interproscan_matches File
qc_stats_seq_len_bin File
functional_annotations File
qc_stats_seq_len_pbcbin File
qiime_assigned_taxonomy File
ipr_CDS_with_match_count Integer
ipr_reads_with_match_count Integer
qiime_sequences-filtered_otus File
qiime_sequences-filtered_clusters File
Permalink: https://w3id.org/cwl/view/git/7bb76f33bf40b5cd2604001cac46f967a209c47f/workflows/emg-pipeline-v3.cwl