Workflow: Runs InterProScan on batches of sequences to retrieve functional annotations.

Fetched 2021-05-15 10:14:28 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
inputFile File [FASTA] Input file path

Optional, path to fasta file that should be loaded on Master startup. Alternatively, in CONVERT mode, the InterProScan 5 XML file to convert.

databases Directory
chunk_size Integer (Optional)
disableResidueAnnotation Boolean (Optional) Disables residue annotation

Optional, excludes sites from the XML, JSON output.

catOutputFileName String
seqtype https://w3id.org/cwl/view/git/b1649c916f1ee31b62c1eba254e05d1a8c50c901/workflows/InterProScan-v5-chunked-wf.cwl#seqtype/seqtype (Optional) Sequence type

Optional, the type of the input sequences (dna/rna (n) or protein (p)). The default sequence type is protein.

outputFormat https://w3id.org/cwl/view/git/b1649c916f1ee31b62c1eba254e05d1a8c50c901/tools/InterProScan/InterProScan-protein_formats.yaml#protein_formats[] (Optional) output format

Optional, case-insensitive, comma separated list of output formats. Supported formats are TSV, XML, JSON, GFF3, HTML and SVG. Default for protein sequences are TSV, XML and GFF3, or for nucleotide sequences GFF3 and XML.

applications https://w3id.org/cwl/view/git/b1649c916f1ee31b62c1eba254e05d1a8c50c901/tools/InterProScan/InterProScan-apps.yaml#apps[] (Optional) Analysis

Optional, comma separated list of analyses. If this option is not set, ALL analyses will be run.

Steps

ID Runs Label Doc
split_seqs
../utils/fasta_chunker.cwl (CommandLineTool)
split FASTA by number of records

based upon code by developers from EMBL-EBI

run_interproscan
../tools/InterProScan/InterProScan-v5.cwl (CommandLineTool)
InterProScan: protein sequence classifier

InterProScan is the software package that allows sequences (protein and nucleic) to be scanned against InterPro's signatures. Signatures are predictive models, provided by several different databases, that make up the InterPro consortium.

This tool description is using a Docker container tagged as version v5.30-69.0.

Documentation on how to run InterProScan 5 can be found here: https://github.com/ebi-pf-team/interproscan/wiki/HowToRun

combine_interproscan_results
../utils/concatenate.cwl (CommandLineTool)
Redirecting Multiple Files Contain in a Single File

The cat (short for “concatenate“) command is one of the most frequently used command in Linux/Unix like operating systems. cat command allows us to create single or multiple files, view contain of file, concatenate files and redirect output in terminal or files.

Outputs

ID Type Label Doc
i5Annotations File
Permalink: https://w3id.org/cwl/view/git/b1649c916f1ee31b62c1eba254e05d1a8c50c901/workflows/InterProScan-v5-chunked-wf.cwl