Workflow: ensembl_genomes_to_variation_graph_with_uniprot_annotation_rdf.cwl

Fetched 2024-04-19 01:30:01 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
my_baseuri String
my_ncbiTaxid String

Steps

ID Runs Label Doc
mod
vg_mod_with_a_gam.cwl (CommandLineTool)
Augment/mod a vg with a gam

Adds the paths contained in the gam into the vg

annotate
annotate_a_vg_with_a_bed.cwl (CommandLineTool)
Annotate a vg graph with bed

Includes all genome paths

fetch_fasta
retrieve_genomic_fasta_from_ensembl_ftp.cwl (CommandLineTool)

Retrieve chromosomal fastas id’s for the assemblys that are present in UniProtKB

vg_to_turtle
vg_to_turtle.cwl (CommandLineTool)
Export a vg to turtle rdf

This allows it being loaded into a triples store

fetch_uniprot
retrieve_rdf_from_uniprot.cwl (CommandLineTool)

Retrieve rdf from uniprot the proteins for the proteomes

msga_the_fasta
fasta_vg_msga_into_graph.cwl (CommandLineTool)
Construct a genome graph

Includes all genome paths

get_ensembl_bed
retrieve_bed_files_from_ensembl.cwl (CommandLineTool)

Retrieve chromosomal fastas id’s for the assemblys that are present in UniProtKB

xg_index_the_vg
xg_index_vg.cwl (CommandLineTool)
Index a vg with xg

The XG index allows faster access to vg

fetch_ensembl_ttl
retrieve_turtle_from_ensembl.cwl (CommandLineTool)

Retrieve ttl from ensemblgenomes the assemblys that are present in UniProtKB

fetch_assembly_ids
retrieve_assembly_identifiers_for_proteomes_from_uniprot.cwl (CommandLineTool)
get assembly ids for proteome per taxid from UniProtKB

Retrieve assembly id’s for E.coli proteomes that are non redundant in UniProtKB

fix_ensembl_turtle
fix_iris_in_turtle_from_ensembl.cwl (CommandLineTool)

Fix IRIs in ensembl RDF because ensembl cant be bothered

fetch_ensembl_metadata
retrieve_metadata_from_ensembl_by_ncbi_taxid.cwl (CommandLineTool)

Retrieve metadata for Ensembl Genome data from taxid

filter_ensembl_metadata
filter_ensembl_records_by_assembly_id.cwl (CommandLineTool)

Generate a four column csv file with the name in the first column,the assembly id in the second and the database subsection name in the third and the escaped species name as used in URLs in the last.

convertAssemblyIdsFromUniProtIntoRegex
convertAssemblyIdsFromUniProtIntoRegex.cwl (CommandLineTool)

Join the proteomes and genome lists so that we have only Ensembl Bacteria genomes that have non redundant UniProtKB proteome. Which at the time of writing gives 200 genomes.

Outputs

ID Type Label Doc
vg File
ensembl File
uniprot File
Permalink: https://w3id.org/cwl/view/git/fab1465733a2251129742bc678150a95b6ecd9a4/ensemblBacteriaUniProtVgExample/ensembl_genomes_to_variation_graph_with_uniprot_annotation_rdf.cwl