Workflow: ensembl_genomes_to_variation_graph_with_uniprot_annotation_rdf.cwl
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
my_baseuri | String | ||
my_ncbiTaxid | String |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
mod |
vg_mod_with_a_gam.cwl
(CommandLineTool)
|
Augment/mod a vg with a gam |
Adds the paths contained in the gam into the vg |
annotate |
annotate_a_vg_with_a_bed.cwl
(CommandLineTool)
|
Annotate a vg graph with bed |
Includes all genome paths |
fetch_fasta |
retrieve_genomic_fasta_from_ensembl_ftp.cwl
(CommandLineTool)
|
Retrieve chromosomal fastas id’s for the assemblys that are present in UniProtKB |
|
vg_to_turtle |
vg_to_turtle.cwl
(CommandLineTool)
|
Export a vg to turtle rdf |
This allows it being loaded into a triples store |
fetch_uniprot |
retrieve_rdf_from_uniprot.cwl
(CommandLineTool)
|
Retrieve rdf from uniprot the proteins for the proteomes |
|
msga_the_fasta |
fasta_vg_msga_into_graph.cwl
(CommandLineTool)
|
Construct a genome graph |
Includes all genome paths |
get_ensembl_bed |
retrieve_bed_files_from_ensembl.cwl
(CommandLineTool)
|
Retrieve chromosomal fastas id’s for the assemblys that are present in UniProtKB |
|
xg_index_the_vg |
xg_index_vg.cwl
(CommandLineTool)
|
Index a vg with xg |
The XG index allows faster access to vg |
fetch_ensembl_ttl |
retrieve_turtle_from_ensembl.cwl
(CommandLineTool)
|
Retrieve ttl from ensemblgenomes the assemblys that are present in UniProtKB |
|
fetch_assembly_ids |
retrieve_assembly_identifiers_for_proteomes_from_uniprot.cwl
(CommandLineTool)
|
get assembly ids for proteome per taxid from UniProtKB |
Retrieve assembly id’s for E.coli proteomes that are non redundant in UniProtKB |
fix_ensembl_turtle |
fix_iris_in_turtle_from_ensembl.cwl
(CommandLineTool)
|
Fix IRIs in ensembl RDF because ensembl cant be bothered |
|
fetch_ensembl_metadata |
retrieve_metadata_from_ensembl_by_ncbi_taxid.cwl
(CommandLineTool)
|
Retrieve metadata for Ensembl Genome data from taxid |
|
filter_ensembl_metadata |
filter_ensembl_records_by_assembly_id.cwl
(CommandLineTool)
|
Generate a four column csv file with the name in the first column,the assembly id in the second and the database subsection name in the third and the escaped species name as used in URLs in the last. |
|
convertAssemblyIdsFromUniProtIntoRegex |
convertAssemblyIdsFromUniProtIntoRegex.cwl
(CommandLineTool)
|
Join the proteomes and genome lists so that we have only Ensembl Bacteria genomes that have non redundant UniProtKB proteome. Which at the time of writing gives 200 genomes. |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
vg | File | ||
ensembl | File | ||
uniprot | File |
https://w3id.org/cwl/view/git/fab1465733a2251129742bc678150a95b6ecd9a4/ensemblBacteriaUniProtVgExample/ensembl_genomes_to_variation_graph_with_uniprot_annotation_rdf.cwl