Workflow: Generate genome indices for STAR & bowtie

Fetched 2023-02-15 20:43:04 GMT

Creates indices for: * [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) * [bowtie](http://bowtie-bio.sourceforge.net/tutorial.shtml) v1.2.0 (12/30/2016) It performs the following steps: 1. `STAR --runMode genomeGenerate` to generate indices, based on [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) and [GTF](http://mblab.wustl.edu/GTF2.html) input files, returns results as an array of files 2. Outputs indices as [Direcotry](http://www.commonwl.org/v1.0/CommandLineTool.html#Directory) data type 3. Separates *chrNameLength.txt* file from Directory output 4. `bowtie-build` to generate indices requires genome [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) file as input, returns results as a group of main and secondary files

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
fasta File [FASTA] Genome FASTA file

Reference genome FASTA file

genome String Genome

Output files base string

threads Integer (Optional) Number of threads to run tools

Number of threads for those steps that support multithreading

genome_label String (Optional) Genome label

Genome label is used by web-ui to show label

annotation_tab File [TSV] Annotation file

Tab-separated annotation file

genome_details String (Optional) Genome details

Genome details

fasta_ribosomal File (Optional) [FASTA] Ribosomal DNA sequence FASTA file

Ribosomal DNA sequence FASTA file

genome_description String (Optional) Genome description

Genome description is used by web-ui to show description

genome_sa_sparse_d Integer (Optional) Genome SA sparse (Use 2 to decrease RAM usage)

default: 1

int>0: suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction

fasta_mitochondrial File (Optional) [FASTA] Mitochondrial chromosome sequence FASTA file

Mitochondrial chromosome sequence FASTA file

input_annotation_gtf File [GTF] GTF input file

Annotation input file

effective_genome_size String Effective genome size

MACS2 effective genome size: hs, mm, ce, dm or number, for example 2.7e9

genome_chr_bin_n_bits Integer (Optional) Genome Chr Bin NBits

If you are using a genome with a large (>5,000) number of references (chrosomes/scaffolds), you may need to reduce the --genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: --genomeChrBinNbits = min(18,log2[max(GenomeLength/NumberOfReferences,ReadLength)]). For example, for 3 gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.

default: 18

int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).

genome_sa_index_n_bases Integer (Optional) length of the SA pre-indexing string

For small genomes, the parameter --genomeSAindexNbases must to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7.

default: 14

int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).

limit_genome_generate_ram Long (Optional) Genome Generate RAM (31G default)

31000000000 int>0: maximum available RAM (bytes) for genome generation

genome_sa_index_n_bases_mitochondrial Integer (Optional) length (mitochondrial) of the SA pre-indexing string

For small genomes, the parameter --genomeSAindexNbases must to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7.

default: 14

int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).

Steps

ID Runs Label Doc
star_generate_indices
../tools/star-genomegenerate.cwl (CommandLineTool)

Runs STAR genomeGenerated. Returns directory with index

bowtie_generate_indices
../tools/bowtie-build.cwl (CommandLineTool)

Tool runs bowtie-build Not supported parameters: -c - reference sequences given on cmd line (as <seq_in>)

ribosomal_generate_indices
../tools/bowtie-build.cwl (CommandLineTool)

Tool runs bowtie-build Not supported parameters: -c - reference sequences given on cmd line (as <seq_in>)

mitochondrial_generate_indices
../tools/star-genomegenerate.cwl (CommandLineTool)

Runs STAR genomeGenerated. Returns directory with index

Outputs

ID Type Label Doc
annotation File [TSV] Annotation file

Tab-separated annotation file

genome_size String Effective genome size

MACS2 effective genome size: hs, mm, ce, dm or number, for example 2.7e9

chrom_length File [Textual format] Chromosome length file

Chromosome length file

star_indices Directory STAR indices folder

Folder which includes all STAR generated indices folder

annotation_gtf File [GTF] GTF input file

Annotation input file

bowtie_indices Directory Bowtie indices folder

Folder which includes all Bowtie generated indices folder

ribosomal_indices Directory Ribosomal DNA indices folder

Ribosomal DNA Bowtie generated indices folder

mitochondrial_indices Directory Mitochondrial chromosome index folder

Mitochondrial chromosome index folder

Permalink: https://w3id.org/cwl/view/git/c602e3cdd72ff904dd54d46ba2b5146eb1c57022/workflows/genome-indices.cwl