Workflow: Generate genome index STAR RNA

Fetched 2023-08-06 20:11:50 GMT

Workflow makes indices for [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886). It performs the following steps: 1. Runs `STAR --runMode genomeGenerate` to generate indices, based on [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) and [GTF](http://mblab.wustl.edu/GTF2.html) input files, returns results as an array of files 2. Transforms array of files into [Direcotry](http://www.commonwl.org/v1.0/CommandLineTool.html#Directory) data type 3. Separates *chrNameLength.txt* file as an output

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
fasta File [FASTA] FASTA input file

Reference genome input FASTA file

threads Integer (Optional) Number of threads to run tools

Number of threads for those steps that support multithreading

genome_label String (Optional) Genome label

Genome label is used by web-ui to show label

annotation_gtf File [GTF] GTF input file

Annotation input file

annotation_tab File [TSV] Annotation file

Tab-separated annotation file

genome_details String (Optional) Genome details

Genome details

genome_description String (Optional) Genome description

Genome description is used by web-ui to show description

genome_sa_sparse_d Integer (Optional) Use 2 to decrease needed RAM for STAR

int>0: suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction

genome_chr_bin_n_bits Integer (Optional) Genome Chr Bin NBits

If you are using a genome with a large (>5,000) number of references (chrosomes/scaffolds), you may need to reduce the --genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: --genomeChrBinNbits = min(18,log2[max(GenomeLength/NumberOfReferences,ReadLength)]). For example, for 3 gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.

default: 18

int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).

genome_sa_index_n_bases Integer (Optional) length of the SA pre-indexing string

For small genomes, the parameter --genomeSAindexNbases must to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7.

default: 14

int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).

limit_genome_generate_ram Long (Optional)

31000000000 int>0: maximum available RAM (bytes) for genome generation

Steps

ID Runs Label Doc
star_generate_indices
../tools/star-genomegenerate.cwl (CommandLineTool)

Runs STAR genomeGenerated. Returns directory with index

Outputs

ID Type Label Doc
annotation File [TSV] Annotation file

Tab-separated annotation file

chrom_length File [Textual format] Chromosome length file

Chromosome length file

star_indices Directory STAR indices folder

Folder which includes all STAR generated indices files

Permalink: https://w3id.org/cwl/view/git/bfa3843bcf36125ff258d6314f64b41336f06e6b/workflows/star-index.cwl