Workflow: Build STAR indices

Fetched 2023-01-08 22:12:45 GMT

Workflow runs [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) to build indices for reference genome provided in a single FASTA file as fasta_file input and GTF annotation file from annotation_gtf_file input. Generated indices are saved in a folder with the name that corresponds to the input genome.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
genome String Genome type

Genome type, such as mm10, hg19, hg38, etc

threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

fasta_file File [FASTA] Reference genome FASTA file

Reference genome FASTA file. Includes all chromosomes

genome_sa_sparse_d Integer (Optional) Suffix array sparsity (use 2 to decrease needed RAM)

Suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAMat the cost of mapping speed reduction\"

annotation_gtf_file File (Optional) [GTF] GTF annotation file

GTF annotation file

genome_chr_bin_n_bits Integer (Optional) Number of bins allocated for each chromosome

If you are using a genome with a large (>5,000) number of references (chrosomes/scaffolds), you may need to reduce the --genomeChrBinNbits to reduce RAM consumption. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]). default: 18

genome_sa_index_n_bases Integer (Optional) Length of SA pre-indexing string

Length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7. default: 14

limit_genome_generate_ram Long (Optional) Limit maximum available RAM (bytes) for genome generation

Maximum available RAM (bytes) for genome generation. Default 31000000000

Steps

ID Runs Label Doc
star_generate_indices
../tools/star-genomegenerate.cwl (CommandLineTool)

Tool returns directory with indices generated by STAR. If genome_dir input is not provided, use default output directory name star_indices. Output chr_name_length should not be moved outside the indices folder.

Outputs

ID Type Label Doc
stderr_log File STAR stderr log

STAR generated stderr log

stdout_log File STAR stdout log

STAR generated stdout log

indices_folder Directory STAR indices

STAR generated indices folder

chrom_length_file File [Textual format] Chromosome length file

Chromosome length file

Permalink: https://w3id.org/cwl/view/git/799575ce58746813f066a665adeacdda252d8cab/workflows/star-index.cwl