Workflow: Build STAR indices
Workflow runs [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) to build indices for reference genome provided in a single FASTA file as fasta_file input and GTF annotation file from annotation_gtf_file input. Generated indices are saved in a folder with the name that corresponds to the input genome.
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
genome | String | Genome type |
Genome type, such as mm10, hg19, hg38, etc |
threads | Integer (Optional) | Number of threads |
Number of threads for those steps that support multithreading |
fasta_file | File [FASTA] | Reference genome FASTA file |
Reference genome FASTA file. Includes all chromosomes |
genome_sa_sparse_d | Integer (Optional) | Suffix array sparsity (use 2 to decrease needed RAM) |
Suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAMat the cost of mapping speed reduction\" |
annotation_gtf_file | File (Optional) [GTF] | GTF annotation file |
GTF annotation file |
genome_chr_bin_n_bits | Integer (Optional) | Number of bins allocated for each chromosome |
If you are using a genome with a large (>5,000) number of references (chrosomes/scaffolds), you may need to reduce the --genomeChrBinNbits to reduce RAM consumption. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]). default: 18 |
genome_sa_index_n_bases | Integer (Optional) | Length of SA pre-indexing string |
Length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7. default: 14 |
limit_genome_generate_ram | Long (Optional) | Limit maximum available RAM (bytes) for genome generation |
Maximum available RAM (bytes) for genome generation. Default 31000000000 |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
star_generate_indices |
../tools/star-genomegenerate.cwl
(CommandLineTool)
|
Tool returns directory with indices generated by STAR. If genome_dir input is not provided, use default output directory name star_indices. Output chr_name_length should not be moved outside the indices folder. |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
stderr_log | File | STAR stderr log |
STAR generated stderr log |
stdout_log | File | STAR stdout log |
STAR generated stdout log |
indices_folder | Directory | STAR indices |
STAR generated indices folder |
chrom_length_file | File [Textual format] | Chromosome length file |
Chromosome length file |
https://w3id.org/cwl/view/git/c9e7f3de7f6ba38ee663bd3f9649e8d7dbac0c86/workflows/star-index.cwl