CWL Workflow: Build STAR indices

Workflow: Build STAR indices

Fetched 2023-01-08 22:12:45 GMT

Verified with cwltool version 3.1.20221201130942

Workflow runs [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) to build indices for reference genome provided in a single FASTA file as fasta_file input and GTF annotation file from annotation_gtf_file input. Generated indices are saved in a folder with the name that corresponds to the input genome.

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
genome	String	Genome type	Genome type, such as mm10, hg19, hg38, etc
threads	Integer (Optional)	Number of threads	Number of threads for those steps that support multithreading
fasta_file	File [FASTA]	Reference genome FASTA file	Reference genome FASTA file. Includes all chromosomes
genome_sa_sparse_d	Integer (Optional)	Suffix array sparsity (use 2 to decrease needed RAM)	Suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAMat the cost of mapping speed reduction\"
annotation_gtf_file	File (Optional) [GTF]	GTF annotation file	GTF annotation file
genome_chr_bin_n_bits	Integer (Optional)	Number of bins allocated for each chromosome	If you are using a genome with a large (>5,000) number of references (chrosomes/scaﬀolds), you may need to reduce the --genomeChrBinNbits to reduce RAM consumption. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]). default: 18
genome_sa_index_n_bases	Integer (Optional)	Length of SA pre-indexing string	Length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7. default: 14
limit_genome_generate_ram	Long (Optional)	Limit maximum available RAM (bytes) for genome generation	Maximum available RAM (bytes) for genome generation. Default 31000000000

Steps

ID	Runs	Label	Doc
star_generate_indices	../tools/star-genomegenerate.cwl (CommandLineTool)		Tool returns directory with indices generated by STAR. If genome_dir input is not provided, use default output directory name star_indices. Output chr_name_length should not be moved outside the indices folder.

Outputs

ID	Type	Label	Doc
stderr_log	File	STAR stderr log	STAR generated stderr log
stdout_log	File	STAR stdout log	STAR generated stdout log
indices_folder	Directory	STAR indices	STAR generated indices folder
chrom_length_file	File [Textual format]	Chromosome length file	Chromosome length file

Permalink: https://w3id.org/cwl/view/git/799575ce58746813f066a665adeacdda252d8cab/workflows/star-index.cwl