CWL Workflow: Generate genome index STAR RNA

Workflow: Generate genome index STAR RNA

Fetched 2023-08-06 20:11:50 GMT

Verified with cwltool version 3.1.20230201224320

Workflow makes indices for [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886). It performs the following steps: 1. Runs `STAR --runMode genomeGenerate` to generate indices, based on [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) and [GTF](http://mblab.wustl.edu/GTF2.html) input files, returns results as an array of files 2. Transforms array of files into [Direcotry](http://www.commonwl.org/v1.0/CommandLineTool.html#Directory) data type 3. Separates *chrNameLength.txt* file as an output

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
fasta	File [FASTA]	FASTA input file	Reference genome input FASTA file
threads	Integer (Optional)	Number of threads to run tools	Number of threads for those steps that support multithreading
genome_label	String (Optional)	Genome label	Genome label is used by web-ui to show label
annotation_gtf	File [GTF]	GTF input file	Annotation input file
annotation_tab	File [TSV]	Annotation file	Tab-separated annotation file
genome_details	String (Optional)	Genome details	Genome details
genome_description	String (Optional)	Genome description	Genome description is used by web-ui to show description
genome_sa_sparse_d	Integer (Optional)	Use 2 to decrease needed RAM for STAR	int>0: suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction
genome_chr_bin_n_bits	Integer (Optional)	Genome Chr Bin NBits	If you are using a genome with a large (>5,000) number of references (chrosomes/scaﬀolds), you may need to reduce the --genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: --genomeChrBinNbits = min(18,log2[max(GenomeLength/NumberOfReferences,ReadLength)]). For example, for 3 gigaBase genome with 100,000 chromosomes/scaﬀolds, this is equal to 15. default: 18 int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).
genome_sa_index_n_bases	Integer (Optional)	length of the SA pre-indexing string	For small genomes, the parameter --genomeSAindexNbases must to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7. default: 14 int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).
limit_genome_generate_ram	Long (Optional)		31000000000 int>0: maximum available RAM (bytes) for genome generation

Steps

ID	Runs	Label	Doc
star_generate_indices	../tools/star-genomegenerate.cwl (CommandLineTool)		Runs STAR genomeGenerated. Returns directory with index

Outputs

ID	Type	Label	Doc
annotation	File [TSV]	Annotation file	Tab-separated annotation file
chrom_length	File [Textual format]	Chromosome length file	Chromosome length file
star_indices	Directory	STAR indices folder	Folder which includes all STAR generated indices files

Permalink: https://w3id.org/cwl/view/git/bfa3843bcf36125ff258d6314f64b41336f06e6b/workflows/star-index.cwl