CWL Workflow: Generate genome indices for STAR & bowtie

Workflow: Generate genome indices for STAR & bowtie

Fetched 2023-02-15 20:43:04 GMT

Verified with cwltool version 3.1.20230201224320

Creates indices for: * [STAR](https://github.com/alexdobin/STAR) v2.5.3a (03/17/2017) PMID: [23104886](https://www.ncbi.nlm.nih.gov/pubmed/23104886) * [bowtie](http://bowtie-bio.sourceforge.net/tutorial.shtml) v1.2.0 (12/30/2016) It performs the following steps: 1. `STAR --runMode genomeGenerate` to generate indices, based on [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) and [GTF](http://mblab.wustl.edu/GTF2.html) input files, returns results as an array of files 2. Outputs indices as [Direcotry](http://www.commonwl.org/v1.0/CommandLineTool.html#Directory) data type 3. Separates *chrNameLength.txt* file from Directory output 4. `bowtie-build` to generate indices requires genome [FASTA](http://zhanglab.ccmb.med.umich.edu/FASTA/) file as input, returns results as a group of main and secondary files

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
fasta	File [FASTA]	Genome FASTA file	Reference genome FASTA file
genome	String	Genome	Output files base string
threads	Integer (Optional)	Number of threads to run tools	Number of threads for those steps that support multithreading
genome_label	String (Optional)	Genome label	Genome label is used by web-ui to show label
annotation_tab	File [TSV]	Annotation file	Tab-separated annotation file
genome_details	String (Optional)	Genome details	Genome details
fasta_ribosomal	File (Optional) [FASTA]	Ribosomal DNA sequence FASTA file	Ribosomal DNA sequence FASTA file
genome_description	String (Optional)	Genome description	Genome description is used by web-ui to show description
genome_sa_sparse_d	Integer (Optional)	Genome SA sparse (Use 2 to decrease RAM usage)	default: 1 int>0: suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction
fasta_mitochondrial	File (Optional) [FASTA]	Mitochondrial chromosome sequence FASTA file	Mitochondrial chromosome sequence FASTA file
input_annotation_gtf	File [GTF]	GTF input file	Annotation input file
effective_genome_size	String	Effective genome size	MACS2 effective genome size: hs, mm, ce, dm or number, for example 2.7e9
genome_chr_bin_n_bits	Integer (Optional)	Genome Chr Bin NBits	If you are using a genome with a large (>5,000) number of references (chrosomes/scaﬀolds), you may need to reduce the --genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: --genomeChrBinNbits = min(18,log2[max(GenomeLength/NumberOfReferences,ReadLength)]). For example, for 3 gigaBase genome with 100,000 chromosomes/scaﬀolds, this is equal to 15. default: 18 int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).
genome_sa_index_n_bases	Integer (Optional)	length of the SA pre-indexing string	For small genomes, the parameter --genomeSAindexNbases must to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7. default: 14 int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).
limit_genome_generate_ram	Long (Optional)	Genome Generate RAM (31G default)	31000000000 int>0: maximum available RAM (bytes) for genome generation
genome_sa_index_n_bases_mitochondrial	Integer (Optional)	length (mitochondrial) of the SA pre-indexing string	For small genomes, the parameter --genomeSAindexNbases must to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7. default: 14 int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter –genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).

Steps

ID	Runs	Doc
star_generate_indices	../tools/star-genomegenerate.cwl (CommandLineTool)	Runs STAR genomeGenerated. Returns directory with index
bowtie_generate_indices	../tools/bowtie-build.cwl (CommandLineTool)	Tool runs bowtie-build Not supported parameters: -c - reference sequences given on cmd line (as <seq_in>)
ribosomal_generate_indices	../tools/bowtie-build.cwl (CommandLineTool)	Tool runs bowtie-build Not supported parameters: -c - reference sequences given on cmd line (as <seq_in>)
mitochondrial_generate_indices	../tools/star-genomegenerate.cwl (CommandLineTool)	Runs STAR genomeGenerated. Returns directory with index

Outputs

ID	Type	Label	Doc
annotation	File [TSV]	Annotation file	Tab-separated annotation file
genome_size	String	Effective genome size	MACS2 effective genome size: hs, mm, ce, dm or number, for example 2.7e9
chrom_length	File [Textual format]	Chromosome length file	Chromosome length file
star_indices	Directory	STAR indices folder	Folder which includes all STAR generated indices folder
annotation_gtf	File [GTF]	GTF input file	Annotation input file
bowtie_indices	Directory	Bowtie indices folder	Folder which includes all Bowtie generated indices folder
ribosomal_indices	Directory	Ribosomal DNA indices folder	Ribosomal DNA Bowtie generated indices folder
mitochondrial_indices	Directory	Mitochondrial chromosome index folder	Mitochondrial chromosome index folder

Permalink: https://w3id.org/cwl/view/git/c602e3cdd72ff904dd54d46ba2b5146eb1c57022/workflows/genome-indices.cwl