QuantSeq 3' FWD, FWD-UMI or REV for single-read mRNA-Seq data

Workflow: QuantSeq 3' FWD, FWD-UMI or REV for single-read mRNA-Seq data

Fetched 2023-01-03 19:52:52 GMT

Verified with cwltool version 3.1.20221201130942

### Devel version of QuantSeq 3' FWD, FWD-UMI or REV for single-read mRNA-Seq data

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
threads	Integer (Optional)	Number of threads	Number of threads for those steps that support multi-threading
use_umi	Boolean (Optional)	Use UMIs	Use UMIs (for FWD-UMI libraries)
fastq_file	File [FASTQ]	FASTQ input file	Reads data in a FASTQ format
clip_3p_end	Integer (Optional)	Clip from 3p end	Number of bases to clip from the 3p end
clip_5p_end	Integer (Optional)	Clip from 5p end	Number of bases to clip from the 5p end
minimum_rpkm	Float (Optional)	Minimum RPKM for Gene Body Average Tag Density Plot	Minimum RPKM for Gene Body Average Tag Density Plot
annotation_file	File [GTF]	Annotation file	GTF or TAB-separated annotation file
chrom_length_file	File [Textual format]	Chromosome length file	Chromosome length file
strand_specificity		Strand specificity. 'Yes' for FWD or FWD-UMI analyses, 'Reverse' for REV, 'No' to disable	Whether the data is from a strand-specific assay. For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.
annotation_gtf_file	File [GTF]	GTF annotation file	GTF annotation file
star_indices_folder	Directory	STAR indices folder	Path to STAR generated indices
bowtie_indices_folder	Directory	BowTie Ribosomal Indices	Path to Bowtie generated indices

Steps

ID	Runs	Label	Doc
get_stat	../tools/collect-statistics-rna-quantseq.cwl (CommandLineTool)		Tool processes and combines log files generated by Trimgalore, Bowtie, Samtools and MACS2. `get_output_prefix` function returns output file prefix equal to `output_prefix`+`_collected_statistics_report` (if this input is provided) or generated on the base of bowtie log basename with `_collected_statistics_report` extension.
star_aligner	../tools/star-alignreads.cwl (CommandLineTool)		Tool runs STAR alignReads. `default_output_name_prefix` function returns output files prefix if `outFileNamePrefix` is not set. By default prefix is equal to basename of `readFilesIn`.
extract_fastq	../tools/extract-fastq.cwl (CommandLineTool)		Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all
get_gene_body	../tools/plugin-plot-rna.cwl (CommandLineTool)	Gene body average tag density plot and RPKM distribution histogram	Runs R script to produce gene body average tag density plot and RPKM distribution histogram Doesn't fail even when we couldn't produce any plots
trim_adapters	trim-quantseq-mrnaseq-se-strand-specific.cwl#trim_adapters/a431e959-2427-41a1-81f8-39387f0b51a2 (CommandLineTool)
bowtie_aligner	../tools/bowtie-alignreads.cwl (CommandLineTool)		Tool maps input raw reads files to reference genome using Bowtie. `default_output_filename` function returns default name for SAM output and log files. In case when `sam` and `output_filename` inputs are not set, default filename will have `.sam` extension but format may not correspond SAM specification. To set output filename manually use `output_filename` input. Default output filename is based on `output_filename` or basename of `upstream_filelist`, `downstream_filelist` or `crossbow_filelist` file (if array, the first file in array is taken). If function is called without argenments and `output_filename` input is set, it will be returned from the function. For single-end input data any of the `upstream_filelist` or `downstream_filelist` inputs can be used. Log filename (`log_file` output) is generated by `default_output_filename` function with ex='.bw' `indices_folder` defines folder to contain Bowtie indices. Based on the first found file with `rev.1.ebwt` or `rev.1.ebwtl` extension, bowtie index prefix is returned from input's `valueFrom` field.
umi_tools_dedup	../tools/umi-tools-dedup.cwl (CommandLineTool)		Deduplicate BAM files based on the first mapping co-ordinate and the UMI attached to the read Only -I, --paired and -S parameters are implemented.
get_bam_statistics	../tools/samtools-stats.cwl (CommandLineTool)		Generates statistics for the input BAM file.
fastx_quality_stats	../tools/fastx-quality-stats.cwl (CommandLineTool)		Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension.
move_umi_to_read_name	../tools/custom-bash.cwl (CommandLineTool)		Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename
bam_to_bigwig_upstream	../tools/bam-bedgraph-bigwig.cwl (Workflow)		Workflow converts input BAM file into bigWig and bedGraph files. Input BAM file should be sorted by coordinates (required by `bam_to_bedgraph` step). If `split` input is not provided use true by default. Default logic is implemented in `valueFrom` field of `split` input inside `bam_to_bedgraph` step to avoid possible bug in cwltool with setting default values for workflow inputs. `scale` has higher priority over the `mapped_reads_number`. The last one is used to calculate `-scale` parameter for `bedtools genomecov` (step `bam_to_bedgraph`) only in a case when input `scale` is not provided. All logic is implemented inside `bedtools-genomecov.cwl`. `bigwig_filename` defines the output name only for generated bigWig file. `bedgraph_filename` defines the output name for generated bedGraph file and can influence on generated bigWig filename in case when `bigwig_filename` is not provided. All workflow inputs and outputs don't have `format` field to avoid format incompatibility errors when workflow is used as subworkflow.
bam_to_bigwig_downstream	../tools/bam-bedgraph-bigwig.cwl (Workflow)		Workflow converts input BAM file into bigWig and bedGraph files. Input BAM file should be sorted by coordinates (required by `bam_to_bedgraph` step). If `split` input is not provided use true by default. Default logic is implemented in `valueFrom` field of `split` input inside `bam_to_bedgraph` step to avoid possible bug in cwltool with setting default values for workflow inputs. `scale` has higher priority over the `mapped_reads_number`. The last one is used to calculate `-scale` parameter for `bedtools genomecov` (step `bam_to_bedgraph`) only in a case when input `scale` is not provided. All logic is implemented inside `bedtools-genomecov.cwl`. `bigwig_filename` defines the output name only for generated bigWig file. `bedgraph_filename` defines the output name for generated bedGraph file and can influence on generated bigWig filename in case when `bigwig_filename` is not provided. All workflow inputs and outputs don't have `format` field to avoid format incompatibility errors when workflow is used as subworkflow.
feature_expression_merge	../tools/feature-merge.cwl (CommandLineTool)	Feature merge - merges feature files based on the specified columns	Tool merges input feature files based on the columns provided in --mergeby input. All input feature CSV/TSV files should have the header (case-sensitive) Format of the input files is identified based on file's extension .csv - CSV .tsv - TSV Otherwise used CSV by default The output file's rows order corresponds to the rows order of the first CSV/TSV feature file. Output is always saved in TSV format. Output file includes only rows intersected by column names set in --mergeby. Output file includes only columns set in --mergeby and --report parameters. Column set in the --report parameter is renamed based on the --aliases or basenames of the --features files.
group_transcript_expression	trim-quantseq-mrnaseq-se-strand-specific.cwl#group_transcript_expression/b625ab9a-4bc5-4aa1-96d9-d8830551dffa (CommandLineTool)
samtools_sort_index_after_dedup	../tools/samtools-sort-index.cwl (CommandLineTool)		Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files. Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided). Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic. If using `sort_output_filename`, the output file extension should be `.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`. `default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default. `ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI
geep_count_transcript_expression	../tools/geep.cwl (CommandLineTool)	geep	Tool calculates RPKM values grouped by isoforms or genes. `default_output_prefix` function returns default prefix based on `bam_file` basename, if `output_prefix` is not provided.
group_geep_transcript_expression	trim-quantseq-mrnaseq-se-strand-specific.cwl#group_geep_transcript_expression/0bcba6ff-25bd-4e95-9948-4f1c129dd853 (CommandLineTool)
samtools_sort_index_before_dedup	../tools/samtools-sort-index.cwl (CommandLineTool)		Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files. Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided). Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic. If using `sort_output_filename`, the output file extension should be `.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`. `default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default. `ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI
htseq_count_transcript_expression	../tools/htseq-count.cwl (CommandLineTool)	HTSeq: Analysing high-throughput sequencing data	For convenience to use in the workflow that sort and index BAM files by coordinate this tools expects coordinate sorted and indexed BAM file as input. For single-read dat it won't influence on anything, for paired-end the more memory will be used to keep reads while looking for their proper pairs (see --max-reads-in-buffer parameter). Current limitations: - only one `--additional-attr` is supported - skip `--nprocesses` parameter as it's not helpful when we use only one input BAM file

Outputs

ID	Type	Label	Doc
bowtie_log	File [Textual format]	Bowtie alignment log	Bowtie alignment log file
bambai_pair	File [BAM]	Coordinate sorted BAM alignment file (+index BAI)	Coordinate sorted BAM file and BAI index file
star_sj_log	File (Optional) [Textual format]	STAR sj log	STAR SJ.out.tab
get_stat_log	File (Optional) [YAML]	YAML formatted combined log	YAML formatted combined log
star_out_log	File (Optional) [Textual format]	STAR log out	STAR Log.out
star_final_log	File [Textual format]	STAR final log	STAR Log.final.out
bigwig_upstream	File [bigWig]	Upstream bigWig file	bigWig file from the 5' - 3' strand
star_stdout_log	File (Optional) [Textual format]	STAR stdout log	STAR Log.std.out
fastx_statistics	File [Textual format]	FASTQ statistics	fastx_quality_stats generated FASTQ file quality statistics file
gene_body_report	File (Optional) [TSV]	Gene body average tag density plot for all isoforms longer than 1000 bp	Gene body average tag density plot for all isoforms longer than 1000 bp in TSV format
bigwig_downstream	File [bigWig]	Downstream bigWig file	bigWig file from the 3' - 5' strand
get_stat_markdown	File (Optional) [TIDE TXT]	Markdown formatted combined log	Markdown formatted combined log
star_progress_log	File (Optional) [Textual format]	STAR progress log	STAR Log.progress.out
gene_body_plot_pdf	File (Optional) [PDF]	Gene body average tag density plot for all isoforms longer than 1000 bp	Gene body average tag density plot for all isoforms longer than 1000 bp in PDF format
get_formatted_stats	File (Optional) [Textual format]	Bowtie, STAR and GEEP mapping stats	Processed and combined Bowtie & STAR aligner and GEEP logs
gene_expression_file	File [TSV]	Gene expression	Gene expression
bam_statistics_report	File [Textual format]	BAM statistics report	BAM statistics report (after deduplication step)
umi_tools_dedup_stats	File[] (Optional)	umi_tools dedup statistics	umi_tools dedup statistics
htseq_count_stderr_log	File [Textual format]	HTSeq: stderr log	HTSeq: stderr log
htseq_count_stdout_log	File [Textual format]	HTSeq: stdout log	HTSeq: stdout log
trim_adapters_stderr_log	File	cutadapt: stderr log	cutadapt: stderr log
trim_adapters_stdout_log	File	cutadapt: stdout log	cutadapt: stdout log
geep_gene_expression_file	File [TSV]	GEEP: expression grouped by gene name	GEEP: expression grouped by gene name
rpkm_distribution_plot_pdf	File (Optional) [PDF]	RPKM distribution plot for isoforms	RPKM distribution plot for isoforms in PDF format
umi_tools_dedup_stderr_log	File	umi_tools dedup: stderr log	umi_tools dedup: stderr log
umi_tools_dedup_stdout_log	File	umi_tools dedup: stdout log	umi_tools dedup: stdout log
combined_gene_expression_file	File [TSV]	HTSeq vs GEEP gene expression comparison	Merged by GeneId, Chrom, TxStart, TxEnd and Strand gene expression files with reported and renamed TotalReads columns.
feature_expression_merge_stderr_log	File [Textual format]	HTSeq vs GEEP gene expression comparison stderr log	HTSeq vs GEEP gene expression comparison stderr log
feature_expression_merge_stdout_log	File [Textual format]	HTSeq vs GEEP gene expression comparison stdout log	HTSeq vs GEEP gene expression comparison stdout log

Permalink:

https://w3id.org/cwl/view/git/581156366f91861bd4dbb5bcb59f67d468b32af3/workflows/trim-quantseq-mrnaseq-se-strand-specific.cwl