Workflow: QuantSeq 3' FWD, FWD-UMI or REV for single-read mRNA-Seq data

Fetched 2023-01-03 19:52:52 GMT

### Devel version of QuantSeq 3' FWD, FWD-UMI or REV for single-read mRNA-Seq data

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional) Number of threads

Number of threads for those steps that support multi-threading

use_umi Boolean (Optional) Use UMIs

Use UMIs (for FWD-UMI libraries)

fastq_file File [FASTQ] FASTQ input file

Reads data in a FASTQ format

clip_3p_end Integer (Optional) Clip from 3p end

Number of bases to clip from the 3p end

clip_5p_end Integer (Optional) Clip from 5p end

Number of bases to clip from the 5p end

minimum_rpkm Float (Optional) Minimum RPKM for Gene Body Average Tag Density Plot

Minimum RPKM for Gene Body Average Tag Density Plot

annotation_file File [GTF] Annotation file

GTF or TAB-separated annotation file

chrom_length_file File [Textual format] Chromosome length file

Chromosome length file

strand_specificity Strand specificity. 'Yes' for FWD or FWD-UMI analyses, 'Reverse' for REV, 'No' to disable

Whether the data is from a strand-specific assay. For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.

annotation_gtf_file File [GTF] GTF annotation file

GTF annotation file

star_indices_folder Directory STAR indices folder

Path to STAR generated indices

bowtie_indices_folder Directory BowTie Ribosomal Indices

Path to Bowtie generated indices

Steps

ID Runs Label Doc
get_stat
../tools/collect-statistics-rna-quantseq.cwl (CommandLineTool)

Tool processes and combines log files generated by Trimgalore, Bowtie, Samtools and MACS2.

`get_output_prefix` function returns output file prefix equal to `output_prefix`+`_collected_statistics_report` (if this input is provided) or generated on the base of bowtie log basename with `_collected_statistics_report` extension.

star_aligner
../tools/star-alignreads.cwl (CommandLineTool)

Tool runs STAR alignReads.

`default_output_name_prefix` function returns output files prefix if `outFileNamePrefix` is not set. By default prefix is equal to basename of `readFilesIn`.

extract_fastq
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

get_gene_body
../tools/plugin-plot-rna.cwl (CommandLineTool)
Gene body average tag density plot and RPKM distribution histogram

Runs R script to produce gene body average tag density plot and RPKM distribution histogram Doesn't fail even when we couldn't produce any plots

trim_adapters
trim-quantseq-mrnaseq-se-strand-specific.cwl#trim_adapters/a431e959-2427-41a1-81f8-39387f0b51a2 (CommandLineTool)
bowtie_aligner
../tools/bowtie-alignreads.cwl (CommandLineTool)

Tool maps input raw reads files to reference genome using Bowtie.

`default_output_filename` function returns default name for SAM output and log files. In case when `sam` and `output_filename` inputs are not set, default filename will have `.sam` extension but format may not correspond SAM specification. To set output filename manually use `output_filename` input. Default output filename is based on `output_filename` or basename of `upstream_filelist`, `downstream_filelist` or `crossbow_filelist` file (if array, the first file in array is taken). If function is called without argenments and `output_filename` input is set, it will be returned from the function.

For single-end input data any of the `upstream_filelist` or `downstream_filelist` inputs can be used.

Log filename (`log_file` output) is generated by `default_output_filename` function with ex='.bw'

`indices_folder` defines folder to contain Bowtie indices. Based on the first found file with `rev.1.ebwt` or `rev.1.ebwtl` extension, bowtie index prefix is returned from input's `valueFrom` field.

umi_tools_dedup
../tools/umi-tools-dedup.cwl (CommandLineTool)

Deduplicate BAM files based on the first mapping co-ordinate and the UMI attached to the read Only -I, --paired and -S parameters are implemented.

get_bam_statistics
../tools/samtools-stats.cwl (CommandLineTool)

Generates statistics for the input BAM file.

fastx_quality_stats
../tools/fastx-quality-stats.cwl (CommandLineTool)

Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension.

move_umi_to_read_name
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

bam_to_bigwig_upstream

Workflow converts input BAM file into bigWig and bedGraph files.

Input BAM file should be sorted by coordinates (required by `bam_to_bedgraph` step).

If `split` input is not provided use true by default. Default logic is implemented in `valueFrom` field of `split` input inside `bam_to_bedgraph` step to avoid possible bug in cwltool with setting default values for workflow inputs.

`scale` has higher priority over the `mapped_reads_number`. The last one is used to calculate `-scale` parameter for `bedtools genomecov` (step `bam_to_bedgraph`) only in a case when input `scale` is not provided. All logic is implemented inside `bedtools-genomecov.cwl`.

`bigwig_filename` defines the output name only for generated bigWig file. `bedgraph_filename` defines the output name for generated bedGraph file and can influence on generated bigWig filename in case when `bigwig_filename` is not provided.

All workflow inputs and outputs don't have `format` field to avoid format incompatibility errors when workflow is used as subworkflow.

bam_to_bigwig_downstream

Workflow converts input BAM file into bigWig and bedGraph files.

Input BAM file should be sorted by coordinates (required by `bam_to_bedgraph` step).

If `split` input is not provided use true by default. Default logic is implemented in `valueFrom` field of `split` input inside `bam_to_bedgraph` step to avoid possible bug in cwltool with setting default values for workflow inputs.

`scale` has higher priority over the `mapped_reads_number`. The last one is used to calculate `-scale` parameter for `bedtools genomecov` (step `bam_to_bedgraph`) only in a case when input `scale` is not provided. All logic is implemented inside `bedtools-genomecov.cwl`.

`bigwig_filename` defines the output name only for generated bigWig file. `bedgraph_filename` defines the output name for generated bedGraph file and can influence on generated bigWig filename in case when `bigwig_filename` is not provided.

All workflow inputs and outputs don't have `format` field to avoid format incompatibility errors when workflow is used as subworkflow.

feature_expression_merge
../tools/feature-merge.cwl (CommandLineTool)
Feature merge - merges feature files based on the specified columns

Tool merges input feature files based on the columns provided in --mergeby input. All input feature CSV/TSV files should have the header (case-sensitive) Format of the input files is identified based on file's extension *.csv - CSV *.tsv - TSV Otherwise used CSV by default

The output file's rows order corresponds to the rows order of the first CSV/TSV feature file. Output is always saved in TSV format.

Output file includes only rows intersected by column names set in --mergeby. Output file includes only columns set in --mergeby and --report parameters. Column set in the --report parameter is renamed based on the --aliases or basenames of the --features files.

group_transcript_expression
trim-quantseq-mrnaseq-se-strand-specific.cwl#group_transcript_expression/b625ab9a-4bc5-4aa1-96d9-d8830551dffa (CommandLineTool)
samtools_sort_index_after_dedup
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided).

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

geep_count_transcript_expression
../tools/geep.cwl (CommandLineTool)
geep

Tool calculates RPKM values grouped by isoforms or genes.

`default_output_prefix` function returns default prefix based on `bam_file` basename, if `output_prefix` is not provided.

group_geep_transcript_expression
trim-quantseq-mrnaseq-se-strand-specific.cwl#group_geep_transcript_expression/0bcba6ff-25bd-4e95-9948-4f1c129dd853 (CommandLineTool)
samtools_sort_index_before_dedup
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided).

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

htseq_count_transcript_expression
../tools/htseq-count.cwl (CommandLineTool)
HTSeq: Analysing high-throughput sequencing data

For convenience to use in the workflow that sort and index BAM files by coordinate this tools expects coordinate sorted and indexed BAM file as input. For single-read dat it won't influence on anything, for paired-end the more memory will be used to keep reads while looking for their proper pairs (see --max-reads-in-buffer parameter).

Current limitations: - only one `--additional-attr` is supported - skip `--nprocesses` parameter as it's not helpful when we use only one input BAM file

Outputs

ID Type Label Doc
bowtie_log File [Textual format] Bowtie alignment log

Bowtie alignment log file

bambai_pair File [BAM] Coordinate sorted BAM alignment file (+index BAI)

Coordinate sorted BAM file and BAI index file

star_sj_log File (Optional) [Textual format] STAR sj log

STAR SJ.out.tab

get_stat_log File (Optional) [YAML] YAML formatted combined log

YAML formatted combined log

star_out_log File (Optional) [Textual format] STAR log out

STAR Log.out

star_final_log File [Textual format] STAR final log

STAR Log.final.out

bigwig_upstream File [bigWig] Upstream bigWig file

bigWig file from the 5' - 3' strand

star_stdout_log File (Optional) [Textual format] STAR stdout log

STAR Log.std.out

fastx_statistics File [Textual format] FASTQ statistics

fastx_quality_stats generated FASTQ file quality statistics file

gene_body_report File (Optional) [TSV] Gene body average tag density plot for all isoforms longer than 1000 bp

Gene body average tag density plot for all isoforms longer than 1000 bp in TSV format

bigwig_downstream File [bigWig] Downstream bigWig file

bigWig file from the 3' - 5' strand

get_stat_markdown File (Optional) [TIDE TXT] Markdown formatted combined log

Markdown formatted combined log

star_progress_log File (Optional) [Textual format] STAR progress log

STAR Log.progress.out

gene_body_plot_pdf File (Optional) [PDF] Gene body average tag density plot for all isoforms longer than 1000 bp

Gene body average tag density plot for all isoforms longer than 1000 bp in PDF format

get_formatted_stats File (Optional) [Textual format] Bowtie, STAR and GEEP mapping stats

Processed and combined Bowtie & STAR aligner and GEEP logs

gene_expression_file File [TSV] Gene expression

Gene expression

bam_statistics_report File [Textual format] BAM statistics report

BAM statistics report (after deduplication step)

umi_tools_dedup_stats File[] (Optional) umi_tools dedup statistics

umi_tools dedup statistics

htseq_count_stderr_log File [Textual format] HTSeq: stderr log

HTSeq: stderr log

htseq_count_stdout_log File [Textual format] HTSeq: stdout log

HTSeq: stdout log

trim_adapters_stderr_log File cutadapt: stderr log

cutadapt: stderr log

trim_adapters_stdout_log File cutadapt: stdout log

cutadapt: stdout log

geep_gene_expression_file File [TSV] GEEP: expression grouped by gene name

GEEP: expression grouped by gene name

rpkm_distribution_plot_pdf File (Optional) [PDF] RPKM distribution plot for isoforms

RPKM distribution plot for isoforms in PDF format

umi_tools_dedup_stderr_log File umi_tools dedup: stderr log

umi_tools dedup: stderr log

umi_tools_dedup_stdout_log File umi_tools dedup: stdout log

umi_tools dedup: stdout log

combined_gene_expression_file File [TSV] HTSeq vs GEEP gene expression comparison

Merged by GeneId, Chrom, TxStart, TxEnd and Strand gene expression files with reported and renamed TotalReads columns.

feature_expression_merge_stderr_log File [Textual format] HTSeq vs GEEP gene expression comparison stderr log

HTSeq vs GEEP gene expression comparison stderr log

feature_expression_merge_stdout_log File [Textual format] HTSeq vs GEEP gene expression comparison stdout log

HTSeq vs GEEP gene expression comparison stdout log

Permalink: https://w3id.org/cwl/view/git/581156366f91861bd4dbb5bcb59f67d468b32af3/workflows/trim-quantseq-mrnaseq-se-strand-specific.cwl