Workflow: QuantSeq 3' FWD, FWD-UMI or REV for single-read mRNA-Seq data

Fetched 2023-08-06 19:59:08 GMT

### Devel version of QuantSeq 3' FWD, FWD-UMI or REV for single-read mRNA-Seq data

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional) Number of threads

Number of threads for those steps that support multi-threading

use_umi Boolean (Optional) Use UMIs

Use UMIs (for FWD-UMI libraries)

fastq_file File [FASTQ] FASTQ input file

Reads data in a FASTQ format

min_length Integer (Optional) Set minimum length for trimmed reads when running FWD/REV pipeline. Shorter reads get discarded. Set 0 to disable

Set minimum length for trimmed reads when running FWD/REV (not UMI) pipeline. Shorter reads get discarded. Applied only when running trim_fastq step. For FWD-UMI pipeline we use cutadapt instead of TrimGalore, so this input is not used

clip_3p_end Integer (Optional) Clip N bp from 3p end

Number of bp to clip from the 3p end

clip_5p_end Integer (Optional) Clip N bp from 5p end

Number of bp to clip from the 5p end

exclude_chr String (Optional) Coma-separated list of chromosomes to be excluded from gene expression calculation

Coma-separated list of chromosomes to be excluded from gene expression calculation

annotation_file File [GTF] Annotation file

GTF or TAB-separated annotation file

chrom_length_file File [Textual format] Chromosome length file

Chromosome length file

annotation_gtf_file File [GTF] GTF annotation file

GTF annotation file

star_indices_folder Directory STAR indices folder

Path to STAR generated indices

bowtie_indices_folder Directory BowTie Ribosomal Indices

Path to Bowtie generated indices

Steps

ID Runs Label Doc
rename
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

get_stat
../tools/collect-statistics-rna-quantseq.cwl (CommandLineTool)

Tool processes and combines log files generated by Trimgalore, Bowtie, Samtools and MACS2.

`get_output_prefix` function returns output file prefix equal to `output_prefix`+`_collected_statistics_report` (if this input is provided) or generated on the base of bowtie log basename with `_collected_statistics_report` extension.

trim_fastq
../tools/trimgalore.cwl (CommandLineTool)

Tool runs Trimgalore - the wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files.

`default_log_name` function returns names for generated log files (for both paired-end and single-end cases). `trim_galore` itself doesn't support setting custom names for output files.

For paired-end data processing both `input_file_pair` and `paired` should be set. If either of them is not set, the other one becomes unset automatically.

If input trigger was set to false, skip running trimaglore and return unchanged input files

bypass_trim
../tools/bypass-trimgalore-se.cwl (CommandLineTool)

If the number of reads in the trimmed_fastq_file is less then min_reads_count, tool will return original_fastq_file and null as selected_report_file. Otherwise, the trimmed_fastq_file and trimming_report_file will be returned. Might be usefull in case of trimgalore removed all reads from the original_fastq_file

star_aligner
../tools/star-alignreads.cwl (CommandLineTool)

Tool runs STAR alignReads.

`default_output_name_prefix` function returns output files prefix if `outFileNamePrefix` is not set. By default prefix is equal to basename of `readFilesIn`.

bam_to_bigwig

Workflow converts input BAM file into bigWig and bedGraph files.

Input BAM file should be sorted by coordinates (required by `bam_to_bedgraph` step).

If `split` input is not provided use true by default. Default logic is implemented in `valueFrom` field of `split` input inside `bam_to_bedgraph` step to avoid possible bug in cwltool with setting default values for workflow inputs.

`scale` has higher priority over the `mapped_reads_number`. The last one is used to calculate `-scale` parameter for `bedtools genomecov` (step `bam_to_bedgraph`) only in a case when input `scale` is not provided. All logic is implemented inside `bedtools-genomecov.cwl`.

`bigwig_filename` defines the output name only for generated bigWig file. `bedgraph_filename` defines the output name for generated bedGraph file and can influence on generated bigWig filename in case when `bigwig_filename` is not provided.

All workflow inputs and outputs don't have `format` field to avoid format incompatibility errors when workflow is used as subworkflow.

extract_fastq
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file(s). If several FASTQ files are provided, they will be concatenated in the order that corresponds to files in input. Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\", otherwise use '.fastq' - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

bowtie_aligner
../tools/bowtie-alignreads.cwl (CommandLineTool)

Tool maps input raw reads files to reference genome using Bowtie.

`default_output_filename` function returns default name for SAM output and log files. In case when `sam` and `output_filename` inputs are not set, default filename will have `.sam` extension but format may not correspond SAM specification. To set output filename manually use `output_filename` input. Default output filename is based on `output_filename` or basename of `upstream_filelist`, `downstream_filelist` or `crossbow_filelist` file (if array, the first file in array is taken). If function is called without argenments and `output_filename` input is set, it will be returned from the function.

For single-end input data any of the `upstream_filelist` or `downstream_filelist` inputs can be used.

Log filename (`log_file` output) is generated by `default_output_filename` function with ex='.bw'

`indices_folder` defines folder to contain Bowtie indices. Based on the first found file with `rev.1.ebwt` or `rev.1.ebwtl` extension, bowtie index prefix is returned from input's `valueFrom` field.

group_isoforms
eaabf197567324eaa50bb94b668b85e7 (CommandLineTool)
umi_tools_dedup
../tools/umi-tools-dedup.cwl (CommandLineTool)

Deduplicate BAM files based on the first mapping co-ordinate and the UMI attached to the read Only -I, --paired and -S parameters are implemented.

umisep_cutadapt
7b8b94eca4a55ab3e92ac3248fa19497 (CommandLineTool)
get_bam_statistics
../tools/samtools-stats.cwl (CommandLineTool)

Generates statistics for the input BAM file.

fastx_quality_stats
../tools/fastx-quality-stats.cwl (CommandLineTool)

Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension.

calculate_expression
../tools/geep.cwl (CommandLineTool)
geep

Tool calculates RPKM values grouped by isoforms or genes.

`default_output_prefix` function returns default prefix based on `bam_file` basename, if `output_prefix` is not provided.

Before running `baseCommand` `bam_file` is staged into output directory with write permissions (`\"writable\": true`). This allow to automatically generate index file at the same directory as input `bam_file`. In case when index file is provided in `secondaryFiles` of `bam_file`, it's not generated twice.

samtools_sort_index_1
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

samtools_sort_index_2
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

htseq_calculate_expression
../tools/htseq-count.cwl (CommandLineTool)
HTSeq: Analysing high-throughput sequencing data

Use minimum number of parameters. Harcoded to return gene expression (gene_id) iformation from coordinate sorted and indexed BAM file. Not strand specific.

Outputs

ID Type Label Doc
bigwig File [bigWig] BigWig file

Generated BigWig file

bowtie_log File [Textual format] Bowtie alignment log

Bowtie alignment log file

rpkm_genes File [TSV] raw reads grouped by gene name

raw reads grouped by gene name

bambai_pair File [BAM] Coordinate sorted BAM alignment file (+index BAI)

Coordinate sorted BAM file and BAI index file

star_sj_log File (Optional) [Textual format] STAR sj log

STAR SJ.out.tab

get_stat_log File (Optional) [YAML] YAML formatted combined log

YAML formatted combined log

star_out_log File (Optional) [Textual format] STAR log out

STAR Log.out

star_final_log File [Textual format] STAR final log

STAR Log.final.out

cutadapt_report File (Optional) [Textual format] Adapter trimming report from Cutadapt

Adapter trimming report from Cutadapt

rpkm_common_tss File [TSV] raw reads grouped by common TSS

raw reads grouped by common TSS

star_stdout_log File (Optional) [Textual format] STAR stdout log

STAR Log.std.out

fastx_statistics File [Textual format] FASTQ statistics

fastx_quality_stats generated FASTQ file quality statistics file

get_stat_markdown File (Optional) [TIDE TXT] Markdown formatted combined log

Markdown formatted combined log

star_progress_log File (Optional) [Textual format] STAR progress log

STAR Log.progress.out

trimgalore_report File (Optional) [Textual format] Adapter trimming report from TrimGalore. Even if it was eventually bypassed

Adapter trimming report from TrimGalore. Even if it was eventually bypassed

get_formatted_stats File (Optional) [Textual format] Bowtie, STAR and GEEP mapping stats

Processed and combined Bowtie & STAR aligner and GEEP logs

bam_statistics_report File [Textual format] BAM statistics report

BAM statistics report (right after alignment and sorting)

umi_tools_dedup_stats File[] (Optional) umi_tools dedup stats

umi_tools dedup stats

umi_tools_dedup_stderr File (Optional) [Textual format] umi_tools dedup stderr log

umi_tools dedup stderr log

umi_tools_dedup_stdout File (Optional) [Textual format] umi_tools dedup stdout log

umi_tools dedup stdout log

reads_per_gene_htseq_count File [TSV] Gene expression from htseq-count (reads per gene)

Gene expression from htseq-count (reads per gene)

Permalink: https://w3id.org/cwl/view/git/a409db2289b86779897ff19003bd351701a81c50/workflows/trim-quantseq-mrnaseq-se-strand-specific.cwl