Workflow: trim-rnaseq-pe.cwl

Fetched 2023-07-20 01:20:31 GMT

Runs RNA-Seq BioWardrobe basic analysis with pair-end data file.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

clip_3p_end Integer (Optional) Clip from 3p end

Number of bases to clip from the 3p end

clip_5p_end Integer (Optional) Clip from 5p end

Number of bases to clip from the 5p end

exclude_chr String (Optional) Chromosome to be excluded in rpkm calculation

Chromosome to be excluded in rpkm calculation

annotation_file File [GTF] Annotation file

GTF or TAB-separated annotation file

chrom_length_file File [Textual format] Chromosome length file

Chromosome length file

fastq_file_upstream File [FASTQ] FASTQ upstream input file

Upstream reads data in a FASTQ format, received after paired end sequencing

star_indices_folder Directory STAR indices folder

Path to STAR generated indices

bowtie_indices_folder Directory BowTie Ribosomal Indices

Path to Bowtie generated indices

fastq_file_downstream File [FASTQ] FASTQ downstream input file

Downstream reads data in a FASTQ format, received after paired end sequencing

Steps

ID Runs Label Doc
get_stat
../tools/python-get-stat-rnaseq.cwl (CommandLineTool)

Tool processes and combines log files generated by STAR/Bowtie aligners and GEEP rpkm results file.

`get_output_filename` function returns output filename equal to `output_filename` (if input is provided) or generated on the base of STAR log basename with `.stat` extension.

trim_fastq
../tools/trimgalore.cwl (CommandLineTool)

Tool runs Trimgalore - the wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files.

`default_log_name` function returns names for generated log files (for both paired-end and single-end cases). `trim_galore` itself doesn't support setting custom names for output files.

For paired-end data processing both `input_file_pair` and `paired` should be set. If either of them is not set, the other one becomes unset automatically.

star_aligner
../tools/star-alignreads.cwl (CommandLineTool)

Tool runs STAR alignReads.

`default_output_name_prefix` function returns output files prefix if `outFileNamePrefix` is not set. By default prefix is equal to basename of `readFilesIn`.

bam_to_bigwig

Workflow converts input BAM file into bigWig and bedGraph files

bowtie_aligner
../tools/bowtie-alignreads.cwl (CommandLineTool)

Tool maps input raw reads files to reference genome using Bowtie.

`default_output_filename` function returns default name for SAM output and log files. In case when `sam` and `output_filename` inputs are not set, default filename will have `.sam` extension but format may not correspond SAM specification. To set output filename manually use `output_filename` input. Default output filename is based on `output_filename` or basename of `upstream_filelist`, `downstream_filelist` or `crossbow_filelist` file (if array, the first file in array is taken). If function is called without argenments and `output_filename` input is set, it will be returned from the function.

For single-end input data any of the `upstream_filelist` or `downstream_filelist` inputs can be used.

Log filename (`log_file` output) is generated by `default_output_filename` function with ex='.bw'

`indices_folder` defines folder to contain Bowtie indices. Based on the first found file with `rev.1.ebwt` or `rev.1.ebwtl` extension, bowtie index prefix is returned from input's `valueFrom` field.

group_isoforms
../tools/group-isoforms.cwl (CommandLineTool)

Tool runs get_gene_n_tss.R script to group isoforms by gene and common TSS

rename_upstream
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

rpkm_calculation
../tools/geep.cwl (CommandLineTool)

Tool calculates RPKM values grouped by isoforms or genes.

`default_output_prefix` function returns default prefix based on `bam_file` basename, if `output_prefix` is not provided.

Before running `baseCommand` `bam_file` is staged into output directory with write permissions (`\"writable\": true`). This allow to automatically generate index file at the same directory as input `bam_file`. In case when index file is provided in `secondaryFiles` of `bam_file`, it's not generated twice.

rename_downstream
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

samtools_sort_index
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

extract_fastq_upstream
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\" - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

extract_fastq_downstream
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\" - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

fastx_quality_stats_upstream
../tools/fastx-quality-stats.cwl (CommandLineTool)

Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension.

fastx_quality_stats_downstream
../tools/fastx-quality-stats.cwl (CommandLineTool)

Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension.

Outputs

ID Type Label Doc
bigwig File [bigWig] BigWig file

Generated BigWig file

bowtie_log File [Textual format] Bowtie alignment log

Bowtie alignment log file

rpkm_genes File [TSV] RPKM, grouped by gene name

Calculated rpkm values, grouped by gene name

bambai_pair File [BAM] Coordinate sorted BAM alignment file (+index BAI)

Coordinate sorted BAM file and BAI index file

star_sj_log File (Optional) [Textual format] STAR sj log

STAR SJ.out.tab

get_stat_log File (Optional) [Textual format] Bowtie, STAR and GEEP combined log

Processed and combined Bowtie & STAR aligner and GEEP logs

star_out_log File (Optional) [Textual format] STAR log out

STAR Log.out

rpkm_isoforms File [CSV] RPKM, grouped by isoforms

Calculated rpkm values, grouped by isoforms

star_final_log File [Textual format] STAR final log

STAR Log.final.out

rpkm_common_tss File [TSV] RPKM, grouped by common TSS

Calculated rpkm values, grouped by common TSS

star_stdout_log File (Optional) [Textual format] STAR stdout log

STAR Log.std.out

star_progress_log File (Optional) [Textual format] STAR progress log

STAR Log.progress.out

trim_report_upstream File TrimGalore report Upstream

TrimGalore generated log for upstream FASTQ

trim_report_downstream File TrimGalore report Downstream

TrimGalore generated log for downstream FASTQ

fastx_statistics_upstream File [Textual format] FASTQ upstream statistics

fastx_quality_stats generated upstream FASTQ quality statistics file

fastx_statistics_downstream File [Textual format] FASTQ downstream statistics

fastx_quality_stats generated downstream FASTQ quality statistics file

Permalink: https://w3id.org/cwl/view/git/e284e3f6dff25037b209895c52f2abd37a1ce1bf/workflows/trim-rnaseq-pe.cwl