Workflow: xenbase-rnaseq-pe.cwl

Fetched 2023-01-12 05:46:33 GMT

XenBase workflow for analysing RNA-Seq paired-end data

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

fasta_file_adapters File [FASTA] Adapters FASTA file

Adapters FASTA file to be used by Trimmomatic

fastq_file_upstream File [FASTQ] FASTQ upstream input file

Upstream reads data in a FASTQ format, received after paired end sequencing

rsem_indices_folder Directory RSEM indices folder

Path to RSEM indices generated with BowTie2

bowtie_indices_folder Directory BowTie Ribosomal Indices

Path to Bowtie generated indices for ribosomal FASTA

fastq_file_downstream File [FASTQ] FASTQ downstream input file

Downstream reads data in a FASTQ format, received after paired end sequencing

Steps

ID Runs Label Doc
get_stat
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

bam_to_bigwig

Workflow converts input BAM file into bigWig and bedGraph files

trim_adapters
../tools/trimmomatic.cwl (CommandLineTool)

Tool runs trimmomatic with ILLUMINACLIP step by default.

`-basein` and `-baseout` inputs are skipped.

If set `lib_type` to `PE`, both of the inputs `fastq_file_upstream` and `fastq_file_downstream` shoul be provided.

If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `trimmomatic` and return FASTQ file[s] with trimmed adapters, alongside with the uppaired reads FASTQ files (if `lib_type` is set to `PE` and such files are present after running `trimmomatic`) If input `trigger` is set to `false`, return unchanged `fastq_file_upstream` and `fastq_file_downstream`, previously staged into output directory.

Before execution `baseCommand`, `fastq_file_upstream` and `fastq_file_downstream` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. They are mount to docker container with `ro` mode as part of `--workdir`, because all generated files will have `trimmed` suffix in their names, so the staged files will not be overwritten.

Trigger logic is implemented in a bash scripts set by default as `bash_script` input. If the first argument $0 (which is `trigger` input) is true, run `trimmomatic` with the rest of the arguments. If $0 is not true, skip `trimmomatic` and return `fastq_file_upstream` and `fastq_file_downstream` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

`default_output_name` function is used for generating output filename based on `input_file.basename` and provided extension.

get_annotation_file
../expressiontools/get-file-by-name.cwl (ExpressionTool)

Returns file the first file from input File[], that match input regex expression

get_chr_length_file
../expressiontools/get-file-by-name.cwl (ExpressionTool)

Returns file the first file from input File[], that match input regex expression

ribo_bowtie_aligner
../tools/bowtie-alignreads.cwl (CommandLineTool)

Tool maps input raw reads files to reference genome using Bowtie.

`default_output_filename` function returns default name for SAM output and log files. In case when `sam` and `output_filename` inputs are not set, default filename will have `.sam` extension but format may not correspond SAM specification. To set output filename manually use `output_filename` input. Default output filename is based on `output_filename` or basename of `upstream_filelist`, `downstream_filelist` or `crossbow_filelist` file (if array, the first file in array is taken). If function is called without argenments and `output_filename` input is set, it will be returned from the function.

For single-end input data any of the `upstream_filelist` or `downstream_filelist` inputs can be used.

Log filename (`log_file` output) is generated by `default_output_filename` function with ex='.bw'

`indices_folder` defines folder to contain Bowtie indices. Based on the first found file with `rev.1.ebwt` or `rev.1.ebwtl` extension, bowtie index prefix is returned from input's `valueFrom` field.

fastqc_stats_upstream
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

extract_fastq_upstream
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\" - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

rename_rsem_genes_file
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

fastqc_stats_downstream
../tools/fastqc.cwl (CommandLineTool)

Tool runs FastQC from Babraham Bioinformatics

rename_rsem_bambai_pair
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

extract_fastq_downstream
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\" - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

make_biowardrobe_isoforms
../tools/python-make-biowardrobe-isoforms.cwl (CommandLineTool)

Tool to generate BioWardrobe compatible isoforms file from RSEM outputs. `rsem_annotation_file` and `rsem_isoforms_file` are supposed to have identical order and number of isoforms. FPKM value is used intead of RPKM.

rename_rsem_isoforms_file
../tools/rename.cwl (CommandLineTool)

Tool renames `source_file` to `target_filename`. Input `target_filename` should be set as string. If it's a full path, only basename will be used. If BAI file is present, it will be renamed too

rsem_calculate_expression
../tools/rsem-calculate-expression.cwl (CommandLineTool)

Tool runs rsem-calculate-expression.

`reference_name` parameter for RSEM is resolved from `indices_folder` input. If `paired_end` input is not set, but both of the `upstream_read_file` and `downstream_read_file` are present, set `paired_end` automatically.

`default_output_filename` function return prefix fot output files generated by RSEM based on `upstream_read_file` or `downstream_read_file` basename, if `output_filename` input is not provided

fastx_quality_stats_upstream
../tools/fastx-quality-stats.cwl (CommandLineTool)

Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension.

fastx_quality_stats_downstream
../tools/fastx-quality-stats.cwl (CommandLineTool)

Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension.

Outputs

ID Type Label Doc
bowtie_log File [Textual format] Ribo Bowtie alignment log

Ribo Bowtie alignment log file. Mostly for debug purposes

bambai_pair File [BAM] Coordinate sorted BAM alignment file (+index BAI)

Coordinate sorted BAM file and BAI index file

bigwig_file File [bigWig] BigWig file

Generated BigWig file

get_stat_log File [Textual format] RSEM & Bowtie combined log

Mapping statistics from RSEM & Bowtie logs

rsem_genes_file File [TSV] RSEM genes expression file

RSEM genes expression file

rsem_stat_folder Directory RSEM alignment statistics

RSEM generated statistics folder. Mostly for debug purposes

rsem_isoforms_file File [TSV] RSEM isoforms expression file

RSEM isoforms expression file

biowardrobe_isoforms_file File [CSV] Biowardrobe compatible isoforms expression file

Biowardrobe compatible isoforms expression file

fastx_statistics_upstream File [Textual format] FASTQ upstream statistics

fastx_quality_stats generated upstream FASTQ quality statistics file

fastx_statistics_downstream File [Textual format] FASTQ downstream statistics

fastx_quality_stats generated downstream FASTQ quality statistics file

Permalink: https://w3id.org/cwl/view/git/cf107bc24a37883ef01b959fd89c19456aaecc02/workflows/xenbase-rnaseq-pe.cwl