Workflow: rnaseq-se-dutp-mitochondrial.cwl

Fetched 2023-08-06 16:14:18 GMT

RNA-Seq strand specific mitochondrial workflow for single-read experiment based on BioWardrobe's basic analysis.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

fastq_file File [FASTQ] FASTQ input file

Reads data in a FASTQ format

clip_3p_end Integer (Optional) Clip from 3p end

Number of bases to clip from the 3p end

clip_5p_end Integer (Optional) Clip from 5p end

Number of bases to clip from the 5p end

exclude_chr String (Optional) Chromosome to be excluded in rpkm calculation

Chromosome to be excluded in rpkm calculation

annotation_file File [GTF] Annotation file

GTF or TAB-separated annotation file

chrom_length_file File [Textual format] Chromosome length file

Chromosome length file

star_indices_folder Directory STAR indices folder

Path to STAR generated indices

bowtie_indices_folder Directory BowTie Ribosomal Indices

Path to Bowtie generated indices

star_indices_folder_mitochondrial Directory STAR indices mitochondrial folder

Path to STAR generated indices for mitochondrial dna

Steps

ID Runs Label Doc
get_stat
../tools/python-get-stat-rnaseq.cwl (CommandLineTool)
python-get-stat-rnaseq

Tool processes and combines log files generated by STAR/Bowtie aligners and GEEP rpkm results file.

`get_output_filename` function returns output filename equal to `output_filename` (if input is provided) or generated on the base of STAR log basename with `.stat` extension.

star_aligner
../tools/star-alignreads.cwl (CommandLineTool)

Tool runs STAR alignReads.

`default_output_name_prefix` function returns output files prefix if `outFileNamePrefix` is not set. By default prefix is equal to basename of `readFilesIn`.

extract_fastq
../tools/extract-fastq.cwl (CommandLineTool)

Tool to decompress input FASTQ file Bash script's logic: - disable case sensitive glob check - check if root name of input file already include '.fastq' or '.fq' extension. If yes, set DEFAULT_EXT to \"\" - check file type, decompress if needed - return 1, if file type is not recognized This script also works of input file doesn't have any extension at all

bowtie_aligner
../tools/bowtie-alignreads.cwl (CommandLineTool)

Tool maps input raw reads files to reference genome using Bowtie.

`default_output_filename` function returns default name for SAM output and log files. In case when `sam` and `output_filename` inputs are not set, default filename will have `.sam` extension but format may not correspond SAM specification. To set output filename manually use `output_filename` input. Default output filename is based on `output_filename` or basename of `upstream_filelist`, `downstream_filelist` or `crossbow_filelist` file (if array, the first file in array is taken). If function is called without argenments and `output_filename` input is set, it will be returned from the function.

For single-end input data any of the `upstream_filelist` or `downstream_filelist` inputs can be used.

Log filename (`log_file` output) is generated by `default_output_filename` function with ex='.bw'

`indices_folder` defines folder to contain Bowtie indices. Based on the first found file with `rev.1.ebwt` or `rev.1.ebwtl` extension, bowtie index prefix is returned from input's `valueFrom` field.

group_isoforms
../tools/group-isoforms.cwl (CommandLineTool)

Tool runs get_gene_n_tss.R script to group isoforms by gene and common TSS

rpkm_calculation
../tools/geep.cwl (CommandLineTool)
geep

Tool calculates RPKM values grouped by isoforms or genes.

`default_output_prefix` function returns default prefix based on `bam_file` basename, if `output_prefix` is not provided.

Before running `baseCommand` `bam_file` is staged into output directory with write permissions (`\"writable\": true`). This allow to automatically generate index file at the same directory as input `bam_file`. In case when index file is provided in `secondaryFiles` of `bam_file`, it's not generated twice.

fastx_quality_stats
../tools/fastx-quality-stats.cwl (CommandLineTool)

Tool calculates statistics on the base of FASTQ file quality scores. If `output_filename` is not provided call function `default_output_filename` to return default output file name generated as `input_file` basename + `.fastxstat` extension.

samtools_sort_index
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

bam_to_bigwig_upstream

Workflow converts input BAM file into bigWig and bedGraph files

bam_to_bigwig_downstream

Workflow converts input BAM file into bigWig and bedGraph files

star_aligner_mitochondrial
../tools/star-alignreads.cwl (CommandLineTool)

Tool runs STAR alignReads.

`default_output_name_prefix` function returns output files prefix if `outFileNamePrefix` is not set. By default prefix is equal to basename of `readFilesIn`.

merge_original_and_mitochondrial
../tools/samtools-merge.cwl (CommandLineTool)

samtools-merge.cwl is developed for CWL consortium Usage: samtools merge [-nurlf] [-h inh.sam] [-b <bamlist.fofn>] <out.bam> <in1.bam> [<in2.bam> ... <inN.bam>]

Options: -n Input files are sorted by read name -r Attach RG tag (inferred from file names) -u Uncompressed BAM output -f Overwrite the output BAM if exist -1 Compress level 1 -l INT Compression level, from 0 to 9 [-1] -R STR Merge file in the specified region STR [all] -h FILE Copy the header in FILE to <out.bam> [in1.bam] -c Combine @RG headers with colliding IDs [alter IDs to be distinct] -p Combine @PG headers with colliding IDs [alter IDs to be distinct] -s VALUE Override random seed -b FILE List of input BAM filenames, one per line [null] -@, --threads INT Number of BAM/CRAM compression threads [0] --input-fmt-option OPT[=VAL] Specify a single input file format option in the form of OPTION or OPTION=VALUE -O, --output-fmt FORMAT[,OPT[=VAL]]... Specify output format (SAM, BAM, CRAM) --output-fmt-option OPT[=VAL] Specify a single output file format option in the form of OPTION or OPTION=VALUE --reference FILE Reference sequence FASTA FILE [null]

samtools_sort_index_mitochondrial
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

merge_original_and_mitochondrial_index
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

Outputs

ID Type Label Doc
bowtie_log File [Textual format] Bowtie alignment log

Bowtie alignment log file

rpkm_genes File [TSV] RPKM, grouped by gene name

Calculated rpkm values, grouped by gene name

star_sj_log File (Optional) [Textual format] STAR sj log

STAR SJ.out.tab

get_stat_log File (Optional) [Textual format] Bowtie, STAR and GEEP combined log

Processed and combined Bowtie & STAR aligner and GEEP logs

star_out_log File (Optional) [Textual format] STAR log out

STAR Log.out

rpkm_isoforms File [CSV] RPKM, grouped by isoforms

Calculated rpkm values, grouped by isoforms

star_final_log File [Textual format] STAR final log

STAR Log.final.out

bigwig_upstream File [bigWig] BigWig file

Generated upstream BigWig file

rpkm_common_tss File [TSV] RPKM, grouped by common TSS

Calculated rpkm values, grouped by common TSS

star_stdout_log File (Optional) [Textual format] STAR stdout log

STAR Log.std.out

bam_merged_index File [BAM] Coordinate sorted BAM alignment file (+index BAI)

Coordinate sorted BAM file and BAI index file

fastx_statistics File [Textual format] FASTQ statistics

fastx_quality_stats generated FASTQ file quality statistics file

bigwig_downstream File [bigWig] BigWig file

Generated downstream BigWig file

star_progress_log File (Optional) [Textual format] STAR progress log

STAR Log.progress.out

Permalink: https://w3id.org/cwl/view/git/62323c137c0ce9b3f843df0dfbda28dafa7c90cf/workflows/rnaseq-se-dutp-mitochondrial.cwl