Workflow: allele-vcf-alignreads-se-pe.cwl

Fetched 2023-01-04 15:34:57 GMT

Workflow maps FASTQ files from `fastq_files` input into reference genome `reference_star_indices_folder` and insilico generated `insilico_star_indices_folder` genome (concatenated genome for both `strain1` and `strain2` strains). For both genomes STAR is run with `outFilterMultimapNmax` parameter set to 1 to discard all of the multimapped reads. For insilico genome SAM file is generated. Then it's splitted into two SAM files based on strain names and then sorted by coordinates into the BAM format. For reference genome output BAM file from STAR slignment is also coordinate sorted.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
strain1 String I strain name

First strain name

strain2 String II strain name

Second strain name

threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

fastq_files File[] Input FASTQ file(s)

Input FASTQ file or array of files

strain1_chain_file File I strain chain file

Chain file to project strain I to reference genome

strain2_chain_file File II strain chain file

Chain file to project strain II to reference genome

reference_chrom_length_file File Chromosome length file for reference genome

Chromosome length file for reference genome

insilico_star_indices_folder Directory STAR indices folder for insilico genome

Path to STAR generated indices folder for insilico genome

reference_star_indices_folder Directory STAR indices folder for reference genome

Path to STAR generated indices folder for reference genome

Steps

ID Runs Label Doc
strain1_project
../tools/crossmap.cwl (CommandLineTool)

Runs CrossMap.py script to project input BAM, BED, BIGWIG file based on input chain file. Not supported input file types: SAM, GFF, VCF, WIG

If `output_basename` is not set, call get_output_filename() and get_log_filename() functions to get default output and log filenames. Input `output_basename` should not include extension.

strain2_project
../tools/crossmap.cwl (CommandLineTool)

Runs CrossMap.py script to project input BAM, BED, BIGWIG file based on input chain file. Not supported input file types: SAM, GFF, VCF, WIG

If `output_basename` is not set, call get_output_filename() and get_log_filename() functions to get default output and log filenames. Input `output_basename` should not include extension.

strain1_sam_filter
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

strain2_sam_filter
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

insilico_star_aligner
../tools/star-alignreads.cwl (CommandLineTool)

Tool runs STAR alignReads.

`default_output_name_prefix` function returns output files prefix if `outFileNamePrefix` is not set. By default prefix is equal to basename of `readFilesIn`.

strain1_bam_to_bigwig

Workflow converts input BAM file into bigWig and bedGraph files

strain2_bam_to_bigwig

Workflow converts input BAM file into bigWig and bedGraph files

reference_star_aligner
../tools/star-alignreads.cwl (CommandLineTool)

Tool runs STAR alignReads.

`default_output_name_prefix` function returns output files prefix if `outFileNamePrefix` is not set. By default prefix is equal to basename of `readFilesIn`.

reference_bam_to_bigwig

Workflow converts input BAM file into bigWig and bedGraph files

reference_samtools_sort
../tools/samtools-sort.cwl (CommandLineTool)

Tool to sort BAM/SAM file (set as input `sort_input`). If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort`, return newly generated sorted BAM/SAM/CRAM file (the actual format of the output file depends on `out_format` value and extension of a filename set in `sort_output_filename`). If input `trigger` is set to `false`, return unchanged BAM/SAM file, previously staged into output directory.

Before `baseCommand` is executed, input BAM/SAM file is staged into output directory (docker parameter `--workdir`), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to copy input BAM/SAM file and mount it to docker container with `rw` mode as part of `--workdir` (if set to false, the file staged into output directory will be mounted to docker container separately with `ro` mode). Because `samtools sort` can overwrite input BAM/SAM file and save output with the same name, we don't need to rename it (as we did for samtools rmdup).

Trigger logic is implemented in bash script set by default in input `bash_script`. If first argment $0 (which is `trigger` input) is true, run `samtools sort` with the rest of the arguments. If $0 is not true, skip `samtools sort` and return input BAM/SAM file, previously staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing logic in bash script saved in `bash_script` input.

`default_output_name` function is used for generating output filename if input `sort_output_filename` is not set or in case when `trigger` is false and we need to return original BAM/SAM file staged into output directory.

`ext` function returns output filename extension on the base of `out_format` value. If input `out_format` is not set, use `.bam` by default. If input `trigger` is false, `out_format` is ignored.

If `trigger` is set to true (or true is used by default), but both `sort_output_filename` and `out_format` are not set, the output file format will be BAM by default.

If `trigger` is set to true (or true is used by default) and `sort_output_filename` is set to some value, but `out_format` is not set, the actual format of output file will be defined on the filename extension set in `sort_output_filename` (if filename doesn't have any extension, BAM is used by default). When `out_format` is also set, it overwrites extension generated on the base of `sort_output_filename` value.

strain1_samtools_sort_index
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

strain2_samtools_sort_index
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

Outputs

ID Type Label Doc
strain1_bigwig File I strain bigWig file

Generated bigWig file for the first strain, projected to reference genome

strain2_bigwig File II strain bigWig file

Generated bigWig file for the second strain, projected to reference genome

reference_bigwig File Reference bigWig file

Generated BigWig file for the reference genome

strain1_bambai_pair File I strain output BAM

Coordinate sorted BAM file mapped to the first strain genome, projected to reference genome

strain2_bambai_pair File II strain output BAM

Coordinate sorted BAM file mapped to the second strain genome, projected to reference genome

insilico_star_out_log File (Optional) STAR log out for insilico genome

STAR Log.out for insilico genome

reference_bambai_pair File Reference output BAM

Coordinate sorted BAM file mapped to reference genome

reference_star_out_log File (Optional) STAR log out for reference genome

STAR Log.out for reference genome

insilico_star_final_log File STAR final log for insilico genome

STAR Log.final.out for insilico genome

insilico_star_stdout_log File (Optional) STAR stdout log for insilico genome

STAR Log.std.out for insilico genome

reference_star_final_log File STAR final log for reference genome

STAR Log.final.out for reference genome

reference_star_stdout_log File (Optional) STAR stdout log for reference genome

STAR Log.std.out for reference genome

insilico_star_progress_log File (Optional) STAR progress log for insilico genome

STAR Log.progress.out for insilico genome

reference_star_progress_log File (Optional) STAR progress log for reference genome

STAR Log.progress.out for reference genome

Permalink: https://w3id.org/cwl/view/git/9bf0aa495735f8081bb5870cb32fc898b9e6eb22/subworkflows/allele-vcf-alignreads-se-pe.cwl