Workflow: allele-process-strain.cwl

Fetched 2023-01-08 23:13:11 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

hal_file File HAL file

HAL file that includes current and reference strain information

sam_file File SAM file

SAM file with reads from both strains mapped to concatenated genome

chrom_length_file File Chromosome length file for reference genome

Chromosome length file for reference genome

output_file_prefix String Prefix for all generated output files

Corresponds to UID

current_strain_name String Current strain name

Current strain name

mapped_reads_number Integer Mapped to concatenated genome reads number

Mapped to concatenated genome reads number to calculate scaling factor

reference_strain_name String Reference strain name

Reference strain name to be projected to

Steps

ID Runs Label Doc
filter_sam
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

sort_bedgraph
../tools/linux-sort.cwl (CommandLineTool)

Tool sorts data from `unsorted_file` by key

`default_output_filename` function returns file name identical to `unsorted_file`, if `output_filename` is not provided.

bam_to_bedgraph
../tools/bedtools-genomecov.cwl (CommandLineTool)

Tool calculates genome coverage from input bam/bed/gff/vcf using `bedtools genomecov`

Depending on `input_file` extension additional prefix is used: if `*.bam` use `-ibam`, else use `-i`.

`scale` and `mapped_reads_number` inputs result in the same parameter `-scale`. If `scale` is not provided, check if `mapped_reads_number` is not null and calculate `-scale` as `1000000/mapped_reads_number`. If both inputs are null, `bedtools genomecov` will use its default scaling value.

`default_output_filename` function returns default output filename and is used when `output_filename` is not provided. Default output file extention is `.tab`. If bedGraph should be generated (check flags `inputs.depth`), extension is updated to `.bedGraph`. Default basename of the output file is generated on the base of `input_file` basename.

remove_overlaps
../tools/custom-bedops.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Based on bedops Dockerfile Default script runs sed command over the input file and exports results to the file with the same name as input's basename

The temporary solution before the bedmap.cwl will be created

project_bedgraph
../tools/halliftover.cwl (CommandLineTool)

Runs halliftover to project input BED file from source to target genome. `source_genome_name` and `target_genome_name` should correspond to the fields in `hal_file`.

If `output_filename` is not set, call `default_output_filename` function.

The following parameters are not yet supported: --outPSL --outPSLWithName

halLiftover manual doesn't say anything if `input_bed_file` should be sorted or not

bedgraph_to_bigwig
../tools/ucsc-bedgraphtobigwig.cwl (CommandLineTool)

Tool converts bedGraph to bigWig file.

`default_output_filename` function returns filename for generated bigWig if `output_filename` is not provided. Default filename is generated on the base of `bedgraph_file` basename with the updated to `*.bigWig` extension.

samtools_sort_index
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

sort_filtered_bedgraph
../tools/custom-bedops.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Based on bedops Dockerfile Default script runs sed command over the input file and exports results to the file with the same name as input's basename

The temporary solution before the bedmap.cwl will be created

filter_projected_bedgraph
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

Outputs

ID Type Label Doc
bambai_pair File BAM mapped to strain genome, not projected to reference genome

Coordinate sorted BAM file mapped to the current strain genome, not projected to reference genome

bigwig_file File Strain specific bigWig file

Generated bigWig file for the current strain, projected to reference genome

Permalink: https://w3id.org/cwl/view/git/fb355eda4555a7e7182a91ce045212b0a087d73f/subworkflows/allele-process-strain.cwl