Workflow: Cut-n-Run pipeline paired-end

Fetched 2023-01-10 17:00:47 GMT

Experimental pipeline for Cut-n-Run analysis. Uses mapping results from the following experiment types: - `chipseq-pe.cwl` - `trim-chipseq-pe.cwl` - `trim-atacseq-pe.cwl` Note, the upstream analyses should not have duplicates removed

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

rmdup_log File [Textual format] Remove duplicates log

Remove duplicates log file from Samtools

broad_peak Boolean Call broad peaks

Make MACS2 call broad peaks by linking nearby highly enriched regions

bambai_pair File [BAM] ChIP-Seq paired-end experiment

Coordinate sorted filtered BAM alignment and index BAI files

genome_size String Effective genome size

The length of the mappable genome (hs, mm, ce, dm or number, for example 2.7e9)

chrom_length File [Textual format] Chromosome lengths file

Chromosome lengths file in TSV format

control_file File (Optional) [BAM] Control ChIP-Seq paired-end experiment

Indexed BAM file from the ChIP-Seq paired-end experiment to be used as a control for MACS2 peak calling

alignment_log File [Textual format] Read alignment log

Read alignment log file from Bowtie

annotation_file File [TSV] Genome annotation file

Genome annotation file in TSV format

max_fragment_size Integer Maximum fragment size

The maximum fragment size needed for read/pair inclusion

min_fragment_size Integer Minimum fragment size

The minimum fragment size needed for read/pair inclusion

Steps

ID Runs Label Doc
filter_bam
../tools/deeptools-alignmentsieve.cwl (CommandLineTool)
AlignmentSieve - utility from deepTools for BAM/CRAM file filtering

For BAM files only. Only selected parameters are implemented.

bam_to_bigwig

Workflow converts input BAM file into bigWig and bedGraph files.

Input BAM file should be sorted by coordinates (required by `bam_to_bedgraph` step).

If `split` input is not provided use true by default. Default logic is implemented in `valueFrom` field of `split` input inside `bam_to_bedgraph` step to avoid possible bug in cwltool with setting default values for workflow inputs.

`scale` has higher priority over the `mapped_reads_number`. The last one is used to calculate `-scale` parameter for `bedtools genomecov` (step `bam_to_bedgraph`) only in a case when input `scale` is not provided. All logic is implemented inside `bedtools-genomecov.cwl`.

`bigwig_filename` defines the output name only for generated bigWig file. `bedgraph_filename` defines the output name for generated bedGraph file and can influence on generated bigWig filename in case when `bigwig_filename` is not provided.

All workflow inputs and outputs don't have `format` field to avoid format incompatibility errors when workflow is used as subworkflow.

get_statistics
../tools/python-get-stat-chipseq.cwl (CommandLineTool)

Tool processes and combines log files generated by Bowtie aligner and samtools rmdup.

`get_output_filename` function returns output filename equal to `output_filename` (if this input is provided) or generated on the base of bowtie log basename with `.stat` extension.

`get_formatted_output_filename` function returns output filename equal to `formatted_output_filename` (if input is provided) or generated on the base of STAR log basename with `_stats.tsv` extension.

macs2_callpeak
../tools/macs2-callpeak.cwl (CommandLineTool)

Tool is used to perform peak calling using MACS2. Input Trigger (default: true) allows to skip all calculation and return all input files unchanged. To set files to be returned in case of Trigger == false, use the following inputs: peak_xls_file_staged: narrow_peak_file_staged: broad_peak_file_staged: gapped_peak_file_staged: peak_summits_file_staged: moder_r_file_staged: treat_pileup_bdg_file_staged: control_lambda_bdg_file_staged:

island_intersect
../tools/iaintersect.cwl (CommandLineTool)

Tool assigns each peak obtained from MACS2 to a gene and region (upstream, promoter, exon, intron, intergenic)

`default_output_filename` function returns output filename with sufix set as `ext` argument. Function is called when either `output_filename` or `log_filename` inputs are not provided.

average_tag_density
../tools/atdp.cwl (CommandLineTool)

Tool calculates average tag density profile around all annotated TSS.

`default_output_filename` function returns output filename with sufix set as `ext` argument. Function is called when either `output_filename` or `log_filename` inputs are not provided.

Before running `baseCommand`, annotaion file `annotation_filename` is staged into output directory (Docker's `--workdir`) with `\"writable\": true` (to allow to overwrite it by `refgene-sort`).

To run `atdp` index file should be provided (either in `secondaryFiles` of `input_file` or as separate input `index_file`)

`baseCommand` runs bash script from `script` input. Script runs `refgene-sort` to sort annotaion file and then runs `atdp`. `refgene-sort` - utility to sort refgene annotation files, using MySQL syntax.

Optionally (with cwltool==1.0.20171107133715), script can be simplified to #!/bin/bash set -- \"$0\" \"$@\" refgene-sort -i \"${2:4}\" -o \"${2:4}\" -s \"ORDER BY chrom, strand, CASE strand WHEN '+' THEN txStart WHEN '-' THEN txEnd END\" atdp \"$@\" because `set -- \"$1\" --a=$(basename \"${2:4}\") \"${@:3}\"` is not needed anymore.

samtools_sort_index
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files, previously staged into output directory.

Before execution `baseCommand`, `sort_input` and `secondaryFiles` (if provided) are staged into directory set as docker parameter `--workdir` (tool's output directory), using `InitialWorkDirRequirement`. Setting `writable: true` makes cwl-runner to make copies of the `sort_input` and `secondaryFiles` (if provided) and mount them to docker container with `rw` mode as part of `--workdir` (if set to false, the files staged into output directory will be mounted to docker container separately with `ro` mode). Because both `samtools sort` and `samtools index` can overwrite files with the same names (and in case of `samtools sort` even the input file can be overwritten), we don't need to rename any of the staged files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided) staged into output directory.

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files staged into output directory. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

Outputs

ID Type Label Doc
bigwig File [bigWig] Genome coverage

Genome coverage in bigWig format

atdp_result File [TSV] Average Tag Density Plot

Average Tag Density Plot file in TSV format

bambai_pair File [BAM] ChIP-Seq paired-end experiment

Coordinate sorted filtered BAM alignment and index BAI files

macs2_broad_peaks File (Optional) [ENCODE broad peak format] Broad peaks

Called peaks file in ENCODE broad peak format

alignmentsieve_log File [Textual format] Alignment filtering log

Alignment filtering log from deepTool's alignmentSieve

iaintersect_result File [TSV] Gene annotated peaks

MACS2 peak file annotated with nearby genes

macs2_called_peaks File [xls] Called peaks

Called peaks file with 1-based coordinates in XLS format

macs2_narrow_peaks File (Optional) [ENCODE narrow peak format] Narrow peaks

Called peaks file in ENCODE narrow peak format

Permalink: https://w3id.org/cwl/view/git/ad948b2691ef7f0f34de38f0102c3cd6f5182b29/workflows/trim-chipseq-pe-cut-n-run.cwl