Workflow: Cut-n-Run pipeline paired-end

Fetched 2023-01-09 20:02:19 GMT

Experimental pipeline for Cut-n-Run analysis. Uses mapping results from the following experiment types: - `chipseq-pe.cwl` - `trim-chipseq-pe.cwl` - `trim-atacseq-pe.cwl` Note, the upstream analyses should not have duplicates removed

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

rmdup_log File [Textual format] Remove duplicates log

Remove duplicates log file from Samtools

broad_peak Boolean (Optional) Call broad peaks

Make MACS2 call broad peaks by linking nearby highly enriched regions

bambai_pair File [BAM] ChIP-Seq paired-end experiment

Coordinate sorted filtered BAM alignment and index BAI files

genome_size String Effective genome size

The length of the mappable genome (hs, mm, ce, dm or number, for example 2.7e9)

chrom_length File [Textual format] Chromosome lengths file

Chromosome lengths file in TSV format

control_file File (Optional) [BAM] Control ChIP-Seq paired-end experiment

Indexed BAM file from the ChIP-Seq paired-end experiment to be used as a control for MACS2 peak calling

alignment_log File [Textual format] Read alignment log

Read alignment log file from Bowtie

promoter_dist Integer (Optional) Max distance from gene TSS (in both direction) overlapping which the peak will be assigned to the promoter region

Max distance from gene TSS (in both direction) overlapping which the peak will be assigned to the promoter region

upstream_dist Integer (Optional) Max distance from the promoter (only in upstream direction) overlapping which the peak will be assigned to the upstream region

Max distance from the promoter (only in upstream direction) overlapping which the peak will be assigned to the upstream region

annotation_file File [TSV] Genome annotation file

Genome annotation file in TSV format

max_fragment_size Integer Maximum fragment size

The maximum fragment size needed for read/pair inclusion

min_fragment_size Integer Minimum fragment size

The minimum fragment size needed for read/pair inclusion

Steps

ID Runs Label Doc
filter_bam
../tools/deeptools-alignmentsieve.cwl (CommandLineTool)
AlignmentSieve - utility from deepTools for BAM/CRAM file filtering

For BAM files only. Only selected parameters are implemented.

bam_to_bigwig

Workflow converts input BAM file into bigWig and bedGraph files.

Input BAM file should be sorted by coordinates (required by `bam_to_bedgraph` step).

If `split` input is not provided use true by default. Default logic is implemented in `valueFrom` field of `split` input inside `bam_to_bedgraph` step to avoid possible bug in cwltool with setting default values for workflow inputs.

`scale` has higher priority over the `mapped_reads_number`. The last one is used to calculate `-scale` parameter for `bedtools genomecov` (step `bam_to_bedgraph`) only in a case when input `scale` is not provided. All logic is implemented inside `bedtools-genomecov.cwl`.

`bigwig_filename` defines the output name only for generated bigWig file. `bedgraph_filename` defines the output name for generated bedGraph file and can influence on generated bigWig filename in case when `bigwig_filename` is not provided.

All workflow inputs and outputs don't have `format` field to avoid format incompatibility errors when workflow is used as subworkflow.

get_statistics
../tools/python-get-stat-chipseq.cwl (CommandLineTool)

Tool processes and combines log files generated by Bowtie aligner and samtools rmdup.

`get_output_filename` function returns output filename equal to `output_filename` (if this input is provided) or generated on the base of bowtie log basename with `.stat` extension.

`get_formatted_output_filename` function returns output filename equal to `formatted_output_filename` (if input is provided) or generated on the base of STAR log basename with `_stats.tsv` extension.

macs2_callpeak
../tools/macs2-callpeak.cwl (CommandLineTool)

Tool is used to perform peak calling using MACS2. Input Trigger (default: true) allows to skip all calculation and return all input files unchanged. To set files to be returned in case of Trigger == false, use the following inputs: peak_xls_file_staged: narrow_peak_file_staged: broad_peak_file_staged: gapped_peak_file_staged: peak_summits_file_staged: moder_r_file_staged: treat_pileup_bdg_file_staged: control_lambda_bdg_file_staged:

island_intersect
../tools/iaintersect.cwl (CommandLineTool)

Tool assigns each peak obtained from MACS2 to a gene and region (upstream, promoter, exon, intron, intergenic)

`default_output_filename` function returns output filename with sufix set as `ext` argument. Function is called when either `output_filename` or `log_filename` inputs are not provided.

average_tag_density
../tools/atdp.cwl (CommandLineTool)

Tool calculates average tag density profile around all annotated TSS.

`default_output_filename` function returns output filename with sufix set as `ext` argument. Function is called when either `output_filename` or `log_filename` inputs are not provided.

Before running `baseCommand`, annotaion file `annotation_filename` is staged into output directory (Docker's `--workdir`) with `\"writable\": true` (to allow to overwrite it by `refgene-sort`).

To run `atdp` index file should be provided (either in `secondaryFiles` of `input_file` or as separate input `index_file`)

`baseCommand` runs bash script from `script` input. Script runs `refgene-sort` to sort annotaion file and then runs `atdp`. `refgene-sort` - utility to sort refgene annotation files, using MySQL syntax.

Optionally (with cwltool==1.0.20171107133715), script can be simplified to #!/bin/bash set -- \"$0\" \"$@\" refgene-sort -i \"${2:4}\" -o \"${2:4}\" -s \"ORDER BY chrom, strand, CASE strand WHEN '+' THEN txStart WHEN '-' THEN txEnd END\" atdp \"$@\" because `set -- \"$1\" --a=$(basename \"${2:4}\") \"${@:3}\"` is not needed anymore.

samtools_sort_index
../tools/samtools-sort-index.cwl (CommandLineTool)

Tool to sort and index input BAM/SAM/CRAM. If input `trigger` is set to `true` or isn't set at all (`true` is used by default), run `samtools sort` and `samtools index`, return sorted BAM and BAI/CSI index file. If input `trigger` is set to `false`, return unchanged `sort_input` (BAM/SAM/CRAM) and index (BAI/CSI, if provided in `secondaryFiles`) files.

Trigger logic is implemented in two bash scripts set by default as `bash_script_sort` and `bash_script_index` inputs. For both of then, if the first argument $0 (which is `trigger` input) is true, run `samtools sort/index` with the rest of the arguments. If $0 is not true, skip `samtools sort/index` and return `sort_input` and `secondaryFiles` (if provided).

Input `trigger` is Boolean, but returns String, because of `valueFrom` field. The `valueFrom` is used, because if `trigger` is false, cwl-runner doesn't append this argument at all to the the `baseCommand` - new feature of CWL v1.0.2. Alternatively, `prefix` field could be used, but it causes changing in script logic.

If using `sort_output_filename`, the output file extension should be `*.bam`, because `samtools sort` defines the output file format on the base of the file extension. If `*.sam` is sed as output filename, it cannot be usefully indexed by `samtools index`.

`default_bam` function is used to generate output filename for `samtools sort` if input `sort_output_filename` is not set or when `trigger` is false and we need to return `sort_input` and `secondaryFiles` (if provided) files. Output filename is generated on the base of `sort_input` basename with `.bam` extension by default.

`ext` function is used to return the index file extension (BAI/CSI) based on `csi` and `bai` inputs according to the following logic `csi` && `bai` => BAI !`csi` && !`bai ` => BAI `csi` && !`bai ` => CSI

Outputs

ID Type Label Doc
bigwig File [bigWig] Genome coverage

Genome coverage in bigWig format

atdp_result File [TSV] Average Tag Density Plot

Average Tag Density Plot file in TSV format

bambai_pair File [BAM] ChIP-Seq paired-end experiment

Coordinate sorted filtered BAM alignment and index BAI files

macs2_broad_peaks File (Optional) [ENCODE broad peak format] Broad peaks

Called peaks file in ENCODE broad peak format

alignmentsieve_log File [Textual format] Alignment filtering log

Alignment filtering log from deepTool's alignmentSieve

iaintersect_result File [TSV] Gene annotated peaks

MACS2 peak file annotated with nearby genes

macs2_called_peaks File [xls] Called peaks

Called peaks file with 1-based coordinates in XLS format

macs2_narrow_peaks File (Optional) [ENCODE narrow peak format] Narrow peaks

Called peaks file in ENCODE narrow peak format

Permalink: https://w3id.org/cwl/view/git/9e3c3e65c19873cd1ed3cf7cc3b94ebc75ae0cc5/workflows/trim-chipseq-pe-cut-n-run.cwl