CWL Workflow: 03-map-pe-blacklist-removal.cwl

Workflow: 03-map-pe-blacklist-removal.cwl

Fetched 2024-11-28 11:29:28 GMT

Verified with cwltool version 3.1.20221201130942

ATAC-seq 03 mapping - reads: PE

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: MIT License

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Doc
nthreads	Integer
picard_jar_path	String	Picard Java jar file
picard_java_opts	String (Optional)	JVM arguments should be a quoted, space separated list (e.g. \"-Xms128m -Xmx512m\")
genome_sizes_file	File	Genome sizes tab-delimited file (used in samtools)
input_fastq_read1_files	File[]	Input fastq files for paired_read1
input_fastq_read2_files	File[]	Input fastq files for paired_read2
ENCODE_blacklist_bedfile	File	Bedfile containing ENCODE consensus blacklist regions to be excluded.
genome_ref_first_index_file	File	Bowtie first index files for reference genome (e.g. *1.ebwt). The rest of the files should be in the same folder.

Steps

ID	Runs	Doc
sam2bam	../map/samtools2bam.cwl (CommandLineTool)
bowtie-pe	../map/bowtie-pe.cwl (CommandLineTool)
sort_bams	../map/samtools-sort.cwl (CommandLineTool)
preseq-c-curve	../map/preseq-c_curve.cwl (CommandLineTool)	Usage: c_curve [OPTIONS] <sorted-bed-file> Options: -o, -output yield output file (default: stdout) -s, -step step size in extrapolations (default: 1e+06) -v, -verbose print more information -P, -pe input is paired end read file -H, -hist input is a text file containing the observed histogram -V, -vals input is a text file containing only the observed counts -B, -bam input is in BAM format -l, -seg_len maximum segment length when merging paired end bam reads (default: 5000) Help options: -?, -help print this help message -about print about message
filter-unmapped	../map/samtools-filter-unmapped.cwl (CommandLineTool)
filtered2sorted	../map/samtools-sort.cwl (CommandLineTool)
sort_dedup_bams	../map/samtools-sort.cwl (CommandLineTool)
index_dedup_bams	../map/samtools-index.cwl (CommandLineTool)
remove_duplicates	../map/picard-MarkDuplicates.cwl (CommandLineTool)
dedup_bam_idxstats	../map/samtools-idxstats.cwl (CommandLineTool)
extract_basename_1	../utils/extract-basename.cwl (CommandLineTool)	Extracts the base name of a file
extract_basename_2	../utils/remove-extension.cwl (CommandLineTool)	Extracts the base name of a file
mapped_reads_count	../map/bowtie-log-read-count.cwl (CommandLineTool)	Get number of processed reads from Bowtie log.
percent_uniq_reads	../map/preseq-percent-uniq-reads.cwl (CommandLineTool)	Get number of processed reads from Bowtie log.
mapped_file_basename	../utils/extract-basename.cwl (CommandLineTool)	Extracts the base name of a file
remove_encode_blacklist	../map/bedtools-intersect.cwl (CommandLineTool)	Tool: bedtools intersect (aka intersectBed) Version: v2.25.0 Summary: Report overlaps between two feature files. Usage: bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam> Note: -b may be followed with multiple databases and/or wildcard (*) character(s). Options: -wa Write the original entry in A for each overlap. -wb Write the original entry in B for each overlap. - Useful for knowing _what_ A overlaps. Restricted by -f and -r. -loj Perform a \"left outer join\". That is, for each feature in A report each overlap with B. If no overlaps are found, report a NULL feature for B. -wo Write the original A and B entries plus the number of base pairs of overlap between the two features. - Overlaps restricted by -f and -r. Only A features with overlap are reported. -wao Write the original A and B entries plus the number of base pairs of overlap between the two features. - Overlapping features restricted by -f and -r. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. -u Write the original A entry _once_ if _any_ overlaps found in B. - In other words, just report the fact >=1 hit was found. - Overlaps restricted by -f and -r. -c For each entry in A, report the number of overlaps with B. - Reports 0 for A entries that have no overlap with B. - Overlaps restricted by -f and -r. -v Only report those entries in A that have _no overlaps_ with B. - Similar to \"grep -v\" (an homage). -ubam Write uncompressed BAM output. Default writes compressed BAM. -s Require same strandedness. That is, only report hits in B that overlap A on the _same_ strand. - By default, overlaps are reported without respect to strand. -S Require different strandedness. That is, only report hits in B that overlap A on the _opposite_ strand. - By default, overlaps are reported without respect to strand. -f Minimum overlap required as a fraction of A. - Default is 1E-9 (i.e., 1bp). - FLOAT (e.g. 0.50) -F Minimum overlap required as a fraction of B. - Default is 1E-9 (i.e., 1bp). - FLOAT (e.g. 0.50) -r Require that the fraction overlap be reciprocal for A AND B. - In other words, if -f is 0.90 and -r is used, this requires that B overlap 90 percent of A and A _also_ overlaps 90 percent of B. -e Require that the minimum fraction be satisfied for A OR B. - In other words, if -e is used with -f 0.90 and -F 0.10 this requires that either 90 percent of A is covered OR 10 percent of B is covered. Without -e, both fractions would have to be satisfied. -split Treat \"split\" BAM or BED12 entries as distinct BED intervals. -g Provide a genome file to enforce consistent chromosome sort order across input files. Only applies when used with -sorted option. -nonamecheck For sorted data, don't throw an error if the file has different naming conventions for the same chromosome. ex. \"chr1\" vs \"chr01\". -sorted Use the \"chromsweep\" algorithm for sorted (-k1,1 -k2,2n) input. -names When using multiple databases, provide an alias for each that will appear instead of a fileId when also printing the DB record. -filenames When using multiple databases, show each complete filename instead of a fileId when also printing the DB record. -sortout When using multiple databases, sort the output DB hits for each record. -bed If using BAM input, write output as BED. -header Print the header from the A file prior to results. -nobuf Disable buffered output. Using this option will cause each line of output to be printed as it is generated, rather than saved in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with other software tools and scripts that need to process one line of bedtools output at a time. -iobuf Specify amount of memory to use for input buffer. Takes an integer argument. Optional suffixes K/M/G supported. Note: currently has no effect with compressed files. Notes: (1) When a BAM file is used for the A file, the alignment is retained if overlaps exist, and exlcuded if an overlap cannot be found. If multiple overlaps exist, they are not reported, as we are only testing for one or more overlaps.
execute_pcr_bottleneck_coef	../map/pcr-bottleneck-coef.cwl (Workflow)	ChIP-seq - map - PCR Bottleneck Coefficients
mapped_filtered_reads_count	../peak_calling/samtools-extract-number-mapped-reads.cwl (CommandLineTool)	Extract mapped reads from BAM file using Samtools flagstat command
percent_mitochondrial_reads	../utils/idxstats-percentage-of-reads-in-chrom.cwl (ExpressionTool)

Outputs

ID	Type	Doc
output_pbc_files	File[]	PCR Bottleneck Coeficient files.
output_bowtie_log	File[]	Bowtie log file.
output_read_count_mapped	File[]	Read counts of the mapped BAM files
output_preseq_c_curve_files	File[]	Preseq c_curve output files.
output_percentage_uniq_reads	File[]	Percentage of uniq reads from preseq c_curve output
output_read_count_mapped_filtered	File[]	Read counts of the mapped and filtered BAM files
output_data_sorted_dedup_bam_files	File[]	BAM files without duplicate reads.
output_percent_mitochondrial_reads	File[]	Percentage of mitochondrial reads.
output_picard_mark_duplicates_files	File[]	Picard MarkDuplicates metrics files.

Permalink:

https://w3id.org/cwl/view/git/c269cecf317c699d6f3a0f44782e90914bce62b5/v1.0/ATAC-seq_pipeline/03-map-pe-blacklist-removal.cwl