Workflow: ATACseq.cwl

Fetched 2023-01-11 12:37:55 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
fastq1 File[]

List of fastq files containing the first mate of raw reads. Muliple files are provided if multiplexing of the same library has been done on multiple lanes. The reads comming from different fastq files are pooled after alignment. Also see parameter \"fastq2\".

fastq2 File[]

List of fastq files containing the second mate of raw reads. Important: this list has to be of same length as parameter \"fastq1\".

genome File

Path to reference genome in fasta format. Bowtie2 index files (\".1.bt2\", \".2.bt2\", ...) as well as a samtools index (\".fai\") has to be located in the same directory.\n All of these files can be downloaded for the most common genome builds at https://support.illumina.com/sequencing/sequencing_software/igenome.html. Alternatively, you can use \"bowtie2-build\" or \"samtools index\" to create them yourself.

adapter1 String (Optional)

Adapter sequence for first reads. If not specified (set to \"null\"), trim_galore will try to autodetect whether ...\n - Illumina universal adapter (AGATCGGAAGAGC)\n - Nextera adapter (CTGTCTCTTATA)\n - Illumina Small RNA 3-prime Adapter (TGGAATTCTCGG)\n ... was used.\n You can directly choose one of the above configurations by setting the string to \"illumina\", \"nextera\", or \"small_rna\". Or you specify the adaptor string manually (e.g. \"AGATCGGAAGAGC\").

adapter2 String (Optional)

Adapter sequence for second reads. If not specified (set to \"null\"), trim_galore will try to autodetect whether ...\n - Illumina universal adapter (AGATCGGAAGAGC)\n - Nextera adapter (CTGTCTCTTATA)\n - Illumina Small RNA 3-prime Adapter (TGGAATTCTCGG)\n ... was used.\n You can directly choose one of the above configurations by setting the string to \"illumina\", \"nextera\", or \"small_rna\". Or you specify the adaptor string manually (e.g. \"AGATCGGAAGAGC\").

bin_size Integer

Bin size used for generation of coverage tracks. The larger the bin size the smaller are the coverage tracks, however, the less precise is the signal. For single bp resolution set to 1.

sample_id String

Sample ID used for naming the output files.

genome_info File

Path to a tab-delimited file listing chromosome sizes in following fashion:\n \"chromosome_name<tab>total_number_of_bp\".\n For the most common UCSC genome build, you can find corresponding files at: https://github.com/CompEpigen/ATACseq_workflows/tree/master/chrom_sizes. Or you can generate them yourself using UCSC script fetchChromSizes (http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/fetchChromSizes) in following fashion:\n \"fetchChromSizes hg38 > hg38.chrom.sizes\".\n If you are dealing with a non-UCSC build, you can generate such a file from a samtools index using:\n \"awk -v OFS='\t' {'print $1,$2'} hg38.fa.fai > hg38.chrom.sizes\".

macs2_qvalue Float

Q-value cutoff used for peak calling by MACS2. The default is 0.05.

effective_genome_size Long

The effectively mappable genome size, please see: https://deeptools.readthedocs.io/en/latest/content/feature/effectiveGenomeSize.html

ignoreForNormalization String (Optional)

List of space-delimited chromosome names that shall be ignored when calculating the scaling factor. Specify as space-delimited string. Default: \"chrX chrY chrM\"

max_mapping_insert_length Long

Maximum insert length between two reads of a pair. In case of ATACseq, very long insert sizes are possible. So it is recommended to use at least a value of 1500. However, please note that alignment will take significantly longer for higher insert sizes. The default is 2500.

Steps

ID Runs Label Doc
trim_and_map
merge_duprem_filter
qc_plot_fingerprint
../tools/deeptools_plotFingerprint.cwl (CommandLineTool)
converting_bam_to_bedpe
../tools/bedtools_bamtobed_pe.cwl (CommandLineTool)
qc_phantompeakqualtools
../tools/phantompeakqualtools.cwl (CommandLineTool)
create_summary_qc_report
../tools/multiqc_hack.cwl (CommandLineTool)
peak_calling_macs2_broad
../tools/macs2_callpeak_atac.cwl (CommandLineTool)
name_sorting_filtered_bam
../tools/samtools_sort_name.cwl (CommandLineTool)

Sort a bam file by read names.

peak_calling_macs2_narrow
../tools/macs2_callpeak_atac.cwl (CommandLineTool)
generating_coverage_tracks
generating_atac_signal_tags
../tools/generate_atac_signal_tags.cwl (CommandLineTool)
plot_fragment_size_distribution
../tools/plot_frag_size_distr.cwl (CommandLineTool)

Outputs

ID Type Label Doc
bam File
bowtie2_log File[]
multiqc_zip File
multiqc_html File
raw_fastqc_zip 4aa49dc258639f3a14c4280a60b832b1[]
bam_signal_tags File[]
raw_fastqc_html af60630199b1f7d1e8010ad76ec2fcf6[]
trim_galore_log 613ac2733c4c95f1e8f09468c2c152ee[]
duprem_fastqc_zip File[]
qc_crosscorr_plot File (Optional)
bigwig_signal_tags File[]
duprem_fastqc_html File[]
fragment_sizes_tsv File
picard_markdup_log File
trimmed_fastqc_zip c408137d37ea325cd987d3fc6d18e1f2[]
filtering_stats_tsv File
frag_size_distr_tsv File
frag_size_stats_tsv File
trimmed_fastqc_html af791f86c6001d5bac127a8d12d2d7e0[]
frag_size_distr_plot File
irreg_mappings_bedpe File
qc_crosscorr_summary File (Optional)
peaks_bed_macs2_broad f9d79e4e31e4111ee0eb8373b0e37bdf[]
peaks_xls_macs2_broad File[]
duprem_flagstat_output File
merged_flagstat_output File
peaks_bed_macs2_narrow File[]
peaks_xls_macs2_narrow File
qc_plot_fingerprint_tsv File (Optional)
filtered_flagstat_output File
qc_plot_fingerprint_plot File (Optional)
qc_plot_fingerprint_stderr File
qc_phantompeakqualtools_stderr File (Optional)
Permalink: https://w3id.org/cwl/view/git/da07ef9c506ba921438df0bc9f6e1ee57b7d5910/CWL/workflows/ATACseq.cwl