Motif Finding with HOMER with custom background regions

Workflow: Motif Finding with HOMER with custom background regions

Fetched 2023-01-09 01:02:06 GMT

Verified with cwltool version 3.1.20221201130942

Motif Finding with HOMER with custom background regions --------------------------------------------------- HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset. It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any nucleic acids motif finding problem. For more information please refer to: ------------------------------------- [Official documentation](http://homer.ucsd.edu/homer/motif/)

Selected
|
Default Values
Nested Workflows
Tools
Inputs/Outputs

This workflow is Open Source and may be reused according to the terms of: Apache License 2.0

Note that the tools invoked by the workflow may have separate licenses.

Inputs

ID	Type	Title	Doc
alias	String	Experiment short name/Alias
threads	Integer (Optional)	Threads number	Number of threads for those steps that support multithreading
motifs_db		Set motifs DB to check against	Set motifs DB to check against
skip_known	Boolean (Optional)	Skip known motif enrichment	Skip known motif enrichment
search_size	String (Optional)	Fragment size to use for motif finding. Can be set as <#> or <#,#>	Fragment size to use for motif finding. <#> - i.e. -size 300 will get sequences from -150 to +150 relative from center <#,#> - i.e. -size -100,50 will get sequences from -100 to +50 relative from center given - will use the exact regions you give it. Default=200
skip_denovo	Boolean (Optional)	Skip de novo motif enrichment	Skip de novo motif enrichment
motif_length	String (Optional)	Motif length(s) for de novo motif discovery	<#>[,<#>,<#>...] - motif length. Default=8,10,12
genome_fasta_file	File [FASTA]	Reference genome FASTA file	Reference genome FASTA file. Includes all chromosomes in a single file
use_hypergeometric	Boolean (Optional)	Use hypergeometric for p-values, instead of default binomial. Usefull if the number of background sequences is smaller than target sequences	Use hypergeometric for p-values, instead of default binomial. Usefull if the number of background sequences is smaller than target sequences
target_regions_file	File [BED]	Target regions. Headerless BED file with minimum [chrom start end name dummy strand] columns. Optionally, CSV	Target regions. Headerless BED file with minimum [chrom start end unique_id dummy strand] columns. Optionally, CSV
apply_mask_on_genome	Boolean (Optional)	Mask all repeats with N	Use the repeat-masked sequence (all repeats will be masked by N)
background_regions_file	File [BED]	Background regions. Headerless BED file with minimum [chrom start end name dummy strand] columns. Optionally, CSV	Background regions. Headerless BED file with minimum [chrom start end unique_id dummy strand] columns. Optionally, CSV
chopify_background_regions	Boolean (Optional)	Chop up large background regions to the avg size of target regions	Chop up large background regions to the avg size of target regions

Steps

ID	Runs	Doc
find_motifs	../tools/homer-find-motifs-genome.cwl (CommandLineTool)	HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset. It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any nucleic acids motif finding problem. Only selected parameters are implemented.
make_target_regions_unique	../tools/custom-bash.cwl (CommandLineTool)	Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename
make_background_regions_unique	../tools/custom-bash.cwl (CommandLineTool)	Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

Outputs

ID	Type	Label	Doc
homer_stderr_log	File [Textual format]	Homer stderr log	Homer stderr log
homer_stdout_log	File [Textual format]	Homer stdout log	Homer stdout log
homer_found_motifs	File	Compressed file with Homer motifs	Homer motifs
homer_known_motifs	File (Optional) [HTML]	Known motifs html file	Known motifs html file
homer_denovo_motifs	File (Optional) [HTML]	de novo motifs html file	de novo motifs html file

Permalink: https://w3id.org/cwl/view/git/564156a9e1cc7c3679a926c479ba3ae133b1bfd4/workflows/homer-motif-analysis-bg.cwl