Workflow: Motif Finding with HOMER with custom background regions

Fetched 2023-01-09 01:02:06 GMT

Motif Finding with HOMER with custom background regions --------------------------------------------------- HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset. It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any nucleic acids motif finding problem. For more information please refer to: ------------------------------------- [Official documentation](http://homer.ucsd.edu/homer/motif/)

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
threads Integer (Optional) Threads number

Number of threads for those steps that support multithreading

motifs_db Set motifs DB to check against

Set motifs DB to check against

skip_known Boolean (Optional) Skip known motif enrichment

Skip known motif enrichment

search_size String (Optional) Fragment size to use for motif finding. Can be set as <#> or <#,#>

Fragment size to use for motif finding. <#> - i.e. -size 300 will get sequences from -150 to +150 relative from center <#,#> - i.e. -size -100,50 will get sequences from -100 to +50 relative from center given - will use the exact regions you give it. Default=200

skip_denovo Boolean (Optional) Skip de novo motif enrichment

Skip de novo motif enrichment

motif_length String (Optional) Motif length(s) for de novo motif discovery

<#>[,<#>,<#>...] - motif length. Default=8,10,12

genome_fasta_file File [FASTA] Reference genome FASTA file

Reference genome FASTA file. Includes all chromosomes in a single file

use_hypergeometric Boolean (Optional) Use hypergeometric for p-values, instead of default binomial. Usefull if the number of background sequences is smaller than target sequences

Use hypergeometric for p-values, instead of default binomial. Usefull if the number of background sequences is smaller than target sequences

target_regions_file File [BED] Target regions. Headerless BED file with minimum [chrom start end name dummy strand] columns. Optionally, CSV

Target regions. Headerless BED file with minimum [chrom start end unique_id dummy strand] columns. Optionally, CSV

apply_mask_on_genome Boolean (Optional) Mask all repeats with N

Use the repeat-masked sequence (all repeats will be masked by N)

background_regions_file File [BED] Background regions. Headerless BED file with minimum [chrom start end name dummy strand] columns. Optionally, CSV

Background regions. Headerless BED file with minimum [chrom start end unique_id dummy strand] columns. Optionally, CSV

chopify_background_regions Boolean (Optional) Chop up large background regions to the avg size of target regions

Chop up large background regions to the avg size of target regions

Steps

ID Runs Label Doc
find_motifs
../tools/homer-find-motifs-genome.cwl (CommandLineTool)

HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset. It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any nucleic acids motif finding problem.

Only selected parameters are implemented.

make_target_regions_unique
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

make_background_regions_unique
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

Outputs

ID Type Label Doc
homer_stderr_log File [Textual format] Homer stderr log

Homer stderr log

homer_stdout_log File [Textual format] Homer stdout log

Homer stdout log

homer_found_motifs File Compressed file with Homer motifs

Homer motifs

homer_known_motifs File (Optional) [HTML] Known motifs html file

Known motifs html file

homer_denovo_motifs File (Optional) [HTML] de novo motifs html file

de novo motifs html file

Permalink: https://w3id.org/cwl/view/git/564156a9e1cc7c3679a926c479ba3ae133b1bfd4/workflows/homer-motif-analysis-bg.cwl