Workflow: Motif Finding with HOMER with random background regions

Fetched 2023-01-04 15:33:33 GMT

Motif Finding with HOMER with random background regions --------------------------------------------------- HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset. It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any nucleic acids motif finding problem. Here is how we generate background for Motifs Analysis ------------------------------------- 1. Take input file with regions in a form of “chr\" “start\" “end\" 2. Sort and remove duplicates from this regions file 3. Extend each region in 20Kb into both directions 4. Merge all overlapped extended regions 5. Subtract not extended regions from the extended ones 6. Randomly distribute not extended regions within the regions that we got as a result of the previous step 7. Get fasta file from these randomly distributed regions (from the previous step). Use it as background For more information please refer to: ------------------------------------- [Official documentation](http://homer.ucsd.edu/homer/motif/)

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
threads Integer (Optional) Threads number

Number of threads for those steps that support multithreading

motifs_db Set motifs DB to check against

Set motifs DB to check against

skip_known Boolean (Optional) Skip known motif enrichment

Skip known motif enrichment

skip_denovo Boolean (Optional) Skip de novo motif enrichment

Skip de novo motif enrichment

regions_file File [BED] Regions file. Headerless BED file with minimum [chrom start end] columns. Optionally, CSV

Regions of interest. Formatted as headerless BED file with minimum [chrom start end] columns. Optionally, CSV

use_binomial Boolean (Optional) Use binomial distribution instead of hypergeometric to calculate p-values

Use binomial distribution instead of hypergeometric to calculate p-values

chrom_length_file File [Textual format] Chromosome length file

Chromosome length file

genome_fasta_file File [FASTA] Reference genome FASTA file

Reference genome FASTA file. Includes all chromosomes in a single file

Steps

ID Runs Label Doc
find_motifs
../tools/homer-find-motifs.cwl (CommandLineTool)

HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset. It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any nucleic acids motif finding problem.

Only selected parameters are implemented.

make_unique
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

bedtools_slop
../tools/bedtools-slop.cwl (CommandLineTool)

Increases the size of each feature in a feature file by a user-defined number of bases. If not using -b, then -l and -r should be used together

bedtools_sort
../tools/linux-sort.cwl (CommandLineTool)

Tool sorts data from `unsorted_file` by key

`default_output_filename` function returns file name identical to `unsorted_file`, if `output_filename` is not provided.

bedtools_merge
../tools/bedtools-merge.cwl (CommandLineTool)

Merges features from BED file. Only selected parameters are implemented.

bedtools_shuffle
../tools/bedtools-shuffle.cwl (CommandLineTool)

Randomly permutes the genomic locations of a feature file among a genome defined in a genome file. One can also provide an “exclusions” BED file that lists regions where you do not want the permuted features to be placed. Or instead “inclusions” BED fils that defines coordinates in which features in -i should be randomly placed. To make experiment reproducible, set \"seed\" option. NOTE: limited parameters are impelented.

bedtools_subtract
../tools/bedtools-subtract.cwl (CommandLineTool)

Searches for features in B that overlap A by at least 1 base pair. If an overlapping feature is found in B, the overlapping portion is removed from A and the remaining portion of A is reported. If a feature in B overlaps all of a feature in A, the A feature will not be reported. All parameters except -a and -b used by default.

bedtools_get_fasta_target
../tools/bedtools-getfasta.cwl (CommandLineTool)

Extracts sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file. Only selected parameters are implemented.

bedtools_get_fasta_background
../tools/bedtools-getfasta.cwl (CommandLineTool)

Extracts sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file. Only selected parameters are implemented.

Outputs

ID Type Label Doc
homer_stderr_log File [Textual format] Homer stderr log

Homer stderr log

homer_stdout_log File [Textual format] Homer stdout log

Homer stdout log

homer_found_motifs File Compressed file with Homer motifs

Homer motifs

homer_known_motifs File (Optional) [HTML] Known motifs

Known motifs html file

homer_denovo_motifs File (Optional) [HTML] de novo motifs

de novo motifs html file

Permalink: https://w3id.org/cwl/view/git/935a78f1aff757f977de4e3672aefead3b23606b/workflows/homer-motif-analysis.cwl