Workflow: Motif Finding with HOMER with target and background regions from peaks
Motif Finding with HOMER with target and background regions from peaks --------------------------------------------------- HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset. It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any nucleic acids motif finding problem. For more information please refer to: ------------------------------------- [Official documentation](http://homer.ucsd.edu/homer/motif/)
- Selected
- |
- Default Values
- Nested Workflows
- Tools
- Inputs/Outputs
Inputs
ID | Type | Title | Doc |
---|---|---|---|
alias | String | Experiment short name/Alias | |
threads | Integer (Optional) | Threads number |
Number of threads for those steps that support multithreading |
motifs_db | Set motifs DB to check against |
Set motifs DB to check against |
|
skip_known | Boolean (Optional) | Skip known motif enrichment |
Skip known motif enrichment |
search_size | String (Optional) | Fragment size to use for motif finding. Can be set as <#> or <#,#> |
Fragment size to use for motif finding. <#> - i.e. -size 300 will get sequences from -150 to +150 relative from center <#,#> - i.e. -size -100,50 will get sequences from -100 to +50 relative from center given - will use the exact regions you give it. Default=200 |
skip_denovo | Boolean (Optional) | Skip de novo motif enrichment |
Skip de novo motif enrichment |
motif_length | String (Optional) | Motif length(s) for de novo motif discovery |
<#>[,<#>,<#>...] - motif length. Default=8,10,12 |
regions_files_a | File[] [ENCODE narrow peak format] | Samples to select target regions from |
Narrow peak files to select target regions from |
regions_files_b | File[] [ENCODE narrow peak format] | Samples to select background regions from |
Narrow peak files to select background regions from |
genome_fasta_file | File [FASTA] | Reference genome FASTA file |
Reference genome FASTA file. Includes all chromosomes in a single file |
use_hypergeometric | Boolean (Optional) | Use hypergeometric for p-values, instead of default binomial. Usefull if the number of background sequences is smaller than target sequences |
Use hypergeometric for p-values, instead of default binomial. Usefull if the number of background sequences is smaller than target sequences |
diff_regions_file_a | File [BED] | Target regions ranges. Headerless BED file with minimum [chrom start end name dummy strand] columns. Optionally, CSV |
Target regions ranges. Headerless BED file with minimum [chrom start end unique_id dummy strand] columns. Optionally, CSV |
diff_regions_file_b | File [BED] | Background regions ranges. Headerless BED file with minimum [chrom start end name dummy strand] columns. Optionally, CSV |
Background regions ranges. Headerless BED file with minimum [chrom start end unique_id dummy strand] columns. Optionally, CSV |
apply_mask_on_genome | Boolean (Optional) | Mask all repeats with N |
Use the repeat-masked sequence (all repeats will be masked by N) |
min_signal_regions_a | String (Optional) | Min signalValue for peaks selected as target regions |
Discard all peaks from narrowPeak file of target regions with signalValue smaller than this threshold |
min_signal_regions_b | String (Optional) | Min signalValue for peaks selected as background regions |
Discard all peaks from narrowPeak file of background regions with signalValue smaller than this threshold |
chopify_background_regions | Boolean (Optional) | Chop up large background regions to the avg size of target regions |
Chop up large background regions to the avg size of target regions |
Steps
ID | Runs | Label | Doc |
---|---|---|---|
find_motifs |
../tools/homer-find-motifs-genome.cwl
(CommandLineTool)
|
HOMER contains a novel motif discovery algorithm that was designed for regulatory element analysis
in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm,
which means that it takes two sets of sequences and tries to identify the regulatory elements that
are specifically enriched in on set relative to the other. It uses ZOOPS scoring (zero or one
occurrence per sequence) coupled with the hypergeometric enrichment calculations (or binomial) to
determine motif enrichment. HOMER also tries its best to account for sequenced bias in the dataset.
It was designed with ChIP-Seq and promoter analysis in mind, but can be applied to pretty much any
nucleic acids motif finding problem. |
|
dedup_and_sort_diff_regions_a |
../tools/custom-bash.cwl
(CommandLineTool)
|
Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename |
|
dedup_and_sort_diff_regions_b |
../tools/custom-bash.cwl
(CommandLineTool)
|
Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename |
|
concat_dedup_and_sort_regions_a |
../tools/custom-bash.cwl
(CommandLineTool)
|
Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename |
|
concat_dedup_and_sort_regions_b |
../tools/custom-bash.cwl
(CommandLineTool)
|
Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename |
|
get_overlapped_with_diff_regions_a |
../tools/bedtools-intersect.cwl
(CommandLineTool)
|
Intersect features from A and B file. Only selected parameters are implemented. |
|
get_overlapped_with_diff_regions_b |
../tools/bedtools-intersect.cwl
(CommandLineTool)
|
Intersect features from A and B file. Only selected parameters are implemented. |
|
merge_overlapped_with_diff_regions_a |
../tools/bedtools-merge.cwl
(CommandLineTool)
|
Merges features from BED file. Only selected parameters are implemented. |
|
merge_overlapped_with_diff_regions_b |
../tools/bedtools-merge.cwl
(CommandLineTool)
|
Merges features from BED file. Only selected parameters are implemented. |
Outputs
ID | Type | Label | Doc |
---|---|---|---|
homer_stderr_log | File [Textual format] | Homer stderr log |
Homer stderr log |
homer_stdout_log | File [Textual format] | Homer stdout log |
Homer stdout log |
homer_found_motifs | File | Compressed file with Homer motifs |
Homer motifs |
homer_known_motifs | File (Optional) [HTML] | Known motifs |
Known motifs html file |
homer_denovo_motifs | File (Optional) [HTML] | de novo motifs |
de novo motifs html file |
https://w3id.org/cwl/view/git/9e3c3e65c19873cd1ed3cf7cc3b94ebc75ae0cc5/workflows/homer-motif-analysis-peak.cwl