The ccre_distribution() function annotates genomic
regions with repeat elements using RepeatMasker data, computes
overlap-based repeat class percentages, and generates a summary table
along with an optional visualization.
When a genomic region overlaps both gene features and cCREs, the annotation follows this priority hierarchy:
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
query_grl |
GRangesList | Yes | GRangesList containing query region sets (e.g., biclustering-derived clusters). Each element represents a sample/cluster. Can also accept a single GRanges object | query_grl = my_clusters |
out_dir |
character | No (default: “./”) | Directory to save output files (cCRE tables and plots). Directory will be created if it does not exist | out_dir = "./ccre_results" |
ref_genome |
character | No (default: “hg38”) | Reference genome version. Must be either “hg38” (Human GRCh38) or “mm10” (Mouse GRCm38) | ref_genome = "hg38" |
ref_source |
character | No (default: “knownGene”) | Gene annotation source used to define gene models and TSS coordinates. Options: “knownGene” (UCSC knownGene via TxDb.Hsapiens.UCSC.hg38.knownGene or TxDb.Mmusculus.UCSC.mm10.knownGene) or “GENCODE” (GENCODE v49 for hg38, vM35 for mm10) | ref_source = "knownGene" |
mode |
character | No (default: “nearest”) | Annotation mode for assigning feature classes to query regions. Options: “nearest” (assigns each query to closest feature by center-to-boundary distance) or “weighted” (assigns proportionally by overlap length, can contribute fractionally to multiple classes) | mode = "nearest" |
plot |
logical | No (default: TRUE) | Whether to generate a stacked barplot visualization of cCRE and gene feature distributions | plot = TRUE |
The function generates the following output files in the specified
out_dir:
ccre_annotation.tsv
ccre_annotation.pdf (if plot = TRUE)

The chromhmm_distribution() function annotates genomic
regions with chromatin states using ChromHMM data, computes
overlap-based chromatin state percentages, and generates a summary table
along with an optional visualization.
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
query_grl |
GRangesList | Yes | GRangesList containing query region sets (e.g., biclustering-derived clusters). Each element represents a sample/cluster. Can also accept a single GRanges object | query_grl = my_clusters |
out_dir |
character | No (default: “./”) | Directory to save output files (repeat tables and plots). Directory will be created if it does not exist | out_dir = "./repeat_results" |
ref_genome |
character | No (default: “hg38”) | Reference genome version. Must be either “hg38” (Human GRCh38) or “mm10” (Mouse GRCm38) | ref_genome = "hg38" |
mode |
character | No (default: “nearest”) | Annotation mode for assigning repeat classes to query regions. Options: “nearest” (assigns each query to closest repeat by center-to-boundary distance) or “weighted” (assigns proportionally by overlap length, can contribute fractionally to multiple classes) | mode = "nearest" |
plot |
logical | No (default: TRUE) | Whether to generate a stacked barplot visualization of repeat element distributions | plot = TRUE |
The function generates the following output files in the specified
out_dir:
chromhmm_annotation.tsv
chromhmm_annotation.pdf (if plot = TRUE)
The repeat_distribution() function anotates genomic
regions with chromatin states using ChromHMM data, computes
overlap-based chromatin state percentages, and generates a summary table
along with an optional visualization.
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
query_grl |
GRangesList | Yes | GRangesList containing query region sets (e.g., biclustering-derived clusters). Each element represents a sample/cluster. Can also accept a single GRanges object | query_grl = my_clusters |
out_dir |
character | No (default: “./”) | Directory to save output files (repeat tables and plots). Directory will be created if it does not exist | out_dir = "./repeat_results" |
ref_genome |
character | No (default: “hg38”) | Reference genome version. Must be either “hg38” (Human GRCh38) or “mm10” (Mouse GRCm38) | ref_genome = "hg38" |
mode |
character | No (default: “nearest”) | Annotation mode for assigning repeat classes to query regions. Options: “nearest” (assigns each query to closest repeat by center-to-boundary distance) or “weighted” (assigns proportionally by overlap length, can contribute fractionally to multiple classes) | mode = "nearest" |
plot |
logical | No (default: TRUE) | Whether to generate a stacked barplot visualization of repeat element distributions | plot = TRUE |
The function generates the following output files in the specified
out_dir:
repeat_annotation.tsv
repeat_annotation.pdf (if plot = TRUE)
The get_matched_control() function generates background
control regions by randomly selecting genes with similar length to the
nearest gene of each target peak, then placing control regions at the
same TSS-relative distance on the selected genes. This approach
preserves the relationship between peaks and gene structure while
avoiding biases from using the same genomic neighborhoods.
What this function does:
Identifies the nearest gene for each target genomic region
Finds candidate genes with similar length (within tolerance threshold)
Randomly selects control genes from candidates, excluding the original gene
Places control regions at identical TSS-relative distances on control genes
Validates control regions are within chromosome boundaries
Generates multiple control regions per target (configurable replicates)
Automatically relaxes length tolerance if insufficient candidates found
Returns GRanges object with all generated control regions
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
target_gr |
GRanges | Yes | GRanges object containing target peak regions for which controls will be generated | target_gr = my_peaks |
ref_genome |
character | No (default: “hg38”) | Reference genome version: “hg38” (Human GRCh38) or “mm10” (Mouse GRCm38) | ref_genome = "hg38" |
style |
character | No (default: “UCSC”) | Chromosome naming style (e.g., “UCSC” uses “chr1”, “Ensembl” uses “1”) | style = "UCSC" |
n_rep |
integer | No (default: 1) | Number of control regions to generate per target peak | n_rep = 3 |
regions |
integer | No (default: 800) | Width of control regions in base pairs (centered on calculated position) | regions = 500 |
seed |
integer | No (default: 42) | Random seed for reproducible control region selection | seed = 12345 |
length_tolerance |
numeric | No (default: 0.2) | Initial tolerance for gene length matching (0.2 = ±20%). Automatically relaxed up to ±100% if insufficient candidates | length_tolerance = 0.3 |
n_rep * 2), tolerance
increases by 10% incrementsThe function returns a GRanges object containing all successfully generated control regions.
Object type: GRanges (GenomicRanges package)
Number of regions: Up to
n_rep × length(target_gr) regions
n_rep control
regionsRegion characteristics:
regions parameterstyle parameter
(e.g., “chr1” for UCSC)library(epigenomeR)
library(GenomicRanges)
# Load target peaks
target_peaks <- rtracklayer::import("CTCF_peaks.bed")
# Basic usage - 1 control per target
control_regions <- get_matched_control(
target_gr = target_peaks,
ref_genome = "hg38"
)
# Generate multiple controls per target
control_regions <- get_matched_control(
target_gr = target_peaks,
ref_genome = "hg38",
n_rep = 5,
regions = 500
)
# Mouse genome with relaxed length tolerance
control_regions <- get_matched_control(
target_gr = target_peaks,
ref_genome = "mm10",
n_rep = 3,
length_tolerance = 0.3,
seed = 123
)
The calculate_TFBS_enrichment() function performs motif
enrichment analysis comparing a set of target genomic regions against
control regions using Fisher’s exact test to identify significantly
enriched transcription factor binding motifs.
What this function does:
Compares target genomic regions (GRanges) against background/control regions (GRanges)
Works directly with GRanges objects without file I/O for region inputs
Overlaps regions with a comprehensive motif library (stored as RDS files)
Optionally filters motif sites using functional region annotations (GRanges)
Counts motif occurrences in target vs control regions using
countOverlaps()
Performs Fisher’s exact test for each motif with pseudocount correction
Calculates odds ratios with standard errors
Adjusts p-values for multiple testing using Benjamini-Hochberg FDR
Identifies transcription factors with significantly enriched binding sites
Provides quantitative enrichment statistics for downstream interpretation
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
target_region |
GRanges | Yes | GRanges object containing genomic regions of interest | target_region = my_peaks |
control_region |
GRanges | Yes | GRanges object containing background/control regions for comparison | control_region = background_peaks |
functional_region |
GRanges or NULL | No (default: NULL) | Optional GRanges object with functional regions for filtering motif sites. Only motifs overlapping these regions are considered | functional_region = open_chromatin |
out_path |
character | No (default: “./TFBS_enrichment.tsv”) | File path to save enrichment results table (TSV format) | out_path = "./results/motif_enrichment.tsv" |
regions |
integer or NULL | No (default: NULL) | Size in base pairs to resize all regions around their center point. If NULL, regions are used as-is | regions = 500 |
ref_genome |
character | No (default: “hg38”) | Reference genome version: “hg38” or “mm10” | ref_genome = "hg38" |
style |
character or NULL | No (default: NULL) | Chromosome naming style (e.g., “UCSC”, “Ensembl”). If NULL, auto-detects from target_region | style = "UCSC" |
style is NULL, uses the style from
target_regionregions
parameterregions = NULL (default): Regions are used with
their original widthsregions is specified (e.g., 500): Both target and
control regions are resized to the specified width, centered on each
region’s midpoint.The function generates the following output file at the specified
output_path:
<output_path> (TSV format)
| feature | odds_ratio | pvalue | odds_ratio_se | FDR | target_hit | control_hit |
|---|---|---|---|---|---|---|
| (CRISPR targeting AFF4)AFF4 | 16.14696 | 2.348799e-123 | 0.1066921 | 1.327293e-122 | 194 | 768 |
| (CRISPR targeting ARID3B)ARID3B | 35.14312 | 1.320588e-76 | 0.1620585 | 3.858694e-76 | 78 | 95 |
| (CRISPR targeting ARID4B)ARID4B | 34.39295 | 7.107417e-223 | 0.1229221 | 4.257343e-221 | 308 | 1241 |
| (CRISPR targeting ATF1)ATF1 | 18.37715 | 3.968721e-74 | 0.1327106 | 1.132030e-73 | 98 | 242 |
| (CRISPR targeting ATF2)ATF2 | 18.67509 | 6.632538e-33 | 0.1978719 | 1.119124e-32 | 40 | 83 |
| (CRISPR targeting ATF3)ATF3 | 23.37675 | 9.848921e-136 | 0.1117167 | 7.023218e-135 | 174 | 444 |
library(epigenomeR)
target_region <- rtracklayer::import("target_peaks.bed")
control_region <- rtracklayer::import("control_peaks.bed")
# Basic enrichment analysis (using original region widths)
calculate_TFBS_enrichment(
target_region = target_region,
control_region = control_region,
out_path = "./TFBS_enrichment.tsv",
ref_genome = "hg38"
)
# With uniform region resizing (200bp centered windows)
calculate_TFBS_enrichment(
target_region = target_region,
control_region = control_region,
regions = 200,
out_path = "./TFBS_enrichment_200bp.tsv",
ref_genome = "hg38"
)
# With functional region filtering (e.g., open chromatin)
functional_region <- rtracklayer::import("ATAC_peaks.bed")
calculate_TFBS_enrichment(
target_region = target_region,
control_region = control_region,
regions = 500,
functional_region = functional_region,
out_path = "./TFBS_enrichment_filtered.tsv",
ref_genome = "hg38"
)
The TFBS_enrichment_heatmap() function generates
heatmaps to visualize transcription factor binding site (TFBS)
enrichment patterns across multiple genomic clusters or experimental
conditions, using results from TFBS_enrichment()
analysis.
What this function does:
Reads multiple TFBS enrichment result files from
TFBS_enrichment() output
Combines odds ratios, FDR values, and target hits across all samples/clusters
Filters TFBS based on significance (FDR ≤ 0.05), effect size (odds ratio ≥ 2), and coverage criteria
Allows filtering for specific transcription factors of interest via pattern matching
Applies log2 transformation to odds ratios for symmetric visualization
Generates a comprehensive heatmap showing all TFBS passing filtering criteria
(Optional) Creates a focused heatmap displaying the union of top N most enriched TFBS per sample
(Optional) Supports optional hierarchical clustering of samples for pattern discovery
Produces publication-ready visualizations with consistent formatting and scalable dimensions
Exports log2-transformed odds ratio and FDR matrices for downstream analysis
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
tsv_path |
character vector | Yes | Vector of file paths to TFBS enrichment result TSV files from TFBS_enrichment() analysis | tsv_path = c("cluster1.tsv", "cluster2.tsv") |
label |
character vector | Yes | Vector of sample/cluster labels corresponding to tsv_path. Must have same length as tsv_path | label = c("Cluster1", "Cluster2", "Cluster3") |
out_dir |
character | Yes | Output directory to save heatmaps and data matrices | out_dir = "./heatmaps" |
top_n |
integer | No (default: 20) | Number of top enriched TFBS per sample to display in second heatmap. If NULL, only the full heatmap is generated | top_n = 50 |
selected_tfs |
character vector | No (default: NULL) | Vector of transcription factor names or substrings to filter rows (TFBS). Matching is case-insensitive. If NULL, all TFBS retained | selected_tfs = c("CTCF", "AP1", "STAT") |
apply_cluster |
logical | No (default: FALSE) | Whether to apply hierarchical clustering to columns and reorder them. If TRUE, columns are clustered and reversed for visualization | apply_cluster = TRUE |
TFBS_enrichment() functionodds_ratio, FDR,
target_hitapply_cluster = FALSE): Columns
ordered by sorted labelsapply_cluster = TRUE):
The function generates the following output files in the specified
out_dir:
TFBS_heatmap_all.pdf

TFBS_heatmap_top<n>.pdf (if top_n
provided)
apply_cluster = TRUE
TFBS_odds_ratio_log2.csv

TFBS_FDR.csv

library(epigenomeR)
# Example 1: Basic usage - generate heatmap for all filtered TFBS
TFBS_enrichment_heatmap(
tsv_path = c(
"output/TFBS_enrichment_Cluster1.tsv",
"output/TFBS_enrichment_Cluster2.tsv",
"output/TFBS_enrichment_Cluster3.tsv"
),
label = c("Cluster1", "Cluster2", "Cluster3"),
out_dir = "heatmaps/all_tfbs"
)
# Example 2: With top N selection - focus on most enriched TFBS
TFBS_enrichment_heatmap(
tsv_path = c(
"output/TFBS_enrichment_Cluster1.tsv",
"output/TFBS_enrichment_Cluster2.tsv",
"output/TFBS_enrichment_Cluster3.tsv"
),
label = c("Cluster1", "Cluster2", "Cluster3"),
out_dir = "heatmaps/top10",
top_n = 10
)
# Example 3: Filter for specific transcription factors
TFBS_enrichment_heatmap(
tsv_path = c(
"output/TFBS_enrichment_TypeA.tsv",
"output/TFBS_enrichment_TypeB.tsv",
"output/TFBS_enrichment_TypeC.tsv"
),
label = c("TypeA", "TypeB", "TypeC"),
out_dir = "heatmaps/selected_tfs",
top_n = 15,
selected_tfs = c("CTCF", "AP1", "STAT", "NRF", "FOXO")
)