The get_matched_control() function generates background
control regions by randomly selecting genes with similar length to the
nearest gene of each target peak, then placing control regions at the
same TSS-relative distance on the selected genes. This approach
preserves the relationship between peaks and gene structure while
avoiding biases from using the same genomic neighborhoods.
What this function does:
Identifies the nearest gene for each target genomic region
Finds candidate genes with similar length (within tolerance threshold)
Randomly selects control genes from candidates, excluding the original gene
Places control regions at identical TSS-relative distances on control genes
Validates control regions are within chromosome boundaries
Generates multiple control regions per target (configurable replicates)
Automatically relaxes length tolerance if insufficient candidates found
Returns GRanges object with all generated control regions
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
target_gr |
GRanges | Yes | GRanges object containing target peak regions for which controls will be generated | target_gr = my_peaks |
ref_genome |
character | No (default: “hg38”) | Reference genome version: “hg38” (Human GRCh38) or “mm10” (Mouse GRCm38) | ref_genome = "hg38" |
style |
character | No (default: “UCSC”) | Chromosome naming style (e.g., “UCSC” uses “chr1”, “Ensembl” uses “1”) | style = "UCSC" |
n_rep |
integer | No (default: 1) | Number of control regions to generate per target peak | n_rep = 3 |
regions |
integer | No (default: 800) | Width of control regions in base pairs (centered on calculated position) | regions = 500 |
seed |
integer | No (default: 42) | Random seed for reproducible control region selection | seed = 12345 |
length_tolerance |
numeric | No (default: 0.2) | Initial tolerance for gene length matching (0.2 = ±20%). Automatically relaxed up to ±100% if insufficient candidates | length_tolerance = 0.3 |
n_rep * 2), tolerance
increases by 10% incrementsThe function returns a GRanges object containing all successfully generated control regions.
Object type: GRanges (GenomicRanges package)
Number of regions: Up to
n_rep × length(target_gr) regions
n_rep control
regionsRegion characteristics:
regions parameterstyle parameter
(e.g., “chr1” for UCSC)library(epigenomeR)
library(GenomicRanges)
# Load target peaks
target_peaks <- rtracklayer::import("CTCF_peaks.bed")
# Basic usage - 1 control per target
control_regions <- get_matched_control(
target_gr = target_peaks,
ref_genome = "hg38"
)
# Generate multiple controls per target
control_regions <- get_matched_control(
target_gr = target_peaks,
ref_genome = "hg38",
n_rep = 5,
regions = 500
)
# Mouse genome with relaxed length tolerance
control_regions <- get_matched_control(
target_gr = target_peaks,
ref_genome = "mm10",
n_rep = 3,
length_tolerance = 0.3,
seed = 123
)
The TFBS_enrichment() function performs motif enrichment
analysis comparing a set of target genomic regions against control
regions using Fisher’s exact test to identify significantly enriched
transcription factor binding motifs.
What this function does:
Compares target genomic regions (GRanges) against background/control regions (GRanges)
Works directly with GRanges objects without file I/O for region inputs
Overlaps regions with a comprehensive motif library (stored as RDS files)
Optionally filters motif sites using functional region annotations (GRanges)
Counts motif occurrences in target vs control regions using
countOverlaps()
Performs Fisher’s exact test for each motif with pseudocount correction
Calculates odds ratios with standard errors
Adjusts p-values for multiple testing using Benjamini-Hochberg FDR
Identifies transcription factors with significantly enriched binding sites
Provides quantitative enrichment statistics for downstream interpretation
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
target_region |
GRanges | Yes | GRanges object containing genomic regions of interest | target_region = my_peaks |
control_region |
GRanges | Yes | GRanges object containing background/control regions for comparison | control_region = background_peaks |
functional_region |
GRanges or NULL | No (default: NULL) | Optional GRanges object with functional regions for filtering motif sites. Only motifs overlapping these regions are considered | functional_region = open_chromatin |
out_path |
character | No (default: “./TFBS_enrichment.tsv”) | File path to save enrichment results table (TSV format) | out_path = "./results/motif_enrichment.tsv" |
regions |
integer or NULL | No (default: NULL) | Size in base pairs to resize all regions around their center point. If NULL, regions are used as-is | regions = 500 |
ref_genome |
character | No (default: “hg38”) | Reference genome version: “hg38” or “mm10” | ref_genome = "hg38" |
style |
character or NULL | No (default: NULL) | Chromosome naming style (e.g., “UCSC”, “Ensembl”). If NULL, auto-detects from target_region | style = "UCSC" |
style is NULL, uses the style from
target_regionregions
parameterregions = NULL (default): Regions are used with
their original widthsregions is specified (e.g., 500): Both target and
control regions are resized to the specified width, centered on each
region’s midpoint.The function generates the following output file at the specified
output_path:
<output_path> (TSV format)
| feature | odds_ratio | pvalue | odds_ratio_se | FDR | target_hit | control_hit |
|---|---|---|---|---|---|---|
| (CRISPR targeting AFF4)AFF4 | 16.14696 | 2.348799e-123 | 0.1066921 | 1.327293e-122 | 194 | 768 |
| (CRISPR targeting ARID3B)ARID3B | 35.14312 | 1.320588e-76 | 0.1620585 | 3.858694e-76 | 78 | 95 |
| (CRISPR targeting ARID4B)ARID4B | 34.39295 | 7.107417e-223 | 0.1229221 | 4.257343e-221 | 308 | 1241 |
| (CRISPR targeting ATF1)ATF1 | 18.37715 | 3.968721e-74 | 0.1327106 | 1.132030e-73 | 98 | 242 |
| (CRISPR targeting ATF2)ATF2 | 18.67509 | 6.632538e-33 | 0.1978719 | 1.119124e-32 | 40 | 83 |
| (CRISPR targeting ATF3)ATF3 | 23.37675 | 9.848921e-136 | 0.1117167 | 7.023218e-135 | 174 | 444 |
library(epigenomeR)
target_region <- rtracklayer::import("target_peaks.bed")
control_region <- rtracklayer::import("control_peaks.bed")
# Basic enrichment analysis (using original region widths)
TFBS_enrichment(
target_region = target_region,
control_region = control_region,
out_path = "./TFBS_enrichment.tsv",
ref_genome = "hg38"
)
# With uniform region resizing (200bp centered windows)
TFBS_enrichment(
target_region = target_region,
control_region = control_region,
regions = 200,
out_path = "./TFBS_enrichment_200bp.tsv",
ref_genome = "hg38"
)
# With functional region filtering (e.g., open chromatin)
functional_region <- rtracklayer::import("ATAC_peaks.bed")
TFBS_enrichment(
target_region = target_region,
control_region = control_region,
regions = 500,
functional_region = functional_region,
out_path = "./TFBS_enrichment_filtered.tsv",
ref_genome = "hg38"
)
The TFBS_enrichment_heatmap() function generates
heatmaps to visualize transcription factor binding site (TFBS)
enrichment patterns across multiple genomic clusters or experimental
conditions, using results from TFBS_enrichment()
analysis.
What this function does:
Reads multiple TFBS enrichment result files from
TFBS_enrichment() output
Combines odds ratios, FDR values, and target hits across all samples/clusters
Filters TFBS based on significance (FDR ≤ 0.05), effect size (odds ratio ≥ 2), and coverage criteria
Allows filtering for specific transcription factors of interest via pattern matching
Applies log2 transformation to odds ratios for symmetric visualization
Generates a comprehensive heatmap showing all TFBS passing filtering criteria
(Optional) Creates a focused heatmap displaying the union of top N most enriched TFBS per sample
(Optional) Supports optional hierarchical clustering of samples for pattern discovery
Produces publication-ready visualizations with consistent formatting and scalable dimensions
Exports log2-transformed odds ratio and FDR matrices for downstream analysis
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
tsv_path |
character vector | Yes | Vector of file paths to TFBS enrichment result TSV files from TFBS_enrichment() analysis | tsv_path = c("cluster1.tsv", "cluster2.tsv") |
label |
character vector | Yes | Vector of sample/cluster labels corresponding to tsv_path. Must have same length as tsv_path | label = c("Cluster1", "Cluster2", "Cluster3") |
out_dir |
character | Yes | Output directory to save heatmaps and data matrices | out_dir = "./heatmaps" |
top_n |
integer | No (default: NULL) | Number of top enriched TFBS per sample to display in second heatmap. If NULL, only the full heatmap is generated | top_n = 10 |
selected_tfs |
character vector | No (default: NULL) | Vector of transcription factor names or substrings to filter rows (TFBS). Matching is case-insensitive. If NULL, all TFBS retained | selected_tfs = c("CTCF", "AP1", "STAT") |
apply_cluster |
logical | No (default: FALSE) | Whether to apply hierarchical clustering to columns and reorder them. If TRUE, columns are clustered and reversed for visualization | apply_cluster = TRUE |
TFBS_enrichment() functionodds_ratio, FDR,
target_hitapply_cluster = FALSE): Columns
ordered by sorted labelsapply_cluster = TRUE):
The function generates the following output files in the specified
out_dir:
TFBS_heatmap_all.pdf

TFBS_heatmap_top<n>.pdf (if top_n
provided)
apply_cluster = TRUE
peak_no_perturb.csv

peak_no_perturb_fdr.csv

library(epigenomeR)
# Example 1: Basic usage - generate heatmap for all filtered TFBS
TFBS_enrichment_heatmap(
tsv_path = c(
"output/TFBS_enrichment_Cluster1.tsv",
"output/TFBS_enrichment_Cluster2.tsv",
"output/TFBS_enrichment_Cluster3.tsv"
),
label = c("Cluster1", "Cluster2", "Cluster3"),
out_dir = "heatmaps/all_tfbs"
)
# Example 2: With top N selection - focus on most enriched TFBS
TFBS_enrichment_heatmap(
tsv_path = c(
"output/TFBS_enrichment_Cluster1.tsv",
"output/TFBS_enrichment_Cluster2.tsv",
"output/TFBS_enrichment_Cluster3.tsv"
),
label = c("Cluster1", "Cluster2", "Cluster3"),
out_dir = "heatmaps/top10",
top_n = 10
)
# Example 3: Filter for specific transcription factors
TFBS_enrichment_heatmap(
tsv_path = c(
"output/TFBS_enrichment_TypeA.tsv",
"output/TFBS_enrichment_TypeB.tsv",
"output/TFBS_enrichment_TypeC.tsv"
),
label = c("TypeA", "TypeB", "TypeC"),
out_dir = "heatmaps/selected_tfs",
top_n = 15,
selected_tfs = c("CTCF", "AP1", "STAT", "NRF", "FOXO")
)