The peak_calling() function identifies enriched genomic
signal blocks (peaks) directly from one or more bedGraph files by
scanning per-chromosome read coverage, merging contiguous non-zero
segments into blocks, estimating local background using proportional
flanking windows, and applying an enrichment test with BH
correction.
What this function does:
This function reads aligned sequencing reads coverage tracks from bedGraph files and processes each file independently. For each chromosome, it extracts contiguous non-zero coverage intervals and merges adjacent intervals into larger “signal blocks” when the current interval starts exactly at the previous interval end.
For every block, the function calculates a block-level signal AUC (sum of coverage over the block) and estimates a local background AUC by extending upstream and downstream flanks with a length proportional to the block size (4.5× the block length on each side; bounded by chromosome edges). The background region includes the block itself plus both flanks, so the enrichment test is performed by comparing the block AUC against the total AUC in the expanded local neighborhood.
Enrichment significance is evaluated using a binomial tail
probability, where the total “trials” correspond to the local background
AUC and the success probability is the length fraction
(block_length / bg_length). The resulting p-values are
adjusted with BH to obtain q-values. Peaks are filtered by both
statistical significance (qvalue_cutoff) and fold change
(fc_cutoff), then exported as BED6+3 with MACS2-compatible
columns (signalValue, pValue,
qValue).
| Parameter | Type | Default | Description | Example |
|---|---|---|---|---|
bedgraph_path
|
character vector | — | Vector of bedGraph file paths. Each bedGraph is processed independently (file-level parallelism), and the function returns a named list of output BED file paths. |
bedgraph_path = c(“C1_H3K27ac.bedGraph”,
“C1_H3K4me3.bedGraph”)
|
out_dir
|
character |
“./”
|
Output directory to save peak BED files. Created recursively if it does not exist. |
out_dir = “./peaks”
|
ref_genome
|
character |
“hg38”
|
Reference genome build used to define standard chromosomes and
chromosome lengths. Supported values: “hg38”,
“mm10”. Chromosome naming style (UCSC vs
NCBI) is auto-detected from the first bedGraph
(presence/absence of chr prefix) and applied to the genome
object.
|
ref_genome = “mm10”
|
qvalue_cutoff
|
numeric |
0.05
|
BH-adjusted q-value cutoff for peak significance filtering. |
qvalue_cutoff = 0.01
|
fc_cutoff
|
numeric |
2
|
Fold-change cutoff for peak filtering. Fold change is computed as mean
signal in the block divided by mean signal in the local background
neighborhood: (auc / block_length) / (bg_auc / bg_length).
|
fc_cutoff = 3
|
The function generates the following output file(s) for each input
BAM in the specified out_dir:
<pair>_peaks.bed
<pair> refers to the BAM filename without extension
chrom: Chromosome name
chromStart: 0-based start coordinate
chromEnd: 1-based end coordinate (BED end; exclusive
semantics are not enforced and correspond to the block end)
name: Peak identifier in the format
chr:start-end
score: Integer score derived from
-log10(q_value) (scaled and capped at 1000)
strand: .
signalValue: Enrichment fold change (fc)
pValue: -log10(p_value) (protected from
underflow)
qValue: -log10(q_value) (protected from
underflow)
| chrom | chromStart | chromEnd | name | score | strand | signalValue | pValue | qValue | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | chr1 | 10081 | 10392 | chr1:10081-10392 | 1000 | . | 10.0032154340836 | 307.652655568589 | 307.652655568589 |
| 2 | chr1 | 51139 | 51682 | chr1:51139-51682 | 1000 | . | 10.0018416206262 | 307.652655568589 | 307.652655568589 |
| 3 | chr1 | 88003 | 88404 | chr1:88003-88404 | 1000 | . | 10.002493765586 | 307.652655568589 | 307.652655568589 |
| 4 | chr1 | 136877 | 137823 | chr1:136877-137823 | 1000 | . | 10 | 307.652655568589 | 307.652655568589 |
Example BED output showing peak-calling results for the
H3K4me1–H3K4me3 pair in sample C1 from the
test dataset.
library(multiEpiCore)
# ===== Test Data =======
bedgraph_dirs <- list(
"C1" = 'bedgraph/C1',
"C2" = 'bedgraph/C2',
"T1" = 'bedgraph/T1',
"T2" = 'bedgraph/T2'
)
for (name in names(bedgraph_dirs)) {
bedgraph_dir <- bedgraph_dirs[[name]]
bedgraph_paths <- list.files(path = bedgraph_dir, pattern = "\\.bedGraph$", recursive = TRUE, full.names = TRUE)
peak_calling(bedgraph_path = bedgraph_paths, out_dir = file.path("peak_calling", name))
}