1. Peak Calling

The peak_calling() function identifies enriched genomic signal blocks (peaks) directly from one or more bedGraph files by scanning per-chromosome read coverage, merging contiguous non-zero segments into blocks, estimating local background using proportional flanking windows, and applying an enrichment test with BH correction.

What this function does:

This function reads aligned sequencing reads coverage tracks from bedGraph files and processes each file independently. For each chromosome, it extracts contiguous non-zero coverage intervals and merges adjacent intervals into larger “signal blocks” when the current interval starts exactly at the previous interval end.

For every block, the function calculates a block-level signal AUC (sum of coverage over the block) and estimates a local background AUC by extending upstream and downstream flanks with a length proportional to the block size (4.5× the block length on each side; bounded by chromosome edges). The background region includes the block itself plus both flanks, so the enrichment test is performed by comparing the block AUC against the total AUC in the expanded local neighborhood.

Enrichment significance is evaluated using a binomial tail probability, where the total “trials” correspond to the local background AUC and the success probability is the length fraction (block_length / bg_length). The resulting p-values are adjusted with BH to obtain q-values. Peaks are filtered by both statistical significance (qvalue_cutoff) and fold change (fc_cutoff), then exported as BED6+3 with MACS2-compatible columns (signalValue, pValue, qValue).

Parameters

Parameter	Type	Default	Description	Example
`bedgraph_path`	character vector	—	Vector of bedGraph file paths. Each bedGraph is processed independently (file-level parallelism), and the function returns a named list of output BED file paths.	`bedgraph_path = c(“C1_H3K27ac.bedGraph”, “C1_H3K4me3.bedGraph”)`
`out_dir`	character	`“./”`	Output directory to save peak BED files. Created recursively if it does not exist.	`out_dir = “./peaks”`
`ref_genome`	character	`“hg38”`	Reference genome build used to define standard chromosomes and chromosome lengths. Supported values: `“hg38”`, `“mm10”`. Chromosome naming style (`UCSC` vs `NCBI`) is auto-detected from the first bedGraph (presence/absence of `chr` prefix) and applied to the genome object.	`ref_genome = “mm10”`
`qvalue_cutoff`	numeric	`0.05`	BH-adjusted q-value cutoff for peak significance filtering.	`qvalue_cutoff = 0.01`
`fc_cutoff`	numeric	`2`	Fold-change cutoff for peak filtering. Fold change is computed as mean signal in the block divided by mean signal in the local background neighborhood: `(auc / block_length) / (bg_auc / bg_length)`.	`fc_cutoff = 3`

Output Files

The function generates the following output file(s) for each input BAM in the specified out_dir:

Peak BED file - <pair>_peaks.bed
- <pair> refers to the BAM filename without extension
- The output is a BED file with a header, compatible with common peak-processing tools, and contains the following columns:
  - chrom: Chromosome name
  - chromStart: 0-based start coordinate
  - chromEnd: 1-based end coordinate (BED end; exclusive semantics are not enforced and correspond to the block end)
  - name: Peak identifier in the format chr:start-end
  - score: Integer score derived from -log10(q_value) (scaled and capped at 1000)
  - strand: .
  - signalValue: Enrichment fold change (fc)
  - pValue: -log10(p_value) (protected from underflow)
  - qValue: -log10(q_value) (protected from underflow)

	chrom	chromStart	chromEnd	name	score	strand	signalValue	pValue	qValue
1	chr1	10081	10392	chr1:10081-10392	1000	.	10.0032154340836	307.652655568589	307.652655568589
2	chr1	51139	51682	chr1:51139-51682	1000	.	10.0018416206262	307.652655568589	307.652655568589
3	chr1	88003	88404	chr1:88003-88404	1000	.	10.002493765586	307.652655568589	307.652655568589
4	chr1	136877	137823	chr1:136877-137823	1000	.	10	307.652655568589	307.652655568589

Example BED output showing peak-calling results for the H3K4me1–H3K4me3 pair in sample C1 from the test dataset.

Example Usage

library(multiEpiCore)

# ===== Test Data =======
bedgraph_dirs <- list(
  "C1" = 'bedgraph/C1',
  "C2" = 'bedgraph/C2',
  "T1" = 'bedgraph/T1',
  "T2" = 'bedgraph/T2'
)

for (name in names(bedgraph_dirs)) {
  bedgraph_dir <- bedgraph_dirs[[name]]
  bedgraph_paths <- list.files(path = bedgraph_dir, pattern = "\\.bedGraph$", recursive = TRUE, full.names = TRUE)
  peak_calling(bedgraph_path = bedgraph_paths, out_dir = file.path("peak_calling", name))
}

2. Peak Annotation

3. Peak Region Read Counting

4. Peak Differential Analysis