1. Peak Calling

The peak_calling() function identifies enriched genomic signal blocks (peaks) directly from one or more bedGraph files by scanning per-chromosome read coverage, merging contiguous non-zero segments into blocks, estimating local background using proportional flanking windows, and applying an enrichment test with BH correction.

What this function does:

This function reads aligned sequencing reads coverage tracks from bedGraph files and processes each file independently. For each chromosome, it extracts contiguous non-zero coverage intervals and merges adjacent intervals into larger “signal blocks” when the current interval starts exactly at the previous interval end.

For every block, the function calculates a block-level signal AUC (sum of coverage over the block) and estimates a local background AUC by extending upstream and downstream flanks with a length proportional to the block size (4.5× the block length on each side; bounded by chromosome edges). The background region includes the block itself plus both flanks, so the enrichment test is performed by comparing the block AUC against the total AUC in the expanded local neighborhood.

Enrichment significance is evaluated using a binomial tail probability, where the total “trials” correspond to the local background AUC and the success probability is the length fraction (block_length / bg_length). The resulting p-values are adjusted with BH to obtain q-values. Peaks are filtered by both statistical significance (qvalue_cutoff) and fold change (fc_cutoff), then exported as BED6+3 with MACS2-compatible columns (signalValue, pValue, qValue).

Parameters

Parameter Type Default Description Example
bedgraph_path character vector Vector of bedGraph file paths. Each bedGraph is processed independently (file-level parallelism), and the function returns a named list of output BED file paths. bedgraph_path = c(“C1_H3K27ac.bedGraph”, “C1_H3K4me3.bedGraph”)
out_dir character “./” Output directory to save peak BED files. Created recursively if it does not exist. out_dir = “./peaks”
ref_genome character “hg38” Reference genome build used to define standard chromosomes and chromosome lengths. Supported values: “hg38”, “mm10”. Chromosome naming style (UCSC vs NCBI) is auto-detected from the first bedGraph (presence/absence of chr prefix) and applied to the genome object. ref_genome = “mm10”
qvalue_cutoff numeric 0.05 BH-adjusted q-value cutoff for peak significance filtering. qvalue_cutoff = 0.01
fc_cutoff numeric 2 Fold-change cutoff for peak filtering. Fold change is computed as mean signal in the block divided by mean signal in the local background neighborhood: (auc / block_length) / (bg_auc / bg_length). fc_cutoff = 3

Output Files

The function generates the following output file(s) for each input BAM in the specified out_dir:

  1. Peak BED file - <pair>_peaks.bed
    • <pair> refers to the BAM filename without extension
    • The output is a BED file with a header, compatible with common peak-processing tools, and contains the following columns:
      • chrom: Chromosome name
      • chromStart: 0-based start coordinate
      • chromEnd: 1-based end coordinate (BED end; exclusive semantics are not enforced and correspond to the block end)
      • name: Peak identifier in the format chr:start-end
      • score: Integer score derived from -log10(q_value) (scaled and capped at 1000)
      • strand: .
      • signalValue: Enrichment fold change (fc)
      • pValue: -log10(p_value) (protected from underflow)
      • qValue: -log10(q_value) (protected from underflow)
chrom chromStart chromEnd name score strand signalValue pValue qValue
1 chr1 10081 10392 chr1:10081-10392 1000 . 10.0032154340836 307.652655568589 307.652655568589
2 chr1 51139 51682 chr1:51139-51682 1000 . 10.0018416206262 307.652655568589 307.652655568589
3 chr1 88003 88404 chr1:88003-88404 1000 . 10.002493765586 307.652655568589 307.652655568589
4 chr1 136877 137823 chr1:136877-137823 1000 . 10 307.652655568589 307.652655568589

Example BED output showing peak-calling results for the H3K4me1–H3K4me3 pair in sample C1 from the test dataset.

Example Usage

library(multiEpiCore)

# ===== Test Data =======
bedgraph_dirs <- list(
  "C1" = 'bedgraph/C1',
  "C2" = 'bedgraph/C2',
  "T1" = 'bedgraph/T1',
  "T2" = 'bedgraph/T2'
)

for (name in names(bedgraph_dirs)) {
  bedgraph_dir <- bedgraph_dirs[[name]]
  bedgraph_paths <- list.files(path = bedgraph_dir, pattern = "\\.bedGraph$", recursive = TRUE, full.names = TRUE)
  peak_calling(bedgraph_path = bedgraph_paths, out_dir = file.path("peak_calling", name))
}

2. Peak Annotation


3. Peak Region Read Counting


4. Peak Differential Analysis