1. Quality Control and Peak Count Heatmap Visualization Based on Reads

The all_qc_with_plot() function performs quality control on BAM/BED files and generates a log2-transformed peak count heatmap for tag-tag pairs.

What this function does:

  • Performs quality control analysis on BAM or BED files

  • Calculates read counts and peak counts for all tag pairs

  • Filters samples based on read count percentiles

  • Generates a heatmap visualizing peak counts between tag pairs

  • Highlights filtered (non-significant) pairs in the heatmap

  • Supports optional tag grouping and categorical annotation

  • Saves QC summary files and visualization as outputs

Parameters

Parameter Type Required Description Example
file_path character vector Yes Vector of BAM or BED file paths for QC and visualization file_path = c("sample1.bam", "sample2.bam")
filtered_path character vector No (default: NULL) Optional vector of file paths for filtered QC matrix filtered_path = c("filtered1.bam", "filtered2.bam")
filtered_percentile numeric No (default: 0.25) Numeric value between 0 and 1 to filter the lowest-read-count samples filtered_percentile = 0.25
output_path_dir character No (default: NULL) Directory to store output CSVs and PDF. If NULL, uses the directory of the first input file output_path_dir = "./results"
split_crf_by character No (default: “-”) Delimiter used to split tag pairs into individual TF names split_crf_by = "-"
save logical No (default: TRUE) If TRUE, writes CSV files and total reads summary save = TRUE
group_csv data frame No (default: NULL) Optional data frame defining tag groupings and categories for annotation group_csv = data.frame(tag = c("YY1", "cJun"), category = c("Group1", "Group2"))
tag_names character No (default: “tag”) Column name in group_csv for tag identifiers tag_names = "TF_name"
category_names character No (default: “category”) Column name in group_csv for group categories category_names = "TF_family"
target_pair_mapping_df data frame No (default: NULL) Optional data frame for renaming tag names in the heatmap matrix target_pair_mapping_df = NULL

Output Files

The function generates the following output files in the specified output_path_dir:

  1. All read counts CSV - all_read_counts_bam.csv or all_read_counts_bed.csv
    • Contains read counts for all input files
    • Two columns: file name and read count
    • Generated only if save = TRUE
    • File extension depends on input file type (BAM or BED)
  2. Filtered read counts CSV - filtered_read_counts_bam.csv or filtered_read_counts_bed.csv
    • Contains read counts for files passing the percentile filter
    • Excludes samples in the lowest percentile based on filtered_percentile parameter
    • Two columns: file name and read count
    • Generated only if save = TRUE
  3. Total reads summary - total_reads.txt
    • Text file containing total read count across all samples
    • Single numeric value
    • Generated only if save = TRUE
  4. Peak count heatmap - AAA_test_qc_heatmap.pdf
    • PDF visualization of log2-transformed peak counts for tag-tag pairs
    • Lower triangle matrix showing peak counts between transcription factor pairs
    • Filtered (non-significant) pairs displayed in grey
    • Color scale: blue (low) → white (medium) → red (high)
    • If group_csv provided, includes categorical annotations with colored blocks
    • Dimensions: 12 × 12 inches

Example Usage

# bed file:
input_path <- "./dir_for_bed_files"
bed_files <- list.files(path = input_path, pattern = "\\.bed$", recursive = TRUE, full.names = TRUE)

# test csv
tags <- c("H3K4me1", "H3K4me3", "H3K9ac", "H3K9me2", "H3K9me3", "H3S10ph", "H3K14ac", "H3K27ac", "H3K27me3", "H3K36me3", "H3K79me3", "H2A_XS139ph", "POLR2AphosphoS2", "AURORA_Aurora_B", "CBP_CREBBP", "CDK8", "EHMT1", "EHMT2", "EP300", "EZH2", "MLL1_KMT2A", "MLL4_MLL2_KMT2B", "MSK1", "MSK2", "PIM1", "SETD2", "SuVar39_SUV39H1", "CTCF", "Max", "Myc", "NRF1", "USF1", "USF2", "YY1", "cFos", "cJun")

splits <- c(rep(1, 12), 2, rep(3, 14), rep(4, 9))

group_labels <- c("Histone Modification", "", "Writer", "Transcription Factor")
group <- group_labels[splits]

group_df <- data.frame(tag = tags, category = group)

library(epigenomeR)
all_qc_with_plot(file_path = bed_files, filtered_path = NULL, output_path_dir = "output/qc_heatmap", group_csv = group_df)

Notes

  • The function automatically detects whether input files are BAM or BED format based on file extensions.

  • The filtered_percentile parameter removes samples with the lowest read counts (e.g., 0.25 removes the bottom 25%).

  • If filtered_path is provided, it overrides the percentile-based filtering for determining significant pairs in the heatmap.

  • The heatmap displays only the lower triangle of the matrix, with filtered pairs shown in grey.

  • Peak counts are log2-transformed (log2(count + 1)) for visualization.

  • When group_csv is provided, tags are sorted by category and displayed with colored block annotations.

  • The function requires the following R packages: ComplexHeatmap, latex2exp, circlize, and grid.

  • All file paths in file_path and filtered_path must exist, or the function will fail.

  • If output_path_dir is NULL, outputs are saved to the directory containing the first input file.