The all_qc_with_plot() function performs quality control
on BAM/BED files and generates a log2-transformed peak count heatmap for
tag-tag pairs.
What this function does:
Performs quality control analysis on BAM or BED files
Calculates read counts and peak counts for all tag pairs
Filters samples based on read count percentiles
Generates a heatmap visualizing peak counts between tag pairs
Highlights filtered (non-significant) pairs in the heatmap
Supports optional tag grouping and categorical annotation
Saves QC summary files and visualization as outputs
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
file_path |
character vector | Yes | Vector of BAM or BED file paths for QC and visualization | file_path = c("sample1.bam", "sample2.bam") |
filtered_path |
character vector | No (default: NULL) | Optional vector of file paths for filtered QC matrix | filtered_path = c("filtered1.bam", "filtered2.bam") |
filtered_percentile |
numeric | No (default: 0.25) | Numeric value between 0 and 1 to filter the lowest-read-count samples | filtered_percentile = 0.25 |
output_path_dir |
character | No (default: NULL) | Directory to store output CSVs and PDF. If NULL, uses the directory of the first input file | output_path_dir = "./results" |
split_crf_by |
character | No (default: “-”) | Delimiter used to split tag pairs into individual TF names | split_crf_by = "-" |
save |
logical | No (default: TRUE) | If TRUE, writes CSV files and total reads summary | save = TRUE |
group_csv |
data frame | No (default: NULL) | Optional data frame defining tag groupings and categories for annotation | group_csv = data.frame(tag = c("YY1", "cJun"), category = c("Group1", "Group2")) |
tag_names |
character | No (default: “tag”) | Column name in group_csv for tag identifiers |
tag_names = "TF_name" |
category_names |
character | No (default: “category”) | Column name in group_csv for group categories |
category_names = "TF_family" |
target_pair_mapping_df |
data frame | No (default: NULL) | Optional data frame for renaming tag names in the heatmap matrix | target_pair_mapping_df = NULL |
The function generates the following output files in the specified
output_path_dir:
all_read_counts_bam.csv or
all_read_counts_bed.csv
save = TRUE
filtered_read_counts_bam.csv or
filtered_read_counts_bed.csv
filtered_percentile parametersave = TRUE
total_reads.txt
save = TRUE
AAA_test_qc_heatmap.pdf
group_csv provided, includes categorical annotations
with colored blocks
# bed file:
input_path <- "./dir_for_bed_files"
bed_files <- list.files(path = input_path, pattern = "\\.bed$", recursive = TRUE, full.names = TRUE)
# test csv
tags <- c("H3K4me1", "H3K4me3", "H3K9ac", "H3K9me2", "H3K9me3", "H3S10ph", "H3K14ac", "H3K27ac", "H3K27me3", "H3K36me3", "H3K79me3", "H2A_XS139ph", "POLR2AphosphoS2", "AURORA_Aurora_B", "CBP_CREBBP", "CDK8", "EHMT1", "EHMT2", "EP300", "EZH2", "MLL1_KMT2A", "MLL4_MLL2_KMT2B", "MSK1", "MSK2", "PIM1", "SETD2", "SuVar39_SUV39H1", "CTCF", "Max", "Myc", "NRF1", "USF1", "USF2", "YY1", "cFos", "cJun")
splits <- c(rep(1, 12), 2, rep(3, 14), rep(4, 9))
group_labels <- c("Histone Modification", "", "Writer", "Transcription Factor")
group <- group_labels[splits]
group_df <- data.frame(tag = tags, category = group)
library(epigenomeR)
all_qc_with_plot(file_path = bed_files, filtered_path = NULL, output_path_dir = "output/qc_heatmap", group_csv = group_df)
The function automatically detects whether input files are BAM or BED format based on file extensions.
The filtered_percentile parameter removes samples
with the lowest read counts (e.g., 0.25 removes the bottom
25%).
If filtered_path is provided, it overrides the
percentile-based filtering for determining significant pairs in the
heatmap.
The heatmap displays only the lower triangle of the matrix, with filtered pairs shown in grey.
Peak counts are log2-transformed (log2(count + 1)) for visualization.
When group_csv is provided, tags are sorted by
category and displayed with colored block annotations.
The function requires the following R packages:
ComplexHeatmap, latex2exp,
circlize, and grid.
All file paths in file_path and
filtered_path must exist, or the function will
fail.
If output_path_dir is NULL, outputs are saved to the
directory containing the first input file.