The build_count_matrix() function builds a
fragment-overlap count matrix from paired-end BAM files over
user-specified genomic regions.
bam_path: Vector of BAM file
paths bam_path <- c("../sample1.bam", "../sample2.bam", "../sample3.bam")
regions: Genomic regions to evaluate,
either:
regions <- 800
regions <- "peaks.bed"save_dir: Output directory save_dir <- "output/count_matrices"
| Parameter | Default | Description |
|---|---|---|
ref_genome
|
“hg38” | Reference genome |
sample_name
|
NULL | Optional prefix added to the output filename. If provided, the final Feather file will include this value at the beginning of the filename. |
do_qc
|
FALSE | Perform quality control |
qc_percent
|
0.25 | QC filtering threshold (filter bottom 25%) |
force_chr_coord
|
FALSE |
By default, if the provided regions file contains a gene identifier
column (for example, gene_id in gff/tsv files or the 4th
“name” column in BED files), this column will be used as the row name
instead of the “CHR_start_end” coordinate string. Setting force_chr_coord = TRUE overrides this behavior and forces
the use of genomic coordinates as row names.
|
The generated count matrix is saved as a Feather file following these conventions:
Fixed-size bins (numeric
regions):
Count_Matrix_<BINSIZE>.feather (e.g.,
Count_Matrix_5000.feather)
Custom regions file (character
regions):
Count_Matrix_<prefix>.feather, where
<prefix> is the basename of the region file (without
extension). (e.g., "peaks.bed" →
Count_Matrix_peaks.feather)
When sample_name is provided:
<sample_name>_<Count_Matrix_xxx>.feather (e.g.,
C1_Count_Matrix_peaks.feather)
library(arrow)
count_matrix <- read_feather("output/Count_Matrix_800.feather")
head(count_matrix)
| pos | H3K27ac-H3K4me3 | H3K4me3-H3K4me3 | H3K4me1-H3K4me3 | H3K4me3-H3K9me3 | |
|---|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | chr1_1_800 | 0 | 0 | 0 | 0 |
| 2 | chr1_801_1600 | 0 | 0 | 0 | 0 |
| 3 | chr1_1601_2400 | 0 | 0 | 0 | 0 |
| 4 | chr1_2401_3200 | 0 | 0 | 0 | 0 |
| 5 | chr1_3201_4000 | 0 | 0 | 0 | 0 |
| 6 | chr1_4001_4800 | 0 | 0 | 0 | 0 |
# Generate count matrix with 800b bins
build_count_matrix(
bam_path = bam_files,
regions = 800,
save_dir = "output/5kb_bins"
)
# Use peak regions from BED file
build_count_matrix(
bam_path = bam_files,
regions = "peaks/merged_peaks.bed",
save_dir = "output/peak_counts"
)
# Enable QC filtering
build_count_matrix(
bam_path = bam_files,
regions = 800,
save_dir = "output/5kb_qc",
do_qc = TRUE,
qc_filtered_percentile = 0.25
)
The apply_transformations() function performs a
sequential series of normalization and transformation steps on a count
matrix and outputs the processed result.
cm_path: Path to the input count
matrix (.feather) The input file must be a valid .feather file
containing a positional column (pos) or any first column
that uniquely identifies genomic intervals. All remaining columns must
be numeric.cm_path <- "input/sample_counts.feather"
save_dir: Output directory save_dir <- "output/count_matrices"
| Parameter | Default | Description |
|---|---|---|
|
c(“remove0”, “libnorm”, “log2p1”, “qnorm”) | Vector specifying the transformation pipeline. Steps are executed in order, and the behavior depends on the sequence you provide (see the table below for details). |
|
FALSE |
If TRUE, writes a .feather file after each step, allowing
inspection of intermediate matrices.
|
The order of operations in transformations directly
determines the output. Below is the full list of supported steps:
| Transformation | Description |
|---|---|
remove0 |
Remove rows where all samples have zero counts |
libnorm |
Library size normalization (CPM, counts per million) |
log2p1 |
Log2 transformation: log2(x + 1) |
qnorm |
Quantile normalization across samples |
minmaxnorm |
Scale values to [0, 1] range |
sqrt |
Square root transformation |
After processing all selected transformations, a final Feather file named:
<original_name>_transformed.feather
If set save_each_step = TRUE, the output path will
be:
save_dir/
├── [count_matrix_prefix]_transformed.feather
├── [count_matrix_prefix]_remove0.feather
├── [count_matrix_prefix]_remove0_libnorm.feather
├── [count_matrix_prefix]_remove0_libnorm_log2p1.feather
└── [count_matrix_prefix]_remove0_libnorm_log2p1_qnorm.feather
apply_transformations(
cm_path = "Count_Matrix_800.feather",
transformations = c("remove0", "libnorm", "log2p1", "qnorm"),
save_dir = "./",
save_each_step = TRUE
)