The qc command performs read-level quality control at
the BAM level and generates the input tables required for downstream
visualization with qc_visualization().
This step is implemented entirely in bash and is designed to be fast, reproducible, and independent of R. It computes sequencing depth for each CRF–CRF pair and applies a percentile-based filtering strategy to remove low-coverage pairs prior to visualization or modeling.
What this script does:
samtools idxstats to compute total mapped read
counts for each BAM file.all_read_count.tsv)
with columns:
pair: CRF–CRF pair name (derived from BAM
filename)read_count: total mapped readsfiltered_read_count.tsv.| Argument | Type | Default | Description | Example |
|---|---|---|---|---|
-i, –input
|
character | — | Directory containing input BAM files (one per CRF pair) |
-i ./bam
|
-o, –output
|
character | — | Output directory for QC tables |
-o ./qc
|
-p, –percentile
|
numeric |
0.25
|
Percentile threshold used to filter low-read-count CRF pairs |
-p 0.1
|
The command generates the following output files in the specified
out_dir:
all_read_count.tsv
| pair | read_count | |
|---|---|---|
| <chr> | <int> | |
| 1 | H3K27ac-H3K4me3 | 60573 |
| 2 | H3K4me3-H3K4me3 | 940240 |
| 3 | H3K4me1-H3K4me3 | 432292 |
| 4 | H3K4me3-H3K9me3 | 415540 |
| 5 | H3K27me3-H3K4me3 | 574643 |
| 6 | … | |
| 7 | H3K9me2-H3K9me3 | 201788 |
filtered_read_count.tsv
filtered_percentile parameter# Test Data
for d in ./bam/*/; do
sample=$(basename "$d")
multiEpiPrep qc \
-i "$d" \
-o "./qc/${sample}" \
-p 0.25
done