1. QC by Percentile

The qc command performs read-level quality control at the BAM level and generates the input tables required for downstream visualization with qc_visualization().

This step is implemented entirely in bash and is designed to be fast, reproducible, and independent of R. It computes sequencing depth for each CRF–CRF pair and applies a percentile-based filtering strategy to remove low-coverage pairs prior to visualization or modeling.

What this script does:

  • Iterates over all BAM files in a user-specified directory.
  • Uses samtools idxstats to compute total mapped read counts for each BAM file.
  • Writes a complete read count table (all_read_count.tsv) with columns:
    • pair: CRF–CRF pair name (derived from BAM filename)
    • read_count: total mapped reads
  • Calculates a percentile-based threshold on read counts.
  • Filters out CRF pairs below the specified percentile, producing filtered_read_count.tsv.
  • Ensures BAM index files are present before processing to avoid silent failures.

Parameters

Argument Type Default Description Example
-i, –input character Directory containing input BAM files (one per CRF pair) -i ./bam
-o, –output character Output directory for QC tables -o ./qc
-p, –percentile numeric 0.25 Percentile threshold used to filter low-read-count CRF pairs -p 0.1

Output Files

The command generates the following output files in the specified out_dir:

  1. All read counts TSV - all_read_count.tsv
    • Contains read counts for all input files
    • Two columns: file prefix and read count
pair read_count
<chr> <int>
1 H3K27ac-H3K4me3 60573
2 H3K4me3-H3K4me3 940240
3 H3K4me1-H3K4me3 432292
4 H3K4me3-H3K9me3 415540
5 H3K27me3-H3K4me3 574643
6
7 H3K9me2-H3K9me3 201788
  1. Filtered read counts CSV - filtered_read_count.tsv
    • Contains read counts for files passing the percentile filter
    • Excludes samples in the lowest percentile based on filtered_percentile parameter
    • Two columns: file name and read count

Example Usage

# Test Data
for d in ./bam/*/; do
  sample=$(basename "$d")
  multiEpiPrep qc \
    -i "$d" \
    -o "./qc/${sample}" \
    -p 0.25
done