1. QC by Percentile

The qc command performs read-level quality control at the BAM level and generates the input tables required for downstream visualization with qc_visualization().

This step is implemented entirely in bash and is designed to be fast, reproducible, and independent of R. It computes sequencing depth for each CRF-CRF pair and applies a percentile-based filtering strategy to remove low-coverage pairs prior to visualization or modeling.

What this script does:

Iterates over all BAM files in a user-specified directory.
Uses samtools idxstats to compute total mapped read counts for each BAM file.
Writes a complete read count table (all_qc.tsv) with columns:
- pair: CRF-CRF pair name (derived from BAM filename)
- read_count: total mapped reads
Calculates a percentile-based threshold on read counts.
Filters out CRF pairs below the specified percentile, producing filtered_qc.tsv.
Ensures BAM index files are present before processing to avoid silent failures.

Parameters

Argument	Type	Default	Description	Example
`-i, –input`	character	—	Directory containing input BAM files (one per CRF pair)	`-i ./bam`
`-o, –output`	character	—	Output directory for QC tables	`-o ./qc`
`-p, –percentile`	numeric	`0.25`	Percentile threshold used to filter low-read-count CRF pairs	`-p 0.1`
`-e, –exclude`	character	`unknown,IgG_control`	Comma-separated list of CRF names to exclude. Exclusion is performed before read counting. pair mode: split `CRF1-CRF2.bam` by `-`; exclude if either CRF exactly matches any item. single mode: exclude only when the BAM basename exactly matches an item. mixed mode: exclude when any item appears as a substring in the filename (can be overly broad). Passing `-e` with no value disables exclusions.	`-e unknown`

Output Files

The command generates the following output files in the specified out_dir:

All read counts TSV - all_qc.tsv
- Contains read counts for all input files
- Two columns: file prefix and read count

	pair	read_count
	<chr>	<int>
1	H3K27me3-H3K4me1	2207658
2	H3K4me3-H3K4me3	1863056
3	H3K27me3-H3K9me3	1763720
4	H3K9me3-H3K9me3	1593466
5	H3K27me3-H3K4me3	1139048
6	H3K4me1-H3K4me3	856052
7	H3K4me3-H3K9me3	821540
8	H3K4me1-H3K9me3	603640
9	H3K4me1-H3K4me1	567902
10	H3K9me2-H3K9me3	394360
11	H3K27me3-H3K9me2	373104
12	H3K9me2-H3K9me2	222676
13	H3K4me3-H3K9me2	159988
14	H3K27ac-H3K4me3	120056
15	H3K27ac-H3K27me3	85816

Filtered read counts CSV - filtered_qc.tsv
- Contains read counts for files passing the percentile filter
- Excludes samples in the lowest percentile based on filtered_percentile parameter
- Two columns: file name and read count

	pair	read_count
	<chr>	<int>
1	H3K27me3-H3K4me1	2207658
2	H3K4me3-H3K4me3	1863056
3	H3K27me3-H3K9me3	1763720
4	H3K9me3-H3K9me3	1593466
5	H3K27me3-H3K4me3	1139048
6	H3K4me1-H3K4me3	856052
7	H3K4me3-H3K9me3	821540
8	H3K4me1-H3K9me3	603640
9	H3K4me1-H3K4me1	567902
10	H3K9me2-H3K9me3	394360
11	H3K27me3-H3K9me2	373104

Example Usage

# Test Data
BAM_DIR="./bam"
QC_DIR="./qc"

for d in "$BAM_DIR"/*/; do
  [[ -d "$d" ]] || continue
  sample="$(basename "$d")"
  echo "======================"
  echo "$sample"
  echo "======================"

  out="${QC_DIR}/${sample}"
  multiEpiPrep qc -i "$d" -o "$out" -e unknown,IgG_control
  echo ""
done

# Move unpassed files to unpassed/
for d in "$BAM_DIR"/*/; do
  [[ -d "$d" ]] || continue
  sample="$(basename "$d")"
  tsv="${QC_DIR}/${sample}/filtered_read_count.tsv"

  echo "======================"
  echo "$sample"
  echo "======================"

  # create unpassed dir
  unpassed_dir="${d}/unpassed"
  mkdir -p "$unpassed_dir"

  pairs=$(awk -F'\t' '
    NR==1 {
      for (i=1; i<=NF; i++) if ($i=="pair") col=i
      if (!col) { print "ERROR: pair column not found" > "/dev/stderr"; exit 1 }
      next
    }
    $col != "" { print $col }
  ' "$tsv")

  for bam in "$d"/*.bam; do
    [[ -e "$bam" ]] || continue
    pair="$(basename "$bam" .bam)"

    if ! printf '%s\n' "$pairs" | grep -Fxq -- "$pair"; then
      echo "Moving $bam -> $unpassed_dir"
      mv "$bam" "$unpassed_dir/"

      # move index if exists
      [[ -e "${bam}.bai" ]] && mv "${bam}.bai" "$unpassed_dir/"
    fi
  done
done

QC (Optional)

1. QC by Percentile

Parameters

Output Files

Example Usage