The frag_decomposition() function performs a complete
fragment length analysis pipeline including quality control, valley
detection, and visualization for nucleosome positioning analysis from
BAM files.
What this function does:
Performs quality control filtering on BAM files based on coverage percentile
Computes fragment lengths from filtered paired-end BAM files
Detects two local minimum valleys for nucleosome fragment classification
Generates fragment length histograms with valley cutoff lines
Creates summary statistics report (valley positions, min/max fragment lengths)
Produces fragment decomposition data with percentages
Generates bar plots showing fragment distribution across samples
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
file_path |
character vector | Yes | Paths to BAM files to analyze | file_path = c("sample1.bam", "sample2.bam") |
save_dir |
character | Yes | Directory path to save all output files and plots | save_dir = "./fragment_analysis" |
frag_decomp_file |
character | No | Path to generated fragment decomposition file (returned by function) | Auto-generated |
filtered_percentile |
numeric | No | Percentile threshold for quality control filtering (default: 0.25, removes bottom 25%) | filtered_percentile = 0.25 |
dens_reso |
numeric | No | Resolution for kernel density estimation in valley detection (default: 2^15 = 32768) | dens_reso = 2^15 |
target_pair_mapping_df |
data frame | No | Optional data frame for mapping sample names to display names (default: NULL) | target_pair_mapping_df = df |
density_kernel |
character | No | Kernel type for density estimation (default: โgaussianโ) | density_kernel = "gaussian" |
valley1_range |
numeric vector | No | Search range for first valley in bp (default: c(73, 221)) | valley1_range = c(73, 221) |
valley2_range |
numeric vector | No | Search range for second valley in bp (default: c(222, 368)) | valley2_range = c(222, 368) |
The function generates the following output files in the specified
save_dir:
premerge_all-qc_frag_lens.RData
premerge_frag_hist.pdf

summary_report.csv
local_min1: First valley position (bp)local_min2: Second valley position (bp)min_frag_len: Minimum fragment length (bp)max_frag_len: Maximum fragment length (bp)
premerge_frag_hist_with_cutoff.pdf

*_frag_decomp_with_perc.tsv
subnucleo,
monomer, dimer+
save_dir/barplots/*.pdf

library(epigenomeR)
frag_decomposition(file_path = bam_files_vector, save_dir = "output/frag_decomposition")
## Use in bash
## change with your length of scen_list
#SBATCH --array=0-1
module load samtools
# setup (change)
thres_name="valley-V-qc"
scen_list=( "T" "V" ) # input dir name
scen_simple_name_list=( "T" "V" ) # output name
target_list=( "IgG_control" "H3K36me3" "H3K4me1"... )
load_root_dir="./data_align" # directory path for scen_list
save_root_raw_dir="output/frag_decomposition/split_bam"
# main function
run_frag_decomposition() {
scen=${scen_list[${SLURM_ARRAY_TASK_ID}]}
scen_simple_name=${scen_simple_name_list[${SLURM_ARRAY_TASK_ID}]}
save_root_dir=${save_root_raw_dir}/${thres_name}
mkdir -p "${save_root_dir}"
load_dir=${load_root_dir}/${scen}
mixed_dir=${save_root_dir}/${scen_simple_name}_mixed
subneucleo_dir=${save_root_dir}/${scen_simple_name}_sub
mononeucleo_dir=${save_root_dir}/${scen_simple_name}_mono
dineucleo_dir=${save_root_dir}/${scen_simple_name}_di
mkdir -p "${mixed_dir}" "${subneucleo_dir}" "${mononeucleo_dir}" "${dineucleo_dir}"
cp -r ${load_dir}/* "${mixed_dir}/"
for target_1 in "${target_list[@]}"; do
for target_2 in "${target_list[@]}"; do
if [ "${target_1}" \< "${target_2}" ] || [ "${target_1}" = "${target_2}" ]; then
mixed_filename="${target_1}-${target_2}.bam"
mixed_dir_filename=${load_dir}/${mixed_filename}
if [[ ! -f "${mixed_dir_filename}" ]]; then
echo "Warning: ${mixed_dir_filename} not found, skip."
continue
fi
samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=24 && $9<=126) || ($9<=-24 && $9>=-126)' | samtools view -b > ${subneucleo_dir}/${mixed_filename}
samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=127 && $9<=297) || ($9<=-127 && $9>=-297)' | samtools view -b > ${mononeucleo_dir}/${mixed_filename}
samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=298 && $9<=800) || ($9<=-298 && $9>=-800)' | samtools view -b > ${dineucleo_dir}/${mixed_filename}
fi
done
done
}
run_frag_decomposition