1. Fragment Length Analysis Pipeline

The frag_decomposition() function performs a complete fragment length analysis pipeline including quality control, valley detection, and visualization for nucleosome positioning analysis from BAM files.

What this function does:

Performs quality control filtering on BAM files based on coverage percentile
Computes fragment lengths from filtered paired-end BAM files
Detects two local minimum valleys for nucleosome fragment classification
Generates fragment length histograms with valley cutoff lines
Creates summary statistics report (valley positions, min/max fragment lengths)
Produces fragment decomposition data with percentages
Generates bar plots showing fragment distribution across samples

Parameters

Parameter	Type	Required	Description	Example
`file_path`	character vector	Yes	Paths to BAM files to analyze	`file_path = c("sample1.bam", "sample2.bam")`
`save_dir`	character	Yes	Directory path to save all output files and plots	`save_dir = "./fragment_analysis"`
`frag_decomp_file`	character	No	Path to generated fragment decomposition file (returned by function)	Auto-generated
`filtered_percentile`	numeric	No	Percentile threshold for quality control filtering (default: 0.25, removes bottom 25%)	`filtered_percentile = 0.25`
`dens_reso`	numeric	No	Resolution for kernel density estimation in valley detection (default: 2^15 = 32768)	`dens_reso = 2^15`
`target_pair_mapping_df`	data frame	No	Optional data frame for mapping sample names to display names (default: NULL)	`target_pair_mapping_df = df`
`density_kernel`	character	No	Kernel type for density estimation (default: “gaussian”)	`density_kernel = "gaussian"`
`valley1_range`	numeric vector	No	Search range for first valley in bp (default: c(73, 221))	`valley1_range = c(73, 221)`
`valley2_range`	numeric vector	No	Search range for second valley in bp (default: c(222, 368))	`valley2_range = c(222, 368)`

Output Files

The function generates the following output files in the specified save_dir:

Fragment length data - premerge_all-qc_frag_lens.RData
- Cached fragment length data from all filtered samples
- Numeric vector of fragment lengths
Initial histogram - premerge_frag_hist.pdf
- Fragment length distribution histogram without valley cutoffs
- Shows overall fragment length distribution
Summary report - summary_report.csv
- Contains key metrics:
  - local_min1: First valley position (bp)
  - local_min2: Second valley position (bp)
  - min_frag_len: Minimum fragment length (bp)
  - max_frag_len: Maximum fragment length (bp)
Histogram with cutoffs - premerge_frag_hist_with_cutoff.pdf
- Fragment length distribution with vertical lines at valley positions
- Used to visualize nucleosome fragment classification
Fragment decomposition data - *_frag_decomp_with_perc.tsv
- Per-sample fragment counts and percentages
- Classified by valley thresholds: subnucleo, monomer, dimer+
Bar plots - save_dir/barplots/*.pdf
- Visual representation of fragment distribution across samples
- Separate plots for different fragment categories

Example Usage

library(epigenomeR)
frag_decomposition(file_path = bam_files_vector, save_dir = "output/frag_decomposition")

2. Split bam files in bash

## Use in bash
## change with your length of scen_list
#SBATCH --array=0-1

module load samtools

# setup (change)
thres_name="valley-V-qc"
scen_list=( "T" "V" ) # input dir name
scen_simple_name_list=( "T" "V" ) # output name
target_list=( "IgG_control" "H3K36me3" "H3K4me1"... )
load_root_dir="./data_align" # directory path for scen_list
save_root_raw_dir="output/frag_decomposition/split_bam"

# main function
run_frag_decomposition() {
    scen=${scen_list[${SLURM_ARRAY_TASK_ID}]}
    scen_simple_name=${scen_simple_name_list[${SLURM_ARRAY_TASK_ID}]}

    save_root_dir=${save_root_raw_dir}/${thres_name}
    mkdir -p "${save_root_dir}"

    load_dir=${load_root_dir}/${scen}
    mixed_dir=${save_root_dir}/${scen_simple_name}_mixed
    subneucleo_dir=${save_root_dir}/${scen_simple_name}_sub
    mononeucleo_dir=${save_root_dir}/${scen_simple_name}_mono
    dineucleo_dir=${save_root_dir}/${scen_simple_name}_di

    mkdir -p "${mixed_dir}" "${subneucleo_dir}" "${mononeucleo_dir}" "${dineucleo_dir}"
    cp -r ${load_dir}/* "${mixed_dir}/"

    for target_1 in "${target_list[@]}"; do
        for target_2 in "${target_list[@]}"; do
            if [ "${target_1}" \< "${target_2}" ] || [ "${target_1}" = "${target_2}" ]; then
                mixed_filename="${target_1}-${target_2}.bam"
                mixed_dir_filename=${load_dir}/${mixed_filename}

                if [[ ! -f "${mixed_dir_filename}" ]]; then
                    echo "Warning: ${mixed_dir_filename} not found, skip."
                    continue
                fi
                samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=24 && $9<=126) || ($9<=-24 && $9>=-126)' | samtools view -b > ${subneucleo_dir}/${mixed_filename}
                samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=127 && $9<=297) || ($9<=-127 && $9>=-297)' | samtools view -b > ${mononeucleo_dir}/${mixed_filename}
                samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=298 && $9<=800) || ($9<=-298 && $9>=-800)' | samtools view -b > ${dineucleo_dir}/${mixed_filename}
            fi
        done
    done
}

run_frag_decomposition

Fragment Decomposition

1. Fragment Length Analysis Pipeline

Parameters

Output Files

Example Usage

2. Split bam files in bash