1. Adapter Identification and Trimming

The trim command scans merged, demultiplexed paired-end FASTQ files for a user-defined adapter structure, removes the identified adapter and tag sequences from the read, and writes selected tag information (CB, UMI) into the read comment as SAM-compliant optional fields, for downstream processing.

The adapter structure is explicitly defined as:

[cell barcode] - [spacer] - [umi] - [linker]

All four components are interpreted in this fixed order during matching. Depending on the experimental design, each component can be defined either by fixed length or by one or more candidate sequences.

For each detected paired-end FASTQ pair, the pipeline performs the following steps:

Reads are processed in paired-end mode from merged demultiplexed FASTQ files, with one FASTQ pair per prefix. Files whose prefixes are listed in --exclude are skipped before adapter identification begins.
The command scans the 5’ region of each read according to the predefined adapter structure, where the expected tag order is always [cell barcode] - [spacer] - [umi] - [linker]. Each component is matched sequentially, and the parser allows user-controlled flexibility through mismatch tolerance and positional laxity. Depending on the library design, the adapter structure is identified on R1 only (with R2 passed through untrimmed) or independently on both R1 and R2, whose results are then combined into a single paired barcode.
Each adapter component can be specified in one of four supported formats: fixed length, fixed sequence, comma-separated candidate sequence list, or @file containing one candidate sequence per line. This allows the same command to support simple fixed-layout designs as well as more complex candidate-based barcode structures.
For sequence-based components, mismatches are allowed according to --error-rate, where the maximum mismatch count is computed as floor(length(tag) * error_rate). In addition, --laxity controls how many bases may be skipped when searching for the next expected tag in the structure.
Once CB and UMI values are identified, they are written as SAM-compliant optional fields (CB:Z:..., UB:Z:...) into the read comment — the portion of the FASTQ header after the first space — leaving the read name (QNAME) itself unchanged. This keeps read names within SAM length limits regardless of adapter structure complexity, and allows downstream alignment (bowtie2 --sam-append-comment) to carry these fields directly into BAM tags without further parsing.
The output consists of trimmed FASTQ files with CB/UMI information recorded in the read comment, written as paired gzipped FASTQ files to the specified output directory.

Parameters

Argument	Type	Default	Description	Example
`-i, –input`	character	—	Directory containing merged demultiplexed paired-end FASTQ files in `.fastq.gz` format.	`-i ./demux`
`-o, –output`	character	—	Output directory for trimmed FASTQ files with annotated read names.	`-o ./adapter`
`–cb`	integer / character	—	Definition of the cell barcode component in the adapter structure. Supported formats: Integer: fixed length String: fixed sequence Comma-separated sequences: candidate sequence list `@file`: one candidate sequence per line	`–cb AAAA,CCCC,GGGG,TTTT`
`–sp`	integer / character	—	Definition of the spacer component in the adapter structure. Supported formats are the same as for `–cb`.	`–sp 8`
`–umi`	integer / character	—	Definition of the UMI component in the adapter structure. Supported formats are the same as for `–cb`.	`–umi 8`
`–linker`	integer / character	—	Definition of the linker component in the adapter structure. Supported formats are the same as for `–cb`.	`–linker GCGATCGAGGACGGCAGATGTGTATAAGAGACAG`
`-r, –error-rate`	numeric	`0.1`	Mismatch rate allowed for sequence-based tag matching. The maximum mismatch count is calculated as `floor(length(tag) * error_rate)`.	`-r 0.1`
`-l, –laxity`	integer	`0`	Maximum number of bases allowed to skip when searching for the next tag in the adapter structure.	`-l 2`
`-e, –exclude`	character vector	`c(“unknown”, “IgG_control”)`	FASTQ prefixes to skip before adapter identification.	`-e unknown IgG_control`
`-j, –threads`	integer	auto-detect	Number of FASTQ prefixes to process in parallel. Falls back to all available CPU cores when not provided.	`-j 8`

Read Name Annotation

When adapter components are successfully identified, CB and UMI values (if present in the adapter structure) are written as SAM-compliant optional fields into the read comment — the portion of the FASTQ header after the first space — while the read name (QNAME) itself is left unchanged. For paired-tag mode, values identified independently on R1 and R2 are combined into a single value per tag before being written.

Input Example: @A00123:45:H3F7MDSX2:1:1101:10000:1000 1:N:0:ATCGTAGC Output Example: @A00123:45:H3F7MDSX2:1:1101:10000:1000 CB:Z:GGGG-AAAA UB:Z:CCCCCC-TTTTTT

Output Files

The command generates the following output files in the specified OUT_DIR:

OUT_DIR/{name1}-{name2}_R1.trimmed.fastq.gz
OUT_DIR/{name1}-{name2}_R2.trimmed.fastq.gz

Example Usage

# 1) Regular Hiplex CUT&Tag
multiEpiPrep trim \
  -i ./demux \
  -o ./trim \
  --linker GCGATCGAGGACGGCAGATGTGTATAAGAGACAG,CACCGTCTCCGCCTCAGATGTGTATAAGAGACAG

# 2) UMI-containing Hiplex CUT&Tag
multiEpiPrep trim \
  -i ./demux \
  -o ./trim \
  --sp 8 \
  --umi 8 \
  --linker GCGATCGAGGACGGCAGATGTGTATAAGAGACAG,CACCGTCTCCGCCTCAGATGTGTATAAGAGACAG

# 3) Candidate CB sequence list
multiEpiPrep trim \
  -i ./demux \
  -o ./trim \
  --cb AAAA,CCCC,GGGG,TTTT \
  --umi 8 \
  --linker GCGATCGAGGACGGCAGATGTGTATAAGAGACAG,CACCGTCTCCGCCTCAGATGTGTATAAGAGACAG

# 4) Strict mode
multiEpiPrep trim \
  -i ./demux \
  -o ./trim \
  -r 0 \
  -l 0

# Test data
DEMUX_DIR="./demux"
TRIM_DIR="./trim"

for d in "$DEMUX_DIR"/*/; do
  [ -d "$d" ] || continue
  sample=$(basename "$d")
  echo "======================"
  echo "$sample"
  echo "======================"

  out="${TRIM_DIR}/${sample}"
  multiEpiPrep trim -i "$d" -o "$out" -g hg38 --linker GCGATCGAGGACGGCAGATGTGTATAAGAGACAG,CACCGTCTCCGCCTCAGATGTGTATAAGAGACAG
done

Identify Adapter

1. Adapter Identification and Trimming

Parameters

Read Name Annotation

Output Files

Example Usage