1. Barcode FASTA Generation

The barcode command converts a user-provided barcode spreadsheet into two FASTA files (barcode_fwd.fasta and barcode_rev.fasta) that are used downstream for demultiplexing. The input spreadsheet must contain columns for forward and reverse barcode sequences, and the output FASTA files will be formatted with headers that include the barcode name and sequence for easy identification during demultiplexing.

Parameters

Argument	Type	Default	Description	Example
`-i, –input`	character	—	Path to the barcode spreadsheet file. Supported formats include `.xlsx`, `.xls`, `.csv`, and `.tsv`. The file must contain at least one column representing targets (e.g., `target` or `crf`) and one column representing barcode sequences (e.g., `bc`, `barcode`, `sequence`, or `index`). Column names are detected automatically.	`-i ./barcodes.xlsx`
`-o, –output`	character	—	Output directory where generated FASTA files will be written. The directory will be created recursively if it does not exist. Two files will be generated inside this directory: `barcode_fwd.fasta` and `barcode_rev.fasta`. Existing files with the same names will be overwritten.	`-o ./output`

Input requirement The input spreadsheet must: - provided in .csv, .tsv, .xlsx, or .xls format - contain at least two columns: one for target names (e.g., “target” or “crf”) and one for barcode sequences (e.g., “bc”, “barcode”, “sequence”).

Output Files

The command generates the following output files in the specified OUT_DIR:

OUT_DIR/barcode_fwd.fasta
OUT_DIR/barcode_rev.fasta

Both FASTA files contain identical content and represent the complete set of barcode sequences provided in the input spreadsheet.

Example Usage

# Test Data
BARCODE_CSV="./barcode/barcode.csv"
multiEpiPrep barcode -i "$BARCODE_CSV" -o "./barcode"

2. Demultiplex FASTQ by Barcode Combination

The demux command demultiplexes paired-end FASTQ files based on forward and reverse barcode combinations using cutadapt, and generates per-combination FASTQ outputs for downstream alignment.

This step is implemented entirely in bash and is designed to be fast, deterministic, and independent of Python or R. It performs barcode matching using linked adapter mode in cutadapt, removes empty outputs, and merges symmetric barcode combinations to ensure canonical CRF-CRF ordering.

What this script does:

Parses required arguments (--r1, --r2, --fwd, --rev, --output) and optional parameters.
Auto-detects CPU threads if --threads is not provided.
Validates existence of input FASTQ and barcode FASTA files.
Ensures output directory is either non-existent or empty (prevents accidental overwrite).
Runs cutadapt in linked-adapter mode:
- -g ^file:<FWD_FASTA>
- -G ^file:<REV_FASTA>
- --no-indels
- --action none
Removes empty FASTQ outputs after demultiplexing.
Detects symmetric combinations (e.g., A-B and B-A):
- Merges them if both exist.
- Renames to canonical lexicographic order.
Produces final per-combination R1/R2 FASTQ files ready for alignment.

Parameters

Argument	Type	Default	Description	Example
`-1, –r1`	character	—	Input R1 FASTQ file (`.fastq.gz`).	`-1 raw.R1.fastq.gz`
`-2, –r2`	character	—	Input R2 FASTQ file (`.fastq.gz`).	`-2 raw.R2.fastq.gz`
`-f, –fwd`	character	—	Forward barcode FASTA file.	`-f barcodes_fwd.fa`
`-r, –rev`	character	—	Reverse barcode FASTA file.	`-r barcodes_rev.fa`
`-o, –output`	character	—	Output directory for demultiplexed FASTQ files (must not already contain files).	`-o ./demux_out`
`-e, –error-rate`	numeric	`0`	Maximum allowed barcode mismatches for cutadapt (`-e` parameter).	`-e 2`
`-j, –threads`	integer	auto-detect	Number of threads for cutadapt (falls back to `get_cpu_cores`).	`-j 16`

Output Files

The command generates the following output files in the specified OUT_DIR:

OUT_DIR/{name1}-{name2}_R1.fastq.gz
OUT_DIR/{name1}-{name2}_R2.fastq.gz

Each pair corresponds to a detected barcode combination.

Empty combinations (0 reads after decompression) are automatically removed.

Symmetric pairs (e.g., A-B and B-A) are merged into a single canonical output (A-B) to ensure consistent downstream processing.

Example Usage

# Test Data
FWD_FASTA="./barcodes_fwd.fasta"
REV_FASTA="./barcodes_rev.fasta"
IN_DIR="./fastq"
FASTQ_PREFIX="test"
OUT_DIR="./demultiplex"

mkdir -p "$OUT_DIR"

for d in "$IN_DIR"/*; do
  [[ -d "$d" ]] || continue
  sample="$(basename "$d")"
  fwd_fastq="${d}/${FASTQ_PREFIX}_R1.fastq.gz"
  rev_fastq="${d}/${FASTQ_PREFIX}_R2.fastq.gz"

  multiEpiPrep demux \
    -1 "$fwd_fastq" \
    -2 "$rev_fastq" \
    -f "$FWD_FASTA" \
    -r "$REV_FASTA" \
    -o "${OUT_DIR}/${sample}"
done