1. Barcode FASTA Generation

The barcode command converts a user-provided barcode spreadsheet into two FASTA files (barcode_fwd.fasta and barcode_rev.fasta) that are used downstream for demultiplexing. The input spreadsheet must contain columns for forward and reverse barcode sequences, and the output FASTA files will be formatted with headers that include the barcode name and sequence for easy identification during demultiplexing.

Parameters

Argument Type Default Description Example
-i, –input character Path to the barcode spreadsheet file. Supported formats include .xlsx, .xls, .csv, and .tsv. The file must contain at least one column representing targets (e.g., target or crf) and one column representing barcode sequences (e.g., bc, barcode, sequence, or index). Column names are detected automatically. -i ./barcodes.xlsx
-o, –output character Output directory where generated FASTA files will be written. The directory will be created recursively if it does not exist. Two files will be generated inside this directory: barcode_fwd.fasta and barcode_rev.fasta. Existing files with the same names will be overwritten. -o ./output

Input requirement The input spreadsheet must: - provided in .csv, .tsv, .xlsx, or .xls format - contain at least two columns: one for target names (e.g., “target” or “crf”) and one for barcode sequences (e.g., “bc”, “barcode”, “sequence”).

Output Files

The command generates the following output files in the specified OUT_DIR:

  • OUT_DIR/barcode_fwd.fasta
  • OUT_DIR/barcode_rev.fasta

Both FASTA files contain identical content and represent the complete set of barcode sequences provided in the input spreadsheet.

Example Usage

# Test Data
BARCODE_CSV="./barcode/barcode.csv"
multiEpiPrep barcode -i "$BARCODE_CSV" -o "./barcode"

2. Demultiplex FASTQ by Barcode Combination

The demux command demultiplexes paired-end FASTQ files based on forward and reverse barcode combinations using cutadapt, and generates per-combination FASTQ outputs for downstream alignment.

This step is implemented entirely in bash and is designed to be fast, deterministic, and independent of Python or R. It performs barcode matching using linked adapter mode in cutadapt, removes empty outputs, and merges symmetric barcode combinations to ensure canonical CRF-CRF ordering.

What this script does:

  • Parses required arguments (--r1, --r2, --fwd, --rev, --output) and optional parameters.
  • Auto-detects CPU threads if --threads is not provided.
  • Validates existence of input FASTQ and barcode FASTA files.
  • Ensures output directory is either non-existent or empty (prevents accidental overwrite).
  • Runs cutadapt in linked-adapter mode:
    • -g ^file:<FWD_FASTA>
    • -G ^file:<REV_FASTA>
    • --no-indels
    • --action none
  • Removes empty FASTQ outputs after demultiplexing.
  • Detects symmetric combinations (e.g., A-B and B-A):
    • Merges them if both exist.
    • Renames to canonical lexicographic order.
  • Produces final per-combination R1/R2 FASTQ files ready for alignment.

Parameters

Argument Type Default Description Example
-1, –r1 character Input R1 FASTQ file (.fastq.gz). -1 raw.R1.fastq.gz
-2, –r2 character Input R2 FASTQ file (.fastq.gz). -2 raw.R2.fastq.gz
-f, –fwd character Forward barcode FASTA file. -f barcodes_fwd.fa
-r, –rev character Reverse barcode FASTA file. -r barcodes_rev.fa
-o, –output character Output directory for demultiplexed FASTQ files (must not already contain files). -o ./demux_out
-e, –error-rate numeric 0 Maximum allowed barcode mismatches for cutadapt (-e parameter). -e 2
-j, –threads integer auto-detect Number of threads for cutadapt (falls back to get_cpu_cores). -j 16

Output Files

The command generates the following output files in the specified OUT_DIR:

  • OUT_DIR/{name1}-{name2}_R1.fastq.gz
  • OUT_DIR/{name1}-{name2}_R2.fastq.gz

Each pair corresponds to a detected barcode combination.

Empty combinations (0 reads after decompression) are automatically removed.

Symmetric pairs (e.g., A-B and B-A) are merged into a single canonical output (A-B) to ensure consistent downstream processing.

Example Usage

# Test Data
FWD_FASTA="./barcodes_fwd.fasta"
REV_FASTA="./barcodes_rev.fasta"
IN_DIR="./fastq"
FASTQ_PREFIX="test"
OUT_DIR="./demultiplex"

mkdir -p "$OUT_DIR"

for d in "$IN_DIR"/*; do
  [[ -d "$d" ]] || continue
  sample="$(basename "$d")"
  fwd_fastq="${d}/${FASTQ_PREFIX}_R1.fastq.gz"
  rev_fastq="${d}/${FASTQ_PREFIX}_R2.fastq.gz"

  multiEpiPrep demux \
    -1 "$fwd_fastq" \
    -2 "$rev_fastq" \
    -f "$FWD_FASTA" \
    -r "$REV_FASTA" \
    -o "${OUT_DIR}/${sample}"
done