The barcode command converts a user-provided barcode
spreadsheet into two FASTA files (barcode_fwd.fasta and
barcode_rev.fasta) that are used downstream for
demultiplexing. The input spreadsheet must contain columns for forward
and reverse barcode sequences, and the output FASTA files will be
formatted with headers that include the barcode name and sequence for
easy identification during demultiplexing.
| Argument | Type | Default | Description | Example |
|---|---|---|---|---|
-i, –input
|
character | — |
Path to the barcode spreadsheet file. Supported formats include
.xlsx, .xls, .csv, and
.tsv. The file must contain at least one column
representing targets (e.g., target or crf) and
one column representing barcode sequences (e.g., bc,
barcode, sequence, or index).
Column names are detected automatically.
|
-i ./barcodes.xlsx
|
-o, –output
|
character | — |
Output directory where generated FASTA files will be written. The
directory will be created recursively if it does not exist. Two files
will be generated inside this directory: barcode_fwd.fasta
and barcode_rev.fasta. Existing files with the same names
will be overwritten.
|
-o ./output
|
Input requirement The input spreadsheet must: - provided in .csv, .tsv, .xlsx, or .xls format - contain at least two columns: one for target names (e.g., “target” or “crf”) and one for barcode sequences (e.g., “bc”, “barcode”, “sequence”).
The command generates the following output files in the specified
OUT_DIR:
OUT_DIR/barcode_fwd.fastaOUT_DIR/barcode_rev.fastaBoth FASTA files contain identical content and represent the complete set of barcode sequences provided in the input spreadsheet.
# Test Data
BARCODE_CSV="./barcode/barcode.csv"
multiEpiPrep barcode -i "$BARCODE_CSV" -o "./barcode"
The demux command demultiplexes paired-end FASTQ files
based on forward and reverse barcode combinations using cutadapt, and
generates per-combination FASTQ outputs for downstream alignment.
This step is implemented entirely in bash and is designed to be fast, deterministic, and independent of Python or R. It performs barcode matching using linked adapter mode in cutadapt, removes empty outputs, and merges symmetric barcode combinations to ensure canonical CRF-CRF ordering.
What this script does:
--r1, --r2,
--fwd, --rev, --output) and
optional parameters.--threads is not
provided.-g ^file:<FWD_FASTA>-G ^file:<REV_FASTA>--no-indels--action none| Argument | Type | Default | Description | Example |
|---|---|---|---|---|
-1, –r1
|
character | — |
Input R1 FASTQ file (.fastq.gz).
|
-1 raw.R1.fastq.gz
|
-2, –r2
|
character | — |
Input R2 FASTQ file (.fastq.gz).
|
-2 raw.R2.fastq.gz
|
-f, –fwd
|
character | — | Forward barcode FASTA file. |
-f barcodes_fwd.fa
|
-r, –rev
|
character | — | Reverse barcode FASTA file. |
-r barcodes_rev.fa
|
-o, –output
|
character | — | Output directory for demultiplexed FASTQ files (must not already contain files). |
-o ./demux_out
|
-e, –error-rate
|
numeric |
0
|
Maximum allowed barcode mismatches for cutadapt (-e
parameter).
|
-e 2
|
-j, –threads
|
integer | auto-detect |
Number of threads for cutadapt (falls back to
get_cpu_cores).
|
-j 16
|
The command generates the following output files in the specified
OUT_DIR:
OUT_DIR/{name1}-{name2}_R1.fastq.gzOUT_DIR/{name1}-{name2}_R2.fastq.gzEach pair corresponds to a detected barcode combination.
Empty combinations (0 reads after decompression) are automatically removed.
Symmetric pairs (e.g., A-B and B-A) are
merged into a single canonical output (A-B) to ensure
consistent downstream processing.
# Test Data
FWD_FASTA="./barcodes_fwd.fasta"
REV_FASTA="./barcodes_rev.fasta"
IN_DIR="./fastq"
FASTQ_PREFIX="test"
OUT_DIR="./demultiplex"
mkdir -p "$OUT_DIR"
for d in "$IN_DIR"/*; do
[[ -d "$d" ]] || continue
sample="$(basename "$d")"
fwd_fastq="${d}/${FASTQ_PREFIX}_R1.fastq.gz"
rev_fastq="${d}/${FASTQ_PREFIX}_R2.fastq.gz"
multiEpiPrep demux \
-1 "$fwd_fastq" \
-2 "$rev_fastq" \
-f "$FWD_FASTA" \
-r "$REV_FASTA" \
-o "${OUT_DIR}/${sample}"
done