Pakka treats pak entry names as attacker-controlled data and runs every extract, add, and verify through the same fail-fast validator. This document is the byte-level reference for what the validator rejects and why; the README has the short summary.
The validator (pakka_unsafe_entry_name) is format-independent — it
inspects the entry-name byte content, not the format-specific field
width — so a name that's legal in the on-disk directory can still be
rejected at extract or add time. The same rules apply to the entry-name
side of pakka_add_file / --as, so a pak built with pakka can be
re-extracted with pakka without surprises.
- Empty names.
- Absolute paths. Names beginning with
/or\. - Drive-letter prefixes.
C:...,D:..., ... - UNC paths.
\\server\share\.... - Parent-directory components. Any path component that is exactly
..(slash- or backslash-separated). Substring matches likefoo..baror..pngremain legal — only an exact..component is rejected. - Colon in a component. Windows alternate-data-stream hazard
(
file:stream). - Trailing dot or space in a component. Windows silently strips
these, so
foo.andfoowould collide. - Control bytes.
0x00–0x1For0x7Fanywhere in the name. - Windows reserved device names.
CON,PRN,AUX,NUL,COM0–COM9,LPT0–LPT9, including with extensions (NUL.txt) and in subdirectories (foo/CON). Recent Windows versions acceptCOM0andLPT0but pakka rejects them for safety.COM10and beyond are allowed — only the single-digit forms are reserved.
Validation runs before any mkdir, fread, or fopen, so a malicious
pak fails fast with no partial writes. Both POSIX and Windows traversal
forms are checked regardless of host OS, since pak archives are
portable.
Three checks run beyond the entry-name validator when bytes would touch disk:
Before any byte is written, pakka sorts the selected entries'
normalized names — case-folded, with \ mapped to /, and trailing
dot or space stripped — and rejects the whole extraction if two entries
would materialize to the same path on Windows or HFS+.
Without this check, two entries that differ only in case or in a trailing dot would silently overwrite each other on case-insensitive filesystems.
Every directory descent into the -C destination tree, and every leaf
open, refuses to cross a symlink or reparse point:
- Modern POSIX —
openatwithO_NOFOLLOWon each path component. - Legacy POSIX (pre-
openat, BSDs below the floors documented inplatforms.md) — anfchdir-based emulation that re-lstats each component. - Windows — a
GetFileAttributesA-based reparse check.
A planted symlink inside the destination directory cannot redirect a write outside it.
When pakka -a recurses into a directory tree, any symlink or reparse
point found inside is reported and skipped rather than silently
followed. This applies symmetrically to the write-time protection: the
archive can never gain entries materialized from a redirected path.
For archives with compressed entries,
pakka_set_max_decompressed_size(archive, max_bytes, err) caps the
bytes any single pakka_open_entry or pakka_read_entry_alloc will
inflate. Default is 64 MiB; pass 0 to disable. This refuses
zip-bomb-style high-ratio entries before they hit RAM.
pakka --verify runs the entry-name validator and the
normalized-collision preflight against every entry, streams every
payload to confirm the directory's offset / length point at
readable bytes, and exits non-zero on any error-level finding. None of
those checks touch the destination filesystem, so an archive can be
audited safely before extraction.
Pass PAKKA_VERIFY_DEEP to pakka_verify (library) for per-entry
CRC32 and decompression checks on top of the structural walk — CRC for
PK3 / PK4; byte-count check for Daikatana, whose custom codec has no
CRC.