Low-latency XML DOM parsing for Zig with comptime-specialized parse modes and an in-tree benchmark/conformance harness.
- Single-pass XML parsing over
[]const u8input. - DOM layout backed by contiguous node/attribute arrays and span slices into source bytes.
- Comptime parse configuration via
Document.parse(input, .{ ... }). - Two parser profiles:
strictandturbo. - Raw borrowed accessors plus allocator-backed decoded helpers for text and attribute values.
- In-tree conformance suites and external parser benchmark harness.
Source: bench/results/latest.json (quick profile).
stream-turbo │████████████████████│ 3725.24 MB/s (100.00%)
stream-strict │███████████████████░│ 3577.71 MB/s (96.04%)
ours-turbo │█████████████████░░░│ 3077.73 MB/s (82.62%)
ours-strict │████████████████░░░░│ 2942.62 MB/s (78.99%)
pugixml │████████░░░░░░░░░░░░│ 1455.80 MB/s (39.08%)
rapidxml │███████░░░░░░░░░░░░░│ 1340.28 MB/s (35.98%)
| Profile | Passed | Rule |
|---|---|---|
quick |
20/20 | ours-turbo >= max(pugixml, rapidxml) |
quick |
20/20 | stream-turbo >= ours-turbo && stream-strict >= ours-strict |
zig build test
zig build conformance
zig build bench-compareMinimal parse:
const std = @import("std");
const fastxml = @import("fastxml");
const options: fastxml.ParseOptions = .{};
const Document = fastxml.Types(options).Document;
pub fn main() !void {
const src = "<root id='r'><child>text</child></root>";
var doc = Document.init(std.heap.page_allocator);
defer doc.deinit();
try doc.parse(src, .{
.mode = .strict,
.validate_closing_tags = true,
});
const root = doc.nodeAt(1).?;
std.debug.print("{s} {s}\n", .{ root.nameSlice(), root.getAttributeValueRaw("id").? });
}fastxml.ParseOptionsfastxml.ParseModefastxml.ParseErrorfastxml.ParseIntfastxml.MaxParseLenfastxml.Types(options).Documentfastxml.Types(options).Nodefastxml.Types(options).Attribute
const options: fastxml.ParseOptions = .{};
const types = fastxml.Types(options);
const Document = types.Document;
const Node = types.Node;
const Attribute = types.Attribute;Index width is configurable at build time, following the same config-module pattern as htmlparser:
zig build test -Dintlen=u64Supported widths are u16, u32, u64, and usize. The default is u32.
Document.parse is comptime-specialized:
try doc.parse(input, .{
.mode = .turbo,
.validate_closing_tags = false,
.expand_dtd_entities = false,
.max_entity_value_len = 4096,
.drop_whitespace_text_nodes = true,
.include_misc_nodes = true,
});Parsing is always non-destructive and the original input is always []const u8.
Use raw accessors when you want borrowed source slices:
const attr_raw = root.getAttributeValueRaw("id").?;
const text_raw = root.firstChild().?.valueRawSlice();Use allocator-backed helpers when you want decoded values without mutating the source:
const attr = try root.getAttributeValue(std.heap.page_allocator, "id") orelse return;
defer std.heap.page_allocator.free(attr);
const inner = try root.innerText(std.heap.page_allocator);
defer std.heap.page_allocator.free(inner);DTD/entity expansion is disabled by default. When expand_dtd_entities = true, fastxml parses internal <!ENTITY ...> declarations from the document doctype into a document-owned hash map and uses that map during decoded value access. max_entity_value_len caps each stored expanded entity value.
turbo keeps DOM construction but drops expensive validation work by default. strict enforces stronger well-formedness checks and is the correctness-first profile.
zig build test
zig build conformance
zig build tools -- run-conformance --suite bench/conformance/well_formedness_w3c_core.json
zig build bench-compareBenchmark and conformance details are documented in bench/README.md.