Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 85 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,100 +1,121 @@
# Glyph11

Glyph11 is a dependency free, low allocation HTTP/1.1 parser for C#. It does not rely on any specific network technology but can be used with any (such as `Socket`, `NetworkStream`, `PipeReader` or anything else).
A zero-allocation, hardened HTTP/1.1 request parser — a pure-C# library and a C core
(`libglyph11`) reachable from .NET and the JVM. RFC 9110/9112 validation, configurable
resource limits, and request-smuggling / semantic checks fused into a single zero-copy pass.

![.NET](https://img.shields.io/badge/.NET-8.0%20%7C%209.0%20%7C%2010.0-512bd4)
[![NuGet](https://img.shields.io/nuget/v/Glyph11.svg)](https://www.nuget.org/packages/Glyph11/)
![.NET](https://img.shields.io/badge/.NET-8.0%20%7C%209.0%20%7C%2010.0-512bd4)
[![Benchmarks](https://img.shields.io/badge/benchmarks-live-blue)](https://dotnet-web-stack.github.io/Glyph11/)
[![Coverage](https://img.shields.io/sonar/coverage/MDA2AV_Glyph11?server=https%3A%2F%2Fsonarcloud.io)](https://sonarcloud.io/summary/new_code?id=MDA2AV_Glyph11)
[![Quality Gate](https://sonarcloud.io/api/project_badges/measure?project=MDA2AV_Glyph11&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=MDA2AV_Glyph11)

## Usage
Three ways to use the same hardened parser:

| | What | Header storage |
|---|---|---|
| **C# library** | pure managed `UltraHardenedParser` | pooled, internal |
| **.NET binding** | the C core via P/Invoke | caller-provided (zero-alloc) |
| **Kotlin binding** | the C core via Panama FFM | per-call, returned as a list |

Glyph11 works with any source that produces a `ReadOnlySequence<byte>` or `ReadOnlyMemory<byte>` — `PipeReader`, `Socket`, `NetworkStream`, or raw byte arrays.
## C# library (managed)

```csharp
using System.Buffers;
using System.Text;
using Glyph11.Protocol;
using Glyph11.Parser;
using Glyph11.Parser.UltraHardened;

var request = new BinaryRequest();
var limits = ParserLimits.Default;
var limits = ParserLimits.Default;

ReadOnlySequence<byte> buffer = ...; // from any network source
// From any source that yields a ReadOnlySequence<byte> (PipeReader, Socket, NetworkStream, …)
ReadOnlySequence<byte> buffer = new(Encoding.ASCII.GetBytes(
"GET /api/users?page=1 HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\n\r\n"));

// UltraHardenedParser fuses structural parsing, resource limits, and every
// semantic check (smuggling, traversal, Host rules, ...) into one pass.
// It throws HttpParseException on any protocol or semantic violation.
if (UltraHardenedParser.TryExtractFullHeaderValidated(ref buffer, request, in limits, out int bytesRead))
{
// All parsed fields are zero-copy slices into the original buffer:
// request.Method.Span → e.g. "GET"
// request.Path.Span → e.g. "/api/users"
// request.Version.Span → e.g. "HTTP/1.1"
// request.Headers → KeyValueList of name/value pairs
// request.QueryParameters → KeyValueList of query params

// The request is fully validated — safe to process.
// Then advance your reader by bytesRead.

// Reuse between requests — clear instead of reallocating:
request.Headers.Clear();
request.QueryParameters.Clear();
}
```

Glyph11 plugs into a `PipeReader` loop: read a buffer, call `TryExtractFullHeaderValidated`, advance the reader by `bytesRead`, and repeat.

## Parsers

Glyph11 ships two parsers:

- **`UltraHardenedParser`** — RFC 9110/9112 compliant with full validation, configurable resource limits, and every smuggling/semantic check fused into the parse pass. Recommended for internet-facing applications.
- **`FlexibleParser`** — Minimal validation for maximum throughput. Suitable for trusted environments where input is pre-validated.

## Performance
Console.WriteLine(Encoding.ASCII.GetString(request.Method.Span)); // GET
Console.WriteLine(Encoding.ASCII.GetString(request.Path.Span)); // /api/users

- **ROM path is zero-allocation** — no GC pressure regardless of request size
- **SIMD-accelerated validation** keeps the `UltraHardenedParser` within a small constant factor of the unvalidated `FlexibleParser`
- **Multi-segment linearization** provides ROM-speed parsing with a single upfront allocation
for (int i = 0; i < request.Headers.Count; i++)
{
var (name, value) = request.Headers[i]; // zero-copy slices
Console.WriteLine($"{Encoding.ASCII.GetString(name.Span)}: {Encoding.ASCII.GetString(value.Span)}");
}

See the [live benchmarks](https://dotnet-web-stack.github.io/Glyph11/) — the managed parser vs. the C core and its .NET (P/Invoke) and JVM (Panama FFM) bindings, contiguous and multi-segment.
// advance your reader by bytesRead; reuse `request` across calls (request.Clear()).
}
// throws HttpParseException on a protocol/semantic violation; returns false if incomplete.
```

## CI Workflows
`TryExtractFullHeaderROM(ref ReadOnlyMemory<byte>, …)` is the single-buffer (contiguous) fast path.
`FlexibleParser` is a minimal-validation variant for trusted, pre-validated input.

### Benchmarks
## .NET binding (C core via P/Invoke)

The **Benchmark** workflow (`.github/workflows/benchmark.yml`) measures parser throughput and allocation using BenchmarkDotNet.
Calls `libglyph11` directly — same validation, native speed, **zero allocation** (you provide the
header/query storage).

| Trigger | Job | What it does |
|---------|-----|--------------|
| `pull_request` | **Parser Benchmarks** | Runs `FlexibleParserBenchmark` and `UltraHardenedParserBenchmark`, compares against the baseline on `gh-pages`, and posts a comment on the PR. Fails if any metric regresses by more than 15%. |
| `workflow_dispatch` | **Full Benchmarks** | Runs all benchmarks (parsers + `AllSemanticChecksBenchmark`) and updates the baseline on `gh-pages`. |
```csharp
using System.Text;
using Glyph11.Native;

**Data flow:** benchmark results are stored as `benchmarks/data.js` on the `gh-pages` branch.
byte[] request = Encoding.ASCII.GetBytes(
"GET /api/users?page=1 HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\n\r\n");

> The cross-language comparison on the [live site](https://dotnet-web-stack.github.io/Glyph11/) is produced separately by the **Cross-Language Benchmark** workflow (`.github/workflows/cross-bench.yml`), which benchmarks the C core, both bindings, and the managed parser, then publishes `benchmarks/cross-lang.json` to `gh-pages`.
Span<Glyph11Field> headers = stackalloc Glyph11Field[64];
Span<Glyph11Field> query = stackalloc Glyph11Field[32];

To publish updated benchmark data:
int status = Glyph11Parser.Parse(request, headers, query, Glyph11Limits.Default, out var r);
if (status == Glyph11Parser.Ok)
{
string Slice(Glyph11Span s) => Encoding.ASCII.GetString(request, (int)s.Offset, (int)s.Length);

1. Merge your changes to `main`.
2. Go to **Actions > Benchmark > Run workflow** on `main`.
Console.WriteLine(Slice(r.Method)); // GET
Console.WriteLine(Slice(r.Path)); // /api/users

### Compliance Probe
for (int i = 0; i < r.HeaderCount; i++)
Console.WriteLine($"{Slice(headers[i].Name)}: {Slice(headers[i].Value)}");
}
// status: 0 = OK, 1 = incomplete (read more), otherwise a protocol/limit error (→ HTTP 400 / 431).
```

The **Probe** workflow (`.github/workflows/probe.yml`) tests HTTP/1.1 compliance across multiple server frameworks using [Glyph11.Probe](src/Glyph11.Probe), a tool that sends malformed and ambiguous HTTP requests and checks the server's response against strict RFC 9110/9112 expectations.
Resolve the native library with the `GLYPH11_NATIVE_PATH` environment variable, or put
`libglyph11.{so,dll,dylib}` on the OS load path.

## Kotlin / JVM binding (C core via Panama FFM)

```kotlin
import io.glyph11.Glyph11
import io.glyph11.Glyph11Span

val request = "GET /api/users?page=1 HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\n\r\n"
.toByteArray(Charsets.ISO_8859_1)

val r = Glyph11.parse(request)
when {
r.isOk -> {
fun slice(s: Glyph11Span) = String(request, s.offset, s.length, Charsets.ISO_8859_1)
println(slice(r.method)) // GET
println(slice(r.path)) // /api/users
for (h in r.headers)
println("${slice(h.name)}: ${slice(h.value)}")
}
r.isIncomplete -> { /* read more bytes */ }
else -> println("rejected → HTTP ${Glyph11.httpCode(r.status)}") // 400 / 431
}
```

Servers tested: **Glyph11** (raw TCP + UltraHardenedParser), **Kestrel** (ASP.NET Core), **Flask** (Python), **Express** (Node.js), **Spring Boot** (Java), **Quarkus** (Java), **Nancy** (.NET), **Jetty** (Java), **Nginx** (native), **Apache** (native), **Caddy** (native), **Pingora** (Rust).
Requires JDK 21+ (FFM). Point at the library with `-Dglyph11.lib=/path/to/libglyph11.so`.

| Trigger | What it does |
|---------|--------------|
| `pull_request` | Starts all three servers, probes each one, evaluates results with strict status-code matching (e.g. a parser error must return `400`, not `404`), and posts a comparison table as a PR comment. Never fails the build — this is informational. |
| `workflow_dispatch` | Same as above, plus pushes `probe/data.js` to `gh-pages`. |
## Benchmarks

**Data flow:** probe results are stored as `probe/data.js` on the `gh-pages` branch.
Live cross-language numbers — managed vs. the C core and its .NET / JVM bindings, contiguous and
multi-segment: **<https://dotnet-web-stack.github.io/Glyph11/>**

To publish updated probe data:
## Build the native core (for the bindings)

1. Merge your changes to `main`.
2. Go to **Actions > Probe > Run workflow** on `main`.
```sh
cmake -S core -B core/build-rel -DGLYPH11_BUILD_TESTS=OFF
cmake --build core/build-rel # → core/build-rel/libglyph11.{so,dll,dylib}
```
33 changes: 21 additions & 12 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,19 +33,28 @@ benchmarks page.

| Payload | C# Ultra | Pure C | C# (FFI) | Kotlin (FFI) |
|---------|---------:|--------:|---------:|-------------:|
| ~95 B | 118 ns | 98 ns | 97 ns | 102 ns |
| 4 KB | 730 ns | 512 ns | 556 ns | 574 ns |
| 32 KB | 5028 ns | 3784 ns | 4254 ns | 4167 ns |
| ~95 B | 118 ns | 97 ns | 98 ns | 100 ns |
| 4 KB | 727 ns | 522 ns | 562 ns | 562 ns |
| 32 KB | 5039 ns | 3906 ns | 4122 ns | 4182 ns |

**Multi-segment** (3 segments):
**Multi-segment** (3 segments — every parser linearizes into a reused buffer, copy counted):

| Payload | C# Ultra | Pure C | C# (FFI) | Kotlin (FFI) |
|---------|---------:|--------:|---------:|-------------:|
| ~95 B | 257 ns | 101 ns | 106 ns | 111 ns |
| 4 KB | 1363 ns | 545 ns | 587 ns | 603 ns |
| 32 KB | 9262 ns | 4256 ns | 4624 ns | 4658 ns |

The FFI bindings track the pure-C floor (`[SuppressGCTransition]` for .NET,
reused off-heap buffers for Kotlin). Native multi-segment = contiguous + a
`memcpy`, so it stays close to contiguous and ~2× faster than the managed
multi-segment path (which allocates per call). Numbers vary run-to-run.
| ~95 B | 130 ns | 102 ns | 110 ns | 120 ns |
| 4 KB | 753 ns | 553 ns | 612 ns | 592 ns |
| 32 KB | 5406 ns | 4324 ns | 4567 ns | 4795 ns |

Multi-segment input must be linearized into a contiguous buffer first — that
per-request copy is counted in every number above. To compare the **parsers** (not
buffer strategy), every path linearizes the same way — `CopyTo`/`memcpy` into a
**reused** scratch buffer, then parse — so multi-segment = contiguous + a `memcpy`
for all of them, and native stays ~1.2× ahead in both modes (the parse engine).

> The managed one-shot API `TryExtractFullHeaderValidated` instead allocates that
> buffer via `input.ToArray()` **every request** — ~9.2 µs vs ~5.4 µs at 32 KB. For a
> multi-segment hot path, hand-roll `CopyTo` + `TryExtractFullHeaderROM` (or, for the
> binding, linearize into a reused buffer before the native call). It's an API cost,
> not a parser difference — hence a note, not the comparison.

Numbers vary run-to-run.
11 changes: 7 additions & 4 deletions bindings/dotnet/Glyph11.Bench/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -117,21 +117,24 @@ public static void Run(string dir)
var data = File.ReadAllBytes(Path.Combine(dir, file));
var rom = (ReadOnlyMemory<byte>)data;
var seq = ThreeSegments(data);
var lin = new byte[data.Length]; // reused linearization buffer (no per-call allocation)

// managed — ROM (single contiguous buffer)
double mRom = Best(iters, () => { req.Clear(); var r = rom; UltraHardenedParser.TryExtractFullHeaderROM(ref r, req, in ManagedLimits, out _); });
Console.WriteLine($"dotnet-managed-rom,{name},{mRom:F1}");

// managed — multi-segment (3 segments, linearized internally)
double mSeg = Best(iters, () => { req.Clear(); var s = seq; UltraHardenedParser.TryExtractFullHeaderValidated(ref s, req, in ManagedLimits, out _); });
// managed — multi-segment: linearize into the SAME reused buffer as the native paths,
// then ROM-parse, so the column compares the parser, not the linearization strategy.
// (The one-shot API TryExtractFullHeaderValidated would input.ToArray() instead — an
// allocation per request; that's an API cost, noted on the page/README, not here.)
double mSeg = Best(iters, () => { req.Clear(); seq.CopyTo(lin); ReadOnlyMemory<byte> r = lin; UltraHardenedParser.TryExtractFullHeaderROM(ref r, req, in ManagedLimits, out _); });
Console.WriteLine($"dotnet-managed-multiseg,{name},{mSeg:F1}");

// native binding (FFI) — contiguous
double ffi = Best(iters, () => Glyph11Parser.Parse(data, h, q, NativeLimits, out _));
Console.WriteLine($"dotnet-ffi,{name},{ffi:F1}");

// native binding (FFI) — multi-segment: linearize into a reused buffer, then parse
var lin = new byte[data.Length];
// native binding (FFI) — multi-segment: same reused-buffer linearization, then parse
double ffiSeg = Best(iters, () => { seq.CopyTo(lin); Glyph11Parser.Parse(lin, h, q, NativeLimits, out _); });
Console.WriteLine($"dotnet-ffi-multiseg,{name},{ffiSeg:F1}");
}
Expand Down
21 changes: 17 additions & 4 deletions bindings/kotlin/src/main/kotlin/io/glyph11/Glyph11.kt
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,24 @@ import java.lang.invoke.MethodHandle
/** A byte range (offset + length) into the parsed input buffer (zero-copy). */
data class Glyph11Span(val offset: Int, val length: Int)

/** A parsed name/value pair (header or query parameter); spans index into the input. */
data class Glyph11Field(val name: Glyph11Span, val value: Glyph11Span)

/** Parsed request fields. Spans index into the input passed to [Glyph11.parse]. */
data class Glyph11Result(
val status: Int,
val method: Glyph11Span,
val target: Glyph11Span,
val path: Glyph11Span,
val version: Glyph11Span,
val headerCount: Int,
val queryCount: Int,
val headers: List<Glyph11Field>,
val query: List<Glyph11Field>,
val consumed: Long,
) {
val isOk: Boolean get() = status == 0
val isIncomplete: Boolean get() = status == 1
val headerCount: Int get() = headers.size
val queryCount: Int get() = query.size
}

/**
Expand Down Expand Up @@ -105,15 +110,23 @@ object Glyph11 {

fun span(off: Long) =
Glyph11Span(req.get(ValueLayout.JAVA_INT, off), req.get(ValueLayout.JAVA_INT, off + 4))
fun fields(seg: MemorySegment, count: Int): List<Glyph11Field> =
(0 until count).map { i ->
val b = i.toLong() * SIZEOF_FIELD
Glyph11Field(
Glyph11Span(seg.get(ValueLayout.JAVA_INT, b), seg.get(ValueLayout.JAVA_INT, b + 4)),
Glyph11Span(seg.get(ValueLayout.JAVA_INT, b + 8), seg.get(ValueLayout.JAVA_INT, b + 12)),
)
}

return Glyph11Result(
status = status,
method = span(0L),
target = span(8L),
path = span(16L),
version = span(24L),
headerCount = req.get(ValueLayout.JAVA_INT, OFF_HEADER_COUNT),
queryCount = req.get(ValueLayout.JAVA_INT, OFF_QUERY_COUNT),
headers = if (status == 0) fields(headers, req.get(ValueLayout.JAVA_INT, OFF_HEADER_COUNT)) else emptyList(),
query = if (status == 0) fields(query, req.get(ValueLayout.JAVA_INT, OFF_QUERY_COUNT)) else emptyList(),
consumed = if (status == 0) consumed.get(ValueLayout.JAVA_LONG, 0L) else 0L,
)
}
Expand Down
1 change: 1 addition & 0 deletions bindings/kotlin/src/main/kotlin/io/glyph11/Main.kt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ private fun smoke() {
check("path", slice(valid, r.path) == "/api/users")
check("version", slice(valid, r.version) == "HTTP/1.1")
check("headerCount", r.headerCount == 2)
check("header name/value", r.headers[0].let { slice(valid, it.name) == "Host" && slice(valid, it.value) == "example.com" })
check("queryCount", r.queryCount == 2)
check("consumed", r.consumed.toInt() == valid.size)

Expand Down
50 changes: 25 additions & 25 deletions site/data.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"unit": "ns/op",
"generated": "2026-06-04 19:51 UTC",
"generated": "2026-06-04 21:00 UTC",
"langs": [
{
"key": "dotnet-managed-rom",
Expand Down Expand Up @@ -39,38 +39,38 @@
{
"payload": "small",
"label": "~95 B",
"dotnet-managed-rom": 118.0,
"dotnet-managed-multiseg": 256.6,
"pure-c": 97.8,
"pure-c-multiseg": 101.4,
"dotnet-ffi": 97.2,
"dotnet-ffi-multiseg": 106.3,
"kotlin-ffi": 102.2,
"kotlin-ffi-multiseg": 110.8
"dotnet-managed-rom": 118.2,
"dotnet-managed-multiseg": 130.2,
"pure-c": 97.1,
"pure-c-multiseg": 102.3,
"dotnet-ffi": 97.8,
"dotnet-ffi-multiseg": 109.6,
"kotlin-ffi": 100.2,
"kotlin-ffi-multiseg": 120.5
},
{
"payload": "4k",
"label": "4 KB",
"dotnet-managed-rom": 730.2,
"dotnet-managed-multiseg": 1362.7,
"pure-c": 512.4,
"pure-c-multiseg": 545.4,
"dotnet-ffi": 555.5,
"dotnet-ffi-multiseg": 586.7,
"kotlin-ffi": 574.0,
"kotlin-ffi-multiseg": 602.7
"dotnet-managed-rom": 727.2,
"dotnet-managed-multiseg": 753.2,
"pure-c": 521.5,
"pure-c-multiseg": 553.3,
"dotnet-ffi": 562.0,
"dotnet-ffi-multiseg": 612.4,
"kotlin-ffi": 561.5,
"kotlin-ffi-multiseg": 591.5
},
{
"payload": "32k",
"label": "32 KB",
"dotnet-managed-rom": 5028.1,
"dotnet-managed-multiseg": 9261.7,
"pure-c": 3784.2,
"pure-c-multiseg": 4256.1,
"dotnet-ffi": 4254.5,
"dotnet-ffi-multiseg": 4624.2,
"kotlin-ffi": 4166.6,
"kotlin-ffi-multiseg": 4657.8
"dotnet-managed-rom": 5039.4,
"dotnet-managed-multiseg": 5406.0,
"pure-c": 3906.2,
"pure-c-multiseg": 4324.2,
"dotnet-ffi": 4121.6,
"dotnet-ffi-multiseg": 4567.0,
"kotlin-ffi": 4182.1,
"kotlin-ffi-multiseg": 4795.3
}
]
}
Loading
Loading