feat(express): Implement A2UI Express compiler, decompiler and parser by gspencergoog · Pull Request #1726 · a2ui-project/a2ui

gspencergoog · 2026-06-23T01:54:13Z

Summary

This PR implements the A2UI Express technical specification, introducing A2UI Express — a highly compressed, model-optimized declarative syntax (DSL) for generative user interfaces. It provides a complete end-to-end Python implementation, including an ANTLR-based compiler, a decompiler, a schema-based system prompt generator, helper scripts, and comprehensive test suites.

This is a refined, standalone extraction of the A2UI Express compiler/decompiler portions originally proposed in PR #1678, incorporating automated parser generation, strict validation, and extensive test coverage.

Changes

Build System & Code Generation:
- Added Hatch build hook in pack_specs_hook.py to automatically compile the ANTLR grammar Express.g4 into Python3 source files at build-time.
- The build hook handles target case-insensitive file renaming to clean snake_case (express_lexer.py, express_parser.py, express_visitor.py), relative import post-processing, and automatic formatting of generated code with pyink.
- Updated pyproject.toml to include antlr4-python3-runtime as a runtime dependency, and antlr4-tools in the build system requirements.
Compiler & Parser (a2ui.experimental.express):
- Implemented an ANTLR-based parsing pipeline using Express.g4 to parse line-oriented declarative layout files.
- The ExpressCompiler compiles the AST directly into standard A2UI v1.0 JSON payloads (with dynamic positional parameter resolution and variable flattening).
- Supports rich string types: standard strings, raw strings (r"..."), raw multiline strings (r"""..."""), and escaped carriage returns.
- Integrates a partial parser mode supporting streaming recovery for incomplete layouts.
- Incorporates strict enum validation for component properties, raising ValueError on mismatch rather than silently ignoring invalid values.
Decompiler:
- Implemented ExpressDecompiler to convert standard A2UI v1.0 JSON payloads back into the highly compact, line-oriented Express DSL.
Schema Helper & Prompt Generator:
- Implemented CatalogSchemaHelper to parse catalog schema definitions.
- Implemented ExpressPromptGenerator to compile active catalog schemas into positional signatures used to prompt generative models.
Evaluation & Testing Scripts:
- Added run_inference.py to evaluate the A2UI Express prompt contract by converting JSON examples to Express DSL via Gemini/Ollama/MLX models and validating the round-trip compilation.
- Added recreate_dsl_examples.py to programmatically regenerate the dynamic markdown documentation.
Documentation & Examples:
- Added comprehensive layout examples under specification/proposals/express/examples/*.a2ui (36 files) along with their corresponding compiled JSON targets.
- Created README.md and a2ui_express.md detailing the DSL grammar, compiler mechanics, and usage.
- Created express_dsl_examples.md detailing the active system prompt contract and compiled weather forecast examples.

Impact & Risks

The feature is fully experimental, contained in the a2ui.experimental.express namespace, and gated behind the A2UI_EXPRESS_ENABLED=true environment variable.
There is no impact on stable production paths or other existing SDK modules.
Build-time code generation introduces a dependency on antlr4 (via antlr4-tools and antlr4-python3-runtime) during development/builds, which is automatically resolved by Hatch and standard pip/uv environments.

Testing

Added 44 robust unit tests under agent_sdks/python/a2ui_agent/tests/express/ including:
- test_compiler.py: Verifies parser correctness, token parsing, raw string handling, and carriage return unescaping.
- test_decompiler.py: Validates round-trip integrity (JSON -> Express -> JSON).
- test_integration.py: Tests the compiler against all 36 catalog layout examples.
- test_cli_tools.py: Tests script interfaces and prompt generation.
The tests can be executed via the standard Dart/Python test runners (e.g. uv run pytest).

…oducing a highly compressed, model-optimized declarative syntax (DSL) for generative user interfaces. It includes the compiler, decompiler, schema helper, and parser modules. This contains the A2UI Express compiler/decompiler portions of a2ui-project#1678, with some additional issues fixed, additional tests, and refinements. * **Compiler & Parser**: Implemented the `ExpressCompiler` and `Parser` in `a2ui.experimental.express` to parse line-oriented DSL and compile it into standard A2UI v1.0 JSON. Supports standard strings, raw strings (`r"..."`), raw multi-line strings (`r"""..."""`), and partial streaming recovery. * **Strict Enum Validation**: Added strict validation for component property enums to raise ValueError on invalid inputs instead of silently ignoring them. * **Event Context Compilation**: Simplified event context processing to avoid redundant compilation. * **Decompiler**: Implemented `ExpressDecompiler` to convert standard v1.0 JSON payloads back into compact Express DSL. * **Schema Helper & Prompt Generator**: Implemented `ExpressPromptGenerator` to compile active catalog schemas into positional signatures used by generative models. * **Examples**: Added 36 `.a2ui` layout examples and corresponding compiled `.json` targets. * **Format Checks**: Integrated `pyink` style verification for specification proposals. The feature is fully experimental and gated behind the `A2UI_EXPRESS_ENABLED=true` environment variable. It does not affect any stable production paths. * Added 44 comprehensive unit tests in `tests/express/` covering parser correctness, thread-safe compilation, raw string escaping, strict enum validation, and round-trip integrity.

- Add support for resolving escaped carriage returns (\r) to the carriage return character in compiler's _unescape_string helper. - Update the escape sequence regex pattern from \\(.) to \\([\s\S]) to match all escaped sequences including newlines. - Rewrite the decompiler carriage return escaping test to be an elegant, end-to-end round-trip compilation and decompilation test.

jacobsimionato · 2026-06-23T04:15:53Z

I think this and the compiler test would be easier to read if they were expressed as folders of example inputs/outputs or examples of invalid inputs and corresponding errors. I think we'll want this soon anyway, so we can have language-agnostic conformance tests. I know this makes the test extremely strict, e.g. it will end up testing whitespace etc, but I think that could actually be a good thing to force all the compiler/decompiler implementations to just work exactly the same.

Happy to skip this now if you want to submit and iterate though! There are clearly tests here, so it shouldn't be hard for an agent to convert them to conformance tests, or at least data-driven tests.

Yeah, I see what you mean. Let's put that in another PR, this one is hard enough to review. We should do this, though.

jacobsimionato · 2026-06-23T04:17:23Z

+  last_end = 0
+  for mo in re.finditer(tok_regex, text):
+    if mo.start() != last_end:
+      raise SyntaxError(f"Unexpected character: {text[last_end:mo.start()]!r}")


Are the exact errors that the compiler should throw formalized in the express spec? I think that'd be worth doing, and having conformance tests for them, to make sure that if we build and test error correction infra and prompts, they are portable across platforms.

No, but they should be. We have platform agnostic conformance tests already, but they don't have any error checking. I'll tackle this in a later PR.

jacobsimionato · 2026-06-23T04:20:01Z

  "description": "Unified catalog of basic A2UI components and functions.",
  "catalogId": "https://a2ui.org/specification/v1_0/catalogs/basic/catalog.json",
-  "instructions": "For layout, use the Row and Column components to organize other components.",
+  "instructions": "For layout, use the Row and Column components to organize other components.\n\n## Catalog Guidelines\n\n1. String Concatenation & Formatting: A2UI does not support binary operators like '+' or formatting symbols. To concatenate strings or dynamically inject data bindings into text, you must use the catalog function `formatString(value)` where the value string contains placeholders formatted as `${expression}`:\n   formatString(\"Hello ${/user/name}\")\n\n2. Strict Hierarchy: You must strictly adhere to the requested component nesting and hierarchy. If the prompt specifies that a component is 'inside' or 'contained in' another component, you MUST place it as a child of that specific component, not as a sibling or in a different container.\n\n## Examples\n\nExample 1: Dynamic text form\n```json\n[\n  {\n    \"version\": \"v1.0\",\n    \"createSurface\": {\n      \"surfaceId\": \"main\",\n      \"components\": [\n        {\n          \"id\": \"root\",\n          \"component\": \"Column\",\n          \"children\": [\"repField\", \"valueField\"]\n        },\n        {\n          \"id\": \"repField\",\n          \"component\": \"TextField\",\n          \"label\": \"Representative\",\n          \"value\": {\"path\": \"/form/rep\"},\n          \"placeholder\": \"Enter name\"\n        },\n        {\n          \"id\": \"valueField\",\n          \"component\": \"TextField\",\n          \"label\": \"Deal Value\",\n          \"value\": {\"path\": \"/form/value\"},\n          \"placeholder\": \"0.00\",\n          \"variant\": \"number\",\n          \"checks\": [\n            {\"call\": \"required\"}\n          ]\n        }\n      ],\n      \"dataModel\": {\n        \"form\": {\n          \"rep\": \"John Doe\",\n          \"value\": 1500.00\n        }\n      }\n    }\n  }\n]\n```\n\nExample 2: Dynamic list with templates\n```json\n[\n  {\n    \"version\": \"v1.0\",\n    \"createSurface\": {\n      \"surfaceId\": \"main\",\n      \"components\": [\n        {\n          \"id\": \"root\",\n          \"component\": \"Card\",\n          \"child\": \"breedList\"\n        },\n        {\n          \"id\": \"breedList\",\n          \"component\": \"List\",\n          \"children\": {\n            \"path\": \"/breeds\",\n            \"componentId\": \"breedTemplate\"\n          },\n          \"direction\": \"horizontal\"\n        },\n        {\n          \"id\": \"breedTemplate\",\n          \"component\": \"Image\",\n          \"url\": {\"path\": \"url\"}\n        }\n      ],\n      \"dataModel\": {\n        \"breeds\": [\n          {\n            \"url\": \"https://example.com/poodle.jpg\"\n          },\n          {\n            \"url\": \"https://example.com/lab.jpg\"\n          }\n        ]\n      }\n    }\n  }\n]\n```",


I wonder if the example here is actually sort of harmful, because alternative inference format prompt generators inline it into their prompt, it could confuse the LLM by being in in the vanilla A2UI protocol format.

Happy to fix that in v1.1 or something though!

Hmm. Yeah, that's a problem. Right now, I check for markdown code blocks and decompile them if they are json, but that's fragile. I think we talked about having an "examples" field in addition to an "instructions" field, and that sounds like the way to go. Then the format agnostic, but catalog related, prompt instructions can go here, but examples go in the examples field, and they can be decompiled.

jiahaog · 2026-06-23T03:08:09Z

            "id": "root",
            "component": "Card",
-            "child": "main-column"
+            "child": "main_column"


Not sure I follow, are these changes related to the PR?

Sort of. I can factor it out if you want. I defined identifiers more strictly, so that "-" wasn't a valid identifier character anymore, because if we ever want to have expressions (we don't yet, but we might), then allowing a minus sign in identifiers seems like a bad idea.

…ivate Refactors compiler.py to add leading underscores to module-level helper functions, the lexical scanner rules, the tokenizer function, and the TokenParser class. This prevents them from being exposed in the public API, keeping the A2UI Express public interface clean and minimal.

…uild hook Migrates the legacy recursive-descent parser and manual tokenizer in the Express compiler to a formal ANTLR4-based parsing pipeline. * Created `Express.g4` grammar defining the full DSL syntax, implementing strict trailing comma checks, semicolon skipping, and support for C++-style block comments. * Implemented `ExpressAstVisitor` and `ExpressErrorListener` inside a dedicated `visitor.py` module to cleanly construct AST nodes and handle lexer/parser syntax errors. * Automated parser compilation inside `pack_specs_hook.py` by strictly executing `antlr4` from the active Python virtual environment (`sys.prefix`), raising a hard `RuntimeError` on failure to guarantee build determinism. * Organized the generated files into a dedicated `generated/` sub-package and renamed modules to PEP 8 snake_case (`express_lexer.py`, `express_parser.py`, `express_visitor.py`) using a safe case-only rename pattern. * Updated `compiler.py` and `visitor.py` imports, and added comprehensive test coverage verifying C++-style block comment skipping.

…e package init - Exclude '**/generated/**' directories from license checks both locally and in CI. - Update the custom pack_specs_hook.py to automatically generate __init__.py inside the ANTLR output directory if missing, making the generated package completely self-building and safe to nuke.

Add `/generated/` to pyink's extend-exclude configurations in pyproject.toml so that auto-generated ANTLR files do not trigger style check failures.

Run pyink inside the custom pack_specs_hook.py build hook to automatically format newly generated parser files. This prevents unformatted generated files from dirtying the git working directory after builds and tests.

jacobsimionato · 2026-06-23T20:45:26Z

I have a moderate preference for deleting all these example files but keeping the script to generate them as necessary, e.g. for integration tests. Perhaps just keep 1 or 2 of these as a kind of documentation, so people can see what the format looks like at a glance?

Motivation: It would be great if people can add additional samples to the specification and not need to run scripts to keep all the other variants in sync. If we force every sample app and tool which wants to use the full set of samples to derive them from a single source of truth during the build process, then we ensure everything always uses the full set of samples.

I agree. I'd like to do that too, but let's do that in another PR too.

jiahaog

Nice!

…tion suite (#1740) ## Summary This change integrates the A2UI Express compiler, decompiler, and prompt strategies directly into the Inspect-ai evaluation suite. It introduces the `express` evaluation strategy, adds a new A2UI v1.0 prompt evaluation dataset, and updates the tasks, solvers, scorers, and CI runners to support schema-validated A2UI v1.0 and A2UI Express evaluations. This is a companion PR to the recently merged A2UI Express compiler implementation (PR #1726), enabling automated evaluation of LLM capability to generate A2UI Express DSL layouts and compiling them back to validated standard JSON. ## Changes * **Evaluation Strategies** (`eval/a2ui_eval/strategies/`): * Implemented the `express` strategy in a new module [express.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/strategies/express.py). This strategy: * Generates system prompt instructions dynamically using `ExpressPromptGenerator` based on the active catalog schema. * Invokes the model to generate layout designs in A2UI Express DSL. * Implements `compile_express_dsl` to extract the generated Express DSL, compile it into standard A2UI v1.0 JSON, and perform schema validation using `A2uiSchemaManager`. * Updated [__init__.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/strategies/__init__.py) to register the new `express` strategy. * **Tasks & Dataset** (`eval/`): * Added [v1_0_prompts.yaml](file:///Users/gspencer/code/a2ui/a2ui_express/eval/datasets/v1_0_prompts.yaml), a new evaluation dataset of prompts and target layouts designed for A2UI v1.0 and Express DSL testing. * Renamed the core evaluation task from `a2ui_v0_9_eval` to `a2ui_v0_9_1_eval` in [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py). * Upgraded [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py) to dynamically load the appropriate dataset and catalog schema (v0.9 vs v1.0/Express) depending on the selected strategy. * Updated the default grading model in [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py) to `google/gemini-3.5-flash`. * **Scorers & Dataset Loader** (`eval/a2ui_eval/`): * Upgraded `a2ui_scorer` in [scorers.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/scorers.py) to support parameterizable protocol versions (`0.9` or `1.0`), ensuring strict validation checks are performed against the correct schema. * Updated [dataset.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/dataset.py) to allow passing a `default_catalog_path`. * **CI Runners & Reporting**: * Updated [run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/bin/run_ci_evals.py) and [report_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/bin/report_evals.py) to run and report on the new `express` strategy alongside existing strategies. * Configured [main.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/main.py) to expose the express strategy option. * **Testing**: * Updated [test_strategies.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_strategies.py) to verify the correctness of the new `express` solver pipeline and compilation chain. * Updated [test_run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_run_ci_evals.py) to test the integration in the CI execution pipeline. ## Impact & Risks * **No Production Risk**: All changes are isolated inside the `eval/` directory and do not affect any SDK runtime paths. * **Gated Execution**: The Express solver activates `A2UI_EXPRESS_ENABLED=true` internally to compile outputs, which is perfectly safe and self-contained. ## Testing * Local unit tests in the `eval/` suite can be executed using `pytest`: ```bash uv run pytest eval/tests/ ``` Both [test_strategies.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_strategies.py) and [test_run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_run_ci_evals.py) have been verified.

github-project-automation Bot added this to A2UI Jun 23, 2026

github-project-automation Bot moved this to Todo in A2UI Jun 23, 2026

gspencergoog mentioned this pull request Jun 23, 2026

feat(express): Implement A2UI Express compiler, decompiler and parser gspencergoog/A2UI#3

Closed

gspencergoog requested review from jacobsimionato and jiahaog June 23, 2026 01:54

This comment was marked as resolved.

Sign in to view

gspencergoog force-pushed the express-pr2-compiler branch from d0b2afe to bc96bbb Compare June 23, 2026 02:06

This comment was marked as resolved.

Sign in to view

gspencergoog force-pushed the express-pr2-compiler branch from 00f3465 to f414489 Compare June 23, 2026 02:20

This comment was marked as resolved.

Sign in to view

gspencergoog force-pushed the express-pr2-compiler branch from f414489 to c2591da Compare June 23, 2026 02:29

gspencergoog force-pushed the express-pr2-compiler branch from 2c51b22 to a8224c2 Compare June 23, 2026 02:33

jacobsimionato reviewed Jun 23, 2026

View reviewed changes

jiahaog reviewed Jun 23, 2026

View reviewed changes

gspencergoog and others added 7 commits June 23, 2026 09:50

Merge branch 'main' into express-pr2-compiler

3da6665

build(express): exclude generated files from python code formatting

396654a

Add `/generated/` to pyink's extend-exclude configurations in pyproject.toml so that auto-generated ANTLR files do not trigger style check failures.

build(express): auto-format ANTLR generated files in build hook

086a5ea

Run pyink inside the custom pack_specs_hook.py build hook to automatically format newly generated parser files. This prevents unformatted generated files from dirtying the git working directory after builds and tests.

Merge branch 'main' into express-pr2-compiler

d8bbb85

jacobsimionato approved these changes Jun 23, 2026

View reviewed changes

Update examples and README.md

be47836

jiahaog approved these changes Jun 23, 2026

View reviewed changes

Remove local inference info

146ae6e

gspencergoog force-pushed the express-pr2-compiler branch from 9429581 to 146ae6e Compare June 23, 2026 22:52

Merge branch 'main' into express-pr2-compiler

073f801

gspencergoog merged commit 8fbc256 into a2ui-project:main Jun 23, 2026
19 checks passed

github-project-automation Bot moved this from Todo to Done in A2UI Jun 23, 2026

gspencergoog mentioned this pull request Jun 23, 2026

feat(express): Integrate A2UI Express and v1.0 into Inspect-ai evaluation suite #1740

Merged

gspencergoog deleted the express-pr2-compiler branch June 23, 2026 23:06

github-actions Bot mentioned this pull request Jun 24, 2026

Evals failed on main (PR #1726) #1743

Closed

Uh oh!

Conversation

gspencergoog commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Impact & Risks

Testing

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gspencergoog Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jiahaog left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gspencergoog commented Jun 23, 2026 •

edited

Loading

gspencergoog Jun 23, 2026 •

edited

Loading