Skip to content

feat(express): Implement A2UI Express compiler, decompiler and parser#1726

Merged
gspencergoog merged 12 commits into
a2ui-project:mainfrom
gspencergoog:express-pr2-compiler
Jun 23, 2026
Merged

feat(express): Implement A2UI Express compiler, decompiler and parser#1726
gspencergoog merged 12 commits into
a2ui-project:mainfrom
gspencergoog:express-pr2-compiler

Conversation

@gspencergoog

@gspencergoog gspencergoog commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR implements the A2UI Express technical specification, introducing A2UI Express — a highly compressed, model-optimized declarative syntax (DSL) for generative user interfaces. It provides a complete end-to-end Python implementation, including an ANTLR-based compiler, a decompiler, a schema-based system prompt generator, helper scripts, and comprehensive test suites.

This is a refined, standalone extraction of the A2UI Express compiler/decompiler portions originally proposed in PR #1678, incorporating automated parser generation, strict validation, and extensive test coverage.

Changes

  • Build System & Code Generation:
    • Added Hatch build hook in pack_specs_hook.py to automatically compile the ANTLR grammar Express.g4 into Python3 source files at build-time.
    • The build hook handles target case-insensitive file renaming to clean snake_case (express_lexer.py, express_parser.py, express_visitor.py), relative import post-processing, and automatic formatting of generated code with pyink.
    • Updated pyproject.toml to include antlr4-python3-runtime as a runtime dependency, and antlr4-tools in the build system requirements.
  • Compiler & Parser (a2ui.experimental.express):
    • Implemented an ANTLR-based parsing pipeline using Express.g4 to parse line-oriented declarative layout files.
    • The ExpressCompiler compiles the AST directly into standard A2UI v1.0 JSON payloads (with dynamic positional parameter resolution and variable flattening).
    • Supports rich string types: standard strings, raw strings (r"..."), raw multiline strings (r"""..."""), and escaped carriage returns.
    • Integrates a partial parser mode supporting streaming recovery for incomplete layouts.
    • Incorporates strict enum validation for component properties, raising ValueError on mismatch rather than silently ignoring invalid values.
  • Decompiler:
    • Implemented ExpressDecompiler to convert standard A2UI v1.0 JSON payloads back into the highly compact, line-oriented Express DSL.
  • Schema Helper & Prompt Generator:
    • Implemented CatalogSchemaHelper to parse catalog schema definitions.
    • Implemented ExpressPromptGenerator to compile active catalog schemas into positional signatures used to prompt generative models.
  • Evaluation & Testing Scripts:
    • Added run_inference.py to evaluate the A2UI Express prompt contract by converting JSON examples to Express DSL via Gemini/Ollama/MLX models and validating the round-trip compilation.
    • Added recreate_dsl_examples.py to programmatically regenerate the dynamic markdown documentation.
  • Documentation & Examples:
    • Added comprehensive layout examples under specification/proposals/express/examples/*.a2ui (36 files) along with their corresponding compiled JSON targets.
    • Created README.md and a2ui_express.md detailing the DSL grammar, compiler mechanics, and usage.
    • Created express_dsl_examples.md detailing the active system prompt contract and compiled weather forecast examples.

Impact & Risks

  • The feature is fully experimental, contained in the a2ui.experimental.express namespace, and gated behind the A2UI_EXPRESS_ENABLED=true environment variable.
  • There is no impact on stable production paths or other existing SDK modules.
  • Build-time code generation introduces a dependency on antlr4 (via antlr4-tools and antlr4-python3-runtime) during development/builds, which is automatically resolved by Hatch and standard pip/uv environments.

Testing

  • Added 44 robust unit tests under agent_sdks/python/a2ui_agent/tests/express/ including:
    • test_compiler.py: Verifies parser correctness, token parsing, raw string handling, and carriage return unescaping.
    • test_decompiler.py: Validates round-trip integrity (JSON -> Express -> JSON).
    • test_integration.py: Tests the compiler against all 36 catalog layout examples.
    • test_cli_tools.py: Tests script interfaces and prompt generation.
  • The tests can be executed via the standard Dart/Python test runners (e.g. uv run pytest).

gemini-code-assist[bot]

This comment was marked as resolved.

@gspencergoog gspencergoog force-pushed the express-pr2-compiler branch from d0b2afe to bc96bbb Compare June 23, 2026 02:06
gemini-code-assist[bot]

This comment was marked as resolved.

@gspencergoog gspencergoog force-pushed the express-pr2-compiler branch from 00f3465 to f414489 Compare June 23, 2026 02:20
…oducing a highly compressed, model-optimized declarative syntax (DSL) for generative user interfaces. It includes the compiler, decompiler, schema helper, and parser modules.

This contains the A2UI Express compiler/decompiler portions of a2ui-project#1678, with some additional issues fixed, additional tests, and refinements.

* **Compiler & Parser**: Implemented the `ExpressCompiler` and `Parser` in `a2ui.experimental.express` to parse line-oriented DSL and compile it into standard A2UI v1.0 JSON. Supports standard strings, raw strings (`r"..."`), raw multi-line strings (`r"""..."""`), and partial streaming recovery.
* **Strict Enum Validation**: Added strict validation for component property enums to raise ValueError on invalid inputs instead of silently ignoring them.
* **Event Context Compilation**: Simplified event context processing to avoid redundant compilation.
* **Decompiler**: Implemented `ExpressDecompiler` to convert standard v1.0 JSON payloads back into compact Express DSL.
* **Schema Helper & Prompt Generator**: Implemented `ExpressPromptGenerator` to compile active catalog schemas into positional signatures used by generative models.
* **Examples**: Added 36 `.a2ui` layout examples and corresponding compiled `.json` targets.
* **Format Checks**: Integrated `pyink` style verification for specification proposals.

The feature is fully experimental and gated behind the `A2UI_EXPRESS_ENABLED=true` environment variable. It does not affect any stable production paths.

* Added 44 comprehensive unit tests in `tests/express/` covering parser correctness, thread-safe compilation, raw string escaping, strict enum validation, and round-trip integrity.
gemini-code-assist[bot]

This comment was marked as resolved.

@gspencergoog gspencergoog force-pushed the express-pr2-compiler branch from f414489 to c2591da Compare June 23, 2026 02:29
- Add support for resolving escaped carriage returns (\r) to the carriage return character in compiler's _unescape_string helper.
- Update the escape sequence regex pattern from \\(.) to \\([\s\S]) to match all escaped sequences including newlines.
- Rewrite the decompiler carriage return escaping test to be an elegant, end-to-end round-trip compilation and decompilation test.
@gspencergoog gspencergoog force-pushed the express-pr2-compiler branch from 2c51b22 to a8224c2 Compare June 23, 2026 02:33
Comment thread agent_sdks/python/a2ui_agent/src/a2ui/experimental/express/compiler.py Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and the compiler test would be easier to read if they were expressed as folders of example inputs/outputs or examples of invalid inputs and corresponding errors. I think we'll want this soon anyway, so we can have language-agnostic conformance tests. I know this makes the test extremely strict, e.g. it will end up testing whitespace etc, but I think that could actually be a good thing to force all the compiler/decompiler implementations to just work exactly the same.

Happy to skip this now if you want to submit and iterate though! There are clearly tests here, so it shouldn't be hard for an agent to convert them to conformance tests, or at least data-driven tests.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see what you mean. Let's put that in another PR, this one is hard enough to review. We should do this, though.

last_end = 0
for mo in re.finditer(tok_regex, text):
if mo.start() != last_end:
raise SyntaxError(f"Unexpected character: {text[last_end:mo.start()]!r}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the exact errors that the compiler should throw formalized in the express spec? I think that'd be worth doing, and having conformance tests for them, to make sure that if we build and test error correction infra and prompts, they are portable across platforms.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but they should be. We have platform agnostic conformance tests already, but they don't have any error checking. I'll tackle this in a later PR.

"description": "Unified catalog of basic A2UI components and functions.",
"catalogId": "https://a2ui.org/specification/v1_0/catalogs/basic/catalog.json",
"instructions": "For layout, use the Row and Column components to organize other components.",
"instructions": "For layout, use the Row and Column components to organize other components.\n\n## Catalog Guidelines\n\n1. String Concatenation & Formatting: A2UI does not support binary operators like '+' or formatting symbols. To concatenate strings or dynamically inject data bindings into text, you must use the catalog function `formatString(value)` where the value string contains placeholders formatted as `${expression}`:\n formatString(\"Hello ${/user/name}\")\n\n2. Strict Hierarchy: You must strictly adhere to the requested component nesting and hierarchy. If the prompt specifies that a component is 'inside' or 'contained in' another component, you MUST place it as a child of that specific component, not as a sibling or in a different container.\n\n## Examples\n\nExample 1: Dynamic text form\n```json\n[\n {\n \"version\": \"v1.0\",\n \"createSurface\": {\n \"surfaceId\": \"main\",\n \"components\": [\n {\n \"id\": \"root\",\n \"component\": \"Column\",\n \"children\": [\"repField\", \"valueField\"]\n },\n {\n \"id\": \"repField\",\n \"component\": \"TextField\",\n \"label\": \"Representative\",\n \"value\": {\"path\": \"/form/rep\"},\n \"placeholder\": \"Enter name\"\n },\n {\n \"id\": \"valueField\",\n \"component\": \"TextField\",\n \"label\": \"Deal Value\",\n \"value\": {\"path\": \"/form/value\"},\n \"placeholder\": \"0.00\",\n \"variant\": \"number\",\n \"checks\": [\n {\"call\": \"required\"}\n ]\n }\n ],\n \"dataModel\": {\n \"form\": {\n \"rep\": \"John Doe\",\n \"value\": 1500.00\n }\n }\n }\n }\n]\n```\n\nExample 2: Dynamic list with templates\n```json\n[\n {\n \"version\": \"v1.0\",\n \"createSurface\": {\n \"surfaceId\": \"main\",\n \"components\": [\n {\n \"id\": \"root\",\n \"component\": \"Card\",\n \"child\": \"breedList\"\n },\n {\n \"id\": \"breedList\",\n \"component\": \"List\",\n \"children\": {\n \"path\": \"/breeds\",\n \"componentId\": \"breedTemplate\"\n },\n \"direction\": \"horizontal\"\n },\n {\n \"id\": \"breedTemplate\",\n \"component\": \"Image\",\n \"url\": {\"path\": \"url\"}\n }\n ],\n \"dataModel\": {\n \"breeds\": [\n {\n \"url\": \"https://example.com/poodle.jpg\"\n },\n {\n \"url\": \"https://example.com/lab.jpg\"\n }\n ]\n }\n }\n }\n]\n```",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the example here is actually sort of harmful, because alternative inference format prompt generators inline it into their prompt, it could confuse the LLM by being in in the vanilla A2UI protocol format.

Happy to fix that in v1.1 or something though!

@gspencergoog gspencergoog Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Yeah, that's a problem. Right now, I check for markdown code blocks and decompile them if they are json, but that's fragile. I think we talked about having an "examples" field in addition to an "instructions" field, and that sounds like the way to go. Then the format agnostic, but catalog related, prompt instructions can go here, but examples go in the examples field, and they can be decompiled.

"id": "root",
"component": "Card",
"child": "main-column"
"child": "main_column"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, are these changes related to the PR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. I can factor it out if you want. I defined identifiers more strictly, so that "-" wasn't a valid identifier character anymore, because if we ever want to have expressions (we don't yet, but we might), then allowing a minus sign in identifiers seems like a bad idea.

Comment thread agent_sdks/python/a2ui_agent/src/a2ui/experimental/express/compiler.py Outdated
gspencergoog and others added 7 commits June 23, 2026 09:50
…ivate

Refactors compiler.py to add leading underscores to module-level helper functions, the lexical scanner rules, the tokenizer function, and the TokenParser class. This prevents them from being exposed in the public API, keeping the A2UI Express public interface clean and minimal.
…uild hook

Migrates the legacy recursive-descent parser and manual tokenizer in the
Express compiler to a formal ANTLR4-based parsing pipeline.

*   Created `Express.g4` grammar defining the full DSL syntax, implementing strict
    trailing comma checks, semicolon skipping, and support for C++-style block comments.
*   Implemented `ExpressAstVisitor` and `ExpressErrorListener` inside a dedicated
    `visitor.py` module to cleanly construct AST nodes and handle lexer/parser syntax errors.
*   Automated parser compilation inside `pack_specs_hook.py` by strictly executing
    `antlr4` from the active Python virtual environment (`sys.prefix`), raising a hard
    `RuntimeError` on failure to guarantee build determinism.
*   Organized the generated files into a dedicated `generated/` sub-package and renamed
    modules to PEP 8 snake_case (`express_lexer.py`, `express_parser.py`, `express_visitor.py`)
    using a safe case-only rename pattern.
*   Updated `compiler.py` and `visitor.py` imports, and added comprehensive test coverage
    verifying C++-style block comment skipping.
…e package init

- Exclude '**/generated/**' directories from license checks both locally and in CI.
- Update the custom pack_specs_hook.py to automatically generate __init__.py inside the ANTLR output directory if missing, making the generated package completely self-building and safe to nuke.
Add `/generated/` to pyink's extend-exclude configurations in pyproject.toml
so that auto-generated ANTLR files do not trigger style check failures.
Run pyink inside the custom pack_specs_hook.py build hook to automatically
format newly generated parser files. This prevents unformatted generated
files from dirtying the git working directory after builds and tests.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a moderate preference for deleting all these example files but keeping the script to generate them as necessary, e.g. for integration tests. Perhaps just keep 1 or 2 of these as a kind of documentation, so people can see what the format looks like at a glance?

Motivation: It would be great if people can add additional samples to the specification and not need to run scripts to keep all the other variants in sync. If we force every sample app and tool which wants to use the full set of samples to derive them from a single source of truth during the build process, then we ensure everything always uses the full set of samples.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'd like to do that too, but let's do that in another PR too.

Comment thread specification/proposals/express/README.md Outdated

@jiahaog jiahaog left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@gspencergoog gspencergoog force-pushed the express-pr2-compiler branch from 9429581 to 146ae6e Compare June 23, 2026 22:52
@gspencergoog gspencergoog merged commit 8fbc256 into a2ui-project:main Jun 23, 2026
19 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in A2UI Jun 23, 2026
@gspencergoog gspencergoog deleted the express-pr2-compiler branch June 23, 2026 23:06
gspencergoog added a commit that referenced this pull request Jun 25, 2026
…tion suite (#1740)

## Summary
This change integrates the A2UI Express compiler, decompiler, and prompt strategies directly into the Inspect-ai evaluation suite. It introduces the `express` evaluation strategy, adds a new A2UI v1.0 prompt evaluation dataset, and updates the tasks, solvers, scorers, and CI runners to support schema-validated A2UI v1.0 and A2UI Express evaluations.

This is a companion PR to the recently merged A2UI Express compiler implementation (PR #1726), enabling automated evaluation of LLM capability to generate A2UI Express DSL layouts and compiling them back to validated standard JSON.

## Changes
* **Evaluation Strategies** (`eval/a2ui_eval/strategies/`):
  * Implemented the `express` strategy in a new module [express.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/strategies/express.py). This strategy:
    * Generates system prompt instructions dynamically using `ExpressPromptGenerator` based on the active catalog schema.
    * Invokes the model to generate layout designs in A2UI Express DSL.
    * Implements `compile_express_dsl` to extract the generated Express DSL, compile it into standard A2UI v1.0 JSON, and perform schema validation using `A2uiSchemaManager`.
  * Updated [__init__.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/strategies/__init__.py) to register the new `express` strategy.
* **Tasks & Dataset** (`eval/`):
  * Added [v1_0_prompts.yaml](file:///Users/gspencer/code/a2ui/a2ui_express/eval/datasets/v1_0_prompts.yaml), a new evaluation dataset of prompts and target layouts designed for A2UI v1.0 and Express DSL testing.
  * Renamed the core evaluation task from `a2ui_v0_9_eval` to `a2ui_v0_9_1_eval` in [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py).
  * Upgraded [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py) to dynamically load the appropriate dataset and catalog schema (v0.9 vs v1.0/Express) depending on the selected strategy.
  * Updated the default grading model in [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py) to `google/gemini-3.5-flash`.
* **Scorers & Dataset Loader** (`eval/a2ui_eval/`):
  * Upgraded `a2ui_scorer` in [scorers.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/scorers.py) to support parameterizable protocol versions (`0.9` or `1.0`), ensuring strict validation checks are performed against the correct schema.
  * Updated [dataset.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/dataset.py) to allow passing a `default_catalog_path`.
* **CI Runners & Reporting**:
  * Updated [run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/bin/run_ci_evals.py) and [report_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/bin/report_evals.py) to run and report on the new `express` strategy alongside existing strategies.
  * Configured [main.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/main.py) to expose the express strategy option.
* **Testing**:
  * Updated [test_strategies.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_strategies.py) to verify the correctness of the new `express` solver pipeline and compilation chain.
  * Updated [test_run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_run_ci_evals.py) to test the integration in the CI execution pipeline.

## Impact & Risks
* **No Production Risk**: All changes are isolated inside the `eval/` directory and do not affect any SDK runtime paths.
* **Gated Execution**: The Express solver activates `A2UI_EXPRESS_ENABLED=true` internally to compile outputs, which is perfectly safe and self-contained.

## Testing
* Local unit tests in the `eval/` suite can be executed using `pytest`:
  ```bash
  uv run pytest eval/tests/
  ```
  Both [test_strategies.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_strategies.py) and [test_run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_run_ci_evals.py) have been verified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants