feat(express): Implement A2UI Express compiler, decompiler and parser#1726
Conversation
d0b2afe to
bc96bbb
Compare
00f3465 to
f414489
Compare
…oducing a highly compressed, model-optimized declarative syntax (DSL) for generative user interfaces. It includes the compiler, decompiler, schema helper, and parser modules. This contains the A2UI Express compiler/decompiler portions of a2ui-project#1678, with some additional issues fixed, additional tests, and refinements. * **Compiler & Parser**: Implemented the `ExpressCompiler` and `Parser` in `a2ui.experimental.express` to parse line-oriented DSL and compile it into standard A2UI v1.0 JSON. Supports standard strings, raw strings (`r"..."`), raw multi-line strings (`r"""..."""`), and partial streaming recovery. * **Strict Enum Validation**: Added strict validation for component property enums to raise ValueError on invalid inputs instead of silently ignoring them. * **Event Context Compilation**: Simplified event context processing to avoid redundant compilation. * **Decompiler**: Implemented `ExpressDecompiler` to convert standard v1.0 JSON payloads back into compact Express DSL. * **Schema Helper & Prompt Generator**: Implemented `ExpressPromptGenerator` to compile active catalog schemas into positional signatures used by generative models. * **Examples**: Added 36 `.a2ui` layout examples and corresponding compiled `.json` targets. * **Format Checks**: Integrated `pyink` style verification for specification proposals. The feature is fully experimental and gated behind the `A2UI_EXPRESS_ENABLED=true` environment variable. It does not affect any stable production paths. * Added 44 comprehensive unit tests in `tests/express/` covering parser correctness, thread-safe compilation, raw string escaping, strict enum validation, and round-trip integrity.
f414489 to
c2591da
Compare
- Add support for resolving escaped carriage returns (\r) to the carriage return character in compiler's _unescape_string helper. - Update the escape sequence regex pattern from \\(.) to \\([\s\S]) to match all escaped sequences including newlines. - Rewrite the decompiler carriage return escaping test to be an elegant, end-to-end round-trip compilation and decompilation test.
2c51b22 to
a8224c2
Compare
There was a problem hiding this comment.
I think this and the compiler test would be easier to read if they were expressed as folders of example inputs/outputs or examples of invalid inputs and corresponding errors. I think we'll want this soon anyway, so we can have language-agnostic conformance tests. I know this makes the test extremely strict, e.g. it will end up testing whitespace etc, but I think that could actually be a good thing to force all the compiler/decompiler implementations to just work exactly the same.
Happy to skip this now if you want to submit and iterate though! There are clearly tests here, so it shouldn't be hard for an agent to convert them to conformance tests, or at least data-driven tests.
There was a problem hiding this comment.
Yeah, I see what you mean. Let's put that in another PR, this one is hard enough to review. We should do this, though.
| last_end = 0 | ||
| for mo in re.finditer(tok_regex, text): | ||
| if mo.start() != last_end: | ||
| raise SyntaxError(f"Unexpected character: {text[last_end:mo.start()]!r}") |
There was a problem hiding this comment.
Are the exact errors that the compiler should throw formalized in the express spec? I think that'd be worth doing, and having conformance tests for them, to make sure that if we build and test error correction infra and prompts, they are portable across platforms.
There was a problem hiding this comment.
No, but they should be. We have platform agnostic conformance tests already, but they don't have any error checking. I'll tackle this in a later PR.
| "description": "Unified catalog of basic A2UI components and functions.", | ||
| "catalogId": "https://a2ui.org/specification/v1_0/catalogs/basic/catalog.json", | ||
| "instructions": "For layout, use the Row and Column components to organize other components.", | ||
| "instructions": "For layout, use the Row and Column components to organize other components.\n\n## Catalog Guidelines\n\n1. String Concatenation & Formatting: A2UI does not support binary operators like '+' or formatting symbols. To concatenate strings or dynamically inject data bindings into text, you must use the catalog function `formatString(value)` where the value string contains placeholders formatted as `${expression}`:\n formatString(\"Hello ${/user/name}\")\n\n2. Strict Hierarchy: You must strictly adhere to the requested component nesting and hierarchy. If the prompt specifies that a component is 'inside' or 'contained in' another component, you MUST place it as a child of that specific component, not as a sibling or in a different container.\n\n## Examples\n\nExample 1: Dynamic text form\n```json\n[\n {\n \"version\": \"v1.0\",\n \"createSurface\": {\n \"surfaceId\": \"main\",\n \"components\": [\n {\n \"id\": \"root\",\n \"component\": \"Column\",\n \"children\": [\"repField\", \"valueField\"]\n },\n {\n \"id\": \"repField\",\n \"component\": \"TextField\",\n \"label\": \"Representative\",\n \"value\": {\"path\": \"/form/rep\"},\n \"placeholder\": \"Enter name\"\n },\n {\n \"id\": \"valueField\",\n \"component\": \"TextField\",\n \"label\": \"Deal Value\",\n \"value\": {\"path\": \"/form/value\"},\n \"placeholder\": \"0.00\",\n \"variant\": \"number\",\n \"checks\": [\n {\"call\": \"required\"}\n ]\n }\n ],\n \"dataModel\": {\n \"form\": {\n \"rep\": \"John Doe\",\n \"value\": 1500.00\n }\n }\n }\n }\n]\n```\n\nExample 2: Dynamic list with templates\n```json\n[\n {\n \"version\": \"v1.0\",\n \"createSurface\": {\n \"surfaceId\": \"main\",\n \"components\": [\n {\n \"id\": \"root\",\n \"component\": \"Card\",\n \"child\": \"breedList\"\n },\n {\n \"id\": \"breedList\",\n \"component\": \"List\",\n \"children\": {\n \"path\": \"/breeds\",\n \"componentId\": \"breedTemplate\"\n },\n \"direction\": \"horizontal\"\n },\n {\n \"id\": \"breedTemplate\",\n \"component\": \"Image\",\n \"url\": {\"path\": \"url\"}\n }\n ],\n \"dataModel\": {\n \"breeds\": [\n {\n \"url\": \"https://example.com/poodle.jpg\"\n },\n {\n \"url\": \"https://example.com/lab.jpg\"\n }\n ]\n }\n }\n }\n]\n```", |
There was a problem hiding this comment.
I wonder if the example here is actually sort of harmful, because alternative inference format prompt generators inline it into their prompt, it could confuse the LLM by being in in the vanilla A2UI protocol format.
Happy to fix that in v1.1 or something though!
There was a problem hiding this comment.
Hmm. Yeah, that's a problem. Right now, I check for markdown code blocks and decompile them if they are json, but that's fragile. I think we talked about having an "examples" field in addition to an "instructions" field, and that sounds like the way to go. Then the format agnostic, but catalog related, prompt instructions can go here, but examples go in the examples field, and they can be decompiled.
| "id": "root", | ||
| "component": "Card", | ||
| "child": "main-column" | ||
| "child": "main_column" |
There was a problem hiding this comment.
Not sure I follow, are these changes related to the PR?
There was a problem hiding this comment.
Sort of. I can factor it out if you want. I defined identifiers more strictly, so that "-" wasn't a valid identifier character anymore, because if we ever want to have expressions (we don't yet, but we might), then allowing a minus sign in identifiers seems like a bad idea.
…ivate Refactors compiler.py to add leading underscores to module-level helper functions, the lexical scanner rules, the tokenizer function, and the TokenParser class. This prevents them from being exposed in the public API, keeping the A2UI Express public interface clean and minimal.
…uild hook
Migrates the legacy recursive-descent parser and manual tokenizer in the
Express compiler to a formal ANTLR4-based parsing pipeline.
* Created `Express.g4` grammar defining the full DSL syntax, implementing strict
trailing comma checks, semicolon skipping, and support for C++-style block comments.
* Implemented `ExpressAstVisitor` and `ExpressErrorListener` inside a dedicated
`visitor.py` module to cleanly construct AST nodes and handle lexer/parser syntax errors.
* Automated parser compilation inside `pack_specs_hook.py` by strictly executing
`antlr4` from the active Python virtual environment (`sys.prefix`), raising a hard
`RuntimeError` on failure to guarantee build determinism.
* Organized the generated files into a dedicated `generated/` sub-package and renamed
modules to PEP 8 snake_case (`express_lexer.py`, `express_parser.py`, `express_visitor.py`)
using a safe case-only rename pattern.
* Updated `compiler.py` and `visitor.py` imports, and added comprehensive test coverage
verifying C++-style block comment skipping.
…e package init - Exclude '**/generated/**' directories from license checks both locally and in CI. - Update the custom pack_specs_hook.py to automatically generate __init__.py inside the ANTLR output directory if missing, making the generated package completely self-building and safe to nuke.
Add `/generated/` to pyink's extend-exclude configurations in pyproject.toml so that auto-generated ANTLR files do not trigger style check failures.
Run pyink inside the custom pack_specs_hook.py build hook to automatically format newly generated parser files. This prevents unformatted generated files from dirtying the git working directory after builds and tests.
There was a problem hiding this comment.
I have a moderate preference for deleting all these example files but keeping the script to generate them as necessary, e.g. for integration tests. Perhaps just keep 1 or 2 of these as a kind of documentation, so people can see what the format looks like at a glance?
Motivation: It would be great if people can add additional samples to the specification and not need to run scripts to keep all the other variants in sync. If we force every sample app and tool which wants to use the full set of samples to derive them from a single source of truth during the build process, then we ensure everything always uses the full set of samples.
There was a problem hiding this comment.
I agree. I'd like to do that too, but let's do that in another PR too.
9429581 to
146ae6e
Compare
…tion suite (#1740) ## Summary This change integrates the A2UI Express compiler, decompiler, and prompt strategies directly into the Inspect-ai evaluation suite. It introduces the `express` evaluation strategy, adds a new A2UI v1.0 prompt evaluation dataset, and updates the tasks, solvers, scorers, and CI runners to support schema-validated A2UI v1.0 and A2UI Express evaluations. This is a companion PR to the recently merged A2UI Express compiler implementation (PR #1726), enabling automated evaluation of LLM capability to generate A2UI Express DSL layouts and compiling them back to validated standard JSON. ## Changes * **Evaluation Strategies** (`eval/a2ui_eval/strategies/`): * Implemented the `express` strategy in a new module [express.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/strategies/express.py). This strategy: * Generates system prompt instructions dynamically using `ExpressPromptGenerator` based on the active catalog schema. * Invokes the model to generate layout designs in A2UI Express DSL. * Implements `compile_express_dsl` to extract the generated Express DSL, compile it into standard A2UI v1.0 JSON, and perform schema validation using `A2uiSchemaManager`. * Updated [__init__.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/strategies/__init__.py) to register the new `express` strategy. * **Tasks & Dataset** (`eval/`): * Added [v1_0_prompts.yaml](file:///Users/gspencer/code/a2ui/a2ui_express/eval/datasets/v1_0_prompts.yaml), a new evaluation dataset of prompts and target layouts designed for A2UI v1.0 and Express DSL testing. * Renamed the core evaluation task from `a2ui_v0_9_eval` to `a2ui_v0_9_1_eval` in [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py). * Upgraded [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py) to dynamically load the appropriate dataset and catalog schema (v0.9 vs v1.0/Express) depending on the selected strategy. * Updated the default grading model in [tasks.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tasks.py) to `google/gemini-3.5-flash`. * **Scorers & Dataset Loader** (`eval/a2ui_eval/`): * Upgraded `a2ui_scorer` in [scorers.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/scorers.py) to support parameterizable protocol versions (`0.9` or `1.0`), ensuring strict validation checks are performed against the correct schema. * Updated [dataset.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/a2ui_eval/dataset.py) to allow passing a `default_catalog_path`. * **CI Runners & Reporting**: * Updated [run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/bin/run_ci_evals.py) and [report_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/bin/report_evals.py) to run and report on the new `express` strategy alongside existing strategies. * Configured [main.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/main.py) to expose the express strategy option. * **Testing**: * Updated [test_strategies.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_strategies.py) to verify the correctness of the new `express` solver pipeline and compilation chain. * Updated [test_run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_run_ci_evals.py) to test the integration in the CI execution pipeline. ## Impact & Risks * **No Production Risk**: All changes are isolated inside the `eval/` directory and do not affect any SDK runtime paths. * **Gated Execution**: The Express solver activates `A2UI_EXPRESS_ENABLED=true` internally to compile outputs, which is perfectly safe and self-contained. ## Testing * Local unit tests in the `eval/` suite can be executed using `pytest`: ```bash uv run pytest eval/tests/ ``` Both [test_strategies.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_strategies.py) and [test_run_ci_evals.py](file:///Users/gspencer/code/a2ui/a2ui_express/eval/tests/test_run_ci_evals.py) have been verified.
Summary
This PR implements the A2UI Express technical specification, introducing A2UI Express — a highly compressed, model-optimized declarative syntax (DSL) for generative user interfaces. It provides a complete end-to-end Python implementation, including an ANTLR-based compiler, a decompiler, a schema-based system prompt generator, helper scripts, and comprehensive test suites.
This is a refined, standalone extraction of the A2UI Express compiler/decompiler portions originally proposed in PR #1678, incorporating automated parser generation, strict validation, and extensive test coverage.
Changes
express_lexer.py,express_parser.py,express_visitor.py), relative import post-processing, and automatic formatting of generated code withpyink.antlr4-python3-runtimeas a runtime dependency, andantlr4-toolsin the build system requirements.a2ui.experimental.express):Express.g4to parse line-oriented declarative layout files.ExpressCompilercompiles the AST directly into standard A2UI v1.0 JSON payloads (with dynamic positional parameter resolution and variable flattening).r"..."), raw multiline strings (r"""..."""), and escaped carriage returns.ValueErroron mismatch rather than silently ignoring invalid values.ExpressDecompilerto convert standard A2UI v1.0 JSON payloads back into the highly compact, line-oriented Express DSL.CatalogSchemaHelperto parse catalog schema definitions.ExpressPromptGeneratorto compile active catalog schemas into positional signatures used to prompt generative models.run_inference.pyto evaluate the A2UI Express prompt contract by converting JSON examples to Express DSL via Gemini/Ollama/MLX models and validating the round-trip compilation.recreate_dsl_examples.pyto programmatically regenerate the dynamic markdown documentation.specification/proposals/express/examples/*.a2ui(36 files) along with their corresponding compiled JSON targets.Impact & Risks
a2ui.experimental.expressnamespace, and gated behind theA2UI_EXPRESS_ENABLED=trueenvironment variable.antlr4(viaantlr4-toolsandantlr4-python3-runtime) during development/builds, which is automatically resolved by Hatch and standard pip/uv environments.Testing
agent_sdks/python/a2ui_agent/tests/express/including:uv run pytest).