Skip to content

Question: should MCU-scale language runtimes have a Tiny benchmark category? #181

@Alpha-Guardian

Description

@Alpha-Guardian

Hi MLCommons Tiny folks,

I wanted to share a small but unusual MCU language-runtime experiment and ask whether systems like this suggest a benchmark gap in the current Tiny landscape.

We built a public demo line called Engram and deployed it on a commodity ESP32-C3.

Current public numbers:

  • Host-side benchmark capability

    • LogiQA = 0.392523
    • IFEval = 0.780037
  • Published board proof

    • LogiQA 642 = 249 / 642 = 0.3878504672897196
    • host_full_match = 642 / 642
    • runtime artifact size = 1,380,771 bytes

Important scope note:

This is not presented as unrestricted open-input native LLM generation on MCU.

The board-side path is closer to a flash-resident, table-driven runtime with:

  • packed token weights
  • hashed lookup structures
  • fixed compiled probe batches
  • streaming fold / checksum style execution over precompiled structures

So this is not a standard vision/KWS/anomaly micro model. It is closer to a task-specialized language runtime whose behavior has been pushed into a very compact executable form.

Repo:
https://github.com/Alpha-Guardian/Engram

What I’m genuinely curious about is whether systems like this point to a missing benchmark category in the TinyML / MCU benchmark ecosystem.

Would something like the following make sense as a future benchmark direction?

  • constrained language-task execution
  • auditable board-measured language behavior
  • fixed-memory / fixed-artifact board deployment
  • explicit separation between host benchmark capability and board execution mode

If people here think this is out of scope for MLCommons Tiny, that would also be useful to know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions