Skip to content

Wasmtime debugging v3: "debug main" environment.#45

Open
cfallin wants to merge 2 commits intobytecodealliance:mainfrom
cfallin:wasmtime-debugging-v3
Open

Wasmtime debugging v3: "debug main" environment.#45
cfallin wants to merge 2 commits intobytecodealliance:mainfrom
cfallin:wasmtime-debugging-v3

Conversation

@cfallin
Copy link
Member

@cfallin cfallin commented Feb 21, 2026

cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 23, 2026
…ifact.

This PR adds logic to embed the original core Wasm module(s) from a
compilation into a new ELF section, alongside other metadata
sections. When a component is compiled, the core Wasms inside are
preserved, accessible by their `StaticModuleIndex`es.

The need for this support arises from the guest-debugger
ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native
code using Wasmtime's APIs. In either case, the existing APIs to
introspect execution state provide `Module` references for each
instance from each stack frame, and PC offsets into these `Module`s
are the way in which breakpoints are configured. The debugger will
somehow need to associate these `Module`s with the original Wasm
bytecode, including e.g. any custom sections containing the
producer-specific ways of encoding debug metadata, to do something
useful. In particular also note that the GDB-stub protocol as extended
for Wasm requires read access directly to the Wasm bytecode (it shows
up as part of a "memory map" that is viewed by the standard
read-remote-memory command); we can't delegate this requirement to the
remote end of the stub connection, but have to handle it in the stub
server that runs inside Wasmtime (as a component or bespoke).

We have two main choices: carry the original bytecode all the way
through the Wasmtime compilation pipeline and present it via
`Module::bytecode()`, ready to use; or say that this task is
out-of-scope and that the debugger top-half can find it on disk
somehow.

Unfortunately the latter ("out of scope, find the file") is somewhat
at odds with the desired developer experience:

- It means that we need some way of mapping a compiled Wasm artifact
  back to a source Wasm; absent "here's the full bytecode", that means
  "here's the path to the full bytecode", but that path is an
  identifier that may not be universally accessible (consider
  e.g. capabilities/preopens present for a debugger component) or
  portable (consider e.g. moving the artifact to a different machine).

  - Or we don't even provide that metadata, and require the user to
    explicitly specify the same module filename twice -- once to
    actually run it, and once as an argument to the debugger.

- It means that we should account for stale artifacts and mark the
  mismatch somehow; e.g. if the user starts debugging with Wasmtime,
  either from a `.cwasm` on disk or with one produced in-memory just
  for this run, and then subsequently rebuilds their source `.wasm`,
  we no longer have a reference for it. (The same problem exists one
  level up if source code is edited, but source to a Wasm producer
  toolchain is definitely out-of-scope for Wasmtime.)

- It means that special logic is required in the case of components to
  map a module back to a specific component section (we would
  essentially have to expose the static module IDs, then require the
  debugger top-half to re-implement our exact flattening algorithm to
  find that core module).

The permissions issue alone was enough to convince me that we should
do something better than providing a filename (why should we have to
authorize the adapter to read the user's filesystem?) but all of the
other benefits -- ensuring an exact match and ensuring perfect
availability -- are a nice bonus. The main downside is making the
`.cwasm` larger (possibly substantially so), but this overhead is only
present when enabling guest-debugging, the data has to be present
anyway, and this is likely not a dealbreaker.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 23, 2026
…ifact.

This PR adds logic to embed the original core Wasm module(s) from a
compilation into a new ELF section, alongside other metadata
sections. When a component is compiled, the core Wasms inside are
preserved, accessible by their `StaticModuleIndex`es.

The need for this support arises from the guest-debugger
ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native
code using Wasmtime's APIs. In either case, the existing APIs to
introspect execution state provide `Module` references for each
instance from each stack frame, and PC offsets into these `Module`s
are the way in which breakpoints are configured. The debugger will
somehow need to associate these `Module`s with the original Wasm
bytecode, including e.g. any custom sections containing the
producer-specific ways of encoding debug metadata, to do something
useful. In particular also note that the GDB-stub protocol as extended
for Wasm requires read access directly to the Wasm bytecode (it shows
up as part of a "memory map" that is viewed by the standard
read-remote-memory command); we can't delegate this requirement to the
remote end of the stub connection, but have to handle it in the stub
server that runs inside Wasmtime (as a component or bespoke).

We have two main choices: carry the original bytecode all the way
through the Wasmtime compilation pipeline and present it via
`Module::bytecode()`, ready to use; or say that this task is
out-of-scope and that the debugger top-half can find it on disk
somehow.

Unfortunately the latter ("out of scope, find the file") is somewhat
at odds with the desired developer experience:

- It means that we need some way of mapping a compiled Wasm artifact
  back to a source Wasm; absent "here's the full bytecode", that means
  "here's the path to the full bytecode", but that path is an
  identifier that may not be universally accessible (consider
  e.g. capabilities/preopens present for a debugger component) or
  portable (consider e.g. moving the artifact to a different machine).

  - Or we don't even provide that metadata, and require the user to
    explicitly specify the same module filename twice -- once to
    actually run it, and once as an argument to the debugger.

- It means that we should account for stale artifacts and mark the
  mismatch somehow; e.g. if the user starts debugging with Wasmtime,
  either from a `.cwasm` on disk or with one produced in-memory just
  for this run, and then subsequently rebuilds their source `.wasm`,
  we no longer have a reference for it. (The same problem exists one
  level up if source code is edited, but source to a Wasm producer
  toolchain is definitely out-of-scope for Wasmtime.)

- It means that special logic is required in the case of components to
  map a module back to a specific component section (we would
  essentially have to expose the static module IDs, then require the
  debugger top-half to re-implement our exact flattening algorithm to
  find that core module).

The permissions issue alone was enough to convince me that we should
do something better than providing a filename (why should we have to
authorize the adapter to read the user's filesystem?) but all of the
other benefits -- ensuring an exact match and ensuring perfect
availability -- are a nice bonus. The main downside is making the
`.cwasm` larger (possibly substantially so), but this overhead is only
present when enabling guest-debugging, the data has to be present
anyway, and this is likely not a dealbreaker.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 23, 2026
…ifact.

This PR adds logic to embed the original core Wasm module(s) from a
compilation into a new ELF section, alongside other metadata
sections. When a component is compiled, the core Wasms inside are
preserved, accessible by their `StaticModuleIndex`es.

The need for this support arises from the guest-debugger
ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native
code using Wasmtime's APIs. In either case, the existing APIs to
introspect execution state provide `Module` references for each
instance from each stack frame, and PC offsets into these `Module`s
are the way in which breakpoints are configured. The debugger will
somehow need to associate these `Module`s with the original Wasm
bytecode, including e.g. any custom sections containing the
producer-specific ways of encoding debug metadata, to do something
useful. In particular also note that the GDB-stub protocol as extended
for Wasm requires read access directly to the Wasm bytecode (it shows
up as part of a "memory map" that is viewed by the standard
read-remote-memory command); we can't delegate this requirement to the
remote end of the stub connection, but have to handle it in the stub
server that runs inside Wasmtime (as a component or bespoke).

We have two main choices: carry the original bytecode all the way
through the Wasmtime compilation pipeline and present it via
`Module::bytecode()`, ready to use; or say that this task is
out-of-scope and that the debugger top-half can find it on disk
somehow.

Unfortunately the latter ("out of scope, find the file") is somewhat
at odds with the desired developer experience:

- It means that we need some way of mapping a compiled Wasm artifact
  back to a source Wasm; absent "here's the full bytecode", that means
  "here's the path to the full bytecode", but that path is an
  identifier that may not be universally accessible (consider
  e.g. capabilities/preopens present for a debugger component) or
  portable (consider e.g. moving the artifact to a different machine).

  - Or we don't even provide that metadata, and require the user to
    explicitly specify the same module filename twice -- once to
    actually run it, and once as an argument to the debugger.

- It means that we should account for stale artifacts and mark the
  mismatch somehow; e.g. if the user starts debugging with Wasmtime,
  either from a `.cwasm` on disk or with one produced in-memory just
  for this run, and then subsequently rebuilds their source `.wasm`,
  we no longer have a reference for it. (The same problem exists one
  level up if source code is edited, but source to a Wasm producer
  toolchain is definitely out-of-scope for Wasmtime.)

- It means that special logic is required in the case of components to
  map a module back to a specific component section (we would
  essentially have to expose the static module IDs, then require the
  debugger top-half to re-implement our exact flattening algorithm to
  find that core module).

The permissions issue alone was enough to convince me that we should
do something better than providing a filename (why should we have to
authorize the adapter to read the user's filesystem?) but all of the
other benefits -- ensuring an exact match and ensuring perfect
availability -- are a nice bonus. The main downside is making the
`.cwasm` larger (possibly substantially so), but this overhead is only
present when enabling guest-debugging, the data has to be present
anyway, and this is likely not a dealbreaker.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 23, 2026
…ifact.

This PR adds logic to embed the original core Wasm module(s) from a
compilation into a new ELF section, alongside other metadata
sections. When a component is compiled, the core Wasms inside are
preserved, accessible by their `StaticModuleIndex`es.

The need for this support arises from the guest-debugger
ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native
code using Wasmtime's APIs. In either case, the existing APIs to
introspect execution state provide `Module` references for each
instance from each stack frame, and PC offsets into these `Module`s
are the way in which breakpoints are configured. The debugger will
somehow need to associate these `Module`s with the original Wasm
bytecode, including e.g. any custom sections containing the
producer-specific ways of encoding debug metadata, to do something
useful. In particular also note that the GDB-stub protocol as extended
for Wasm requires read access directly to the Wasm bytecode (it shows
up as part of a "memory map" that is viewed by the standard
read-remote-memory command); we can't delegate this requirement to the
remote end of the stub connection, but have to handle it in the stub
server that runs inside Wasmtime (as a component or bespoke).

We have two main choices: carry the original bytecode all the way
through the Wasmtime compilation pipeline and present it via
`Module::bytecode()`, ready to use; or say that this task is
out-of-scope and that the debugger top-half can find it on disk
somehow.

Unfortunately the latter ("out of scope, find the file") is somewhat
at odds with the desired developer experience:

- It means that we need some way of mapping a compiled Wasm artifact
  back to a source Wasm; absent "here's the full bytecode", that means
  "here's the path to the full bytecode", but that path is an
  identifier that may not be universally accessible (consider
  e.g. capabilities/preopens present for a debugger component) or
  portable (consider e.g. moving the artifact to a different machine).

  - Or we don't even provide that metadata, and require the user to
    explicitly specify the same module filename twice -- once to
    actually run it, and once as an argument to the debugger.

- It means that we should account for stale artifacts and mark the
  mismatch somehow; e.g. if the user starts debugging with Wasmtime,
  either from a `.cwasm` on disk or with one produced in-memory just
  for this run, and then subsequently rebuilds their source `.wasm`,
  we no longer have a reference for it. (The same problem exists one
  level up if source code is edited, but source to a Wasm producer
  toolchain is definitely out-of-scope for Wasmtime.)

- It means that special logic is required in the case of components to
  map a module back to a specific component section (we would
  essentially have to expose the static module IDs, then require the
  debugger top-half to re-implement our exact flattening algorithm to
  find that core module).

The permissions issue alone was enough to convince me that we should
do something better than providing a filename (why should we have to
authorize the adapter to read the user's filesystem?) but all of the
other benefits -- ensuring an exact match and ensuring perfect
availability -- are a nice bonus. The main downside is making the
`.cwasm` larger (possibly substantially so), but this overhead is only
present when enabling guest-debugging, the data has to be present
anyway, and this is likely not a dealbreaker.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 23, 2026
…ifact.

This PR adds logic to embed the original core Wasm module(s) from a
compilation into a new ELF section, alongside other metadata
sections. When a component is compiled, the core Wasms inside are
preserved, accessible by their `StaticModuleIndex`es.

The need for this support arises from the guest-debugger
ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native
code using Wasmtime's APIs. In either case, the existing APIs to
introspect execution state provide `Module` references for each
instance from each stack frame, and PC offsets into these `Module`s
are the way in which breakpoints are configured. The debugger will
somehow need to associate these `Module`s with the original Wasm
bytecode, including e.g. any custom sections containing the
producer-specific ways of encoding debug metadata, to do something
useful. In particular also note that the GDB-stub protocol as extended
for Wasm requires read access directly to the Wasm bytecode (it shows
up as part of a "memory map" that is viewed by the standard
read-remote-memory command); we can't delegate this requirement to the
remote end of the stub connection, but have to handle it in the stub
server that runs inside Wasmtime (as a component or bespoke).

We have two main choices: carry the original bytecode all the way
through the Wasmtime compilation pipeline and present it via
`Module::bytecode()`, ready to use; or say that this task is
out-of-scope and that the debugger top-half can find it on disk
somehow.

Unfortunately the latter ("out of scope, find the file") is somewhat
at odds with the desired developer experience:

- It means that we need some way of mapping a compiled Wasm artifact
  back to a source Wasm; absent "here's the full bytecode", that means
  "here's the path to the full bytecode", but that path is an
  identifier that may not be universally accessible (consider
  e.g. capabilities/preopens present for a debugger component) or
  portable (consider e.g. moving the artifact to a different machine).

  - Or we don't even provide that metadata, and require the user to
    explicitly specify the same module filename twice -- once to
    actually run it, and once as an argument to the debugger.

- It means that we should account for stale artifacts and mark the
  mismatch somehow; e.g. if the user starts debugging with Wasmtime,
  either from a `.cwasm` on disk or with one produced in-memory just
  for this run, and then subsequently rebuilds their source `.wasm`,
  we no longer have a reference for it. (The same problem exists one
  level up if source code is edited, but source to a Wasm producer
  toolchain is definitely out-of-scope for Wasmtime.)

- It means that special logic is required in the case of components to
  map a module back to a specific component section (we would
  essentially have to expose the static module IDs, then require the
  debugger top-half to re-implement our exact flattening algorithm to
  find that core module).

The permissions issue alone was enough to convince me that we should
do something better than providing a filename (why should we have to
authorize the adapter to read the user's filesystem?) but all of the
other benefits -- ensuring an exact match and ensuring perfect
availability -- are a nice bonus. The main downside is making the
`.cwasm` larger (possibly substantially so), but this overhead is only
present when enabling guest-debugging, the data has to be present
anyway, and this is likely not a dealbreaker.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Feb 23, 2026
…ifact.

This PR adds logic to embed the original core Wasm module(s) from a
compilation into a new ELF section, alongside other metadata
sections. When a component is compiled, the core Wasms inside are
preserved, accessible by their `StaticModuleIndex`es.

The need for this support arises from the guest-debugger
ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native
code using Wasmtime's APIs. In either case, the existing APIs to
introspect execution state provide `Module` references for each
instance from each stack frame, and PC offsets into these `Module`s
are the way in which breakpoints are configured. The debugger will
somehow need to associate these `Module`s with the original Wasm
bytecode, including e.g. any custom sections containing the
producer-specific ways of encoding debug metadata, to do something
useful. In particular also note that the GDB-stub protocol as extended
for Wasm requires read access directly to the Wasm bytecode (it shows
up as part of a "memory map" that is viewed by the standard
read-remote-memory command); we can't delegate this requirement to the
remote end of the stub connection, but have to handle it in the stub
server that runs inside Wasmtime (as a component or bespoke).

We have two main choices: carry the original bytecode all the way
through the Wasmtime compilation pipeline and present it via
`Module::bytecode()`, ready to use; or say that this task is
out-of-scope and that the debugger top-half can find it on disk
somehow.

Unfortunately the latter ("out of scope, find the file") is somewhat
at odds with the desired developer experience:

- It means that we need some way of mapping a compiled Wasm artifact
  back to a source Wasm; absent "here's the full bytecode", that means
  "here's the path to the full bytecode", but that path is an
  identifier that may not be universally accessible (consider
  e.g. capabilities/preopens present for a debugger component) or
  portable (consider e.g. moving the artifact to a different machine).

  - Or we don't even provide that metadata, and require the user to
    explicitly specify the same module filename twice -- once to
    actually run it, and once as an argument to the debugger.

- It means that we should account for stale artifacts and mark the
  mismatch somehow; e.g. if the user starts debugging with Wasmtime,
  either from a `.cwasm` on disk or with one produced in-memory just
  for this run, and then subsequently rebuilds their source `.wasm`,
  we no longer have a reference for it. (The same problem exists one
  level up if source code is edited, but source to a Wasm producer
  toolchain is definitely out-of-scope for Wasmtime.)

- It means that special logic is required in the case of components to
  map a module back to a specific component section (we would
  essentially have to expose the static module IDs, then require the
  debugger top-half to re-implement our exact flattening algorithm to
  find that core module).

The permissions issue alone was enough to convince me that we should
do something better than providing a filename (why should we have to
authorize the adapter to read the user's filesystem?) but all of the
other benefits -- ensuring an exact match and ensuring perfect
availability -- are a nice bonus. The main downside is making the
`.cwasm` larger (possibly substantially so), but this overhead is only
present when enabling guest-debugging, the data has to be present
anyway, and this is likely not a dealbreaker.
github-merge-queue bot pushed a commit to bytecodealliance/wasmtime that referenced this pull request Feb 24, 2026
…ifact. (#12636)

* Debugging: preserve original Wasm bytecode inside of compiled ELF artifact.

This PR adds logic to embed the original core Wasm module(s) from a
compilation into a new ELF section, alongside other metadata
sections. When a component is compiled, the core Wasms inside are
preserved, accessible by their `StaticModuleIndex`es.

The need for this support arises from the guest-debugger
ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native
code using Wasmtime's APIs. In either case, the existing APIs to
introspect execution state provide `Module` references for each
instance from each stack frame, and PC offsets into these `Module`s
are the way in which breakpoints are configured. The debugger will
somehow need to associate these `Module`s with the original Wasm
bytecode, including e.g. any custom sections containing the
producer-specific ways of encoding debug metadata, to do something
useful. In particular also note that the GDB-stub protocol as extended
for Wasm requires read access directly to the Wasm bytecode (it shows
up as part of a "memory map" that is viewed by the standard
read-remote-memory command); we can't delegate this requirement to the
remote end of the stub connection, but have to handle it in the stub
server that runs inside Wasmtime (as a component or bespoke).

We have two main choices: carry the original bytecode all the way
through the Wasmtime compilation pipeline and present it via
`Module::bytecode()`, ready to use; or say that this task is
out-of-scope and that the debugger top-half can find it on disk
somehow.

Unfortunately the latter ("out of scope, find the file") is somewhat
at odds with the desired developer experience:

- It means that we need some way of mapping a compiled Wasm artifact
  back to a source Wasm; absent "here's the full bytecode", that means
  "here's the path to the full bytecode", but that path is an
  identifier that may not be universally accessible (consider
  e.g. capabilities/preopens present for a debugger component) or
  portable (consider e.g. moving the artifact to a different machine).

  - Or we don't even provide that metadata, and require the user to
    explicitly specify the same module filename twice -- once to
    actually run it, and once as an argument to the debugger.

- It means that we should account for stale artifacts and mark the
  mismatch somehow; e.g. if the user starts debugging with Wasmtime,
  either from a `.cwasm` on disk or with one produced in-memory just
  for this run, and then subsequently rebuilds their source `.wasm`,
  we no longer have a reference for it. (The same problem exists one
  level up if source code is edited, but source to a Wasm producer
  toolchain is definitely out-of-scope for Wasmtime.)

- It means that special logic is required in the case of components to
  map a module back to a specific component section (we would
  essentially have to expose the static module IDs, then require the
  debugger top-half to re-implement our exact flattening algorithm to
  find that core module).

The permissions issue alone was enough to convince me that we should
do something better than providing a filename (why should we have to
authorize the adapter to read the user's filesystem?) but all of the
other benefits -- ensuring an exact match and ensuring perfect
availability -- are a nice bonus. The main downside is making the
`.cwasm` larger (possibly substantially so), but this overhead is only
present when enabling guest-debugging, the data has to be present
anyway, and this is likely not a dealbreaker.

* miri ignore tests with compilation

* Review feedback.
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm likely a bit biased in having helped review much of the work so far, but nonetheless I personally feel that this strikes a good balance between flexible and usable and I feel this is a good way to move forward with. I'm hoping this can help pave the way forward (ish) for more and more parts of Wasmtime to be self-hosted wasm rather than built-in to the engine/API/etc.

We've still got a lot of details to figure out and bikeshed here, e.g. WITs, where the debugger wasms come from, CLI arguments, etc, but I'm confident we can find a good fit for all these (if not just take all the existing prototyping as-is). Overall I believe that expose-the-debugger-as-WIT is the right way forward at this time and then wasi:cli/run on top provides a usable way to consume that.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't really think this is an "alternative" or "compromise" to the original debug adapter protocol; it is an incremental step and factoring of layers. You can use "debug main" to implement the original / long-term debug adapter components:

  • component $main imports the Wasm-level debugging interface from Wasmtime
  • $main wraps an inner $debug-adapter component that imports the Wasm-level debugging interface and exports a new source-level debugging interface
  • $main implements a DAP server and translates incoming DAP source-level queries into invocations on $debug-adapter's exported source-level interface, which the $debug-adapter translates to invocations on the Wasm-level interface via DWARF or whatever other debug info it has, etc...

So all that is to say that I think this is great! Love when we can path-find near-term wins that also acts as an incremental milestone towards the long-term end goal. Ship it!

@cfallin
Copy link
Member Author

cfallin commented Feb 26, 2026

Since we now have signoffs per

Once any stakeholder from a different group has signed off, the RFC will move into a 10 calendar day final comment period (FCP)

we are moving into the

Final Comment Period

and if no objections or other issues are raised, this RFC should merge on 2026-03-08.

Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Nick: this seems like an excellent step on the path to the original plan. What's more, I think it provides additional value in itself, so I think it's a really good addition!

@JDevlieghere
Copy link

JDevlieghere commented Feb 27, 2026

I read through the RFC and wanted to express my excitements and support for the proposed approach. I think decoupling the two levels makes a lot of sense. As the person who's been working on this in LLDB, I'm obviously excited about the possibility of debugging Wasmtime over the GDB remote protocol.

As I've said in my FOSDEM talk, I am convinced that the GDB remote approach is the way to go for compiled languages like C, C++, Swift & Rust. However, I also agree that it would be a mistake for interpreted languages, where the Debug Adapter Protocol is a much better fit.

[...] it does not preclude building the DAP-based adapter ecosystem on top of the new world.

That's right, and that was the point I was trying to make in this discussion with @alexcrichton and @fitzgen. The RFC doesn't go as far as calling it out explicitly, but with the gdbserver + LLDB approach, you can leverage LLDB to do the heavy lifting and then use lldb-dap to provide a DAP interface, like you would for interpreter languages.

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants