Skip to content

[RFC][lldb][NVGPU] introducing "shadow functions" to cuda-lldb#94

Open
zhyty wants to merge 16 commits into
clayborg:meta-nvidiafrom
zhyty:shadow-functions-for-pr
Open

[RFC][lldb][NVGPU] introducing "shadow functions" to cuda-lldb#94
zhyty wants to merge 16 commits into
clayborg:meta-nvidiafrom
zhyty:shadow-functions-for-pr

Conversation

@zhyty

@zhyty zhyty commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

By default, disable breakpoint locations in host-side kernel wrapper functions on the CPU side once an associated NVIDIA GPU target exists. This is targeting CUDA programs where source-level breakpoints can otherwise resolve to the host launch wrapper instead of the actual device kernel.

These locations are not deleted. LLDB still creates them, but disables them by default so users can explicitly re-enable them if they really want to stop in the host launch path.

What are "shadow functions"?

When nvcc compiles CUDA source, a __global__ kernel typically has host-side launch machinery associated with it. In practice, source breakpoints may resolve to that host wrapper path because the host binary can contain symbol and line-table information for it.

Using shadow_functions.cu as an example:

  • my_kernel(int) is the host-visible wrapper entry point.
  • That wrapper quickly transitions into a __device_stub_ helper such as __device_stub__Z9my_kerneli(int), which performs the host-side CUDA launch boilerplate.
  • The actual GPU instructions for the kernel are not in the host .text for my_kernel(int) or __device_stub__...; they live in device code embedded in the binary.

From the user's point of view, my_kernel is "the kernel". From the host CPU symbol table's point of view, it is a wrapper around launch machinery. This PR treats those CPU-side wrapper locations as "shadow functions" and disables host breakpoint locations there when a GPU target is present.

How does this PR identify shadow functions?

The implementation no longer precomputes wrapper address ranges or maintains interval maps. Instead, it answers the question at breakpoint-location handling time using the owning symbol context and the module's indexed symbol lookup.

For a native breakpoint location:

  1. Resolve a SymbolContext for the location using function and symbol scope.
  2. Prefer the concrete owning function name, falling back to the owning symbol name when needed.
  3. Construct the expected device-stub base name by prefixing that name with __device_stub_.
  4. Query the same module with Module::FindFunctionSymbols(..., eFunctionNameTypeBase, ...).

If the module has a matching __device_stub_ function symbol, LLDB treats the native location as a host-side shadow wrapper and disables that native breakpoint location.

This matches the important property we care about: a CPU-side wrapper is identified by the presence of the corresponding CUDA device-stub symbol in the same module, and the lookup uses the symbol table index instead of scanning pre-recorded address ranges.

Plugging Into LLDB's Lifecycle

There are two integration points:

  • When a GPU plugin target is associated with an existing native target, Target::SetGPUPluginTarget walks the native target's current breakpoint locations and asks the GPU platform to inspect each one. This handles breakpoints that already existed before the GPU target was created.
  • When a new native breakpoint location is created later, BreakpointLocationList::AddLocation checks associated GPU plugin targets and lets each platform decide whether that location should be disabled.

The platform hook used by both paths is Platform::HandleNativeBreakpointLocation. In the NVIDIA implementation, PlatformNVGPU::HandleNativeBreakpointLocation resolves the symbol context for the location, checks whether it is a shadow function, and disables it if so.

Why this design?

This version is simpler than the earlier interval-map approach:

  • No module-level shadow-function bookkeeping.
  • No need to track wrapper address ranges.
  • No cleanup problem for stale interval-map entries on unload.
  • The lookup uses existing symbol table indexing for function-name searches.

It also keeps the user-visible behavior we want: the host breakpoint location still exists and is visible in LLDB, but it is disabled by default once the GPU target provides a better device-side interpretation.

Test Plan

lldb-dotest is unreliable in this setup, so I used llvm-lit directly:

$BUILD_DIR/bin/llvm-lit -v --test-output=all \
  $LLVM_PROJECT_ROOT/lldb/test/API/gpu/nvidia/shadow_functions/TestNVGPUShadowFunctions.py

The test covers:

  • Name breakpoints in CUDA kernels are not left enabled on the CPU target.
  • File/line breakpoints in CUDA kernels are not left enabled on the CPU target.
  • The behavior holds for multiple kernels in the same test binary.
  • Breakpoints created after GPU target creation are also filtered correctly.

TODOs

Ideally, we handle dlclose by re-enabling the shadow function host side breakpoint locations. We're deferring this to a future change.

Tom Yang added 9 commits April 8, 2026 17:10
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
on GPU target creation

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@zhyty zhyty requested a review from walter-erquinigo April 9, 2026 00:28
@agontarek agontarek self-requested a review April 9, 2026 20:49
Comment thread lldb/source/Plugins/Platform/NVGPU/PlatformNVGPU.cpp Outdated
Comment thread lldb/source/Plugins/Platform/NVGPU/PlatformNVGPU.cpp Outdated
Comment thread lldb/source/Plugins/Platform/NVGPU/PlatformNVGPU.cpp Outdated
Tom Yang added 3 commits April 14, 2026 17:20
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@zhyty zhyty marked this pull request as ready for review April 15, 2026 02:01
@zhyty zhyty requested a review from clayborg April 15, 2026 16:12

@clayborg clayborg left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this does follow what NVidia does within GDB. Though the cost can be quite high and cause a lot of time wasted processing and identifying all shadow functions even though we might set a breakpoint in a few of them. We only care about the identifying which breakpoints are in shadow functions.

A good solution would only check each breakpoint location to see if it is a shadow breakpoint and disable it. We don't need to parse everything and make a huge map where 99% of the contents will never be accessed and making this map will cause delays in starting the debug sessions.

Comment thread lldb/include/lldb/Target/Platform.h Outdated
Comment thread lldb/include/lldb/Target/Statistics.h Outdated
Comment thread lldb/include/lldb/Target/Statistics.h Outdated
protected:
StatsDuration m_create_time;
StatsDuration m_load_core_time;
StatsDuration m_shadow_function_identification_time;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove and make a virtual platform method to get statistics for a platform. The default Platform::GetStatistics() should get the plug-in name only:

    "platform": { 
        "name": "nvgpu",
    }

Subclasses should override this and call the base class and add any key/value pairs that make sense for the platform itself.

Comment thread lldb/include/lldb/Target/Platform.h Outdated
Comment thread lldb/source/Plugins/Platform/NVGPU/PlatformNVGPU.cpp Outdated
Comment thread lldb/source/Plugins/Platform/NVGPU/PlatformNVGPU.cpp Outdated
Comment thread lldb/source/Plugins/Platform/NVGPU/PlatformNVGPU.cpp Outdated

@clayborg clayborg left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things to fix:

  • rvemo

Comment thread lldb/include/lldb/Target/Platform.h Outdated
Comment thread lldb/include/lldb/Target/Platform.h Outdated
Comment thread lldb/include/lldb/Target/Statistics.h Outdated
Comment thread lldb/source/Plugins/Platform/NVGPU/PlatformNVGPU.cpp Outdated
Comment thread lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
Comment thread lldb/source/Target/Statistics.cpp Outdated
Comment thread lldb/source/Target/Statistics.cpp Outdated
Tom Yang added 4 commits April 20, 2026 00:53
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
fixes

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
bool

we weren't really making use of it anyway. not sure if we'd need a
return in the future, but no need for now.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@zhyty zhyty requested a review from clayborg April 21, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants