[RFC][lldb][NVGPU] introducing "shadow functions" to cuda-lldb#94
[RFC][lldb][NVGPU] introducing "shadow functions" to cuda-lldb#94zhyty wants to merge 16 commits into
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
on GPU target creation Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
clayborg
left a comment
There was a problem hiding this comment.
So this does follow what NVidia does within GDB. Though the cost can be quite high and cause a lot of time wasted processing and identifying all shadow functions even though we might set a breakpoint in a few of them. We only care about the identifying which breakpoints are in shadow functions.
A good solution would only check each breakpoint location to see if it is a shadow breakpoint and disable it. We don't need to parse everything and make a huge map where 99% of the contents will never be accessed and making this map will cause delays in starting the debug sessions.
| protected: | ||
| StatsDuration m_create_time; | ||
| StatsDuration m_load_core_time; | ||
| StatsDuration m_shadow_function_identification_time; |
There was a problem hiding this comment.
remove and make a virtual platform method to get statistics for a platform. The default Platform::GetStatistics() should get the plug-in name only:
"platform": {
"name": "nvgpu",
}
Subclasses should override this and call the base class and add any key/value pairs that make sense for the platform itself.
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
bool we weren't really making use of it anyway. not sure if we'd need a return in the future, but no need for now. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary
By default, disable breakpoint locations in host-side kernel wrapper functions on the CPU side once an associated NVIDIA GPU target exists. This is targeting CUDA programs where source-level breakpoints can otherwise resolve to the host launch wrapper instead of the actual device kernel.
These locations are not deleted. LLDB still creates them, but disables them by default so users can explicitly re-enable them if they really want to stop in the host launch path.
What are "shadow functions"?
When
nvcccompiles CUDA source, a__global__kernel typically has host-side launch machinery associated with it. In practice, source breakpoints may resolve to that host wrapper path because the host binary can contain symbol and line-table information for it.Using
shadow_functions.cuas an example:my_kernel(int)is the host-visible wrapper entry point.__device_stub_helper such as__device_stub__Z9my_kerneli(int), which performs the host-side CUDA launch boilerplate..textformy_kernel(int)or__device_stub__...; they live in device code embedded in the binary.From the user's point of view,
my_kernelis "the kernel". From the host CPU symbol table's point of view, it is a wrapper around launch machinery. This PR treats those CPU-side wrapper locations as "shadow functions" and disables host breakpoint locations there when a GPU target is present.How does this PR identify shadow functions?
The implementation no longer precomputes wrapper address ranges or maintains interval maps. Instead, it answers the question at breakpoint-location handling time using the owning symbol context and the module's indexed symbol lookup.
For a native breakpoint location:
SymbolContextfor the location using function and symbol scope.__device_stub_.Module::FindFunctionSymbols(..., eFunctionNameTypeBase, ...).If the module has a matching
__device_stub_function symbol, LLDB treats the native location as a host-side shadow wrapper and disables that native breakpoint location.This matches the important property we care about: a CPU-side wrapper is identified by the presence of the corresponding CUDA device-stub symbol in the same module, and the lookup uses the symbol table index instead of scanning pre-recorded address ranges.
Plugging Into LLDB's Lifecycle
There are two integration points:
Target::SetGPUPluginTargetwalks the native target's current breakpoint locations and asks the GPU platform to inspect each one. This handles breakpoints that already existed before the GPU target was created.BreakpointLocationList::AddLocationchecks associated GPU plugin targets and lets each platform decide whether that location should be disabled.The platform hook used by both paths is
Platform::HandleNativeBreakpointLocation. In the NVIDIA implementation,PlatformNVGPU::HandleNativeBreakpointLocationresolves the symbol context for the location, checks whether it is a shadow function, and disables it if so.Why this design?
This version is simpler than the earlier interval-map approach:
It also keeps the user-visible behavior we want: the host breakpoint location still exists and is visible in LLDB, but it is disabled by default once the GPU target provides a better device-side interpretation.
Test Plan
lldb-dotestis unreliable in this setup, so I usedllvm-litdirectly:The test covers:
TODOs
Ideally, we handle
dlcloseby re-enabling the shadow function host side breakpoint locations. We're deferring this to a future change.