update go-nvml with changes from cuda 13.2.1#179
Conversation
2881034 to
b031cd0
Compare
|
Raised PR #180 to address the CI failures |
|
This doesn't seem to update any of versioned libs, or add any of the wrapper functions as described in the README. |
ce28140 to
8289fbb
Compare
|
@klueska I have added the wrapper functions. As for the new versioned methods, I am not able to do the expected symbol remapping as the input/output struct types are also versioned. These struct versions are also different in each of these versioned methods. |
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
8289fbb to
536f8d4
Compare
|
There should be an intermediate PR for CUDA 13.1.x that lands before this one. |
| const: | ||
| - {action: accept, from: "^NVML_"} | ||
| - {action: accept, from: "^nvml"} | ||
| - {action: ignore, from: "NVML_MCDM_SUPPORT"} |
There was a problem hiding this comment.
Why did this need ignoring?
There was a problem hiding this comment.
This variable was not being rendered properly, so it led to a compilation error in nvml.go. Ignoring it was the only way to unblock this
There was a problem hiding this comment.
This is what the constant gets rendered as in nvml.go
const (
// NO_UNVERSIONED_FUNC_DEFS as defined in go-nvml/<predefine>:24
NO_UNVERSIONED_FUNC_DEFS = 1
// MCDM_SUPPORT as defined in nvml/nvml.h:106
[77 67 68 77 95 83 85 80 80 79 82 84]
// API_VERSION as defined in nvml/nvml.h:111
API_VERSION = 13
// API_VERSION_STR as defined in nvml/nvml.h:112
API_VERSION_STR = "13"
// VALUE_NOT_AVAILABLE as defined in nvml/nvml.h:156
VALUE_NOT_AVAILABLE = -1
You can see that MCDM_SUPPORT is mis-rendered
| Sample1: sample1, | ||
| Sample2: sample2, | ||
| Metrics: [210]nvml.GpmMetric{ | ||
| Metrics: [333]nvml.GpmMetric{ |
There was a problem hiding this comment.
Why did this number need bumping?
There was a problem hiding this comment.
This struct's schema was updated in the latest nvml.h.
There was a problem hiding this comment.
You only need to create an alias to a versioned function if it appears in this list. In this update, this list is unmodified, so there are no new versioned function aliases necessary.
There was a problem hiding this comment.
Thanks for the tip, this is good to know
|
|
||
| # Update nvml.h from the NVIDIA CUDA redistributable JSON | ||
| update-nvml-h: CUDA_VERSION := 13.0.0 | ||
| update-nvml-h: CUDA_VERSION := 13.2.1 |
There was a problem hiding this comment.
We need an intermediate PR for the 13.1 change before this one. We are unfortunately behind on updating this, but that doesn't mean we should just skip a version.
| { | ||
| unsigned int counterId; //!< Counter ID, one of \ref nvmlPRMCounterId_t | ||
| /* Input data */ | ||
| nvmlPRMCounterInput_v1_t inData; //!< PRM input values |
There was a problem hiding this comment.
Since this struct has input values, we need to make sure our wrappers take this struct as an input. There should be other examples of this in older PRs.
There was a problem hiding this comment.
I didn't look closely yet, but do any of the structs used here require the NVML_STRUCT_VERSION to be set?
There was a problem hiding this comment.
Yes, this struct :
type RusdSettings_v1 struct {
Version uint32
PollMask uint64
}
I've addressed this in #182
|
The following commands were run to update the go-nvml API with the latest changes from CUDA 13.2.1
make update-nvml-hmake bindingsgo generate ./...