Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions pkg/nvml/device.go
Original file line number Diff line number Diff line change
Expand Up @@ -3276,21 +3276,23 @@ func (device nvmlDevice) GetCoolerInfo() (CoolerInfo, Return) {
// nvml.DeviceGetTemperatureV()
type TemperatureHandler struct {
device nvmlDevice
sensorType TemperatureSensors
}

func (handler TemperatureHandler) V1() (Temperature, Return) {
var temperature Temperature
temperature.Version = STRUCT_VERSION(temperature, 1)
temperature.SensorType = uint32(handler.sensorType)
ret := nvmlDeviceGetTemperatureV(handler.device, &temperature)
return temperature, ret
}

func (l *library) DeviceGetTemperatureV(device Device) TemperatureHandler {
return device.GetTemperatureV()
func (l *library) DeviceGetTemperatureV(device Device, sensorType TemperatureSensors) TemperatureHandler {
return device.GetTemperatureV(sensorType)
}

func (device nvmlDevice) GetTemperatureV() TemperatureHandler {
return TemperatureHandler{device}
func (device nvmlDevice) GetTemperatureV(sensorType TemperatureSensors) TemperatureHandler {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue with this is that as the temperature struct is extended with additional INPUT parameters, we would need to add additional arguments to this function. In other cases we explicitly allow the full struct to be passed in (see #159), but this was not a versioned struct.

If we were not dealing with a versioned struct, we would probably want to change this signature to (*Temperature) and only enure that we set the version in each of the handlers.

Copy link
Copy Markdown
Author

@dschervov dschervov Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am grateful for for the full explanation! I agree about inconsistent API 💯

What should we do than? Maybe we can create V2 function, with this profile: func (handler TemperatureHandler) V2(sensorType TemperatureSensors) (Temperature, Return);

This will make sure that all V1 funciton calls in user code will be correctly executed, and the people who familiar with C nvml will be using V2 where they explicitly say what kind of sensor they want to read.

Later then the new sensor will be added (for memory for example), we already have the interface (V2 function) to use it. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The V2 function can only be added once we have an nvmlTemperature_v2_t type defined -- which is not yet the case. For now, I think we may rather want something like:

diff --git a/pkg/nvml/device.go b/pkg/nvml/device.go
index 1cea1f9..7e378a3 100644
--- a/pkg/nvml/device.go
+++ b/pkg/nvml/device.go
@@ -3275,14 +3275,14 @@ func (device nvmlDevice) GetCoolerInfo() (CoolerInfo, Return) {
 
 // nvml.DeviceGetTemperatureV()
 type TemperatureHandler struct {
-	device nvmlDevice
-	sensorType TemperatureSensors
+	device           nvmlDevice
+	temperatureInput *Temperature
 }
 
 func (handler TemperatureHandler) V1() (Temperature, Return) {
 	var temperature Temperature
 	temperature.Version = STRUCT_VERSION(temperature, 1)
-	temperature.SensorType = uint32(handler.sensorType)
+	temperature.SensorType = handler.temperatureInput.SensorType
 	ret := nvmlDeviceGetTemperatureV(handler.device, &temperature)
 	return temperature, ret
 }
@@ -3291,8 +3291,8 @@ func (l *library) DeviceGetTemperatureV(device Device, sensorType TemperatureSen
 	return device.GetTemperatureV(sensorType)
 }
 
-func (device nvmlDevice) GetTemperatureV(sensorType TemperatureSensors) TemperatureHandler {
-	return TemperatureHandler{device, sensorType}
+func (device nvmlDevice) GetTemperatureV(temperature *Temperature) TemperatureHandler {
+	return TemperatureHandler{device, temperature}
 }
 
 // nvml.DeviceGetMarginTemperature()

but I'm still not convinced that this is correct / the best solution. @klueska what are your thoughts?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, i get it now! Thats cool! 💪

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there should be any parameters to the V() function. Any version specific parameters should be passed in the V1(), V2(), etc. methods.

Reagrding passing temperature *Temperature -- in the Go API we have tried to avoid passing the whole struct as input, and instead pass the individual INPUT parts of the struct as parameters instead. Is there a reason to break from that here?

Copy link
Copy Markdown
Author

@dschervov dschervov Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIs there a reason to break from that here

I think right now its not that necessarily, because the TemperatureSensor can only have one value = 0 (GPU), and it is used when we create struct by default in Go.

We can introduce V2 function with additional parameter: TemperatureSensor, when NVML api will make another sensor type (for example for MEMORY).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klueska this is a similar problem to what we had for the attestation report parameter for DeviceGetConfComputeGpuAttestationReport (see #159). As was the case there, for the original NVML API, the struct includes INPUT and OUTPUT fields. The additional complication here is that the function uses a VERSIONED struct. Our solution there was to explicitly allow users to pass in a reference to the struct to allow the input fields to be set.

@dschervov we don't control the version of the struct in the BINDINGS. These are determined by the NVML definitions that we are wrapping. I would also only expect a new struct version when we add new fields to the struct, and not when we add support for new values.

return TemperatureHandler{device, sensorType}
}

// nvml.DeviceGetMarginTemperature()
Expand Down
8 changes: 4 additions & 4 deletions pkg/nvml/mock/device.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions pkg/nvml/mock/interface.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions pkg/nvml/zz_generated.api.go
Original file line number Diff line number Diff line change
Expand Up @@ -588,7 +588,7 @@ type Interface interface {
DeviceGetTargetFanSpeed(Device, int) (int, Return)
DeviceGetTemperature(Device, TemperatureSensors) (uint32, Return)
DeviceGetTemperatureThreshold(Device, TemperatureThresholds) (uint32, Return)
DeviceGetTemperatureV(Device) TemperatureHandler
DeviceGetTemperatureV(Device, TemperatureSensors) TemperatureHandler
DeviceGetThermalSettings(Device, uint32) (GpuThermalSettings, Return)
DeviceGetTopologyCommonAncestor(Device, Device) (GpuTopologyLevel, Return)
DeviceGetTopologyNearestGpus(Device, GpuTopologyLevel) ([]Device, Return)
Expand Down Expand Up @@ -959,7 +959,7 @@ type Device interface {
GetTargetFanSpeed(int) (int, Return)
GetTemperature(TemperatureSensors) (uint32, Return)
GetTemperatureThreshold(TemperatureThresholds) (uint32, Return)
GetTemperatureV() TemperatureHandler
GetTemperatureV(TemperatureSensors) TemperatureHandler
GetThermalSettings(uint32) (GpuThermalSettings, Return)
GetTopologyCommonAncestor(Device) (GpuTopologyLevel, Return)
GetTopologyNearestGpus(GpuTopologyLevel) ([]Device, Return)
Expand Down