Skip to content

fix(metrics): probe AMD L3 uncore and disable when unusable (fixes #660)#661

Merged
harp-intel merged 1 commit intomainfrom
fix/amd-turin-uncore-probe
Mar 12, 2026
Merged

fix(metrics): probe AMD L3 uncore and disable when unusable (fixes #660)#661
harp-intel merged 1 commit intomainfrom
fix/amd-turin-uncore-probe

Conversation

@harp-intel
Copy link
Contributor

On GCP AMD Turin instances, sysfs lists amd_l3 but perf cannot use the L3 PMU, causing "no metrics collected" when uncore events are included (see #660).

Changes:

  • Add AMD-only metadata script that runs perf stat with one L3 event.
  • When the probe fails (exit != 0 or stderr indicates L3 PMU unavailable), set SupportsUncore = false and remove l3/df from UncoreDeviceIDs so collection uses core-only events and still produces metrics.
  • In legacy loader isCollectableEvent, require SupportsUncore for all uncore-by-device events (not only name-prefix UNC), so AMD l3/df events are excluded when uncore is disabled.

Result: On GCP (and similar VMs) metrics are collected with core-only events; on bare-metal AMD Turin uncore remains enabled when the probe succeeds.

Made with Cursor

On GCP AMD Turin instances, sysfs lists amd_l3 but perf cannot use the L3
PMU, causing 'no metrics collected' when uncore events are included.

- Add AMD-only metadata script that runs perf stat with one L3 event.
- When the probe fails (exit != 0 or stderr indicates L3 PMU unavailable),
  set SupportsUncore = false and remove l3/df from UncoreDeviceIDs so
  collection uses core-only events and still produces metrics.
- In legacy loader isCollectableEvent, require SupportsUncore for all
  uncore-by-device events (not only name-prefix UNC), so AMD l3/df
  events are excluded when uncore is disabled.

Made-with: Cursor
@harp-intel harp-intel merged commit 7b188e9 into main Mar 12, 2026
5 checks passed
@harp-intel harp-intel deleted the fix/amd-turin-uncore-probe branch March 12, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The metrics command doesn't work on GCP's AMD Turin instances, e.g., c4d-standard-*

1 participant