Conversation
prashantbytesyntax
left a comment
There was a problem hiding this comment.
couple of comments, rest steps in testing for Dynamic Profiling cc @artursarlo and Marc
| # Import here to avoid circular dependency | ||
| from gprofiler.platform import get_cpu_model, get_hypervisor_vendor | ||
|
|
||
| hypervisor = get_hypervisor_vendor() |
There was a problem hiding this comment.
since this is only needed for custom event, can we lazy load this up
There was a problem hiding this comment.
How late do you mean by? I think doing this earlier than later is better. As this is done during initialization and only one time, it shouldn't affect overhead at all. It would be better than doing during dynamic profiling by custom event requested. Not sure if I understood your comment. Let me know if I misunderstood.
| system_profiler: Union["SystemProfiler", "NoopProfiler"] = NoopProfiler() | ||
|
|
||
| # When custom event is specified, only use perf (SystemProfiler), disable all language profilers | ||
| custom_event_mode = user_args.get("perf_event") is not None |
There was a problem hiding this comment.
can you help share or comment why we cannot run multi-language profiler when custom event for perf is triggered
There was a problem hiding this comment.
Language/runtime profilers generate samples based on OS time, whereas HW event based samples from perf are different, which you can't merge/combine for final collapsed file generation.
Does it make sense? I can add a comment explaining the limitation in the codes
| logger.critical("Failed to determine perf event to use") | ||
| raise | ||
|
|
||
| # When custom event is specified, use it directly and skip discovery |
There was a problem hiding this comment.
we would have to test with Dynamic profiling cc @artursarlo
|
|
||
|
|
||
| @lru_cache(maxsize=None) | ||
| def get_perf_available_events() -> Dict[str, str]: |
There was a problem hiding this comment.
we can probably leverage this to validate events with Dynamic Profiling cc @artursarlo
b257bb2 to
002cefb
Compare
There was a problem hiding this comment.
Pull request overview
Adds initial support for profiling via custom perf events (PMU/hardware/software/tracepoint/custom raw) to generate event-based flamegraphs, including CLI plumbing, event validation, and profile metadata enrichment.
Changes:
- Introduces new CLI options (
--perf-event,--perf-event-period,--hw-events-file) and integrates them into profiler selection and perf invocation. - Adds hardware event discovery/validation utilities (perf-list parsing, custom JSON events, PEBS modifier selection + VM fallback).
- Enriches profile output/metadata with sampling event/mode/period/frequency and adds platform/hypervisor detection support.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/prepare_machine-unknown-linux-musl.sh | Bumps bundled zlib version used in musl build prep. |
| scripts/check_pyinstaller.sh | Suppresses PyInstaller warning for newly added cpuid import. |
| requirements.txt | Adds cpuid dependency (x86_64 marker) for CPUID-based hypervisor detection. |
| mypy.ini | Adds mypy ignore config for cpuid imports. |
| gprofiler/utils/perf_process.py | Adds period-based (-c) vs frequency-based (-F) sampling selection in perf record command. |
| gprofiler/utils/perf.py | Extends perf event enum and adds perf-script sample counting log. |
| gprofiler/utils/hw_events.py | New module: event resolution (builtin/custom), precise modifier selection, accessibility tests, VM fallback. |
| gprofiler/resources/hw_events_template.json | New template JSON for defining custom raw PMU events per platform. |
| gprofiler/profilers/perf.py | Wires custom event args/period through SystemProfiler into PerfProcess, and forces FP mode for custom events. |
| gprofiler/profilers/factory.py | Adds “custom perf event mode” to disable language profilers and pass custom event args into SystemProfiler. |
| gprofiler/platform.py | Adds CPU codename detection and hypervisor vendor detection utilities. |
| gprofiler/metadata/system_metadata.py | Extends collected system info with hypervisor + CPU arch codename. |
| gprofiler/merge.py | Adds sampling metadata fields into merged profile header. |
| gprofiler/main.py | Adds CLI args + validation for custom events, and emits sampling metadata into snapshot metadata. |
| README.md | Documents hardware event profiling options and usage examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gprofiler/main.py
Outdated
| # Validate --perf-event-period and -f/--frequency are mutually exclusive | ||
| if args.perf_event_period and args.frequency != DEFAULT_SAMPLING_FREQUENCY: | ||
| parser.error( | ||
| "--perf-event-period and -f/--frequency are mutually exclusive. " | ||
| "Use --perf-event-period for period-based sampling or -f for frequency-based sampling." |
There was a problem hiding this comment.
The mutual-exclusion check for --perf-event-period vs -f/--profiling-frequency is incorrect: it only errors when args.frequency != DEFAULT_SAMPLING_FREQUENCY, so --perf-event-period ... -f 11 will be accepted even though the options are documented as mutually exclusive.
Consider enforcing this via an argparse mutually-exclusive group, or by detecting whether -f/--profiling-frequency was explicitly provided (and also fix the error text to mention --profiling-frequency, not --frequency).
| # Validate --perf-event-period and -f/--frequency are mutually exclusive | |
| if args.perf_event_period and args.frequency != DEFAULT_SAMPLING_FREQUENCY: | |
| parser.error( | |
| "--perf-event-period and -f/--frequency are mutually exclusive. " | |
| "Use --perf-event-period for period-based sampling or -f for frequency-based sampling." | |
| # Validate --perf-event-period and -f/--profiling-frequency are mutually exclusive | |
| if args.perf_event_period and any(opt in sys.argv for opt in ("-f", "--profiling-frequency")): | |
| parser.error( | |
| "--perf-event-period and -f/--profiling-frequency are mutually exclusive. " | |
| "Use --perf-event-period for period-based sampling or -f/--profiling-frequency for frequency-based sampling." |
| # Failed - try fallback for VMs | ||
| if is_vm and event_args[1].endswith(":p"): | ||
| # Remove modifier | ||
| event_without_modifier = event_args[1].rstrip(":p") |
There was a problem hiding this comment.
rstrip(":p") doesn’t remove the exact suffix ":p"; it strips any trailing characters that are either : or p. This is error-prone and can remove more than intended.
Use an exact suffix removal (e.g., removesuffix(":p") or event_args[1][:-2]) when building the fallback event string.
| event_without_modifier = event_args[1].rstrip(":p") | |
| event_without_modifier = event_args[1][:-2] |
| class SupportedPerfEvent(Enum): | ||
| """ | ||
| order here is crucial, the first one we try and succeed - will be used. | ||
| keep it mind that we should always use `PERF_DEFAULT` as a first try. | ||
| meaning - keeping with `perf`s' default is always preferred. | ||
| """ | ||
|
|
||
| PERF_DEFAULT = "default" | ||
| PERF_SW_CPU_CLOCK = "cpu-clock" | ||
| PERF_SW_TASK_CLOCK = "task-clock" | ||
| PERF_CUSTOM_EVENT = "custom" # User-specified event via --perf-event | ||
|
|
||
| def perf_extra_args(self) -> List[str]: | ||
| def perf_extra_args(self, custom_event_args: Optional[List[str]] = None) -> List[str]: | ||
| if self == SupportedPerfEvent.PERF_DEFAULT: | ||
| return [] | ||
| elif self == SupportedPerfEvent.PERF_CUSTOM_EVENT: | ||
| # Custom event args provided externally | ||
| return custom_event_args if custom_event_args else [] | ||
| return ["-e", self.value] |
There was a problem hiding this comment.
SupportedPerfEvent now includes PERF_CUSTOM_EVENT, but perf_extra_args() returns [] when no custom_event_args are passed. This means discover_appropriate_perf_event() will treat the custom option as “default perf” (redundant with PERF_DEFAULT) and may waste time retrying the same configuration.
Consider removing PERF_CUSTOM_EVENT from this enum (or excluding it from discovery iteration), or make perf_extra_args() raise/return a sentinel when PERF_CUSTOM_EVENT is selected without args.
|
|
||
| **Note:** On bare metal systems, hardware events use PEBS (Precise Event-Based Sampling) with `:pp` modifier for more accurate attribution. On VMs, a less precise `:p` modifier is used as a fallback. | ||
|
|
||
| **Custom Events Template:** A template file for custom PMU events is available at `gprofiler/resources/hw_events_template.json`. |
There was a problem hiding this comment.
README points users to gprofiler/resources/hw_events_template.json, but that file isn’t covered by the current MANIFEST.in patterns (only resources/burn, resources/perf, and a few subdirectories are included). In packaged installs (sdist/wheel), the template may be missing.
Ensure the new JSON template is included as package data (e.g., update MANIFEST.in / packaging configuration accordingly).
| **Custom Events Template:** A template file for custom PMU events is available at `gprofiler/resources/hw_events_template.json`. | |
| **Custom Events Template:** A template file for custom PMU events is available in the source tree at [`gprofiler/resources/hw_events_template.json`](gprofiler/resources/hw_events_template.json). If you installed gProfiler from a package and this file is not present locally, you can copy it from the project repository. |
| @@ -63,6 +81,10 @@ def get_profilers( | |||
| if isinstance(profiler_instance, SystemProfiler): | |||
| system_profiler = profiler_instance | |||
| else: | |||
| # In custom event mode, skip all process profilers | |||
| if custom_event_mode: | |||
| logger.debug(f"Skipping {profiler_name} profiler in custom event mode") | |||
| continue | |||
There was a problem hiding this comment.
In custom perf-event mode, the factory still instantiates every process profiler and will sys.exit(1) if any of them fails to initialize, even though they’re later skipped. This defeats the purpose of “disabling all language-specific profilers” and can break --perf-event runs on hosts missing optional profiler deps.
Skip constructing non-SystemProfiler profilers entirely when custom_event_mode is enabled (e.g., continue early in the registry loop unless the profiler is SystemProfiler).
| # Add sampling event information if custom event is being used | ||
| if isinstance(self.system_profiler, SystemProfiler) and self.system_profiler._custom_event_name: | ||
| from gprofiler.platform import get_hypervisor_vendor | ||
| from gprofiler.utils.hw_events import get_event_type, get_perf_available_events, get_precise_modifier | ||
|
|
||
| event_name = self.system_profiler._custom_event_name | ||
| hypervisor_vendor = get_hypervisor_vendor() | ||
| perf_events = get_perf_available_events() | ||
| event_type = get_event_type(event_name, perf_events) | ||
|
|
||
| # Use "custom" as fallback if event_type is None or empty | ||
| effective_type = event_type if event_type else "custom" | ||
| modifier = get_precise_modifier(event_name, effective_type, hypervisor_vendor) | ||
|
|
||
| metadata.update( | ||
| { | ||
| "sampling_event": event_name, | ||
| "sampling_mode": "period" if self.system_profiler._perf_period else "frequency", | ||
| "precise_modifier": modifier, | ||
| } | ||
| ) | ||
|
|
||
| if self.system_profiler._perf_period: | ||
| metadata.update({"sampling_period": self.system_profiler._perf_period}) | ||
| else: | ||
| metadata.update({"sampling_frequency": self.system_profiler._frequency}) |
There was a problem hiding this comment.
Sampling metadata recomputes precise_modifier from event_name/hypervisor, but the actual perf args may differ after validate_event_with_fallback (e.g., on VMs it may strip :p if the precise event isn’t accessible). This can cause the header metadata to disagree with what perf record actually used.
Prefer deriving the modifier from the resolved perf_event_args (stored on the SystemProfiler) or persist the final modifier/args during CLI validation and reuse it here.
| def validate_and_get_event_args( | ||
| event_name: str, hypervisor_vendor: str, hw_events_file: Optional[str] = None | ||
| ) -> List[str]: | ||
| """ | ||
| Validate event and return perf arguments for it. | ||
|
|
||
| Resolution order: | ||
| 1. Check perf list for built-in events | ||
| 2. Check custom events file (if specified) | ||
| 3. Raise error if not found | ||
|
|
||
| Returns list like ["-e", "event_name:modifier"] | ||
| """ | ||
| # Check if it's an uncore event (not supported for flamegraphs) | ||
| if event_name.startswith("uncore_") or "/uncore_" in event_name: | ||
| raise ValueError( | ||
| f"Uncore event '{event_name}' is not supported for flamegraph generation. " | ||
| f"Uncore events measure system-wide hardware activity and cannot be attributed to specific " | ||
| f"processes/threads." | ||
| ) | ||
|
|
||
| # First check perf list | ||
| perf_events = get_perf_available_events() | ||
| event_type = get_event_type(event_name, perf_events, hw_events_file) | ||
|
|
||
| if event_type and event_type != "custom": | ||
| # Found in perf list | ||
| modifier = get_precise_modifier(event_name, event_type, hypervisor_vendor) | ||
| event_with_modifier = f"{event_name}{modifier}" | ||
| return ["-e", event_with_modifier] | ||
|
|
||
| # Not in perf list, check custom events | ||
| custom_events = load_custom_events(hw_events_file) | ||
| if event_name not in custom_events: | ||
| # Event not found anywhere | ||
| available_builtin = list(perf_events.keys())[:10] # Show first 10 | ||
| available_custom = list(custom_events.keys()) | ||
|
|
||
| error_msg = f"Event '{event_name}' not found in perf built-in events" | ||
| if hw_events_file: | ||
| error_msg += f" or custom events file ({hw_events_file})" | ||
| error_msg += ".\n" | ||
| error_msg += f" Available built-in events (first 10): {available_builtin}\n" | ||
| if available_custom: | ||
| error_msg += f" Available custom events: {available_custom}\n" | ||
| else: | ||
| error_msg += " No custom events file provided. Use --hw-events-file to specify one.\n" | ||
| error_msg += f" Run '{perf_path()} list' to see all built-in events." | ||
|
|
||
| raise ValueError(error_msg) | ||
|
|
||
| # Found in custom events, get platform-specific config | ||
| platform = get_cpu_model() | ||
| event_config = custom_events[event_name] | ||
|
|
||
| if platform not in event_config: | ||
| supported_platforms = [k for k in event_config.keys() if not k.startswith("_")] | ||
| error_msg = ( | ||
| f"Custom event '{event_name}' not supported on platform '{platform}'.\n" | ||
| f" Supported platforms: {supported_platforms}" | ||
| ) | ||
| raise ValueError(error_msg) | ||
|
|
||
| # Get raw event code for this platform | ||
| platform_config = event_config[platform] | ||
| raw_event = platform_config.get("raw") | ||
|
|
||
| if not raw_event: | ||
| error_msg = f"Custom event '{event_name}' missing 'raw' field for platform '{platform}'" | ||
| raise ValueError(error_msg) | ||
|
|
||
| # Apply modifier for custom events (treated as hardware events) | ||
| modifier = get_precise_modifier(event_name, "custom", hypervisor_vendor) | ||
| event_with_modifier = f"{raw_event}{modifier}" | ||
|
|
||
| return ["-e", event_with_modifier] | ||
|
|
||
|
|
||
| def test_perf_event_accessible(event_args: List[str]) -> bool: | ||
| """ | ||
| Test if a perf event is accessible by running a quick perf record test. | ||
| Returns True if accessible, False otherwise. | ||
| """ | ||
| try: | ||
| run_process( | ||
| [perf_path(), "record", "-o", "/dev/null"] + event_args + ["--", "sleep", "0.1"], | ||
| suppress_log=True, | ||
| ) | ||
| return True | ||
| except CalledProcessError: | ||
| return False | ||
| except Exception: | ||
| return False | ||
|
|
||
|
|
||
| def validate_event_with_fallback(event_name: str, event_args: List[str], hypervisor_vendor: str) -> List[str]: | ||
| """ | ||
| Validate event accessibility with fallback for VMs. | ||
|
|
||
| For VMs: if event with :p modifier fails, retry without modifier. | ||
| For bare metal: no fallback, event must work as-is. | ||
|
|
||
| Returns validated event args or raises error. | ||
| """ | ||
| is_vm = hypervisor_vendor != "NONE" | ||
|
|
||
| # Test the event | ||
| if test_perf_event_accessible(event_args): | ||
| return event_args | ||
|
|
||
| # Failed - try fallback for VMs | ||
| if is_vm and event_args[1].endswith(":p"): | ||
| # Remove modifier | ||
| event_without_modifier = event_args[1].rstrip(":p") | ||
| fallback_args = ["-e", event_without_modifier] | ||
|
|
||
| if test_perf_event_accessible(fallback_args): | ||
| return fallback_args | ||
|
|
||
| # No fallback worked | ||
| error_msg = f"Cannot access perf event '{event_name}'. Check permissions and PMU availability." | ||
| raise RuntimeError(error_msg) |
There was a problem hiding this comment.
This PR introduces substantial new behavior in hw_events.py (perf list parsing, custom JSON event resolution, PEBS modifier selection, and VM fallback logic) but there are no unit tests covering it. Given the string parsing and platform-conditional behavior, regressions are likely.
Add tests that mock run_process/perf_path output to cover: uncore rejection, built-in vs custom resolution, modifier selection for VM vs bare metal, and the :p fallback behavior.
gprofiler/utils/perf.py
Outdated
| _process_single_sample(sample, pid_to_collapsed_stacks_counters, insert_dso_name) | ||
| sample_count += 1 | ||
|
|
||
| logger.info(f"Parsed perf script output: {sample_count} samples") |
There was a problem hiding this comment.
parse_perf_script_from_iterator logs the parsed sample count at INFO on every parse. This function runs every snapshot (and for multiple perf processes), so it can generate very high-volume logs in normal operation.
Switch this to DEBUG and/or only log when diagnostics mode is enabled.
| logger.info(f"Parsed perf script output: {sample_count} samples") | |
| logger.debug(f"Parsed perf script output: {sample_count} samples") |
| [mypy-humanfriendly.*] | ||
| ignore_missing_imports = True | ||
| [mypy-cpuid.*] | ||
| ignore_missing_imports = True |
There was a problem hiding this comment.
The mypy override is [mypy-cpuid.*], but the code imports the top-level module import cpuid (no submodule). Mypy won’t apply the .* rule to the root module, so the missing-stubs error will still occur.
Add a [mypy-cpuid] section (in addition to or instead of cpuid.*) with ignore_missing_imports = True.
| ignore_missing_imports = True | |
| ignore_missing_imports = True | |
| [mypy-cpuid] | |
| ignore_missing_imports = True |
| perf_event_options.add_argument( | ||
| "--perf-event-period", | ||
| type=int, | ||
| dest="perf_event_period", | ||
| help="Use period-based sampling instead of frequency (-c instead of -F). " | ||
| "Specify the number of events between samples (e.g., 10000 for sampling every 10000 events). " | ||
| "Only valid with --perf-event.", | ||
| ) |
There was a problem hiding this comment.
--perf-event-period is parsed as a plain int, so 0/negative values are accepted and then bypass the later validation because it uses truthiness (if args.perf_event_period ...). This can lead to silently falling back to frequency sampling or passing an invalid -c value to perf.
Use the existing positive_integer type (like other numeric CLI args) and validate with is not None rather than truthiness.
Signed-off-by: Min Lim <min.yeol.lim@intel.com> Signed-off-by: Min Yeol Lim <min.yeol.lim@intel.com>
Signed-off-by: Min Yeol Lim <min.yeol.lim@intel.com>
Signed-off-by: Min Yeol Lim <min.yeol.lim@intel.com>
002cefb to
32f2967
Compare
Signed-off-by: Min Yeol Lim <min.yeol.lim@intel.com>
32f2967 to
f1bb9b8
Compare
This is the initial implementation of the request filed at #1013.
Summary
Add support for hardware event-based flamegraph generation using custom perf events (PMU events). Users
can now generate flamegraphs based on specific hardware performance counters like cache misses, branch
mispredictions, and other PMU events instead of the default CPU time-based sampling.
Key Features
New CLI Options
--perf-eventcache-misses,branch-misses). Disables--perf-event-period--perf-event.-f.--hw-events-file--perf-event.Usage Examples