Fix native memory duress probe#2
Open
Bukhtawar wants to merge 10 commits into
Open
Conversation
added 9 commits
May 19, 2026 02:12
Signed-off-by: Pradeep L <spradeel@amazon.com> # Conflicts: # sandbox/libs/analytics-framework/src/main/java/org/opensearch/analytics/spi/AnalyticsSearchBackendPlugin.java # sandbox/plugins/analytics-backend-datafusion/rust/src/query_tracker.rs # sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionAnalyticsBackendPlugin.java # sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java # sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/nativelib/NativeBridge.java # Conflicts: # server/src/main/java/org/opensearch/monitor/os/OsProbe.java
Rename per-task native_memory_bytes_threshold to native_memory_percent_threshold to match its actual semantics. The setting was declared as Setting<Long> with a '_bytes_threshold' key but the value was being interpreted as a percent inside NativeMemoryUsageTracker. Switch to Setting<Double> bounded to [0.0, 1.0], mirroring heap_percent_threshold. Effective per-task byte threshold is now budget * fraction, where budget is the backend-installed native-memory budget. Also includes in-progress native-memory backpressure plumbing: NativeMemoryUsageTracker budget supplier, AnalyticsShardTask now extends SearchShardTask so SBP observes it, DataFusion plugin wires currentBytesByTaskId snapshot supplier, native registry top-N FFM call, and OsProbe getProcessNativeMemoryBytes helper. Several rough edges flagged in code review remain (see PR description). Signed-off-by: Pradeep L <spradeel@amazon.com>
Tracing for the FFM rust path and the Java BP tick. Rust ffm.rs: log enter/ok-exit/err-exit for df_execute_query and df_execute_with_context. Rust query_tracker.rs: log cancel_query found/not-found and drain_completed_query drained/not-drained. NativeMemoryUsageTracker.evaluate now logs every code path (skip, inert, below-threshold, exceeds). NativeMemoryUsageTracker.refresh distinguishes null vs empty supplier and logs heaviest 5 task IDs. SearchBackpressureService.doRun logs per-tracker reasons-produced count, merge result, limiter outcome. Signed-off-by: Pradeep L <spradeel@amazon.com>
Signed-off-by: Pradeep L <spradeel@amazon.com>
Signed-off-by: Pradeep L <spradeel@amazon.com>
Signed-off-by: Pradeep L <spradeel@amazon.com>
Signed-off-by: Pradeep L <spradeel@amazon.com>
Signed-off-by: Pradeep L <spradeel@amazon.com>
Signed-off-by: Pradeep L <spradeel@amazon.com>
9fc0af4 to
861c6e5
Compare
1. Duress probe now uses NativeMemoryUsageService pool usage (sum of per-task snapshot) instead of OsProbe.getProcessNativeMemoryBytes(). Process RSS overcounts — it includes Netty direct buffers, thread stacks, mmap, not just DataFusion's pool. 2. Remove NODE_NATIVE_MEMORY_LIMIT_SETTING from NodeDuressSettings — redundant with NativeMemoryUsageService.getBudgetBytes() installed by the backend plugin. Two budget settings that can diverge is a misconfiguration trap. 3. Replace platform gate (isNativeTrackingSupported: Linux/macOS only) with hasSnapshotProvider() — feature is active when a backend installs a supplier, not based on OS. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
861c6e5 to
e048863
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.