Skip to content

Add execution stats to data-node fragment slow log#2

Merged
himshikhagupta merged 1 commit into
himshikhagupta:analytics-slow-logfrom
Bukhtawar:enhance-slow-log-stats
Jun 1, 2026
Merged

Add execution stats to data-node fragment slow log#2
himshikhagupta merged 1 commit into
himshikhagupta:analytics-slow-logfrom
Bukhtawar:enhance-slow-log-stats

Conversation

@Bukhtawar
Copy link
Copy Markdown

Introduces FragmentExecutionStats record with execution shape fields derived from the resolved plan alternative — the actual plan selected for execution on this node.

Fields (all zero-cost — field reads from existing objects):

  • rows_produced: output row count (already counted)
  • used_secondary_index: from resolved plan's DelegationDescriptor
  • delegated_predicate_count: number of predicates delegated to secondary index — explains cost when used_secondary_index=true
  • filter_tree_shape: CONJUNCTIVE vs INTERLEAVED_BOOLEAN_EXPRESSION — explains why delegation is expensive (complex boolean evaluation)
  • has_partial_aggregate: whether the fragment does aggregation — explains high compute with low output rows
  • task_id: from the existing task object
  • id (X-Opaque-Id): from task headers

Also refactors executeFragmentStreamingAsync to surface the resolved plan via ResolvedExecution wrapper, ensuring stats reflect the exact plan alternative that was executed.

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Introduces FragmentExecutionStats record with execution shape fields
derived from the resolved plan alternative — the actual plan selected
for execution on this node.

Fields (all zero-cost — field reads from existing objects):
- rows_produced: output row count (already counted)
- used_secondary_index: from resolved plan's DelegationDescriptor
- delegated_predicate_count: number of predicates delegated to secondary
  index — explains cost when used_secondary_index=true
- filter_tree_shape: CONJUNCTIVE vs INTERLEAVED_BOOLEAN_EXPRESSION —
  explains why delegation is expensive (complex boolean evaluation)
- has_partial_aggregate: whether the fragment does aggregation —
  explains high compute with low output rows
- task_id: from the existing task object
- id (X-Opaque-Id): from task headers

Also refactors executeFragmentStreamingAsync to surface the resolved
plan via ResolvedExecution wrapper, ensuring stats reflect the exact
plan alternative that was executed.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@himshikhagupta himshikhagupta merged commit 495ab23 into himshikhagupta:analytics-slow-log Jun 1, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants