Skip to content

analytics-eng: preserve a Sort under a Fetch in substrait conversion#21912

Open
mch2 wants to merge 1 commit into
opensearch-project:mainfrom
mch2:substrait-preserve-sort-under-fetch
Open

analytics-eng: preserve a Sort under a Fetch in substrait conversion#21912
mch2 wants to merge 1 commit into
opensearch-project:mainfrom
mch2:substrait-preserve-sort-under-fetch

Conversation

@mch2
Copy link
Copy Markdown
Member

@mch2 mch2 commented May 31, 2026

Description

A single Calcite LogicalSort that carries both a collation and a fetch/offset (a PPL sort x | head N, which Calcite merges into one node) lowers via isthmus to Fetch(Sort(input)) — two Substrait rels from one operator. DataFusionFragmentConvertor.replaceInput rewired the Fetch's input directly, dropping the Sort and producing Fetch(input). The limit then ran over the unordered concat-gather, so a multi-shard sort | head N returned the first N rows in arrival order instead of sorted order.

Descend into the Sort when the Fetch's input is one, so the rewired shape is Fetch(Sort(newInput)): gather, sort globally, then limit.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

A single Calcite LogicalSort that carries both a collation and a fetch/offset
(a PPL `sort x | head N`, which Calcite merges into one node) lowers via
isthmus to Fetch(Sort(input)) — two Substrait rels from one operator.
DataFusionFragmentConvertor.replaceInput rewired the Fetch's input directly,
dropping the Sort and producing Fetch(input). The limit then ran over the
unordered concat-gather, so a multi-shard `sort | head N` returned the first N
rows in arrival order instead of sorted order.

Descend into the Sort when the Fetch's input is one, so the rewired shape is
Fetch(Sort(newInput)): gather, sort globally, then limit.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
@mch2 mch2 requested a review from a team as a code owner May 31, 2026 10:04
@github-actions
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for f66e387: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.47%. Comparing base (dad63c0) to head (f66e387).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21912      +/-   ##
============================================
- Coverage     73.51%   73.47%   -0.05%     
+ Complexity    75582    75526      -56     
============================================
  Files          6034     6034              
  Lines        342661   342661              
  Branches      49294    49294              
============================================
- Hits         251918   251760     -158     
- Misses        70712    70855     +143     
- Partials      20031    20046      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant