Skip to content

Support EXISTS subqueries in projection list#4168

Open
g31pranjal wants to merge 5 commits into
FoundationDB:mainfrom
g31pranjal:exists_in_projection_list
Open

Support EXISTS subqueries in projection list#4168
g31pranjal wants to merge 5 commits into
FoundationDB:mainfrom
g31pranjal:exists_in_projection_list

Conversation

@g31pranjal
Copy link
Copy Markdown
Member

@g31pranjal g31pranjal commented May 14, 2026

Summary

  • Refactors ExistsValue to implement ValueWithChild instead of QuantifiedValue + NonEvaluableValue, making it a proper evaluable value that can appear in SELECT projection lists (not just WHERE clauses)
  • Adds eval() implementation to ExistsValue that returns a boolean based on whether the child value is non-null
  • Introduces ColumnarValue wrapper in the relational layer to track column positions for values in the projection list
  • Updates proto serialization (PExistsValue) to support the new generic PValue child field alongside the deprecated alias-based format

Test plan

  • New exists-in-select.yamsql test file with 7+ test cases covering various EXISTS-in-projection scenarios

@g31pranjal g31pranjal added the enhancement New feature or request label May 14, 2026
@g31pranjal g31pranjal changed the title Exists in projection list Support EXISTS subqueries in projection list May 18, 2026
@g31pranjal g31pranjal requested a review from normen662 May 19, 2026 08:22
@g31pranjal g31pranjal self-assigned this May 19, 2026
final var underlyingValue = new ExistsValue(asExistential.getQuantifier().getAlias());
getDelegate().getCurrentPlanFragment().addOperator(asExistential);
final var underlyingValue = new ExistsValue(QuantifiedObjectValue.of(asExistential.getQuantifier()));
if (getDelegate().getPlanGenerationContext().shouldProcessLiteral()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan generator is not designed to handle subqueries in projection list yet, which I think is what driving the condition to prevent plan generation twice of exists.

There should be a better way of doing this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the way it's done currently is definitely wasteful. The reason is that the selectElements are parsed twice - first, to know if there are aggregations, and again for constructing the QGM - this is what is driving the EXISTS to be explored. shouldProcessLiteral seems to be a flag for avoiding literal processing on the first go - and hence I borrow it here to avoid quantifier attachment to expression. Happy to talk about potential improvements!

@g31pranjal g31pranjal force-pushed the exists_in_projection_list branch from 7513412 to 308c043 Compare May 22, 2026 15:44
@github-actions
Copy link
Copy Markdown

📊 Metrics Diff Analysis Report

Summary

  • New queries: 11
  • Dropped queries: 0
  • Plan changed + metrics changed: 4
  • Plan unchanged + metrics changed: 2
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

  • yaml-tests/src/test/resources/exists-in-select.metrics.yaml: 11

Plan and Metrics Changed

These queries experienced both plan and metrics changes. This generally indicates that there was some planner change
that means the planning for this query may be substantially different. Some amount of query plan metrics change is expected,
but the reviewer should still validate that these changes are not excessive.

Total: 4 queries

Statistical Summary (Plan and Metrics Changed)

task_count:

  • Average change: -110.8
  • Median change: -80
  • Standard deviation: 38.8
  • Range: -175 to -80
  • Queries changed: 4
  • No regressions! 🎉

transform_count:

  • Average change: -40.5
  • Median change: -34
  • Standard deviation: 9.6
  • Range: -57 to -34
  • Queries changed: 4
  • No regressions! 🎉

transform_yield_count:

  • Average change: -5.0
  • Median change: -3
  • Standard deviation: 2.4
  • Range: -9 to -3
  • Queries changed: 4
  • No regressions! 🎉

insert_new_count:

  • Average change: -14.3
  • Median change: -11
  • Standard deviation: 4.1
  • Range: -21 to -11
  • Queries changed: 4
  • No regressions! 🎉

insert_reused_count:

  • Average change: -1.0
  • Median change: -1
  • Standard deviation: 0.0
  • Range: -1 to -1
  • Queries changed: 3
  • No regressions! 🎉

There were no queries with significant regressions detected.

Minor Changes (Plan and Metrics Changed)

In addition, there were 4 queries with minor changes.

Only Metrics Changed

These queries experienced only metrics changes without any plan changes. If these metrics have substantially changed,
then a planner change has been made which affects planner performance but does not correlate with any new outcomes,
which could indicate a regression.

Total: 2 queries

Statistical Summary (Only Metrics Changed)

task_count:

  • Average change: -123.0
  • Median change: -123
  • Standard deviation: 0.0
  • Range: -123 to -123
  • Queries changed: 2
  • No regressions! 🎉

transform_count:

  • Average change: -45.0
  • Median change: -45
  • Standard deviation: 0.0
  • Range: -45 to -45
  • Queries changed: 2
  • No regressions! 🎉

transform_yield_count:

  • Average change: -3.0
  • Median change: -3
  • Standard deviation: 0.0
  • Range: -3 to -3
  • Queries changed: 2
  • No regressions! 🎉

insert_new_count:

  • Average change: -13.0
  • Median change: -13
  • Standard deviation: 0.0
  • Range: -13 to -13
  • Queries changed: 2
  • No regressions! 🎉

insert_reused_count:

  • Average change: -1.0
  • Median change: -1
  • Standard deviation: 0.0
  • Range: -1 to -1
  • Queries changed: 2
  • No regressions! 🎉

Significant Regressions (Only Metrics Changed)

There were 2 outliers detected. Outlier queries have a significant regression in at least one field. Statistically, this represents either an increase of more than two standard deviations above the mean or a large absolute increase (e.g., 100).

  • yaml-tests/src/test/resources/versions-tests.metrics.yaml:556: EXPLAIN select t3."__ROW_VERSION" AS version3, t3.id AS id3, t4."__ROW_VERSION" AS version4, t4.id AS id4, t3.col2, t4.col4 from t3, t4 where t3.col1 = 'b' and t4.col1 = 'b' and t4.col2 = t3.col2 and exists (select 1 from t4.col4 x where x = 2)
    • explain: COVERING(T3_VERSION_WITH_COL1 <,> -> [COL1: VALUE:[0], ID: KEY:[2]]) | FILTER _.COL1 EQUALS promote(@c41 AS STRING) | FETCH | FLATMAP q0 -> { ISCAN(T4_COL2_COL4 [EQUALS q0.COL2, EQUALS promote(@c69 AS LONG)]) | FILTER _.COL1 EQUALS promote(@c41 AS STRING) AS q1 RETURN (q0.__ROW_VERSION AS VERSION3, q0.ID AS ID3, q1.__ROW_VERSION AS VERSION4, q1.ID AS ID4, q0.COL2 AS COL2, q1.COL4 AS COL4) }
    • task_count: 2519 -> 2396 (-123)
    • transform_count: 835 -> 790 (-45)
    • transform_yield_count: 190 -> 187 (-3)
    • insert_new_count: 363 -> 350 (-13)
    • insert_reused_count: 34 -> 33 (-1)
  • yaml-tests/src/test/resources/versions-tests.metrics.yaml:574: EXPLAIN select t3.id AS id3, t4."__ROW_VERSION" AS version4, t4.id AS id4, t3.col2, t4.col4 from t3, t4 where t3.col1 = 'b' and t4.col1 = 'b' and t4.col2 = t3.col2 and exists (select 1 from t4.col4 x where x = 2)
    • explain: COVERING(T3_VERSION_WITH_COL1 <,> -> [COL1: VALUE:[0], ID: KEY:[2]]) | FILTER _.COL1 EQUALS promote(@c35 AS STRING) | FETCH | FLATMAP q0 -> { ISCAN(T4_COL2_COL4 [EQUALS q0.COL2, EQUALS promote(@c63 AS LONG)]) | FILTER _.COL1 EQUALS promote(@c35 AS STRING) AS q1 RETURN (q0.ID AS ID3, q1.__ROW_VERSION AS VERSION4, q1.ID AS ID4, q0.COL2 AS COL2, q1.COL4 AS COL4) }
    • task_count: 2519 -> 2396 (-123)
    • transform_count: 834 -> 789 (-45)
    • transform_yield_count: 190 -> 187 (-3)
    • insert_new_count: 363 -> 350 (-13)
    • insert_reused_count: 34 -> 33 (-1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants