Skip to content

Support LEFT OUTER JOIN and RIGHT OUTER JOIN#4122

Open
RobertBrunel wants to merge 3 commits into
FoundationDB:mainfrom
RobertBrunel:left-join
Open

Support LEFT OUTER JOIN and RIGHT OUTER JOIN#4122
RobertBrunel wants to merge 3 commits into
FoundationDB:mainfrom
RobertBrunel:left-join

Conversation

@RobertBrunel
Copy link
Copy Markdown
Contributor

@RobertBrunel RobertBrunel commented Apr 30, 2026

Support LEFT OUTER JOIN and RIGHT OUTER JOIN

This change introduces support for left and right outer joins.

A dedicated QGM box (OuterJoinExpression) represents one outer join. It is strictly binary (unlike the SELECT box) and carries the join type (LEFT/RIGHT/FULL), the ON-clause predicates, and a reference to the “preserved” and the “null-supplying” quantifier. During the rewriting phase, an exploration rule RewriteOuterJoinRule canonicalizes the outer join box into two nested select boxes:

  • The preserved side is connected to the outer SelectExpression through a normal FOREACH quantifier.
  • The null-supplying side is wrapped in an inner SelectExpression carrying the ON predicates and is connected through a FOREACH quantifier with nullOnEmpty set to true.

The rewrite happens during canonicalization so that all subsequent planning rules (predicate push-down, join ordering, implementation) handle the join as a normal nested SELECT. No other rules need to know about OuterJoinExpression. The RewritingCostModel penalizes any surviving OuterJoinExpression, ensuring the rewritten form always wins.

Key changes:

  • OuterJoinExpression represents the outer join in the QGM.
  • QueryVisitor parses the OUTER JOIN syntax and constructs the logical OuterJoinExpression.
  • RewriteOuterJoinRule rewrites the OuterJoinExpression into nested SelectExpression boxes; it is registered in RewritingRuleSet so it fires during the canonicalization phase.
  • RewritingCostModel consults a new ExpressionCountProperty.outerJoinCount ahead of selectCount so an un-rewritten OuterJoinExpression is always more expensive than the canonical two-SELECT form.
  • Supporting changes in CardinalitiesProperty, RecordTypesProperty, LogicalPlanFragment, and RelationalExpressionMatchers to teach existing utilities about OuterJoinExpression, so cardinality, ordering, and record-type properties propagate correctly through the new node.

Testing:

  • join-tests-outer.yamsql integration tests covering LEFT and RIGHT OUTER JOIN semantics, anti-join patterns, predicate placement (ON vs WHERE) on either side of the join, and predicate push-down into either source.

Resolves #4151.

@RobertBrunel RobertBrunel self-assigned this Apr 30, 2026
@RobertBrunel RobertBrunel added the enhancement New feature or request label Apr 30, 2026
@RobertBrunel RobertBrunel requested a review from normen662 April 30, 2026 16:56
Copy link
Copy Markdown
Collaborator

@alecgrieser alecgrieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like in general a sound approach to me, and it appears to be correct in its aims. I have a few questions about the approach, as well as some documentation and testing suggestions. There are some comments in the OuterJoinExpression code that could probably be answered with either additional comments (perhaps in the PR or perhaps just in GitHub) or potentially more substantive code changes

Comment thread docs/sphinx/source/reference/sql_commands/DQL/JOIN.rst Outdated
Comment thread docs/sphinx/source/reference/sql_commands/DQL/JOIN.rst Outdated
Comment thread docs/sphinx/source/reference/Joins.rst Outdated
Comment thread yaml-tests/src/test/java/YamlIntegrationTests.java Outdated
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql Outdated
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql Outdated
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql
RobertBrunel added a commit to RobertBrunel/fdb-record-layer that referenced this pull request May 8, 2026
* General cleanup pass over `INNER_JOIN.rst` and `Joins.rst`.
* Rename `INNER_JOIN.rst` to `JOIN.rst`.
* Add documentation of LEFT OUTER JOIN to `JOIN.rst`. Support for left joins is introduced by PR FoundationDB#4122.
@RobertBrunel RobertBrunel changed the title Support LEFT OUTER JOIN [draft] Support LEFT OUTER JOIN May 8, 2026
@RobertBrunel RobertBrunel force-pushed the left-join branch 2 times, most recently from 80dd43a to 76e80f0 Compare May 8, 2026 19:56
@RobertBrunel RobertBrunel marked this pull request as ready for review May 8, 2026 19:59
@RobertBrunel RobertBrunel force-pushed the left-join branch 3 times, most recently from ab490f9 to 1659dfa Compare May 11, 2026 14:53
@RobertBrunel RobertBrunel requested review from alecgrieser and hatyo May 11, 2026 14:53
…ectness

`PullUpNullOnEmptyRule` splits a `SelectExpression` featuring a null-on-empty quantifier into two selects. However, it assigns the predicates only to the lower select. To be correct, it needs to apply them to the upper `SelectExpression` as well (“to act on any nulls produced by this quantifier”, as the Javadoc comment on the rule already says). Without this bugfix, WHERE predicates may get incorrectly pushed past the null-on-empty boundary.

Testing:
* Introduce `PullUpNullOnEmptyRuleTest` and add a regression test.

Fixes FoundationDB#4148.
This change introduces support for left and right outer joins.

A dedicated QGM box (`OuterJoinExpression`) represents one outer join. It is strictly binary (unlike the SELECT box) and carries the join type (LEFT/RIGHT/FULL), the ON-clause predicates, and a reference to the “preserved” and the “null-supplying” quantifier. During the rewriting phase, an exploration rule `RewriteOuterJoinRule` canonicalizes the outer join box into two nested select boxes:

* The preserved side is connected to the outer `SelectExpression` through a normal FOREACH quantifier.
* The null-supplying side is wrapped in an inner `SelectExpression` carrying the ON predicates and is connected through a FOREACH quantifier with `nullOnEmpty` set to true.

The rewrite happens during canonicalization so that all subsequent planning rules (predicate push-down, join ordering, implementation) handle the join as a normal nested SELECT. No other rules need to know about `OuterJoinExpression`. The `RewritingCostModel` penalizes any surviving `OuterJoinExpression`, ensuring the rewritten form always wins.

Key changes:

* `OuterJoinExpression` represents the outer join in the QGM.
* `QueryVisitor` parses the `OUTER JOIN` syntax and constructs the logical `OuterJoinExpression`.
* `RewriteOuterJoinRule` rewrites the `OuterJoinExpression` into nested `SelectExpression` boxes; it is registered in `RewritingRuleSet` so it fires during the canonicalization phase.
* `RewritingCostModel` consults a new `ExpressionCountProperty.outerJoinCount` ahead of `selectCount` so an un-rewritten `OuterJoinExpression` is always more expensive than the canonical two-SELECT form.
* Supporting changes in `CardinalitiesProperty`, `RecordTypesProperty`, `LogicalPlanFragment`, and `RelationalExpressionMatchers` to teach existing utilities about `OuterJoinExpression`, so cardinality, ordering, and record-type properties propagate correctly through the new node.

Testing:
* `join-tests-outer.yamsql` integration tests covering LEFT and RIGHT OUTER JOIN semantics, anti-join patterns, predicate placement (ON vs WHERE) on either side of the join, and predicate push-down into either source.

Resolves FoundationDB#4151.
@github-actions
Copy link
Copy Markdown

📊 Metrics Diff Analysis Report

Summary

  • New queries: 37
  • Dropped queries: 0
  • Plan changed + metrics changed: 0
  • Plan unchanged + metrics changed: 0
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

  • yaml-tests/src/test/resources/join-tests-outer.metrics.yaml: 37

@RobertBrunel RobertBrunel changed the title Support LEFT OUTER JOIN Support LEFT OUTER JOIN and RIGHT OUTER JOIN May 12, 2026
RobertBrunel added a commit to RobertBrunel/fdb-record-layer that referenced this pull request May 13, 2026
* General cleanup pass over `INNER_JOIN.rst` and `Joins.rst`.
* Rename `INNER_JOIN.rst` to `JOIN.rst`.
* Add documentation of OUTER JOIN to `JOIN.rst`. Support for outer joins is introduced in PR FoundationDB#4122.
Comment thread yaml-tests/src/test/resources/join-tests-outer.yamsql
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for LEFT OUTER JOIN and RIGHT OUTER JOIN

4 participants