Skip to content

[SPARK-56984][DOCS] Document the SQL PATH feature#56040

Open
srielau wants to merge 6 commits into
apache:masterfrom
srielau:SPARK-56984-document-path
Open

[SPARK-56984][DOCS] Document the SQL PATH feature#56040
srielau wants to merge 6 commits into
apache:masterfrom
srielau:SPARK-56984-document-path

Conversation

@srielau
Copy link
Copy Markdown
Contributor

@srielau srielau commented May 21, 2026

What changes were proposed in this pull request?

Adds user-facing documentation for the SQL Standard PATH feature
introduced in Spark 4.2 (SPARK-56939 and related): the SET PATH
statement, the current_path() function, path-based resolution of
unqualified routines, tables, views, and session variables, and the
supporting infrastructure (system.builtin / system.session namespaces
with builtin. / session. shortcuts, spark.sql.path.enabled,
spark.sql.defaultPath).

New pages:

  • docs/sql-ref-syntax-aux-conf-mgmt-set-path.md — reference for the
    SET PATH statement, including a dedicated subsection on how
    DEFAULT_PATH is derived and how to change it.
  • docs/sql-ref-function-current-path.md — reference for the
    current_path() builtin.

Modified pages:

  • docs/sql-ref-name-resolution.md — new SQL Path section that
    introduces the concept, the system.builtin / system.session
    namespaces and their 2-part shortcuts, the path-walk for unqualified
    DML / queries vs the current-schema rule for DDL, the frozen-path
    behavior for persistent views and SQL UDFs, and the Reserved names and
    collisions
    subsection. Table / view and function resolution sections
    rewritten accordingly.
  • docs/sql-ref-identifier.mdReserved system names table
    linking back to the canonical description.
  • docs/sql-ref-syntax-aux-describe-function.md — examples for SQL
    UDF Function / Type / Input / Returns output, qualified builtin
    lookup (system.builtin.abs), and the SQL Path: row in
    DESCRIBE FUNCTION EXTENDED.
  • docs/sql-ref-syntax-aux-describe-table.md — example for the
    SQL Path row in DESCRIBE EXTENDED on a view.
  • docs/sql-ref-syntax-ddl-create-view.md, create-sql-function.md,
    create-function.md — document the session /
    system.session qualifier on temporary objects
    (INVALID_TEMP_OBJ_QUALIFIER otherwise), and add frozen-path examples.
  • docs/sql-ref-syntax-ddl-drop-view.md, drop-function.md
    clarify DROP TEMPORARY FUNCTION vs DROP VIEW semantics and the
    qualifiers accepted in each.
  • docs/sql-ref-syntax-ddl-create-database.md — note discouraging
    the schema names session and builtin.
  • docs/sql-ref-syntax-aux-conf-mgmt-set.md,
    docs/sql-ref-syntax-aux-conf-mgmt.md, docs/sql-ref-syntax.md
    — cross-link SET PATH.
  • docs/sql-migration-guide.md — 4.1 → 4.2 entries for the
    builtin.x / session.x resolution change, the new temp-object
    qualifiers, and the opt-in PATH feature.

Why are the changes needed?

The PATH feature (SPARK-56939 and friends) shipped without external
documentation. Users have no published place to learn about SET PATH,
current_path(), the system.builtin / system.session namespaces, or
the path-walking resolution rules; the behavior change for partially
qualified builtin.x / session.x references is also a 4.1 → 4.2
migration concern that needs to be called out.

Does this PR introduce any user-facing change?

No. This change is documentation-only.

How was this patch tested?

  • Markdown lint clean on every touched file.
  • Spot-checked for non-ASCII typographic characters; none introduced.
  • Cross-checked every behavioral claim against the relevant test suites:
    sql-tests/inputs/sql-path.sql, SetPathSuite,
    FunctionQualificationSuite, RelationQualificationSuite,
    SQLFunctionSuite, and DescribeTableSuite
    (DESCRIBE EXTENDED AS JSON for view shows SQL Path when PATH is
    enabled
    ).

Local Jekyll build was attempted but blocked on Ruby 3 / Bundler 2.4.22
which were not installed in the local environment; relying on the GitHub
docs CI to validate the HTML build.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.7)

srielau added 5 commits May 21, 2026 15:58
### What changes were proposed in this pull request?

This change adds user-facing documentation for the SQL Standard PATH feature
introduced in Spark 4.2 (SPARK-56939 and related): the `SET PATH` statement,
the `current_path()` function, path-based resolution of unqualified routines,
tables, views, and session variables, and the supporting infrastructure
(`system.builtin` / `system.session` namespaces with `builtin.` / `session.`
shortcuts, `spark.sql.path.enabled`, `spark.sql.defaultPath`).

New pages:

- `docs/sql-ref-syntax-aux-conf-mgmt-set-path.md` - reference for the
  `SET PATH` statement, including a dedicated subsection on how
  `DEFAULT_PATH` is derived and how to change it.
- `docs/sql-ref-function-current-path.md` - reference for the
  `current_path()` builtin.

Modified pages:

- `docs/sql-ref-name-resolution.md` - new "SQL Path" section that introduces
  the concept, the `system.builtin` / `system.session` namespaces and their
  2-part shortcuts, the path-walk for unqualified DML / queries vs the
  current-schema rule for DDL, the frozen-path behavior for persistent views
  and SQL UDFs, and the "Reserved names and collisions" subsection. Table /
  view and function resolution sections rewritten accordingly.
- `docs/sql-ref-identifier.md` - new "Reserved system names" table linking
  back to "Reserved names and collisions".
- `docs/sql-ref-syntax-aux-describe-function.md` - examples for SQL UDF
  `Function / Type / Input / Returns` output, qualified builtin lookup
  (`system.builtin.abs`), and the `SQL Path:` row in
  `DESCRIBE FUNCTION EXTENDED`.
- `docs/sql-ref-syntax-aux-describe-table.md` - example for the `SQL Path`
  row in `DESCRIBE EXTENDED` on a view.
- `docs/sql-ref-syntax-ddl-create-view.md`, `create-sql-function.md`,
  `create-function.md` - allow `session` / `system.session` qualifier on
  temporary objects (`INVALID_TEMP_OBJ_QUALIFIER` otherwise), and add
  frozen-path examples.
- `docs/sql-ref-syntax-ddl-drop-view.md`, `drop-function.md` - clarify
  `DROP TEMPORARY FUNCTION` vs `DROP VIEW` semantics and the qualifiers
  accepted in each.
- `docs/sql-ref-syntax-ddl-create-database.md` - note discouraging the
  schema names `session` and `builtin`, with a link to the canonical
  description.
- `docs/sql-ref-syntax-aux-conf-mgmt-set.md`,
  `docs/sql-ref-syntax-aux-conf-mgmt.md`, `docs/sql-ref-syntax.md` -
  cross-link `SET PATH`.
- `docs/sql-migration-guide.md` - 4.1 -> 4.2 entries for the
  `builtin.x` / `session.x` resolution change, the new temp-object
  qualifiers, and the opt-in PATH feature.

### Why are the changes needed?

The PATH feature (SPARK-56939 and friends) shipped without external
documentation. Users have no published place to learn about `SET PATH`,
`current_path()`, the `system.builtin` / `system.session` namespaces, or the
path-walking resolution rules; the behavior change for partially qualified
`builtin.x` / `session.x` references is also a 4.1 -> 4.2 migration concern
that needs to be called out.

### Does this PR introduce _any_ user-facing change?

No. This change is documentation-only.

### How was this patch tested?

- Markdown lint clean on every touched file.
- Spot-checked for non-ASCII typographic characters; none introduced.
- Cross-checked every behavioral claim against the relevant test suites:
  `sql-tests/inputs/sql-path.sql`, `SetPathSuite`, `FunctionQualificationSuite`,
  `RelationQualificationSuite`, `SQLFunctionSuite`, and `DescribeTableSuite`
  ("DESCRIBE EXTENDED AS JSON for view shows SQL Path when PATH is enabled").

Local Jekyll build was attempted but blocked on Ruby 3 / Bundler 2.4.22
which were not installed in the local environment; relying on the GitHub
docs CI to validate the HTML build.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.7)
Consolidate the per-object resolution sections in
`sql-ref-name-resolution.md` into a single `Object name resolution`
section with the three subsections `Fully qualified`,
`Partially qualified`, and `Unqualified`. `Table and view resolution`
and `Function resolution` are now thin sections that list what each
kind of object can resolve to, carry their kind-specific notes
(common table expressions for relations) and errors
(`TABLE_OR_VIEW_NOT_FOUND` / `UNRESOLVED_ROUTINE`), and keep their
existing examples.

Move the conceptual material on the SQL Path (what it is, the
`system.builtin` / `system.session` namespaces, DML-vs-DDL,
`spark.sql.path.enabled` gating, the initial-value-of-PATH rule, and
the frozen-path semantics for persistent views and SQL UDFs) into
the Description section of `SET PATH`. Make the
`Reserved system names` section on the Identifiers page the
canonical reference for `system` / `session` / `builtin`, with the
mini-path table and the 3-part-bypass rule. Update cross-page links
to point at these new homes.

Tighten the prose pass-wide on the rewritten sections: drop
"worked examples", "in particular", "as special cases", "small
two-step", "straightforward", and similar filler; lead the 2-part
case with the common rule (`current_catalog`-prepend) and treat the
`session` / `builtin` mini-path as the exception; remove the bogus
`current_catalog.builtin.x` "special case" bullet from the 3-part
case; make `Frozen SQL Path` an inline note rather than a heading.

No behavior changes; documentation only.
Iterative copy-edit pass on the documentation introduced for the SQL
Path feature, plus a handful of small accuracy fixes uncovered during
review:

- `sql-ref-name-resolution.md`: the page intro and the
  per-object-kind sections are slimmed down. The DML/queries vs DDL
  rule is folded into the `Unqualified (1 part)` paragraph.
  Per-object error references (`TABLE_OR_VIEW_NOT_FOUND` /
  `UNRESOLVED_ROUTINE`) live with the corresponding sections.
- `sql-ref-syntax-aux-conf-mgmt-set-path.md`: the Description is
  reworked into prose ("the initial value of PATH is DEFAULT_PATH ...");
  the Syntax block now follows the rest of the Spark SQL reference
  (square brackets, alternation with braces, ellipsis for repetition,
  production name on its own line); the grammar commits only to
  two-level `catalog.schema` references; the
  `spark.sql.functionResolution.sessionOrder` configuration is
  de-emphasized; `RESET PATH` wording is dropped in favor of describing
  the actual revert mechanism (`SET PATH = DEFAULT_PATH`); two applied
  examples are added (appending a shared-UDF schema; dropping
  `system.session` to force explicit qualification).
- `sql-ref-function-current-path.md`: the Syntax block matches the
  auto-generated style (`current_path()`), with the no-parens form
  noted briefly under Arguments. The "still works when disabled"
  disclaimer and per-page `spark.sql.path.enabled` toggle are removed.
- `sql-ref-identifier.md`: the *Reserved system names* section is
  rewritten to describe the actual tie-breaker behavior (per-object
  name collisions, not whole-schema hiding) and to use real catalog
  names in examples instead of the meta-syntactic `current_catalog.x`.
- `sql-ref-functions-builtin.md`: a one-paragraph intro explains that
  built-in functions live in `system.builtin` and can be referenced
  unambiguously via the fully qualified name.
- `sql-ref-syntax-aux-describe-table.md`: the JSON schema gains a
  `sql_path` entry; the worked view example is generic (the SQL Path
  row appears in the output but is no longer the headline); the
  `DESC FORMATTED ... AS JSON` outputs are pretty-printed.
- `sql-ref-syntax-ddl-create-view.md` /
  `sql-ref-syntax-ddl-create-sql-function.md`: the frozen-path note
  moves from the object-name parameter (where it didn't belong) to the
  query/expression parameter. Both pages now state that a persistent
  view / SQL UDF cannot reference temporary views, temporary functions,
  or session variables.
- `sql-ref-syntax-ddl-drop-function.md` /
  `sql-ref-syntax-ddl-drop-view.md`: the parameter prose is shortened;
  DROP VIEW gains a worked example of using `session.v` to drop a
  temporary view that shadows a persistent one. The stale
  AnalysisException output is replaced with `Error: TABLE_OR_VIEW_NOT_FOUND`.
- `sql-ref-syntax-ddl-create-database.md`: the discouraged-name note
  shrinks to a one-line pointer to *Reserved system names*.
- `sql-ref-syntax.md`: TOC links `CREATE FUNCTION (SQL)` explicitly.
- `sql-migration-guide.md`: the 4.1 -> 4.2 entry uses
  `spark_catalog.session.x` (a real catalog name) instead of the
  meta-syntactic `current_catalog.session.x`.

No new behavior; documentation copy and accuracy only.
Four small accuracy nits caught in a self-review pass:

1. SET PATH grammar: removed the self-referential `schema_name`
   production by inlining `catalog_name . schema_name` directly into
   `path_element`. The previous form defined `schema_name` recursively
   with itself, which is hard to read literally.

2. `SYSTEM_PATH` parameter: dropped the explicit ordering "expands to
   `system.builtin, system.session`". The actual order depends on
   `spark.sql.functionResolution.sessionOrder`, which the rest of the
   page de-emphasizes. Now reads "Expands to the two system namespaces,
   `system.builtin` and `system.session`."

3. SET PATH Description: "To revert mid-session, run
   `SET PATH = DEFAULT_PATH`" overstated the operation. The statement
   stores a snapshot of `DEFAULT_PATH` into `_sessionPath` rather than
   restoring the "never-set" state, so a later change to
   `spark.sql.defaultPath` is not picked up. Reworded to "re-apply the
   current default path mid-session" with a brief parenthetical that
   names the snapshot behavior.

4. Reserved system names: "spark.sql.catalog.system = ... is
   unsupported" was correct but suggested a clean rejection. In fact
   the v2 catalog loader does not special-case `system` and registering
   it gives undefined results, per the CatalogManager comment. Now
   reads "is unsupported and may yield undefined results".

Documentation only; no behavior changes.
Plain-English copy edit on the SQL Path documentation. No content
changes; word choices favor common terms over technical jargon and
remove minor stylistic inconsistencies.

- `synthetic` -> `virtual` for the `system` catalog and namespaces
  (matches usage elsewhere in the doc set).
- `is gated by` -> `is controlled by` for `spark.sql.path.enabled`.
- `how it is gated` -> `how to enable it`.
- `A SET PATH is scoped` -> `The effect of SET PATH is scoped`
  (avoids the awkward indefinite-article noun).
- `(cycle break) rather than recursing` -> `to avoid a cycle, rather
  than recursing`.
- `live marker` -> `re-evaluated each time` in a code comment.
- `spelled out` -> `qualified explicitly` in a code comment.
- `flip the preference` -> `reverse the preference`.
- `may yield undefined results` -> `produces undefined results`.
- `literally named X` -> `named X` (drop the redundant adverb).
- `extension-injected functions` ->
  `functions injected through SparkSessionExtensions` in the migration
  guide.

Documentation only; no behavior changes.
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Thanks for the comprehensive SQL Path docs — the new SET PATH and current_path() pages plus the restructured name-resolution page give the feature a coherent reference surface. A few accuracy and consistency items below; the one I'd most like to see addressed is the broken bullet list in create-sql-function.md (item 1) — the new paragraphs land in the middle of the disallowed-expressions list and orphan the Row producing functions bullet from its siblings, which will render as two unrelated lists.

Other items are accuracy gaps (the SET PATH schema_name parameter under-documents 3+ part namespaces that the grammar actually accepts; migration guide omits session variables from the PATH feature description; the abs shadowing example doesn't quite demonstrate the rule it captions; same cust_id column shows two different type spellings in describe-table) plus small consistency fixes.

Comment thread docs/sql-ref-syntax-ddl-create-sql-function.md Outdated
Comment thread docs/sql-ref-syntax-aux-conf-mgmt-set-path.md Outdated
Comment thread docs/sql-migration-guide.md Outdated
Comment thread docs/sql-ref-function-current-path.md Outdated
Comment thread docs/sql-ref-name-resolution.md
Comment thread docs/sql-ref-syntax-aux-describe-table.md
Comment thread docs/sql-ref-name-resolution.md Outdated
Seven items from the PR review:

1. `sql-ref-syntax-ddl-create-sql-function.md`: the new frozen-path
   paragraphs were inserted inside the bulleted list of disallowed
   expression types, orphaning the `Row producing functions such as
   explode` bullet (Kramdown rendered it as two separate lists with
   body paragraphs in between). Moved the paragraphs after the list.

2. `sql-ref-syntax-aux-conf-mgmt-set-path.md`: the `schema_name`
   parameter previously said "Both parts are required" (i.e. exactly
   two), but the implementation accepts multi-level namespaces
   (`SetPathSuite` test "multi-level namespace (3+ parts) is
   accepted", and the `INVALID_SQL_PATH_SCHEMA_REFERENCE` error
   message itself documents the allowance). Updated to "`catalog.schema`
   or, for catalogs with multi-level namespaces, `catalog.ns1.ns2...`.
   At least two parts are required." The grammar block now reads
   `catalog_name . namespace [ . namespace ... ]`.

3. `sql-migration-guide.md`: the PATH-feature bullet omitted session
   variables (a documented PATH consumer with a dedicated test) and
   opened with "Spark 4.2 introduces..." while every other bullet in
   the section opens with "Since Spark 4.2,". Both fixed.

4. `sql-ref-function-current-path.md`: a stray "persisted view" in a
   code comment; the rest of the PR uses "persistent view". Fixed.

5. `sql-ref-identifier.md`: the canonical Reserved system names
   section now introduces the term "mini-path" in prose so that
   cross-page link text from `name-resolution.md` lands somewhere
   that defines it.

6. `sql-ref-syntax-aux-describe-table.md`: the same `cust_id` column
   appeared as `{"name": "int"}` in the view example and
   `{"name": "integer"}` in the legacy `customer` example. The
   doc's own JSON schema block specifies `int` for `IntegerType`,
   so the legacy example was wrong; aligned both to `int`.

7. `sql-ref-name-resolution.md`: the `abs` shadowing example created
   a 0-arg temp `abs()` and then called `abs(-5)` (one arg), which
   was a signature mismatch rather than a shadow. Rewrote with a
   matching `abs(x INT)` temp and an explicit `SET PATH =
   system.session, system.builtin, spark_catalog.default` so the
   unqualified `abs(-5)` actually resolves to the temp; the example
   then demonstrates `system.builtin.abs(-5)` reaching around the
   shadow.

Documentation only; no behavior changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants