From abfc56c4eea3d78489b8e7ee70530b21bbedbef5 Mon Sep 17 00:00:00 2001 From: Serge Rielau Date: Thu, 21 May 2026 15:58:37 +0200 Subject: [PATCH 1/6] [SPARK-56984][DOCS] Document the SQL PATH feature ### What changes were proposed in this pull request? This change adds user-facing documentation for the SQL Standard PATH feature introduced in Spark 4.2 (SPARK-56939 and related): the `SET PATH` statement, the `current_path()` function, path-based resolution of unqualified routines, tables, views, and session variables, and the supporting infrastructure (`system.builtin` / `system.session` namespaces with `builtin.` / `session.` shortcuts, `spark.sql.path.enabled`, `spark.sql.defaultPath`). New pages: - `docs/sql-ref-syntax-aux-conf-mgmt-set-path.md` - reference for the `SET PATH` statement, including a dedicated subsection on how `DEFAULT_PATH` is derived and how to change it. - `docs/sql-ref-function-current-path.md` - reference for the `current_path()` builtin. Modified pages: - `docs/sql-ref-name-resolution.md` - new "SQL Path" section that introduces the concept, the `system.builtin` / `system.session` namespaces and their 2-part shortcuts, the path-walk for unqualified DML / queries vs the current-schema rule for DDL, the frozen-path behavior for persistent views and SQL UDFs, and the "Reserved names and collisions" subsection. Table / view and function resolution sections rewritten accordingly. - `docs/sql-ref-identifier.md` - new "Reserved system names" table linking back to "Reserved names and collisions". - `docs/sql-ref-syntax-aux-describe-function.md` - examples for SQL UDF `Function / Type / Input / Returns` output, qualified builtin lookup (`system.builtin.abs`), and the `SQL Path:` row in `DESCRIBE FUNCTION EXTENDED`. - `docs/sql-ref-syntax-aux-describe-table.md` - example for the `SQL Path` row in `DESCRIBE EXTENDED` on a view. - `docs/sql-ref-syntax-ddl-create-view.md`, `create-sql-function.md`, `create-function.md` - allow `session` / `system.session` qualifier on temporary objects (`INVALID_TEMP_OBJ_QUALIFIER` otherwise), and add frozen-path examples. - `docs/sql-ref-syntax-ddl-drop-view.md`, `drop-function.md` - clarify `DROP TEMPORARY FUNCTION` vs `DROP VIEW` semantics and the qualifiers accepted in each. - `docs/sql-ref-syntax-ddl-create-database.md` - note discouraging the schema names `session` and `builtin`, with a link to the canonical description. - `docs/sql-ref-syntax-aux-conf-mgmt-set.md`, `docs/sql-ref-syntax-aux-conf-mgmt.md`, `docs/sql-ref-syntax.md` - cross-link `SET PATH`. - `docs/sql-migration-guide.md` - 4.1 -> 4.2 entries for the `builtin.x` / `session.x` resolution change, the new temp-object qualifiers, and the opt-in PATH feature. ### Why are the changes needed? The PATH feature (SPARK-56939 and friends) shipped without external documentation. Users have no published place to learn about `SET PATH`, `current_path()`, the `system.builtin` / `system.session` namespaces, or the path-walking resolution rules; the behavior change for partially qualified `builtin.x` / `session.x` references is also a 4.1 -> 4.2 migration concern that needs to be called out. ### Does this PR introduce _any_ user-facing change? No. This change is documentation-only. ### How was this patch tested? - Markdown lint clean on every touched file. - Spot-checked for non-ASCII typographic characters; none introduced. - Cross-checked every behavioral claim against the relevant test suites: `sql-tests/inputs/sql-path.sql`, `SetPathSuite`, `FunctionQualificationSuite`, `RelationQualificationSuite`, `SQLFunctionSuite`, and `DescribeTableSuite` ("DESCRIBE EXTENDED AS JSON for view shows SQL Path when PATH is enabled"). Local Jekyll build was attempted but blocked on Ruby 3 / Bundler 2.4.22 which were not installed in the local environment; relying on the GitHub docs CI to validate the HTML build. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.7) --- docs/sql-migration-guide.md | 3 + docs/sql-ref-function-current-path.md | 99 ++++++++ docs/sql-ref-identifier.md | 13 + docs/sql-ref-name-resolution.md | 228 +++++++++++++++-- docs/sql-ref-syntax-aux-conf-mgmt-set-path.md | 230 ++++++++++++++++++ docs/sql-ref-syntax-aux-conf-mgmt-set.md | 3 + docs/sql-ref-syntax-aux-conf-mgmt.md | 1 + docs/sql-ref-syntax-aux-describe-function.md | 90 ++++++- docs/sql-ref-syntax-aux-describe-table.md | 29 +++ docs/sql-ref-syntax-ddl-create-database.md | 7 + docs/sql-ref-syntax-ddl-create-function.md | 19 +- .../sql-ref-syntax-ddl-create-sql-function.md | 99 +++++++- docs/sql-ref-syntax-ddl-create-view.md | 99 +++++++- docs/sql-ref-syntax-ddl-drop-function.md | 25 +- docs/sql-ref-syntax-ddl-drop-view.md | 21 +- docs/sql-ref-syntax.md | 1 + 16 files changed, 914 insertions(+), 53 deletions(-) create mode 100644 docs/sql-ref-function-current-path.md create mode 100644 docs/sql-ref-syntax-aux-conf-mgmt-set-path.md diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 66531397d2cc1..7324f3128a018 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -31,6 +31,9 @@ license: | - Since Spark 4.2, Spark enables order-independent checksums for shuffle outputs by default to detect data inconsistencies during indeterminate shuffle stage retries. If a checksum mismatch is detected, Spark rolls back and re-executes all succeeding stages that depend on the shuffle output. If rolling back is not possible for some succeeding stages, the job will fail. To restore the previous behavior, set `spark.sql.shuffle.orderIndependentChecksum.enabled` and `spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch` to `false`. - Since Spark 4.2, support for Derby JDBC datasource is deprecated. - Since Spark 4.2, a new default method `mergeWith` has been added to the `CustomTaskMetric` interface. The default implementation sums the two metric values, which is correct for count-type metrics. Data source connector implementations that report non-additive metrics (e.g., maximum, average, compression ratio, or gauge values) must override `mergeWith` to provide correct merge semantics. +- Since Spark 4.2, the synthetic `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and extension-injected functions; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. Both also accept the 2-part shortcuts `builtin.x` and `session.x`. As a result, `builtin.func()` and `session.func()` now resolve to the synthetic system namespaces before any persistent schema literally named `builtin` or `session`. To restore the previous behavior (persistent schema first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `spark_catalog.session.x`). See [Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions). +- Since Spark 4.2, `CREATE TEMPORARY VIEW`, `CREATE TEMPORARY FUNCTION`, and the corresponding `DROP` statements accept the `session` and `system.session` qualifiers on the object name (in addition to the previously supported unqualified form); for example, `CREATE TEMPORARY VIEW system.session.v AS ...` and `DROP TEMPORARY FUNCTION session.f` are now valid. Any other qualifier on a temporary object is rejected with `INVALID_TEMP_OBJ_QUALIFIER`. +- Spark 4.2 introduces the SQL standard `PATH` feature: the `SET PATH` statement, the `current_path()` function, the path-based resolution of unqualified routines / tables / views, and the configurations `spark.sql.path.enabled` (default `false`) and `spark.sql.defaultPath`. The feature is opt-in; when `spark.sql.path.enabled` is `false`, unqualified resolution falls back to a fixed default path and `SET PATH` is rejected with `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. See [Name Resolution](sql-ref-name-resolution.html#sql-path) and [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). ## Upgrading from Spark SQL 4.0 to 4.1 diff --git a/docs/sql-ref-function-current-path.md b/docs/sql-ref-function-current-path.md new file mode 100644 index 0000000000000..24be3e7ec22a0 --- /dev/null +++ b/docs/sql-ref-function-current-path.md @@ -0,0 +1,99 @@ +--- +layout: global +title: current_path function +displayTitle: current_path function +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +Returns the effective SQL Path for the current session as a comma-separated string of +qualified namespace names. See [Name Resolution](sql-ref-name-resolution.html#sql-path) for a +description of how the path drives unqualified name resolution and +[`SET PATH`](sql-ref-syntax-aux-conf-mgmt-set-path.html) for how to change it. + +### Syntax + +```sql +current_path() + +CURRENT_PATH +``` + +Like `current_user`, `current_schema`, and `current_catalog`, `current_path` accepts an empty +argument list or no parentheses at all. + +### Arguments + +This function takes no arguments. + +### Returns + +A non-nullable `STRING`. Each path entry is written as a dotted name with backticks added only +where required by Spark's identifier rules. Entries are separated by a single comma. + +When the path contains the virtual `CURRENT_SCHEMA` marker, the marker is materialized as the +catalog-qualified current schema (`current_catalog.current_schema`) each time +`current_path()` is evaluated, so subsequent `USE SCHEMA` statements are reflected without +re-issuing `SET PATH`. + +`current_path()` is a regular built-in function. It remains available, and returns the default +path, even when `spark.sql.path.enabled` is `false`. + +### Examples + +```sql +> SET spark.sql.path.enabled = true; + +> SELECT current_path(); + system.builtin,system.session,spark_catalog.default + +-- ANSI no-parens form returns the same value. +> SELECT CURRENT_PATH; + system.builtin,system.session,spark_catalog.default + +-- The output reflects the latest SET PATH. +> SET PATH = spark_catalog.default, system.builtin; +> SELECT current_path(); + spark_catalog.default,system.builtin + +-- CURRENT_SCHEMA on the path is re-evaluated on every call. +> SET PATH = CURRENT_SCHEMA, system.builtin; +> USE spark_catalog.finance; +> SELECT current_path(); + spark_catalog.finance,system.builtin +> USE spark_catalog.default; +> SELECT current_path(); + spark_catalog.default,system.builtin + +-- Inside a persisted view or SQL function body, current_path() returns the invoker's path, +-- not the frozen path captured at creation time. +> SET PATH = spark_catalog.default, system.builtin; +> CREATE VIEW v_path AS SELECT current_path() AS p; +> SET PATH = spark_catalog.other, system.builtin; +> SELECT * FROM v_path; + spark_catalog.other,system.builtin + +-- current_path() still returns the default path when SET PATH is disabled. +> SET spark.sql.path.enabled = false; +> SELECT current_path(); + system.builtin,system.session,spark_catalog.default +``` + +### Related Statements + +* [Name Resolution](sql-ref-name-resolution.html) +* [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) +* [Built-in Functions](sql-ref-functions-builtin.html) diff --git a/docs/sql-ref-identifier.md b/docs/sql-ref-identifier.md index 7aca08ea9fd8d..0c768a212f431 100644 --- a/docs/sql-ref-identifier.md +++ b/docs/sql-ref-identifier.md @@ -52,6 +52,19 @@ An identifier is a string used to identify a database object such as a table, vi Any character from the character set. Use ` to escape special characters (e.g., `). +### Reserved system names + +A few names have a special meaning in object identifiers and should be avoided as user-defined +names. They are interpreted by Spark's [SQL Path](sql-ref-name-resolution.html#sql-path) and are +documented in detail under +[Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions). + +| Name | Position | Notes | +| :--- | :------- | :---- | +| `system` | catalog | Synthetic catalog hosting `system.builtin` and `system.session`. Registering a v2 catalog under this name (`spark.sql.catalog.system = ...`) is not supported. | +| `builtin` | schema in any catalog | The 2-part form `builtin.x` is interpreted as the synonym `system.builtin.x`. A persistent schema literally named `builtin` is allowed but discouraged; reach it as `current_catalog.builtin.x`. | +| `session` | schema in any catalog | The 2-part form `session.x` is interpreted as the synonym `system.session.x`. A persistent schema literally named `session` is allowed but discouraged; reach it as `current_catalog.session.x`. | + ### Examples ```sql diff --git a/docs/sql-ref-name-resolution.md b/docs/sql-ref-name-resolution.md index 2532f05e164b3..1a3cb3ac5017c 100644 --- a/docs/sql-ref-name-resolution.md +++ b/docs/sql-ref-name-resolution.md @@ -19,7 +19,101 @@ license: | limitations under the License. --- -Name resolution is the process by which [identifiers](sql-ref-identifier.html) are resolved to specific column-, field-, parameter-, or table-references. +Name resolution is the process by which [identifiers](sql-ref-identifier.html) are resolved to specific column-, field-, parameter-, table-, function-, or variable-references. + +## SQL Path + +For unqualified references to functions, tables, views, and session variables Spark walks +an ordered list of namespaces known as the **SQL Path**. The first match along the path wins. + +The path is a list of catalog-qualified schema names. In addition to ordinary persistent schemas it +can refer to two virtual system namespaces: + +- `system.builtin` — the set of built-in functions provided by Spark (such as `abs`, `concat`, + `current_user`, `current_path`, ...). Includes functions injected by `SparkSessionExtensions`. +- `system.session` — the per-session namespace that holds temporary views, temporary functions, + and session variables created in the current session. + +Both system namespaces are special: they cannot be created or dropped, and persistent objects with +these names live in different (`spark_catalog`-qualified) schemas. The 2-part shortcuts +`builtin.name` and `session.name` are accepted as synonyms for `system.builtin.name` and +`system.session.name`. + +The path is observable through the [`current_path()`](sql-ref-function-current-path.html) function. + +### Enabling and setting the path + +The `SET PATH` statement is gated by the `spark.sql.path.enabled` configuration (default `false`). +When `false`, `SET PATH` raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`, but unqualified +resolution still walks a fixed default path and `current_path()` still returns it. + +When `spark.sql.path.enabled` is `true`, you can change the path with +[`SET PATH`](sql-ref-syntax-aux-conf-mgmt-set-path.html), for example: + +```sql +SET PATH = spark_catalog.analytics, spark_catalog.default, system.builtin; +``` + +If `SET PATH` has not been issued in the session, the effective path is the **default path**, +which is either taken from the `spark.sql.defaultPath` configuration (when set) or composed +automatically from `system.builtin`, `system.session`, and the current schema. The same +`DEFAULT_PATH` value is what `SET PATH = DEFAULT_PATH` expands to. See +[How `DEFAULT_PATH` is derived](sql-ref-syntax-aux-conf-mgmt-set-path.html#how-default_path-is-derived) +for the full derivation and how to change it. + +Inside `SET PATH` the following shortcut tokens are accepted: + +| Token | Expands to | +| :---- | :--------- | +| `DEFAULT_PATH` | The default path described above. | +| `SYSTEM_PATH` | `system.builtin` and `system.session`, in the configured order. | +| `PATH` | The current value of the path (useful when appending). | +| `CURRENT_SCHEMA` / `CURRENT_DATABASE` | A virtual marker that resolves to the current schema (`current_catalog.current_schema`) every time the path is consulted. | + +### When the path is consulted + +The path participates only in **DML** (`SELECT`, `INSERT`, `UPDATE`, `DELETE`, `MERGE`, ...) and in +query expressions inside DDL bodies. DDL itself — `CREATE TABLE`, `CREATE VIEW`, +`CREATE FUNCTION`, `DROP ...`, `ALTER ...`, etc. — resolves unqualified object names against the +current catalog and schema (`current_catalog.current_schema`), not the path. This is so that +`CREATE TABLE t` always creates `t` in the current schema regardless of how PATH is set. + +When you create a persistent view or a SQL UDF, Spark captures the effective path at creation time +into the object's metadata. Each time the view or function is invoked its body resolves against +that **frozen path**, not the invoker's current path. Invocations of `current_schema()` and +`current_path()` inside the body still reflect the invoker's context. + +### Reserved names and collisions + +The SQL Path feature relies on three names being treated specially: + +- **`system`** as a catalog. Spark's `system` catalog is a synthetic namespace; it serves the + `system.builtin` and `system.session` entries and is not loadable as a v2 catalog plugin. The + current catalog cannot be `system` and `SET PATH` does not look up `system` through the v2 + catalog API. Registering a v2 catalog under the name `system` + (`spark.sql.catalog.system = ...`) is therefore not supported. + +- **`session`** as a schema name in any catalog. Persistent schemas literally named `session` are + allowed by the catalog API but are discouraged: the unqualified 2-part form `session.x` is + interpreted as the synonym `system.session.x` (a temporary object) by default. To target a + persistent schema called `session`, qualify it with the catalog name + (`spark_catalog.session.x`). + +- **`builtin`** as a schema name in any catalog. Persistent schemas literally named `builtin` are + similarly allowed but discouraged: the unqualified 2-part form `builtin.x` is interpreted as the + synonym `system.builtin.x`. To target a persistent schema called `builtin`, qualify it with the + catalog name (`spark_catalog.builtin.x`). + +These collisions matter only for 2-part names; 1-part lookups always go through the SQL Path, and +3-part names are never ambiguous. + +Two internal configurations let advanced users tune the behavior when collisions exist; ordinary +workloads should not need to change them. + +| Configuration | Purpose | +| :------------ | :------ | +| `spark.sql.legacy.persistentCatalogFirst` | When `true`, the legacy lookup order is used for partially qualified `builtin.x` and `session.x`: the persistent catalog (e.g. `spark_catalog.builtin.x`) is tried first, and only if it does not yield a match does Spark fall back to the synthetic `system.builtin.x` / `system.session.x`. Default `false` (system namespace wins). | +| `spark.sql.functionResolution.sessionOrder` | Controls where the per-session `system.session` namespace sits relative to `system.builtin` and the current persistent schema when assembling the default path. Values: `first` (session, builtin, persistent), `second` (builtin, session, persistent — default), `last` (builtin, persistent, session). Affects both `DEFAULT_PATH` expansion and the unqualified search path reported in `UNRESOLVED_ROUTINE` / `TABLE_OR_VIEW_NOT_FOUND` errors. | ## Column, field, parameter, and variable resolution @@ -137,7 +231,10 @@ In detail, resolution of identifiers to a specific reference follows these rules 1. **Session Variables** - 1. Match the identifier to a variable name. If the identifier is qualified, the qualifier must be `session` or `system.session`. + 1. Match the identifier to a session variable name. + If the identifier is qualified, the qualifier must be `session` or `system.session`. + If the identifier is unqualified, `system.session` must be present on the SQL Path + (the default path includes it). 1. If the identifier is qualified, match to a field or map key of a variable following rule 1.c ### Limitations @@ -258,7 +355,7 @@ This restriction also applies to parameter references in SQL functions. ## Table and view resolution -An identifier in table-reference can be any one of the following: +An identifier in a table reference can be any of the following: - Persistent table or view - Common table expression (CTE) @@ -270,23 +367,37 @@ Resolution of an identifier depends on whether it is qualified: If the identifier is fully qualified with three parts: `catalog.schema.relation`, it is unique. - If the identifier consists of two parts: `schema.relation`, it is further qualified with the result of `SELECT current_catalog()` to make it unique. + If the identifier consists of two parts: `schema.relation`, it is further qualified with the + result of `SELECT current_catalog()` to make it unique. + As a special case, the schema `session` is implicitly qualified with the catalog `system` and + interpreted as a temporary view. + + If the identifier is `system.session.relation`, it targets the temporary view scope only. - **Unqualified** 1. **Common table expression** - If the reference is within the scope of a `WITH` clause, match the identifier to a CTE starting with the immediately containing `WITH` clause and moving outwards from there. + If the reference is within the scope of a `WITH` clause, match the identifier to a CTE + starting with the immediately containing `WITH` clause and moving outwards from there. + + 1. **SQL Path walk** - 1. **Temporary view** + For each entry on the [SQL Path](#sql-path) in order: - Match the identifier to any temporary view defined within the current session. + - When the entry is `system.session`, attempt to match the identifier as a temporary view. + - Otherwise, fully qualify the identifier with the entry (`catalog.schema.relation`) and look + it up as a persistent relation. - 1. **Persisted table** + The first match wins. If no entry yields a match, the relation is unresolved. - Fully qualify the identifier by pre-pending the result of `SELECT current_catalog()` and `SELECT current_schema()` and look it up as a persistent relation. +If the relation cannot be resolved to any table, view, or CTE, Spark raises a +`TABLE_OR_VIEW_NOT_FOUND` error. The error includes the effective search path, for example +`searchPath = [system.builtin, system.session, spark_catalog.default]`. -If the relation cannot be resolved to any table, view, or CTE, Databricks raises a TABLE_OR_VIEW_NOT_FOUND error. +> Note: persistent views capture their creation-time SQL Path. When a persistent view is +> referenced, the body resolves against the frozen path rather than the invoker's current path; +> see [SQL Path](#sql-path). ### Examples @@ -317,7 +428,13 @@ If the relation cannot be resolved to any table, view, or CTE, Databricks raises > SELECT c1 FROM rel; 2 --- Temporary views cannot be qualified, so qualifiecation resolved to the table: +-- A temporary view can be qualified with `session` or `system.session`: +> SELECT c1 FROM session.rel; + 2 +> SELECT c1 FROM system.session.rel; + 2 + +-- Other 2-part qualifications resolve to the persisted table: > SELECT c1 FROM default.rel; 1 @@ -343,6 +460,25 @@ If the relation cannot be resolved to any table, view, or CTE, Databricks raises SELECT 1), cte; [TABLE_OR_VIEW_NOT_FOUND] The table or view `cte` cannot be found. + +-- PATH drives unqualified relation lookup order +> SET spark.sql.path.enabled = true; +> CREATE SCHEMA db_a; +> CREATE SCHEMA db_b; +> CREATE TABLE db_a.t USING parquet AS SELECT 1 AS v; +> CREATE TABLE db_b.t USING parquet AS SELECT 2 AS v; + +> SET PATH = spark_catalog.db_a, spark_catalog.db_b, system.builtin; +> SELECT v FROM t; + 1 + +> SET PATH = spark_catalog.db_b, spark_catalog.db_a, system.builtin; +> SELECT v FROM t; + 2 + +-- Three-part `system.session.x` references the temporary scope only: +> SELECT * FROM system.session.no_such_view; + [TABLE_OR_VIEW_NOT_FOUND] ... `system`.`session`.`no_such_view` ... ``` ## Function resolution @@ -351,9 +487,10 @@ A function reference is recognized by the mandatory trailing set of parentheses. It can resolve to: -- A builtin function provided by Spark, -- A temporary user defined function scoped to the current session, or -- A persistent user defined function. +- A built-in function provided by Spark or a `SparkSessionExtensions` injection + (`system.builtin`), +- A temporary user-defined function scoped to the current session (`system.session`), or +- A persistent user-defined function stored in a catalog schema. Resolution of a function name depends on whether it is qualified: @@ -361,27 +498,34 @@ Resolution of a function name depends on whether it is qualified: If the name is fully qualified with three parts: `catalog.schema.function`, it is unique. - If the name consists of two parts: `schema.function`, it is further qualified with the result of `SELECT current_catalog()` to make it unique. + If the name consists of two parts: `schema.function`, it is further qualified with the result of + `SELECT current_catalog()` to make it unique. As a special case, the 2-part shortcuts + `builtin.function` and `session.function` are accepted as synonyms for `system.builtin.function` + and `system.session.function` respectively. - The function is then looked up in the catalog. + The function is then looked up in the resulting namespace. - **Unqualified** - For unqualified function names Spark follows a fixed order of precedence (`PATH`): - - 1. **Builtin function** + For unqualified function names Spark walks the [SQL Path](#sql-path) and returns the first match: - If a function by this name exists among the set of built-in functions, that function is chosen. + 1. For each entry on the path in order: - 1. **Temporary function** + - When the entry is `system.builtin`, attempt to match against the set of built-in functions. + - When the entry is `system.session`, attempt to match against temporary functions. + - Otherwise, fully qualify the function name with the entry (`catalog.schema.function`) and + look it up as a persistent function. - If a function by this name exists among the set of temporary functions, that function is chosen. + 2. The first match wins. If no entry yields a match, the function is unresolved. - 1. **Persisted function** +If the function cannot be resolved Spark raises an `UNRESOLVED_ROUTINE` error. The error includes +the effective search path, for example +`searchPath = [system.builtin, system.session, spark_catalog.default]`. - Fully qualify the function name by pre-pending the result of `SELECT current_catalog()` and `SELECT current_schema()` and look it up as a persistent function. - -If the function cannot be resolved Spark raises an `UNRESOLVED_ROUTINE` error. +> Note: SQL user-defined functions (`CREATE FUNCTION`) capture their creation-time SQL Path. When +> the function is invoked, the body resolves against the frozen path rather than the invoker's +> current path; see [SQL Path](#sql-path). Inside the body, `current_schema()` and +> `current_path()` still reflect the invoker's context. ### Examples @@ -420,4 +564,36 @@ If the function cannot be resolved Spark raises an `UNRESOLVED_ROUTINE` error. -- To resolve the persistent function it now needs qualification > SELECT spark_catalog.default.func(4, 3); 6 + +-- A built-in can always be reached by qualification, even when shadowed +> CREATE TEMPORARY FUNCTION abs() RETURNS INT RETURN 999; +> SELECT abs(-5); + 5 +> SELECT session.abs(); + 999 +> SELECT builtin.abs(-5); + 5 +> SELECT system.builtin.abs(-5); + 5 +> DROP TEMPORARY FUNCTION abs; + +-- PATH controls unqualified routine lookup order +> SET spark.sql.path.enabled = true; +> CREATE SCHEMA path_a; +> CREATE SCHEMA path_b; +> CREATE FUNCTION path_a.pick() RETURNS INT RETURN 10; +> CREATE FUNCTION path_b.pick() RETURNS INT RETURN 20; + +> SET PATH = spark_catalog.path_a, spark_catalog.path_b, system.builtin; +> SELECT pick(); + 10 + +> SET PATH = spark_catalog.path_b, spark_catalog.path_a, system.builtin; +> SELECT pick(); + 20 + +-- Unresolved routine lists the effective search path +> SET PATH = spark_catalog.default, system.builtin; +> SELECT does_not_exist(); + [UNRESOLVED_ROUTINE] ... searchPath: [`spark_catalog`.`default`, `system`.`builtin`] ... ``` diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md new file mode 100644 index 0000000000000..68e873eef80b5 --- /dev/null +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md @@ -0,0 +1,230 @@ +--- +layout: global +title: SET PATH +displayTitle: SET PATH +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +### Description + +`SET PATH` changes the **SQL Path** of the current session. The SQL Path is the ordered list of +namespaces Spark walks when resolving unqualified references to functions, tables, views, and +session variables. The first match along the path wins. See +[Name Resolution](sql-ref-name-resolution.html#sql-path) for the conceptual overview. + +The path is observable through the [`current_path()`](sql-ref-function-current-path.html) function. + +`SET PATH` is gated by `spark.sql.path.enabled`. When that configuration is `false` (the default), +`SET PATH` raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. Unqualified resolution still uses +a fixed default path, and `current_path()` still returns it. + +### Syntax + +```sql +SET PATH = path_element [, path_element ...] + +path_element + : DEFAULT_PATH + | SYSTEM_PATH + | PATH + | CURRENT_SCHEMA + | CURRENT_DATABASE + | schema_name + +schema_name + : { catalog_name . namespace_name [ . namespace_name ...] } +``` + +### Parameters + +* **`DEFAULT_PATH`** + + Expands to the value of the `spark.sql.defaultPath` configuration, or, when that configuration + is empty, to the spark-built-in default: `system.builtin`, `system.session`, and the current + schema, interleaved per `spark.sql.functionResolution.sessionOrder` + (`first` / `second` (default) / `last`). + +* **`SYSTEM_PATH`** + + Expands to the two system namespaces `system.builtin` and `system.session`, in the order + configured by `spark.sql.functionResolution.sessionOrder`. + +* **`PATH`** + + Expands to the **current** value of the SQL Path. Useful for appending entries without + re-typing them, for example `SET PATH = PATH, spark_catalog.analytics`. + `PATH` is not allowed in the value of `spark.sql.defaultPath` (it would create a cycle). + +* **`CURRENT_SCHEMA`** / **`CURRENT_DATABASE`** + + A virtual marker that resolves to the catalog-qualified current schema + (`current_catalog.current_schema`) every time the path is consulted. This means subsequent + `USE SCHEMA` statements are picked up without re-issuing `SET PATH`. + `CURRENT_DATABASE` is a synonym for `CURRENT_SCHEMA`. + +* **`schema_name`** + + An explicit catalog-qualified schema reference. At least two parts are required + (`catalog.namespace`). The catalog and schema do not need to exist at the time of `SET PATH`; + non-existent entries are silently skipped during name resolution. + + Identifier quoting follows the usual rules; backtick-quoted parts that contain a dot are + preserved, for example `spark_catalog.` + `` `sch.b` ``. + +### Description + +* Duplicate entries are detected after expansion and raise `DUPLICATE_SQL_PATH_ENTRY`. + Comparisons honor the session's case sensitivity setting. +* `CURRENT_SCHEMA` and `CURRENT_DATABASE` are aliases and are flagged as a duplicate if both are + listed. +* Identifier case is preserved in storage and in `current_path()` output. +* Setting the path takes effect immediately. The change is scoped to the current session and is + reset by `RESET` or by closing the session. Cloned sessions inherit the parent's path + at clone time, but later changes in a child session do not propagate back. + +#### How `DEFAULT_PATH` is derived + +`DEFAULT_PATH` is what `SET PATH = DEFAULT_PATH` expands to, and it is also the effective path of +a session that has never issued `SET PATH`. It has two layers: + +1. If `spark.sql.defaultPath` is set to a non-empty value, that value is parsed using the same + grammar as `SET PATH` (with one restriction: the `PATH` keyword is not allowed inside the conf + value, since it would be self-referential). The parsed value is `DEFAULT_PATH`. + + The conf value is validated for syntax at the time it is set; an invalid value is rejected + with `Cannot modify the value of the SQL config 'spark.sql.defaultPath'`. Static duplicates + inside the conf are tolerated (unlike interactive `SET PATH`, which rejects them) so a later + `USE SCHEMA` cannot turn a previously valid default into a runtime error. + + A `DEFAULT_PATH` token *inside* the conf value resolves to the spark-built-in default below + (cycle break) rather than recursing. + +2. If `spark.sql.defaultPath` is empty (the factory default), `DEFAULT_PATH` is the + **spark-built-in default**: `system.builtin`, `system.session`, and the current schema + (`current_catalog.current_schema`), interleaved per + `spark.sql.functionResolution.sessionOrder`: + + | `sessionOrder` | Order produced | + | :------------- | :------------- | + | `first` | `system.session`, `system.builtin`, `current_schema` | + | `second` (default) | `system.builtin`, `system.session`, `current_schema` | + | `last` | `system.builtin`, `current_schema`, `system.session` | + +To change `DEFAULT_PATH`, set the conf via any of the usual mechanisms: + +* In a session: `SET spark.sql.defaultPath = system.session, system.builtin, current_schema;` +* In static configuration: pass `--conf spark.sql.defaultPath=...` to `spark-submit`, set it in + `SparkConf`, or add it to `spark-defaults.conf`. +* To return to the spark-built-in default, clear the conf with + `RESET spark.sql.defaultPath` (or set it to an empty string). + +`spark.sql.functionResolution.sessionOrder` and `spark.sql.legacy.persistentCatalogFirst` are +internal configurations intended for advanced use; ordinary workloads should leave them at their +defaults. See [Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions). + +#### Error conditions + +| Condition | Cause | +| :-------- | :---- | +| `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED` | `SET PATH` was issued while `spark.sql.path.enabled` is `false`. | +| `INVALID_SQL_PATH_SCHEMA_REFERENCE` | A `schema_name` was given with fewer than two parts. | +| `DUPLICATE_SQL_PATH_ENTRY` | Two entries collapsed to the same concrete namespace after expansion. | + +#### Reserved names + +`system` is reserved as a catalog name and `session` / `builtin` are discouraged as schema names +because they collide with the 2-part shortcuts for the system namespaces. See +[Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions) for +the details and for the internal configurations that tune the behavior when collisions exist. + +### Examples + +```sql +-- Enable the feature first; the default is false. +> SET spark.sql.path.enabled = true; + +-- Observe the default path. +> SELECT current_path(); + system.builtin,system.session,spark_catalog.default + +-- Replace the path with explicit entries. +> SET PATH = spark_catalog.default, system.builtin; +> SELECT current_path(); + spark_catalog.default,system.builtin + +-- Identifier case is preserved. +> SET PATH = Spark_Catalog.Default, System.Builtin; +> SELECT current_path(); + Spark_Catalog.Default,System.Builtin + +-- Backtick-quoted parts that contain a dot round-trip with quoting. +> SET PATH = spark_catalog.`sch.b`, system.builtin; +> SELECT current_path(); + spark_catalog.`sch.b`,system.builtin + +-- DEFAULT_PATH and SYSTEM_PATH shortcuts. +> SET PATH = DEFAULT_PATH; +> SET PATH = SYSTEM_PATH; +> SELECT current_path(); + system.builtin,system.session + +-- Append an entry by referring to the current path. +> SET PATH = spark_catalog.default, system.builtin; +> SET PATH = PATH, spark_catalog.analytics; +> SELECT current_path(); + spark_catalog.default,system.builtin,spark_catalog.analytics + +-- CURRENT_SCHEMA is a live marker; USE SCHEMA updates the effective path. +> SET PATH = CURRENT_SCHEMA, system.builtin; +> USE spark_catalog.finance; +> SELECT current_path(); + spark_catalog.finance,system.builtin +> USE spark_catalog.default; +> SELECT current_path(); + spark_catalog.default,system.builtin + +-- DEFAULT_PATH can be customized via the conf. +> SET spark.sql.defaultPath = system.session, system.builtin, current_schema; +> SET PATH = DEFAULT_PATH; +> SELECT current_path(); + system.session,system.builtin,spark_catalog.default +> RESET spark.sql.defaultPath; + +-- Error cases. +> SET PATH = spark_catalog.default, spark_catalog.default; + [DUPLICATE_SQL_PATH_ENTRY] + +> SET PATH = my_schema_no_catalog; + [INVALID_SQL_PATH_SCHEMA_REFERENCE] + +-- PATH is rejected as a value of the DEFAULT_PATH conf (would cycle). +> SET spark.sql.defaultPath = PATH, system.builtin; + [Error: invalid value] + +-- SET PATH is rejected when the feature is disabled. +> SET spark.sql.path.enabled = false; +> SET PATH = spark_catalog.default; + [UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED] +``` + +### Related Statements + +* [Name Resolution](sql-ref-name-resolution.html) +* [`current_path` function](sql-ref-function-current-path.html) +* [SET](sql-ref-syntax-aux-conf-mgmt-set.html) +* [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) +* [USE DATABASE](sql-ref-syntax-ddl-usedb.html) diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set.md b/docs/sql-ref-syntax-aux-conf-mgmt-set.md index 9e57a221f9688..396559ca48e74 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt-set.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set.md @@ -25,6 +25,8 @@ The SET command sets a property, returns the value of an existing property or re To set SQL variables defined with [DECLARE VARIABLE](sql-ref-syntax-ddl-declare-variable.html) use [SET VAR](sql-ref-syntax-aux-set-var.html). +To change the session SQL Path used for unqualified name resolution use [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). + ### Syntax ```sql @@ -72,3 +74,4 @@ SET spark.sql.variable.substitute; * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) * [SET VAR](sql-ref-syntax-aux-set-var.html) +* [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) diff --git a/docs/sql-ref-syntax-aux-conf-mgmt.md b/docs/sql-ref-syntax-aux-conf-mgmt.md index 3312bcb503500..6b809d4a94655 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt.md @@ -22,3 +22,4 @@ license: | * [SET](sql-ref-syntax-aux-conf-mgmt-set.html) * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) * [SET TIME ZONE](sql-ref-syntax-aux-conf-mgmt-set-timezone.html) + * [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) diff --git a/docs/sql-ref-syntax-aux-describe-function.md b/docs/sql-ref-syntax-aux-describe-function.md index 0c5a3d751a564..bc23daebc16ef 100644 --- a/docs/sql-ref-syntax-aux-describe-function.md +++ b/docs/sql-ref-syntax-aux-describe-function.md @@ -22,9 +22,15 @@ license: | ### Description `DESCRIBE FUNCTION` statement returns the basic metadata information of an -existing function. The metadata information includes the function name, implementing -class and the usage details. If the optional `EXTENDED` option is specified, the basic -metadata information is returned along with the extended usage information. +existing function. For built-in and external (Java/Hive) functions the output includes the +function name, implementing class, and usage details. For +[SQL user-defined functions](sql-ref-syntax-ddl-create-sql-function.html) the output describes +the function signature (input parameters, return type/columns) and, with `EXTENDED`, the +function body, characteristics, and the frozen +[SQL Path](sql-ref-name-resolution.html#sql-path) that was captured at creation time. + +If the optional `EXTENDED` option is specified, the basic metadata is returned along with the +extended information. ### Syntax @@ -36,12 +42,12 @@ metadata information is returned along with the extended usage information. * **function_name** - Specifies a name of an existing function in the system. The function name may be - optionally qualified with a database name. If `function_name` is qualified with - a database then the function is resolved from the user specified database, otherwise - it is resolved from the current database. + Specifies a name of an existing function. The function name follows the regular + [name resolution](sql-ref-name-resolution.html#function-resolution) rules: unqualified + names walk the SQL Path; 2- and 3-part names may use the `system.builtin` / `system.session` + namespaces (or their shortcuts `builtin` / `session`). - **Syntax:** `[ database_name. ] function_name` + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` ### Examples @@ -102,6 +108,72 @@ DESC FUNCTION EXTENDED explode; | 10 | | 20 | +---------------------------------------------------------------+ + +-- Built-in functions can be qualified with `builtin` or `system.builtin`. +DESC FUNCTION system.builtin.abs; ++-------------------------------------------------------------------+ +|function_desc | ++-------------------------------------------------------------------+ +|Function: abs | +|Class: org.apache.spark.sql.catalyst.expressions.Abs | +|Usage: abs(expr) - Returns the absolute value of the numeric value.| ++-------------------------------------------------------------------+ + +-- Describe a SQL scalar UDF: the output uses the SQL function layout +-- (Function / Type / Input / Returns). +CREATE FUNCTION area(x DOUBLE, y DOUBLE) RETURNS DOUBLE RETURN x * y; +DESC FUNCTION area; ++-------------------------------+ +|function_desc | ++-------------------------------+ +|Function: spark_catalog.default.area| +|Type: SCALAR | +|Input: x DOUBLE | +| y DOUBLE | +|Returns: DOUBLE | ++-------------------------------+ + +-- Describe a SQL table UDF. +CREATE FUNCTION getemps(deptno INT) + RETURNS TABLE (id INT, name STRING) + RETURN SELECT id, name FROM employee WHERE employee.deptno = getemps.deptno; +DESC FUNCTION getemps; ++--------------------------------------+ +|function_desc | ++--------------------------------------+ +|Function: spark_catalog.default.getemps| +|Type: TABLE | +|Input: deptno INT | +|Returns: id INT | +| name STRING | ++--------------------------------------+ + +-- DESC FUNCTION EXTENDED for a SQL UDF adds the body, the characteristic clauses, +-- the captured SQL configs, the owner, the create time, and the frozen SQL Path +-- (when `spark.sql.path.enabled` is true). +SET spark.sql.path.enabled = true; +SET PATH = spark_catalog.default, system.builtin; +CREATE FUNCTION frozen_fn() RETURNS INT + COMMENT 'demo function' + RETURN (SELECT MAX(id) FROM frozen_t); +DESC FUNCTION EXTENDED frozen_fn; ++-----------------------------------------------------------------+ +|function_desc | ++-----------------------------------------------------------------+ +|Function: spark_catalog.default.frozen_fn | +|Type: SCALAR | +|Input: () | +|Returns: INT | +|Comment: demo function | +|Deterministic:false | +|Data Access: READS SQL DATA | +|Configs: spark.sql.ansi.enabled=true | +| ... | +|Owner: | +|Create Time: Wed Apr 30 14:05:43 PDT 2026 | +|Body: (SELECT MAX(id) FROM frozen_t) | +|SQL Path: spark_catalog.default, system.builtin | ++-----------------------------------------------------------------+ ``` ### Related Statements @@ -109,3 +181,5 @@ DESC FUNCTION EXTENDED explode; * [DESCRIBE DATABASE](sql-ref-syntax-aux-describe-database.html) * [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html) * [DESCRIBE QUERY](sql-ref-syntax-aux-describe-query.html) +* [CREATE FUNCTION (SQL)](sql-ref-syntax-ddl-create-sql-function.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-aux-describe-table.md b/docs/sql-ref-syntax-aux-describe-table.md index 46d9432f5d072..b86a7bafbd793 100644 --- a/docs/sql-ref-syntax-aux-describe-table.md +++ b/docs/sql-ref-syntax-aux-describe-table.md @@ -274,6 +274,34 @@ DESCRIBE customer salesdb.customer.name; -- Returns the table metadata in JSON format. DESC FORMATTED customer AS JSON; {"table_name":"customer","catalog_name":"spark_catalog","schema_name":"default","namespace":["default"],"columns":[{"name":"cust_id","type":{"name":"integer"},"nullable":true},{"name":"name","type":{"name":"string"},"comment":"Short name","nullable":true},{"name":"state","type":{"name":"varchar","length":20},"nullable":true}],"location": "file:/tmp/salesdb.db/custom...","created_time":"2020-04-07T14:05:43Z","last_access":"UNKNOWN","created_by":"None","type":"MANAGED","provider":"parquet","partition_provider":"Catalog","partition_columns":["state"]} + +-- DESCRIBE EXTENDED on a view emits view-specific rows. When `spark.sql.path.enabled` is true, +-- the output also includes the frozen `SQL Path` that the view body resolves against; see +-- [Name Resolution](sql-ref-name-resolution.html#sql-path). +SET spark.sql.path.enabled = true; +SET PATH = spark_catalog.default, system.builtin; +CREATE VIEW recent_customers AS + SELECT cust_id, name FROM customer WHERE cust_id > 1000; + +DESCRIBE EXTENDED recent_customers; ++----------------------------+---------------------------------------+--------+ +| col_name| data_type| comment| ++----------------------------+---------------------------------------+--------+ +| cust_id| int| null| +| name| string| null| +| | | | +|# Detailed Table Information| | | +| Catalog | spark_catalog| | +| Database| default| | +| Table| recent_customers| | +| Type| VIEW| | +| View Text|SELECT cust_id, name FROM customer ... | | +| View Original Text|SELECT cust_id, name FROM customer ... | | +| View Schema Mode| COMPENSATION| | +| View Catalog and Namespace| spark_catalog.default | | +| View Query Output Columns| [`cust_id`, `name`] | | +| SQL Path| spark_catalog.default, system.builtin| | ++----------------------------+---------------------------------------+--------+ ``` ### Related Statements @@ -281,3 +309,4 @@ DESC FORMATTED customer AS JSON; * [DESCRIBE DATABASE](sql-ref-syntax-aux-describe-database.html) * [DESCRIBE QUERY](sql-ref-syntax-aux-describe-query.html) * [DESCRIBE FUNCTION](sql-ref-syntax-aux-describe-function.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index 9d8bf47844724..2c90c69c3fd59 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -38,6 +38,12 @@ CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name Specifies the name of the database to be created. + > Note: avoid naming a database `session` or `builtin`. The catalog API accepts both names, + > but the 2-part forms `session.x` and `builtin.x` are interpreted as the temporary and + > built-in namespaces respectively, which hides a persistent database with one of these + > names. See + > [Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions). + * **IF NOT EXISTS** Creates a database with the given name if it does not exist. If a database with the same name already exists, nothing will happen. @@ -85,3 +91,4 @@ DESCRIBE DATABASE EXTENDED customer_db; * [DESCRIBE DATABASE](sql-ref-syntax-aux-describe-database.html) * [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-ddl-create-function.md b/docs/sql-ref-syntax-ddl-create-function.md index e0e2545f5ee3f..2565870494410 100644 --- a/docs/sql-ref-syntax-ddl-create-function.md +++ b/docs/sql-ref-syntax-ddl-create-function.md @@ -50,8 +50,9 @@ CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ] * **TEMPORARY** Indicates the scope of function being created. When `TEMPORARY` is specified, the - created function is valid and visible in the current session. No persistent - entry is made in the catalog for these kind of functions. + created function is valid and visible in the current session. Temporary functions live in the + per-session `system.session` namespace. No persistent entry is made in the catalog for these + kind of functions. * **IF NOT EXISTS** @@ -62,9 +63,19 @@ CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ] * **function_name** - Specifies a name of function to be created. The function name may be optionally qualified with a database name. + Specifies a name of function to be created. - **Syntax:** `[ database_name. ] function_name` + * For a **permanent** function the name may be optionally qualified with a database name + (or a catalog and database). If the name is not qualified the function is created in the + current schema. + + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` + + * For a **temporary** function the name may be optionally qualified with the session schema + (`session` or `system.session`). Any other qualifier is rejected with + `INVALID_TEMP_OBJ_QUALIFIER`. + + **Syntax:** `[ { session | system.session } . ] function_name` * **class_name** diff --git a/docs/sql-ref-syntax-ddl-create-sql-function.md b/docs/sql-ref-syntax-ddl-create-sql-function.md index 649cd895a1974..1f7da5f433def 100644 --- a/docs/sql-ref-syntax-ddl-create-sql-function.md +++ b/docs/sql-ref-syntax-ddl-create-sql-function.md @@ -58,7 +58,10 @@ characteristic - **TEMPORARY** - The scope of the function being created. When you specify `TEMPORARY`, the created function is valid and visible in the current session. No persistent entry is made in the catalog. + The scope of the function being created. When you specify `TEMPORARY`, the created function is + valid and visible in the current session. Temporary functions live in the per-session + `system.session` namespace and are dropped when the session ends. No persistent entry is made in + the catalog. - **IF NOT EXISTS** @@ -66,10 +69,30 @@ characteristic - **function_name** - A name for the function. For a permanent function, you can optionally qualify the function name, or it will be created under the current catalog and namespace. - If the name is not qualified the permanent function is created in the current schema. + A name for the function. - **Syntax:** `[ database_name. ] function_name` + * For a **permanent** function, you can optionally qualify the function name with a database name + (or a catalog and database). If the name is not qualified the permanent function is created in + the current schema. + + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` + + * For a **temporary** function, you can optionally qualify the function name with the session + schema (`session` or `system.session`). Any other qualifier — including + `system.builtin`, the current schema, or an arbitrary database name — is rejected with + `INVALID_TEMP_OBJ_QUALIFIER`. For example, `CREATE TEMPORARY FUNCTION session.f ...` and + `CREATE TEMPORARY FUNCTION system.session.f ...` are accepted. + + **Syntax:** `[ { session | system.session } . ] function_name` + + The function name must be unique among all routines (procedures and functions) in its schema. + +> Note: a SQL UDF captures the SQL Path that was in effect when `CREATE FUNCTION` ran. When the +> function is invoked, the body resolves against that frozen path, not the invoker's current path. +> Inside the body, `current_schema()` and `current_path()` still reflect the invoker's context. +> See [Name Resolution](sql-ref-name-resolution.html#sql-path). +> Use [DESCRIBE FUNCTION EXTENDED](sql-ref-syntax-aux-describe-function.html) to inspect the +> captured path. - **function_parameter** @@ -296,8 +319,76 @@ characteristic Returns: INT ``` +### Create a temporary SQL function with a session qualifier + +```sql +-- Unqualified, `session`-qualified, and `system.session`-qualified names all create the same +-- temporary function in the per-session `system.session` namespace. +> CREATE TEMPORARY FUNCTION add_one(x INT) RETURNS INT RETURN x + 1; + +> CREATE OR REPLACE TEMPORARY FUNCTION session.add_one(x INT) RETURNS INT + RETURN x + 1; + +> CREATE OR REPLACE TEMPORARY FUNCTION system.session.add_one(x INT) RETURNS INT + RETURN x + 1; + +-- All three names refer to the same temporary function: +> SELECT add_one(1), session.add_one(1), system.session.add_one(1); + 2 2 2 + +-- DROP TEMPORARY FUNCTION accepts the same qualifiers: +> DROP TEMPORARY FUNCTION session.add_one; + +-- Any other qualifier on a TEMPORARY function is rejected. +> CREATE TEMPORARY FUNCTION mydb.bad_temp() RETURNS INT RETURN 1; + [INVALID_TEMP_OBJ_QUALIFIER] qualifier `mydb` is not allowed for temporary FUNCTION ... + +> CREATE TEMPORARY FUNCTION system.builtin.bad_temp() RETURNS INT RETURN 1; + [INVALID_TEMP_OBJ_QUALIFIER] qualifier `system`.`builtin` is not allowed for temporary FUNCTION ... +``` + +### Frozen SQL Path + +A SQL UDF captures the SQL Path that is in effect at `CREATE FUNCTION` time. The body resolves +against that frozen path on every invocation, even if the caller's session has set a different +PATH. See [Name Resolution](sql-ref-name-resolution.html#sql-path). + +```sql +> SET spark.sql.path.enabled = true; + +> CREATE SCHEMA path_a; +> CREATE SCHEMA path_b; +> CREATE TABLE path_a.t USING parquet AS SELECT 10 AS id; +> CREATE TABLE path_b.t USING parquet AS SELECT 20 AS id; + +-- The PATH at CREATE FUNCTION time points at path_a, so unqualified `t` in the body binds to +-- path_a.t. +> SET PATH = spark_catalog.path_a, system.builtin; +> CREATE FUNCTION default.frozen_fn() RETURNS INT + RETURN (SELECT MAX(id) FROM t); + +-- Flip the live PATH. The function body still resolves `t` against the frozen path. +> SET PATH = spark_catalog.path_b, system.builtin; + +-- A bare query follows the LIVE path: +> SELECT MAX(id) FROM t; + 20 + +-- The function body follows its FROZEN path: +> SELECT default.frozen_fn(); + 10 + +-- DESCRIBE FUNCTION EXTENDED shows the captured path: +> DESC FUNCTION EXTENDED default.frozen_fn; + Function: spark_catalog.default.frozen_fn + ... + SQL Path: spark_catalog.path_a, system.builtin +``` + ### Related Statements * [SHOW FUNCTIONS](sql-ref-syntax-aux-show-functions.html) * [DESCRIBE FUNCTION](sql-ref-syntax-aux-describe-function.html) * [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html) +* [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-ddl-create-view.md b/docs/sql-ref-syntax-ddl-create-view.md index 2d832636b38fc..59434b76c7e2a 100644 --- a/docs/sql-ref-syntax-ddl-create-view.md +++ b/docs/sql-ref-syntax-ddl-create-view.md @@ -40,9 +40,11 @@ CREATE [ OR REPLACE ] [ [ GLOBAL ] TEMPORARY ] VIEW [ IF NOT EXISTS ] view_ident * **[ GLOBAL ] TEMPORARY** - TEMPORARY views are session-scoped and will be dropped when session ends - because it skips persisting the definition in the underlying metastore, if any. - GLOBAL TEMPORARY views are tied to a system preserved temporary database `global_temp`. + `TEMPORARY` views are session-scoped and are dropped when the session ends; + no entry is persisted in the underlying metastore. + Temporary views live in the per-session `system.session` namespace. + + `GLOBAL TEMPORARY` views are tied to the system-preserved temporary database `global_temp`. * **IF NOT EXISTS** @@ -51,9 +53,28 @@ CREATE [ OR REPLACE ] [ [ GLOBAL ] TEMPORARY ] VIEW [ IF NOT EXISTS ] view_ident * **view_identifier** - Specifies a view name, which may be optionally qualified with a database name. + Specifies a view name. + + * For a **persistent** view the name may be optionally qualified with a database name (or a + catalog and database). If the name is not qualified the view is created in the current + schema. + + **Syntax:** `[ catalog_name. ] [ database_name. ] view_name` + + * For a **temporary** view the name may be optionally qualified with the session schema + (`session` or `system.session`). Any other qualifier is rejected with + `INVALID_TEMP_OBJ_QUALIFIER`. For example, `CREATE TEMPORARY VIEW session.v ...` and + `CREATE TEMPORARY VIEW system.session.v ...` are accepted; `CREATE TEMPORARY VIEW mydb.v ...` + is not. - **Syntax:** `[ database_name. ] view_name` + **Syntax:** `[ { session | system.session } . ] view_name` + + The fully qualified view name must be unique within its schema. + +> Note: a persistent view captures the SQL Path that was in effect when `CREATE VIEW` ran. When the +> view is referenced, the body resolves against that frozen path, not the invoker's current path. +> See [Name Resolution](sql-ref-name-resolution.html#sql-path). +> Use [DESCRIBE EXTENDED](sql-ref-syntax-aux-describe-table.html) to inspect the captured path. * **create_view_clauses** @@ -98,8 +119,76 @@ CREATE OR REPLACE VIEW open_orders WITH SCHEMA EVOLUTION AS SELECT * FROM orders WHERE status = 'open'; ``` +### Create a temporary view with a session qualifier + +```sql +-- Unqualified, `session`-qualified, and `system.session`-qualified names all create the same +-- temporary view in the per-session `system.session` namespace. +CREATE TEMPORARY VIEW recent_orders + AS SELECT * FROM orders WHERE order_date > current_date - INTERVAL 7 DAYS; + +CREATE OR REPLACE TEMPORARY VIEW session.recent_orders + AS SELECT * FROM orders WHERE order_date > current_date - INTERVAL 7 DAYS; + +CREATE OR REPLACE TEMPORARY VIEW system.session.recent_orders + AS SELECT * FROM orders WHERE order_date > current_date - INTERVAL 7 DAYS; + +-- All three names address the same temporary view: +SELECT count(*) FROM recent_orders; +SELECT count(*) FROM session.recent_orders; +SELECT count(*) FROM system.session.recent_orders; + +-- DROP VIEW accepts the same qualifiers (there is no DROP TEMPORARY VIEW form): +DROP VIEW session.recent_orders; + +-- Any other qualifier on a TEMPORARY view is rejected. +CREATE TEMPORARY VIEW mydb.bad_temp AS SELECT 1; + [INVALID_TEMP_OBJ_QUALIFIER] qualifier `mydb` is not allowed for temporary VIEW ... + +CREATE TEMPORARY VIEW system.builtin.bad_temp AS SELECT 1; + [INVALID_TEMP_OBJ_QUALIFIER] qualifier `system`.`builtin` is not allowed for temporary VIEW ... +``` + +### Frozen SQL Path + +A persistent view captures the SQL Path that is in effect at `CREATE VIEW` time. The view body +resolves against that frozen path on every reference, even when the caller's session has set a +different PATH. See [Name Resolution](sql-ref-name-resolution.html#sql-path). + +```sql +> SET spark.sql.path.enabled = true; + +> CREATE SCHEMA views_a; +> CREATE SCHEMA views_b; +> CREATE TABLE views_a.t USING parquet AS SELECT 1 AS id; +> CREATE TABLE views_b.t USING parquet AS SELECT 2 AS id; + +-- The PATH at CREATE VIEW time points at views_a, so unqualified `t` in the view body binds to +-- views_a.t. +> SET PATH = spark_catalog.views_a, system.builtin; +> CREATE VIEW default.v_frozen AS SELECT id FROM t; + +-- Flip the live PATH. The view body still resolves `t` against the frozen path. +> SET PATH = spark_catalog.views_b, system.builtin; + +-- A bare query follows the LIVE path: +> SELECT id FROM t; + 2 + +-- The view body follows its FROZEN path: +> SELECT id FROM default.v_frozen; + 1 + +-- DESCRIBE EXTENDED shows the captured path: +> DESCRIBE EXTENDED default.v_frozen; + ... + SQL Path spark_catalog.views_a, system.builtin +``` + ### Related Statements * [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html) * [DROP VIEW](sql-ref-syntax-ddl-drop-view.html) * [SHOW VIEWS](sql-ref-syntax-aux-show-views.html) +* [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-ddl-drop-function.md b/docs/sql-ref-syntax-ddl-drop-function.md index bef31d74afcff..ebf5b01404a19 100644 --- a/docs/sql-ref-syntax-ddl-drop-function.md +++ b/docs/sql-ref-syntax-ddl-drop-function.md @@ -34,14 +34,31 @@ DROP [ TEMPORARY ] FUNCTION [ IF EXISTS ] function_name * **function_name** - Specifies the name of an existing function. The function name may be - optionally qualified with a database name. + Specifies the name of an existing function. Whether `function_name` refers to a temporary + function or a persistent function is selected by the `TEMPORARY` keyword, not by the + identifier — without `TEMPORARY` the name always targets a persistent function (even if + you write `session.f` or `system.session.f`, in which case Spark looks for a persistent + function in a schema literally named `session`). - **Syntax:** `[ database_name. ] function_name` + * With `TEMPORARY`: the name may be optionally qualified with the session schema + (`session` or `system.session`); for example, `DROP TEMPORARY FUNCTION f`, + `DROP TEMPORARY FUNCTION session.f`, and `DROP TEMPORARY FUNCTION system.session.f` all + drop the same temporary function. + + **Syntax:** `[ { session | system.session } . ] function_name` + + * Without `TEMPORARY`: the name may be optionally qualified with a database name (or a catalog + and database) and resolves to a persistent function in that schema. + + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` + + The built-in namespace `system.builtin` cannot be dropped: `DROP FUNCTION system.builtin.abs` + raises `FORBIDDEN_OPERATION`. * **TEMPORARY** - Should be used to delete the `TEMPORARY` function. + Required to drop a temporary function. Without `TEMPORARY`, `DROP FUNCTION` only considers + persistent functions. * **IF EXISTS** diff --git a/docs/sql-ref-syntax-ddl-drop-view.md b/docs/sql-ref-syntax-ddl-drop-view.md index 5b680d7f907e0..eca58c840e436 100644 --- a/docs/sql-ref-syntax-ddl-drop-view.md +++ b/docs/sql-ref-syntax-ddl-drop-view.md @@ -23,6 +23,14 @@ license: | `DROP VIEW` removes the metadata associated with a specified view from the catalog. +Unlike `DROP FUNCTION`, `DROP VIEW` has no `TEMPORARY` keyword. The choice between a temporary +view and a persistent view is driven by the identifier alone: + +* An unqualified `view_name` first matches a temporary view, then a persistent view in + the current schema. +* A name qualified with `session` or `system.session` targets only the temporary view scope. +* Any other qualifier targets only persistent views. + ### Syntax ```sql @@ -37,9 +45,18 @@ DROP VIEW [ IF EXISTS ] view_identifier * **view_identifier** - Specifies the view name to be dropped. The view name may be optionally qualified with a database name. + Specifies the view name to be dropped. + + * For a **persistent** view the name may be optionally qualified with a database name + (or a catalog and database). + + **Syntax:** `[ catalog_name. ] [ database_name. ] view_name` + + * For a **temporary** view the name may be optionally qualified with the session schema + (`session` or `system.session`); for example, `DROP VIEW v`, `DROP VIEW session.v`, and + `DROP VIEW system.session.v` all drop the same temporary view. - **Syntax:** `[ database_name. ] view_name` + **Syntax:** `[ { session | system.session } . ] view_name` ### Examples diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md index d8c37dc021985..ef1597cd96b41 100644 --- a/docs/sql-ref-syntax.md +++ b/docs/sql-ref-syntax.md @@ -123,6 +123,7 @@ You use SQL scripting to execute procedural logic in SQL. * [REFRESH FUNCTION](sql-ref-syntax-aux-cache-refresh-function.html) * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) * [SET](sql-ref-syntax-aux-conf-mgmt-set.html) + * [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) * [SET VAR](sql-ref-syntax-aux-set-var.html) * [SHOW COLLATIONS](sql-ref-syntax-aux-show-collations.html) * [SHOW COLUMNS](sql-ref-syntax-aux-show-columns.html) From 066802a8c4bf3bc02bf66661b51275c3cc055bc4 Mon Sep 17 00:00:00 2001 From: Serge Rielau Date: Thu, 21 May 2026 18:34:55 +0200 Subject: [PATCH 2/6] [SPARK-56984][DOCS][FOLLOWUP] Restructure SQL Path docs Consolidate the per-object resolution sections in `sql-ref-name-resolution.md` into a single `Object name resolution` section with the three subsections `Fully qualified`, `Partially qualified`, and `Unqualified`. `Table and view resolution` and `Function resolution` are now thin sections that list what each kind of object can resolve to, carry their kind-specific notes (common table expressions for relations) and errors (`TABLE_OR_VIEW_NOT_FOUND` / `UNRESOLVED_ROUTINE`), and keep their existing examples. Move the conceptual material on the SQL Path (what it is, the `system.builtin` / `system.session` namespaces, DML-vs-DDL, `spark.sql.path.enabled` gating, the initial-value-of-PATH rule, and the frozen-path semantics for persistent views and SQL UDFs) into the Description section of `SET PATH`. Make the `Reserved system names` section on the Identifiers page the canonical reference for `system` / `session` / `builtin`, with the mini-path table and the 3-part-bypass rule. Update cross-page links to point at these new homes. Tighten the prose pass-wide on the rewritten sections: drop "worked examples", "in particular", "as special cases", "small two-step", "straightforward", and similar filler; lead the 2-part case with the common rule (`current_catalog`-prepend) and treat the `session` / `builtin` mini-path as the exception; remove the bogus `current_catalog.builtin.x` "special case" bullet from the 3-part case; make `Frozen SQL Path` an inline note rather than a heading. No behavior changes; documentation only. --- docs/sql-migration-guide.md | 4 +- docs/sql-ref-function-current-path.md | 7 +- docs/sql-ref-identifier.md | 29 ++- docs/sql-ref-name-resolution.md | 210 ++++-------------- docs/sql-ref-syntax-aux-conf-mgmt-set-path.md | 148 ++++++------ docs/sql-ref-syntax-aux-describe-function.md | 8 +- docs/sql-ref-syntax-aux-describe-table.md | 2 +- docs/sql-ref-syntax-ddl-create-database.md | 2 +- .../sql-ref-syntax-ddl-create-sql-function.md | 4 +- docs/sql-ref-syntax-ddl-create-view.md | 4 +- 10 files changed, 163 insertions(+), 255 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 7324f3128a018..6a61914780bab 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -31,9 +31,9 @@ license: | - Since Spark 4.2, Spark enables order-independent checksums for shuffle outputs by default to detect data inconsistencies during indeterminate shuffle stage retries. If a checksum mismatch is detected, Spark rolls back and re-executes all succeeding stages that depend on the shuffle output. If rolling back is not possible for some succeeding stages, the job will fail. To restore the previous behavior, set `spark.sql.shuffle.orderIndependentChecksum.enabled` and `spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch` to `false`. - Since Spark 4.2, support for Derby JDBC datasource is deprecated. - Since Spark 4.2, a new default method `mergeWith` has been added to the `CustomTaskMetric` interface. The default implementation sums the two metric values, which is correct for count-type metrics. Data source connector implementations that report non-additive metrics (e.g., maximum, average, compression ratio, or gauge values) must override `mergeWith` to provide correct merge semantics. -- Since Spark 4.2, the synthetic `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and extension-injected functions; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. Both also accept the 2-part shortcuts `builtin.x` and `session.x`. As a result, `builtin.func()` and `session.func()` now resolve to the synthetic system namespaces before any persistent schema literally named `builtin` or `session`. To restore the previous behavior (persistent schema first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `spark_catalog.session.x`). See [Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions). +- Since Spark 4.2, the synthetic `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and extension-injected functions; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. As a result, 2-part references like `builtin.func()` and `session.func()` now follow a mini-path that tries the synthetic system namespace first and the current catalog second, so a persistent schema literally named `builtin` or `session` is no longer reached by `builtin.func()` / `session.func()` when the system namespace contains an object of the same name. To restore the previous behavior (current catalog first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `current_catalog.session.x`). See [Reserved system names](sql-ref-identifier.html#reserved-system-names). - Since Spark 4.2, `CREATE TEMPORARY VIEW`, `CREATE TEMPORARY FUNCTION`, and the corresponding `DROP` statements accept the `session` and `system.session` qualifiers on the object name (in addition to the previously supported unqualified form); for example, `CREATE TEMPORARY VIEW system.session.v AS ...` and `DROP TEMPORARY FUNCTION session.f` are now valid. Any other qualifier on a temporary object is rejected with `INVALID_TEMP_OBJ_QUALIFIER`. -- Spark 4.2 introduces the SQL standard `PATH` feature: the `SET PATH` statement, the `current_path()` function, the path-based resolution of unqualified routines / tables / views, and the configurations `spark.sql.path.enabled` (default `false`) and `spark.sql.defaultPath`. The feature is opt-in; when `spark.sql.path.enabled` is `false`, unqualified resolution falls back to a fixed default path and `SET PATH` is rejected with `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. See [Name Resolution](sql-ref-name-resolution.html#sql-path) and [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). +- Spark 4.2 introduces the SQL standard `PATH` feature: the `SET PATH` statement, the `current_path()` function, the path-based resolution of unqualified routines / tables / views, and the configurations `spark.sql.path.enabled` (default `false`) and `spark.sql.defaultPath`. The feature is opt-in; when `spark.sql.path.enabled` is `false`, unqualified resolution falls back to a fixed default path and `SET PATH` is rejected with `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) and [Name Resolution](sql-ref-name-resolution.html). ## Upgrading from Spark SQL 4.0 to 4.1 diff --git a/docs/sql-ref-function-current-path.md b/docs/sql-ref-function-current-path.md index 24be3e7ec22a0..085e7cd4defe7 100644 --- a/docs/sql-ref-function-current-path.md +++ b/docs/sql-ref-function-current-path.md @@ -20,9 +20,10 @@ license: | --- Returns the effective SQL Path for the current session as a comma-separated string of -qualified namespace names. See [Name Resolution](sql-ref-name-resolution.html#sql-path) for a -description of how the path drives unqualified name resolution and -[`SET PATH`](sql-ref-syntax-aux-conf-mgmt-set-path.html) for how to change it. +qualified namespace names. See [`SET PATH`](sql-ref-syntax-aux-conf-mgmt-set-path.html) for a +description of what the path is, how it is gated, and how to change it, and +[Name Resolution](sql-ref-name-resolution.html) for how the path drives unqualified name +resolution. ### Syntax diff --git a/docs/sql-ref-identifier.md b/docs/sql-ref-identifier.md index 0c768a212f431..c69de9959a4e0 100644 --- a/docs/sql-ref-identifier.md +++ b/docs/sql-ref-identifier.md @@ -54,16 +54,31 @@ An identifier is a string used to identify a database object such as a table, vi ### Reserved system names -A few names have a special meaning in object identifiers and should be avoided as user-defined -names. They are interpreted by Spark's [SQL Path](sql-ref-name-resolution.html#sql-path) and are -documented in detail under -[Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions). +`system`, `session`, and `builtin` have special meaning and should not be used as user-defined +catalog or schema names. | Name | Position | Notes | | :--- | :------- | :---- | -| `system` | catalog | Synthetic catalog hosting `system.builtin` and `system.session`. Registering a v2 catalog under this name (`spark.sql.catalog.system = ...`) is not supported. | -| `builtin` | schema in any catalog | The 2-part form `builtin.x` is interpreted as the synonym `system.builtin.x`. A persistent schema literally named `builtin` is allowed but discouraged; reach it as `current_catalog.builtin.x`. | -| `session` | schema in any catalog | The 2-part form `session.x` is interpreted as the synonym `system.session.x`. A persistent schema literally named `session` is allowed but discouraged; reach it as `current_catalog.session.x`. | +| `system` | catalog | Synthetic catalog hosting `system.builtin` and `system.session`. Not loadable as a v2 catalog plugin; `spark.sql.catalog.system = ...` is unsupported, and the current catalog cannot be `system`. | +| `builtin` | schema | A persistent schema literally named `builtin` is allowed but discouraged: `builtin.x` follows the mini-path below. To reach a persistent `builtin` schema, use `current_catalog.builtin.x`. | +| `session` | schema | A persistent schema literally named `session` is allowed but discouraged: `session.x` follows the mini-path below. To reach a persistent `session` schema, use `current_catalog.session.x`. | + +For 2-part references starting with `builtin` or `session`, Spark chooses the implicit catalog +using the mini-path below and returns the first match: + +| `spark.sql.legacy.persistentCatalogFirst` | Mini-path tried in order | +| :-------------------------------------- | :----------------------- | +| `false` (default) | `system.session.x` / `system.builtin.x`, then `current_catalog.session.x` / `current_catalog.builtin.x` | +| `true` (legacy) | `current_catalog.session.x` / `current_catalog.builtin.x`, then `system.session.x` / `system.builtin.x` | + +3-part names skip the mini-path: `system.session.x` and `system.builtin.x` always target the +system namespace, and `current_catalog.session.x` / `current_catalog.builtin.x` always target the +persistent schema. + +The `system.builtin` and `system.session` namespaces are described in +[SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). Temporary objects in `system.session` are +documented under [CREATE VIEW](sql-ref-syntax-ddl-create-view.html) and +[CREATE FUNCTION (SQL)](sql-ref-syntax-ddl-create-sql-function.html). ### Examples diff --git a/docs/sql-ref-name-resolution.md b/docs/sql-ref-name-resolution.md index 1a3cb3ac5017c..be20bf3872f95 100644 --- a/docs/sql-ref-name-resolution.md +++ b/docs/sql-ref-name-resolution.md @@ -21,100 +21,6 @@ license: | Name resolution is the process by which [identifiers](sql-ref-identifier.html) are resolved to specific column-, field-, parameter-, table-, function-, or variable-references. -## SQL Path - -For unqualified references to functions, tables, views, and session variables Spark walks -an ordered list of namespaces known as the **SQL Path**. The first match along the path wins. - -The path is a list of catalog-qualified schema names. In addition to ordinary persistent schemas it -can refer to two virtual system namespaces: - -- `system.builtin` — the set of built-in functions provided by Spark (such as `abs`, `concat`, - `current_user`, `current_path`, ...). Includes functions injected by `SparkSessionExtensions`. -- `system.session` — the per-session namespace that holds temporary views, temporary functions, - and session variables created in the current session. - -Both system namespaces are special: they cannot be created or dropped, and persistent objects with -these names live in different (`spark_catalog`-qualified) schemas. The 2-part shortcuts -`builtin.name` and `session.name` are accepted as synonyms for `system.builtin.name` and -`system.session.name`. - -The path is observable through the [`current_path()`](sql-ref-function-current-path.html) function. - -### Enabling and setting the path - -The `SET PATH` statement is gated by the `spark.sql.path.enabled` configuration (default `false`). -When `false`, `SET PATH` raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`, but unqualified -resolution still walks a fixed default path and `current_path()` still returns it. - -When `spark.sql.path.enabled` is `true`, you can change the path with -[`SET PATH`](sql-ref-syntax-aux-conf-mgmt-set-path.html), for example: - -```sql -SET PATH = spark_catalog.analytics, spark_catalog.default, system.builtin; -``` - -If `SET PATH` has not been issued in the session, the effective path is the **default path**, -which is either taken from the `spark.sql.defaultPath` configuration (when set) or composed -automatically from `system.builtin`, `system.session`, and the current schema. The same -`DEFAULT_PATH` value is what `SET PATH = DEFAULT_PATH` expands to. See -[How `DEFAULT_PATH` is derived](sql-ref-syntax-aux-conf-mgmt-set-path.html#how-default_path-is-derived) -for the full derivation and how to change it. - -Inside `SET PATH` the following shortcut tokens are accepted: - -| Token | Expands to | -| :---- | :--------- | -| `DEFAULT_PATH` | The default path described above. | -| `SYSTEM_PATH` | `system.builtin` and `system.session`, in the configured order. | -| `PATH` | The current value of the path (useful when appending). | -| `CURRENT_SCHEMA` / `CURRENT_DATABASE` | A virtual marker that resolves to the current schema (`current_catalog.current_schema`) every time the path is consulted. | - -### When the path is consulted - -The path participates only in **DML** (`SELECT`, `INSERT`, `UPDATE`, `DELETE`, `MERGE`, ...) and in -query expressions inside DDL bodies. DDL itself — `CREATE TABLE`, `CREATE VIEW`, -`CREATE FUNCTION`, `DROP ...`, `ALTER ...`, etc. — resolves unqualified object names against the -current catalog and schema (`current_catalog.current_schema`), not the path. This is so that -`CREATE TABLE t` always creates `t` in the current schema regardless of how PATH is set. - -When you create a persistent view or a SQL UDF, Spark captures the effective path at creation time -into the object's metadata. Each time the view or function is invoked its body resolves against -that **frozen path**, not the invoker's current path. Invocations of `current_schema()` and -`current_path()` inside the body still reflect the invoker's context. - -### Reserved names and collisions - -The SQL Path feature relies on three names being treated specially: - -- **`system`** as a catalog. Spark's `system` catalog is a synthetic namespace; it serves the - `system.builtin` and `system.session` entries and is not loadable as a v2 catalog plugin. The - current catalog cannot be `system` and `SET PATH` does not look up `system` through the v2 - catalog API. Registering a v2 catalog under the name `system` - (`spark.sql.catalog.system = ...`) is therefore not supported. - -- **`session`** as a schema name in any catalog. Persistent schemas literally named `session` are - allowed by the catalog API but are discouraged: the unqualified 2-part form `session.x` is - interpreted as the synonym `system.session.x` (a temporary object) by default. To target a - persistent schema called `session`, qualify it with the catalog name - (`spark_catalog.session.x`). - -- **`builtin`** as a schema name in any catalog. Persistent schemas literally named `builtin` are - similarly allowed but discouraged: the unqualified 2-part form `builtin.x` is interpreted as the - synonym `system.builtin.x`. To target a persistent schema called `builtin`, qualify it with the - catalog name (`spark_catalog.builtin.x`). - -These collisions matter only for 2-part names; 1-part lookups always go through the SQL Path, and -3-part names are never ambiguous. - -Two internal configurations let advanced users tune the behavior when collisions exist; ordinary -workloads should not need to change them. - -| Configuration | Purpose | -| :------------ | :------ | -| `spark.sql.legacy.persistentCatalogFirst` | When `true`, the legacy lookup order is used for partially qualified `builtin.x` and `session.x`: the persistent catalog (e.g. `spark_catalog.builtin.x`) is tried first, and only if it does not yield a match does Spark fall back to the synthetic `system.builtin.x` / `system.session.x`. Default `false` (system namespace wins). | -| `spark.sql.functionResolution.sessionOrder` | Controls where the per-session `system.session` namespace sits relative to `system.builtin` and the current persistent schema when assembling the default path. Values: `first` (session, builtin, persistent), `second` (builtin, session, persistent — default), `last` (builtin, persistent, session). Affects both `DEFAULT_PATH` expansion and the unqualified search path reported in `UNRESOLVED_ROUTINE` / `TABLE_OR_VIEW_NOT_FOUND` errors. | - ## Column, field, parameter, and variable resolution Identifiers in expressions can be references to any one of the following: @@ -233,8 +139,8 @@ In detail, resolution of identifiers to a specific reference follows these rules 1. Match the identifier to a session variable name. If the identifier is qualified, the qualifier must be `session` or `system.session`. - If the identifier is unqualified, `system.session` must be present on the SQL Path - (the default path includes it). + If the identifier is unqualified, `system.session` must be present on the + [SQL Path](sql-ref-syntax-aux-conf-mgmt-set-path.html) (the default path includes it). 1. If the identifier is qualified, match to a field or map key of a variable following rule 1.c ### Limitations @@ -353,52 +259,55 @@ This restriction also applies to parameter references in SQL functions. frm.a lat.b func.c ``` -## Table and view resolution - -An identifier in a table reference can be any of the following: +## Object name resolution -- Persistent table or view -- Common table expression (CTE) -- [Temporary view](sql-ref-syntax-ddl-create-view.html) +Tables, views, and functions follow the same resolution rule. It depends on how many parts the +identifier has. -Resolution of an identifier depends on whether it is qualified: +### Fully qualified (3 parts) — `catalog.schema.object` -- **Qualified** +The reference is unique and is looked up in `catalog.schema`. `system.builtin.object` identifies +a built-in function; `system.session.object` identifies a temporary view, function, or session +variable. - If the identifier is fully qualified with three parts: `catalog.schema.relation`, it is unique. +### Partially qualified (2 parts) — `schema.object` - If the identifier consists of two parts: `schema.relation`, it is further qualified with the - result of `SELECT current_catalog()` to make it unique. - As a special case, the schema `session` is implicitly qualified with the catalog `system` and - interpreted as a temporary view. +The identifier is qualified with `current_catalog` — producing +`current_catalog.schema.object` — unless the leading part is `session` (or `builtin`, for +functions). In that case Spark uses the +[mini-path](sql-ref-identifier.html#reserved-system-names) to choose the implicit catalog, +returning the first match: - If the identifier is `system.session.relation`, it targets the temporary view scope only. +| `spark.sql.legacy.persistentCatalogFirst` | Mini-path tried in order | +| :-------------------------------------- | :----------------------- | +| `false` (default) | `system.session.x` / `system.builtin.x`, then `current_catalog.session.x` / `current_catalog.builtin.x` | +| `true` (legacy) | `current_catalog.session.x` / `current_catalog.builtin.x`, then `system.session.x` / `system.builtin.x` | -- **Unqualified** +### Unqualified (1 part) — `object` - 1. **Common table expression** +In queries and DML, Spark walks the [SQL Path](sql-ref-syntax-aux-conf-mgmt-set-path.html) and +returns the first match. In DDL, the identifier is qualified with `current_catalog.current_schema`. - If the reference is within the scope of a `WITH` clause, match the identifier to a CTE - starting with the immediately containing `WITH` clause and moving outwards from there. +> Note: persistent views and SQL UDFs capture the SQL Path at `CREATE` time. When the view or +> function is invoked, its body resolves names — tables, views, and functions — +> against that frozen path, not the invoker's current path. `current_schema()` and +> `current_path()` inside the body still return the invoker's context. See +> [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). - 1. **SQL Path walk** - - For each entry on the [SQL Path](#sql-path) in order: +## Table and view resolution - - When the entry is `system.session`, attempt to match the identifier as a temporary view. - - Otherwise, fully qualify the identifier with the entry (`catalog.schema.relation`) and look - it up as a persistent relation. +A table reference can be a persistent table or view, a temporary view, or a common table +expression (CTE). - The first match wins. If no entry yields a match, the relation is unresolved. +Resolution follows [Object name resolution](#object-name-resolution), with one addition for +unqualified references: when the reference is inside a `WITH` clause, Spark first matches the +identifier against CTEs from the innermost `WITH` outward. If no CTE matches, Spark walks the +SQL Path. -If the relation cannot be resolved to any table, view, or CTE, Spark raises a -`TABLE_OR_VIEW_NOT_FOUND` error. The error includes the effective search path, for example +If the relation cannot be resolved, Spark raises `TABLE_OR_VIEW_NOT_FOUND`. The error includes +the effective search path, for example `searchPath = [system.builtin, system.session, spark_catalog.default]`. -> Note: persistent views capture their creation-time SQL Path. When a persistent view is -> referenced, the body resolves against the frozen path rather than the invoker's current path; -> see [SQL Path](#sql-path). - ### Examples ```sql @@ -483,50 +392,19 @@ If the relation cannot be resolved to any table, view, or CTE, Spark raises a ## Function resolution -A function reference is recognized by the mandatory trailing set of parentheses. - -It can resolve to: +A function reference is recognized by the trailing parentheses. It can resolve to: -- A built-in function provided by Spark or a `SparkSessionExtensions` injection - (`system.builtin`), -- A temporary user-defined function scoped to the current session (`system.session`), or -- A persistent user-defined function stored in a catalog schema. +- A built-in function (`system.builtin`), including functions injected by + `SparkSessionExtensions`. +- A temporary function in the current session (`system.session`). +- A persistent function in a catalog schema. -Resolution of a function name depends on whether it is qualified: +Resolution follows [Object name resolution](#object-name-resolution). -- **Qualified** - - If the name is fully qualified with three parts: `catalog.schema.function`, it is unique. - - If the name consists of two parts: `schema.function`, it is further qualified with the result of - `SELECT current_catalog()` to make it unique. As a special case, the 2-part shortcuts - `builtin.function` and `session.function` are accepted as synonyms for `system.builtin.function` - and `system.session.function` respectively. - - The function is then looked up in the resulting namespace. - -- **Unqualified** - - For unqualified function names Spark walks the [SQL Path](#sql-path) and returns the first match: - - 1. For each entry on the path in order: - - - When the entry is `system.builtin`, attempt to match against the set of built-in functions. - - When the entry is `system.session`, attempt to match against temporary functions. - - Otherwise, fully qualify the function name with the entry (`catalog.schema.function`) and - look it up as a persistent function. - - 2. The first match wins. If no entry yields a match, the function is unresolved. - -If the function cannot be resolved Spark raises an `UNRESOLVED_ROUTINE` error. The error includes -the effective search path, for example +If the function cannot be resolved, Spark raises `UNRESOLVED_ROUTINE`. The error includes the +effective search path, for example `searchPath = [system.builtin, system.session, spark_catalog.default]`. -> Note: SQL user-defined functions (`CREATE FUNCTION`) capture their creation-time SQL Path. When -> the function is invoked, the body resolves against the frozen path rather than the invoker's -> current path; see [SQL Path](#sql-path). Inside the body, `current_schema()` and -> `current_path()` still reflect the invoker's context. - ### Examples ```sql diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md index 68e873eef80b5..1d049c80388fc 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md @@ -21,16 +21,51 @@ license: | ### Description -`SET PATH` changes the **SQL Path** of the current session. The SQL Path is the ordered list of -namespaces Spark walks when resolving unqualified references to functions, tables, views, and -session variables. The first match along the path wins. See -[Name Resolution](sql-ref-name-resolution.html#sql-path) for the conceptual overview. +`SET PATH` changes the **SQL Path** of the current session. -The path is observable through the [`current_path()`](sql-ref-function-current-path.html) function. +The SQL Path is an ordered list of catalog-qualified schema names that Spark walks when +resolving unqualified references to functions, tables, views, and session variables in queries +and DML (`SELECT`, `INSERT`, `UPDATE`, `DELETE`, `MERGE`). The first match wins. DDL +(`CREATE TABLE`, `CREATE VIEW`, `CREATE FUNCTION`, `DROP`, `ALTER`, ...) resolves unqualified +object names against `current_catalog.current_schema`, not the path; so `CREATE TABLE t` always +creates `t` in the current schema regardless of the path. -`SET PATH` is gated by `spark.sql.path.enabled`. When that configuration is `false` (the default), -`SET PATH` raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. Unqualified resolution still uses -a fixed default path, and `current_path()` still returns it. +The path can include two virtual namespaces in the `system` catalog: + +- `system.builtin` — built-in functions, including those injected by + `SparkSessionExtensions`. +- `system.session` — temporary views, temporary functions, and session variables in the + current session. + +`SET PATH` is gated by `spark.sql.path.enabled`. When it is `false` (the default), `SET PATH` +raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. Unqualified resolution and +[`current_path()`](sql-ref-function-current-path.html) still use the default path. + +The path within a session: + +1. The initial value of `PATH` is `DEFAULT_PATH`. +2. `DEFAULT_PATH` is either: + + - the value of `spark.sql.defaultPath`, when that configuration is set; or + - a built-in value composed of `system.builtin`, `system.session`, and the current schema, + when `spark.sql.defaultPath` is empty. + +3. To override the initial value, set `spark.sql.defaultPath`, or set + `spark.sql.functionResolution.sessionOrder` to reorder the entries of the built-in value in + step 2. + +See the [`DEFAULT_PATH` parameter](#parameters) for the exact derivation rules. + +A `SET PATH` is scoped to the current session and is lost when the session ends. To revert +mid-session, run `SET PATH = DEFAULT_PATH`. Cloned sessions inherit the parent's path at clone +time; later changes in the child do not propagate back. + +Persistent views and SQL UDFs capture the path at `CREATE` time into the object's metadata. +Each invocation resolves the body against that frozen path, not the invoker's current path; +`current_schema()` and `current_path()` inside the body still return the invoker's context. + +The leading names `session` and `builtin` have special meaning in 2-part references; see +[Reserved system names](sql-ref-identifier.html#reserved-system-names). ### Syntax @@ -53,10 +88,37 @@ schema_name * **`DEFAULT_PATH`** - Expands to the value of the `spark.sql.defaultPath` configuration, or, when that configuration - is empty, to the spark-built-in default: `system.builtin`, `system.session`, and the current - schema, interleaved per `spark.sql.functionResolution.sessionOrder` - (`first` / `second` (default) / `last`). + Expands to the session's default path. The default path has two layers: + + 1. If `spark.sql.defaultPath` is set to a non-empty value, that value is parsed using the same + grammar as `SET PATH` (with one restriction: the `PATH` keyword is not allowed inside the + conf value, since it would be self-referential). + + The conf value is validated for syntax at the time it is set; an invalid value is rejected. + Static duplicates inside the conf are tolerated (unlike interactive `SET PATH`, which + rejects them) so a later `USE SCHEMA` cannot turn a previously valid default into a runtime + error. A `DEFAULT_PATH` token inside the conf value resolves to the spark-built-in default + below (cycle break) rather than recursing. + + 2. If `spark.sql.defaultPath` is empty (the factory setting), the spark-built-in default + applies: `system.builtin`, `system.session`, and the current schema + (`current_catalog.current_schema`), interleaved per + `spark.sql.functionResolution.sessionOrder`: + + | `sessionOrder` | Order produced | + | :------------- | :------------- | + | `first` | `system.session`, `system.builtin`, `current_schema` | + | `second` (default) | `system.builtin`, `system.session`, `current_schema` | + | `last` | `system.builtin`, `current_schema`, `system.session` | + + To change the default path, set `spark.sql.defaultPath` via any of the usual mechanisms + (`SET spark.sql.defaultPath = ...` at runtime, `--conf` on `spark-submit`, `SparkConf`, or + `spark-defaults.conf`); clear it with `RESET spark.sql.defaultPath` to return to the + spark-built-in default. + + `spark.sql.functionResolution.sessionOrder` and `spark.sql.legacy.persistentCatalogFirst` are + internal configurations intended for advanced use; ordinary workloads should leave them at + their defaults. * **`SYSTEM_PATH`** @@ -85,58 +147,15 @@ schema_name Identifier quoting follows the usual rules; backtick-quoted parts that contain a dot are preserved, for example `spark_catalog.` + `` `sch.b` ``. -### Description +### Semantics -* Duplicate entries are detected after expansion and raise `DUPLICATE_SQL_PATH_ENTRY`. - Comparisons honor the session's case sensitivity setting. -* `CURRENT_SCHEMA` and `CURRENT_DATABASE` are aliases and are flagged as a duplicate if both are - listed. +* Setting the path takes effect immediately. * Identifier case is preserved in storage and in `current_path()` output. -* Setting the path takes effect immediately. The change is scoped to the current session and is - reset by `RESET` or by closing the session. Cloned sessions inherit the parent's path - at clone time, but later changes in a child session do not propagate back. - -#### How `DEFAULT_PATH` is derived - -`DEFAULT_PATH` is what `SET PATH = DEFAULT_PATH` expands to, and it is also the effective path of -a session that has never issued `SET PATH`. It has two layers: - -1. If `spark.sql.defaultPath` is set to a non-empty value, that value is parsed using the same - grammar as `SET PATH` (with one restriction: the `PATH` keyword is not allowed inside the conf - value, since it would be self-referential). The parsed value is `DEFAULT_PATH`. - - The conf value is validated for syntax at the time it is set; an invalid value is rejected - with `Cannot modify the value of the SQL config 'spark.sql.defaultPath'`. Static duplicates - inside the conf are tolerated (unlike interactive `SET PATH`, which rejects them) so a later - `USE SCHEMA` cannot turn a previously valid default into a runtime error. - - A `DEFAULT_PATH` token *inside* the conf value resolves to the spark-built-in default below - (cycle break) rather than recursing. - -2. If `spark.sql.defaultPath` is empty (the factory default), `DEFAULT_PATH` is the - **spark-built-in default**: `system.builtin`, `system.session`, and the current schema - (`current_catalog.current_schema`), interleaved per - `spark.sql.functionResolution.sessionOrder`: - - | `sessionOrder` | Order produced | - | :------------- | :------------- | - | `first` | `system.session`, `system.builtin`, `current_schema` | - | `second` (default) | `system.builtin`, `system.session`, `current_schema` | - | `last` | `system.builtin`, `current_schema`, `system.session` | - -To change `DEFAULT_PATH`, set the conf via any of the usual mechanisms: - -* In a session: `SET spark.sql.defaultPath = system.session, system.builtin, current_schema;` -* In static configuration: pass `--conf spark.sql.defaultPath=...` to `spark-submit`, set it in - `SparkConf`, or add it to `spark-defaults.conf`. -* To return to the spark-built-in default, clear the conf with - `RESET spark.sql.defaultPath` (or set it to an empty string). - -`spark.sql.functionResolution.sessionOrder` and `spark.sql.legacy.persistentCatalogFirst` are -internal configurations intended for advanced use; ordinary workloads should leave them at their -defaults. See [Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions). +* Duplicate entries are detected after expansion and raise `DUPLICATE_SQL_PATH_ENTRY`. + Comparisons honor the session's case sensitivity setting. Because `CURRENT_DATABASE` is an + alias for `CURRENT_SCHEMA`, listing both is flagged as a duplicate. -#### Error conditions +### Error conditions | Condition | Cause | | :-------- | :---- | @@ -144,13 +163,6 @@ defaults. See [Reserved names and collisions](sql-ref-name-resolution.html#reser | `INVALID_SQL_PATH_SCHEMA_REFERENCE` | A `schema_name` was given with fewer than two parts. | | `DUPLICATE_SQL_PATH_ENTRY` | Two entries collapsed to the same concrete namespace after expansion. | -#### Reserved names - -`system` is reserved as a catalog name and `session` / `builtin` are discouraged as schema names -because they collide with the 2-part shortcuts for the system namespaces. See -[Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions) for -the details and for the internal configurations that tune the behavior when collisions exist. - ### Examples ```sql diff --git a/docs/sql-ref-syntax-aux-describe-function.md b/docs/sql-ref-syntax-aux-describe-function.md index bc23daebc16ef..3dca36833c3d4 100644 --- a/docs/sql-ref-syntax-aux-describe-function.md +++ b/docs/sql-ref-syntax-aux-describe-function.md @@ -27,7 +27,7 @@ function name, implementing class, and usage details. For [SQL user-defined functions](sql-ref-syntax-ddl-create-sql-function.html) the output describes the function signature (input parameters, return type/columns) and, with `EXTENDED`, the function body, characteristics, and the frozen -[SQL Path](sql-ref-name-resolution.html#sql-path) that was captured at creation time. +[SQL Path](sql-ref-syntax-aux-conf-mgmt-set-path.html) that was captured at creation time. If the optional `EXTENDED` option is specified, the basic metadata is returned along with the extended information. @@ -44,8 +44,10 @@ extended information. Specifies a name of an existing function. The function name follows the regular [name resolution](sql-ref-name-resolution.html#function-resolution) rules: unqualified - names walk the SQL Path; 2- and 3-part names may use the `system.builtin` / `system.session` - namespaces (or their shortcuts `builtin` / `session`). + names walk the SQL Path; 3-part names target the chosen `catalog.schema` directly + (including the system namespaces `system.builtin` and `system.session`); 2-part names that + lead with `builtin` or `session` follow a mini-path across the system namespace and the + current catalog. **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` diff --git a/docs/sql-ref-syntax-aux-describe-table.md b/docs/sql-ref-syntax-aux-describe-table.md index b86a7bafbd793..395fea1f0bcb3 100644 --- a/docs/sql-ref-syntax-aux-describe-table.md +++ b/docs/sql-ref-syntax-aux-describe-table.md @@ -277,7 +277,7 @@ DESC FORMATTED customer AS JSON; -- DESCRIBE EXTENDED on a view emits view-specific rows. When `spark.sql.path.enabled` is true, -- the output also includes the frozen `SQL Path` that the view body resolves against; see --- [Name Resolution](sql-ref-name-resolution.html#sql-path). +-- [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). SET spark.sql.path.enabled = true; SET PATH = spark_catalog.default, system.builtin; CREATE VIEW recent_customers AS diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index 2c90c69c3fd59..7228fe6f85869 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -42,7 +42,7 @@ CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name > but the 2-part forms `session.x` and `builtin.x` are interpreted as the temporary and > built-in namespaces respectively, which hides a persistent database with one of these > names. See - > [Reserved names and collisions](sql-ref-name-resolution.html#reserved-names-and-collisions). + > [Reserved system names](sql-ref-identifier.html#reserved-system-names). * **IF NOT EXISTS** diff --git a/docs/sql-ref-syntax-ddl-create-sql-function.md b/docs/sql-ref-syntax-ddl-create-sql-function.md index 1f7da5f433def..a4bb2c1e3daa1 100644 --- a/docs/sql-ref-syntax-ddl-create-sql-function.md +++ b/docs/sql-ref-syntax-ddl-create-sql-function.md @@ -90,7 +90,7 @@ characteristic > Note: a SQL UDF captures the SQL Path that was in effect when `CREATE FUNCTION` ran. When the > function is invoked, the body resolves against that frozen path, not the invoker's current path. > Inside the body, `current_schema()` and `current_path()` still reflect the invoker's context. -> See [Name Resolution](sql-ref-name-resolution.html#sql-path). +> See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). > Use [DESCRIBE FUNCTION EXTENDED](sql-ref-syntax-aux-describe-function.html) to inspect the > captured path. @@ -351,7 +351,7 @@ characteristic A SQL UDF captures the SQL Path that is in effect at `CREATE FUNCTION` time. The body resolves against that frozen path on every invocation, even if the caller's session has set a different -PATH. See [Name Resolution](sql-ref-name-resolution.html#sql-path). +PATH. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). ```sql > SET spark.sql.path.enabled = true; diff --git a/docs/sql-ref-syntax-ddl-create-view.md b/docs/sql-ref-syntax-ddl-create-view.md index 59434b76c7e2a..d0fd9b98e5b6c 100644 --- a/docs/sql-ref-syntax-ddl-create-view.md +++ b/docs/sql-ref-syntax-ddl-create-view.md @@ -73,7 +73,7 @@ CREATE [ OR REPLACE ] [ [ GLOBAL ] TEMPORARY ] VIEW [ IF NOT EXISTS ] view_ident > Note: a persistent view captures the SQL Path that was in effect when `CREATE VIEW` ran. When the > view is referenced, the body resolves against that frozen path, not the invoker's current path. -> See [Name Resolution](sql-ref-name-resolution.html#sql-path). +> See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). > Use [DESCRIBE EXTENDED](sql-ref-syntax-aux-describe-table.html) to inspect the captured path. * **create_view_clauses** @@ -153,7 +153,7 @@ CREATE TEMPORARY VIEW system.builtin.bad_temp AS SELECT 1; A persistent view captures the SQL Path that is in effect at `CREATE VIEW` time. The view body resolves against that frozen path on every reference, even when the caller's session has set a -different PATH. See [Name Resolution](sql-ref-name-resolution.html#sql-path). +different PATH. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). ```sql > SET spark.sql.path.enabled = true; From 8f6d8e2a50c2a6d01c285cd629938b04736f7e75 Mon Sep 17 00:00:00 2001 From: Serge Rielau Date: Fri, 22 May 2026 11:07:03 +0200 Subject: [PATCH 3/6] [SPARK-56984][DOCS][FOLLOWUP] Tighten SQL Path doc copy and accuracy Iterative copy-edit pass on the documentation introduced for the SQL Path feature, plus a handful of small accuracy fixes uncovered during review: - `sql-ref-name-resolution.md`: the page intro and the per-object-kind sections are slimmed down. The DML/queries vs DDL rule is folded into the `Unqualified (1 part)` paragraph. Per-object error references (`TABLE_OR_VIEW_NOT_FOUND` / `UNRESOLVED_ROUTINE`) live with the corresponding sections. - `sql-ref-syntax-aux-conf-mgmt-set-path.md`: the Description is reworked into prose ("the initial value of PATH is DEFAULT_PATH ..."); the Syntax block now follows the rest of the Spark SQL reference (square brackets, alternation with braces, ellipsis for repetition, production name on its own line); the grammar commits only to two-level `catalog.schema` references; the `spark.sql.functionResolution.sessionOrder` configuration is de-emphasized; `RESET PATH` wording is dropped in favor of describing the actual revert mechanism (`SET PATH = DEFAULT_PATH`); two applied examples are added (appending a shared-UDF schema; dropping `system.session` to force explicit qualification). - `sql-ref-function-current-path.md`: the Syntax block matches the auto-generated style (`current_path()`), with the no-parens form noted briefly under Arguments. The "still works when disabled" disclaimer and per-page `spark.sql.path.enabled` toggle are removed. - `sql-ref-identifier.md`: the *Reserved system names* section is rewritten to describe the actual tie-breaker behavior (per-object name collisions, not whole-schema hiding) and to use real catalog names in examples instead of the meta-syntactic `current_catalog.x`. - `sql-ref-functions-builtin.md`: a one-paragraph intro explains that built-in functions live in `system.builtin` and can be referenced unambiguously via the fully qualified name. - `sql-ref-syntax-aux-describe-table.md`: the JSON schema gains a `sql_path` entry; the worked view example is generic (the SQL Path row appears in the output but is no longer the headline); the `DESC FORMATTED ... AS JSON` outputs are pretty-printed. - `sql-ref-syntax-ddl-create-view.md` / `sql-ref-syntax-ddl-create-sql-function.md`: the frozen-path note moves from the object-name parameter (where it didn't belong) to the query/expression parameter. Both pages now state that a persistent view / SQL UDF cannot reference temporary views, temporary functions, or session variables. - `sql-ref-syntax-ddl-drop-function.md` / `sql-ref-syntax-ddl-drop-view.md`: the parameter prose is shortened; DROP VIEW gains a worked example of using `session.v` to drop a temporary view that shadows a persistent one. The stale AnalysisException output is replaced with `Error: TABLE_OR_VIEW_NOT_FOUND`. - `sql-ref-syntax-ddl-create-database.md`: the discouraged-name note shrinks to a one-line pointer to *Reserved system names*. - `sql-ref-syntax.md`: TOC links `CREATE FUNCTION (SQL)` explicitly. - `sql-migration-guide.md`: the 4.1 -> 4.2 entry uses `spark_catalog.session.x` (a real catalog name) instead of the meta-syntactic `current_catalog.session.x`. No new behavior; documentation copy and accuracy only. --- docs/sql-migration-guide.md | 2 +- docs/sql-ref-function-current-path.md | 17 +--- docs/sql-ref-functions-builtin.md | 4 + docs/sql-ref-identifier.md | 24 +++--- docs/sql-ref-name-resolution.md | 18 ++--- docs/sql-ref-syntax-aux-conf-mgmt-set-path.md | 79 +++++++++---------- docs/sql-ref-syntax-aux-describe-function.md | 4 +- docs/sql-ref-syntax-aux-describe-table.md | 63 +++++++++++++-- docs/sql-ref-syntax-ddl-create-database.md | 5 +- .../sql-ref-syntax-ddl-create-sql-function.md | 18 ++--- docs/sql-ref-syntax-ddl-create-view.md | 15 ++-- docs/sql-ref-syntax-ddl-drop-function.md | 23 ++---- docs/sql-ref-syntax-ddl-drop-view.md | 35 ++++---- docs/sql-ref-syntax.md | 3 +- 14 files changed, 156 insertions(+), 154 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 6a61914780bab..10b4652b5db04 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -31,7 +31,7 @@ license: | - Since Spark 4.2, Spark enables order-independent checksums for shuffle outputs by default to detect data inconsistencies during indeterminate shuffle stage retries. If a checksum mismatch is detected, Spark rolls back and re-executes all succeeding stages that depend on the shuffle output. If rolling back is not possible for some succeeding stages, the job will fail. To restore the previous behavior, set `spark.sql.shuffle.orderIndependentChecksum.enabled` and `spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch` to `false`. - Since Spark 4.2, support for Derby JDBC datasource is deprecated. - Since Spark 4.2, a new default method `mergeWith` has been added to the `CustomTaskMetric` interface. The default implementation sums the two metric values, which is correct for count-type metrics. Data source connector implementations that report non-additive metrics (e.g., maximum, average, compression ratio, or gauge values) must override `mergeWith` to provide correct merge semantics. -- Since Spark 4.2, the synthetic `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and extension-injected functions; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. As a result, 2-part references like `builtin.func()` and `session.func()` now follow a mini-path that tries the synthetic system namespace first and the current catalog second, so a persistent schema literally named `builtin` or `session` is no longer reached by `builtin.func()` / `session.func()` when the system namespace contains an object of the same name. To restore the previous behavior (current catalog first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `current_catalog.session.x`). See [Reserved system names](sql-ref-identifier.html#reserved-system-names). +- Since Spark 4.2, the synthetic `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and extension-injected functions; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. As a result, 2-part references like `builtin.func()` and `session.func()` now follow a mini-path that tries the synthetic system namespace first and the current catalog second, so a persistent schema literally named `builtin` or `session` is no longer reached by `builtin.func()` / `session.func()` when the system namespace contains an object of the same name. To restore the previous behavior (current catalog first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `spark_catalog.session.x`). See [Reserved system names](sql-ref-identifier.html#reserved-system-names). - Since Spark 4.2, `CREATE TEMPORARY VIEW`, `CREATE TEMPORARY FUNCTION`, and the corresponding `DROP` statements accept the `session` and `system.session` qualifiers on the object name (in addition to the previously supported unqualified form); for example, `CREATE TEMPORARY VIEW system.session.v AS ...` and `DROP TEMPORARY FUNCTION session.f` are now valid. Any other qualifier on a temporary object is rejected with `INVALID_TEMP_OBJ_QUALIFIER`. - Spark 4.2 introduces the SQL standard `PATH` feature: the `SET PATH` statement, the `current_path()` function, the path-based resolution of unqualified routines / tables / views, and the configurations `spark.sql.path.enabled` (default `false`) and `spark.sql.defaultPath`. The feature is opt-in; when `spark.sql.path.enabled` is `false`, unqualified resolution falls back to a fixed default path and `SET PATH` is rejected with `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) and [Name Resolution](sql-ref-name-resolution.html). diff --git a/docs/sql-ref-function-current-path.md b/docs/sql-ref-function-current-path.md index 085e7cd4defe7..c23b860072946 100644 --- a/docs/sql-ref-function-current-path.md +++ b/docs/sql-ref-function-current-path.md @@ -29,16 +29,11 @@ resolution. ```sql current_path() - -CURRENT_PATH ``` -Like `current_user`, `current_schema`, and `current_catalog`, `current_path` accepts an empty -argument list or no parentheses at all. - ### Arguments -This function takes no arguments. +This function takes no arguments. The parentheses may be omitted. ### Returns @@ -50,14 +45,9 @@ catalog-qualified current schema (`current_catalog.current_schema`) each time `current_path()` is evaluated, so subsequent `USE SCHEMA` statements are reflected without re-issuing `SET PATH`. -`current_path()` is a regular built-in function. It remains available, and returns the default -path, even when `spark.sql.path.enabled` is `false`. - ### Examples ```sql -> SET spark.sql.path.enabled = true; - > SELECT current_path(); system.builtin,system.session,spark_catalog.default @@ -86,11 +76,6 @@ path, even when `spark.sql.path.enabled` is `false`. > SET PATH = spark_catalog.other, system.builtin; > SELECT * FROM v_path; spark_catalog.other,system.builtin - --- current_path() still returns the default path when SET PATH is disabled. -> SET spark.sql.path.enabled = false; -> SELECT current_path(); - system.builtin,system.session,spark_catalog.default ``` ### Related Statements diff --git a/docs/sql-ref-functions-builtin.md b/docs/sql-ref-functions-builtin.md index 1912a1e577d59..22e52d0500c53 100644 --- a/docs/sql-ref-functions-builtin.md +++ b/docs/sql-ref-functions-builtin.md @@ -17,6 +17,10 @@ license: | limitations under the License. --- +All built-in functions live in the virtual schema `system.builtin`. They can always be referenced +unambiguously by their fully qualified name (for example `system.builtin.abs`), regardless of any +user-defined function that may share the same name. + ### Aggregate Functions {% include_api_gen generated-agg-funcs-table.html %} #### Examples diff --git a/docs/sql-ref-identifier.md b/docs/sql-ref-identifier.md index c69de9959a4e0..e185ce04e8db8 100644 --- a/docs/sql-ref-identifier.md +++ b/docs/sql-ref-identifier.md @@ -60,20 +60,16 @@ catalog or schema names. | Name | Position | Notes | | :--- | :------- | :---- | | `system` | catalog | Synthetic catalog hosting `system.builtin` and `system.session`. Not loadable as a v2 catalog plugin; `spark.sql.catalog.system = ...` is unsupported, and the current catalog cannot be `system`. | -| `builtin` | schema | A persistent schema literally named `builtin` is allowed but discouraged: `builtin.x` follows the mini-path below. To reach a persistent `builtin` schema, use `current_catalog.builtin.x`. | -| `session` | schema | A persistent schema literally named `session` is allowed but discouraged: `session.x` follows the mini-path below. To reach a persistent `session` schema, use `current_catalog.session.x`. | - -For 2-part references starting with `builtin` or `session`, Spark chooses the implicit catalog -using the mini-path below and returns the first match: - -| `spark.sql.legacy.persistentCatalogFirst` | Mini-path tried in order | -| :-------------------------------------- | :----------------------- | -| `false` (default) | `system.session.x` / `system.builtin.x`, then `current_catalog.session.x` / `current_catalog.builtin.x` | -| `true` (legacy) | `current_catalog.session.x` / `current_catalog.builtin.x`, then `system.session.x` / `system.builtin.x` | - -3-part names skip the mini-path: `system.session.x` and `system.builtin.x` always target the -system namespace, and `current_catalog.session.x` / `current_catalog.builtin.x` always target the -persistent schema. +| `builtin` | schema | A persistent schema literally named `builtin` is allowed but discouraged because it collides with `system.builtin`. | +| `session` | schema | A persistent schema literally named `session` is allowed but discouraged because it collides with `system.session`. | + +An unqualified 2-part reference like `builtin.x` or `session.x` resolves to +`system.builtin.x` / `system.session.x` if such an object exists, and otherwise falls back to +the same name in the current catalog. So an object in a persistent `builtin` or `session` +schema is shadowed only when an object of the same name exists in the corresponding system +namespace. The shadowed object stays reachable via its fully qualified 3-part name (for example +`spark_catalog.session.x`). Set `spark.sql.legacy.persistentCatalogFirst` to `true` to flip the +preference: the current catalog is tried first and the system namespace becomes the fallback. The `system.builtin` and `system.session` namespaces are described in [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). Temporary objects in `system.session` are diff --git a/docs/sql-ref-name-resolution.md b/docs/sql-ref-name-resolution.md index be20bf3872f95..42e9149041068 100644 --- a/docs/sql-ref-name-resolution.md +++ b/docs/sql-ref-name-resolution.md @@ -50,7 +50,7 @@ In detail, resolution of identifiers to a specific reference follows these rules 1. **Parameterless function reference** - If the identifier is unqualified and matches `current_user`, `current_date`, or `current_timestamp`: Resolve it as one of these functions. + If the identifier is unqualified and matches `current_user`, `current_date`, `current_time`, `current_timestamp`, or `current_path`: Resolve it as one of these functions. 1. **Column DEFAULT specification** @@ -280,8 +280,8 @@ returning the first match: | `spark.sql.legacy.persistentCatalogFirst` | Mini-path tried in order | | :-------------------------------------- | :----------------------- | -| `false` (default) | `system.session.x` / `system.builtin.x`, then `current_catalog.session.x` / `current_catalog.builtin.x` | -| `true` (legacy) | `current_catalog.session.x` / `current_catalog.builtin.x`, then `system.session.x` / `system.builtin.x` | +| `false` (default) | the system namespace (`system.session.x` / `system.builtin.x`), then the current catalog's `session.x` / `builtin.x` | +| `true` (legacy) | the current catalog's `session.x` / `builtin.x`, then the system namespace (`system.session.x` / `system.builtin.x`) | ### Unqualified (1 part) — `object` @@ -371,7 +371,6 @@ the effective search path, for example [TABLE_OR_VIEW_NOT_FOUND] The table or view `cte` cannot be found. -- PATH drives unqualified relation lookup order -> SET spark.sql.path.enabled = true; > CREATE SCHEMA db_a; > CREATE SCHEMA db_b; > CREATE TABLE db_a.t USING parquet AS SELECT 1 AS v; @@ -392,14 +391,8 @@ the effective search path, for example ## Function resolution -A function reference is recognized by the trailing parentheses. It can resolve to: - -- A built-in function (`system.builtin`), including functions injected by - `SparkSessionExtensions`. -- A temporary function in the current session (`system.session`). -- A persistent function in a catalog schema. - -Resolution follows [Object name resolution](#object-name-resolution). +A function reference is recognized by the trailing parentheses, and follows +[Object name resolution](#object-name-resolution). If the function cannot be resolved, Spark raises `UNRESOLVED_ROUTINE`. The error includes the effective search path, for example @@ -456,7 +449,6 @@ effective search path, for example > DROP TEMPORARY FUNCTION abs; -- PATH controls unqualified routine lookup order -> SET spark.sql.path.enabled = true; > CREATE SCHEMA path_a; > CREATE SCHEMA path_b; > CREATE FUNCTION path_a.pick() RETURNS INT RETURN 10; diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md index 1d049c80388fc..be628d9f41f8c 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md @@ -41,20 +41,11 @@ The path can include two virtual namespaces in the `system` catalog: raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. Unqualified resolution and [`current_path()`](sql-ref-function-current-path.html) still use the default path. -The path within a session: - -1. The initial value of `PATH` is `DEFAULT_PATH`. -2. `DEFAULT_PATH` is either: - - - the value of `spark.sql.defaultPath`, when that configuration is set; or - - a built-in value composed of `system.builtin`, `system.session`, and the current schema, - when `spark.sql.defaultPath` is empty. - -3. To override the initial value, set `spark.sql.defaultPath`, or set - `spark.sql.functionResolution.sessionOrder` to reorder the entries of the built-in value in - step 2. - -See the [`DEFAULT_PATH` parameter](#parameters) for the exact derivation rules. +The initial value of `PATH` in a session is `DEFAULT_PATH`. `DEFAULT_PATH` is either the value of +`spark.sql.defaultPath`, or, when that configuration is empty, a built-in value composed of +`system.builtin`, `system.session`, and the current schema. To override, set +`spark.sql.defaultPath`. See the [`DEFAULT_PATH` parameter](#parameters) for the exact derivation +rules. A `SET PATH` is scoped to the current session and is lost when the session ends. To revert mid-session, run `SET PATH = DEFAULT_PATH`. Cloned sessions inherit the parent's path at clone @@ -70,18 +61,18 @@ The leading names `session` and `builtin` have special meaning in 2-part referen ### Syntax ```sql -SET PATH = path_element [, path_element ...] +SET PATH = path_element [ , ... ] path_element - : DEFAULT_PATH - | SYSTEM_PATH - | PATH - | CURRENT_SCHEMA - | CURRENT_DATABASE - | schema_name + { DEFAULT_PATH | + SYSTEM_PATH | + PATH | + CURRENT_SCHEMA | + CURRENT_DATABASE | + schema_name } schema_name - : { catalog_name . namespace_name [ . namespace_name ...] } + catalog_name . schema_name ``` ### Parameters @@ -102,28 +93,16 @@ schema_name 2. If `spark.sql.defaultPath` is empty (the factory setting), the spark-built-in default applies: `system.builtin`, `system.session`, and the current schema - (`current_catalog.current_schema`), interleaved per - `spark.sql.functionResolution.sessionOrder`: - - | `sessionOrder` | Order produced | - | :------------- | :------------- | - | `first` | `system.session`, `system.builtin`, `current_schema` | - | `second` (default) | `system.builtin`, `system.session`, `current_schema` | - | `last` | `system.builtin`, `current_schema`, `system.session` | + (`current_catalog.current_schema`), in that order. To change the default path, set `spark.sql.defaultPath` via any of the usual mechanisms (`SET spark.sql.defaultPath = ...` at runtime, `--conf` on `spark-submit`, `SparkConf`, or `spark-defaults.conf`); clear it with `RESET spark.sql.defaultPath` to return to the spark-built-in default. - `spark.sql.functionResolution.sessionOrder` and `spark.sql.legacy.persistentCatalogFirst` are - internal configurations intended for advanced use; ordinary workloads should leave them at - their defaults. - * **`SYSTEM_PATH`** - Expands to the two system namespaces `system.builtin` and `system.session`, in the order - configured by `spark.sql.functionResolution.sessionOrder`. + Expands to `system.builtin, system.session`. * **`PATH`** @@ -140,12 +119,12 @@ schema_name * **`schema_name`** - An explicit catalog-qualified schema reference. At least two parts are required - (`catalog.namespace`). The catalog and schema do not need to exist at the time of `SET PATH`; - non-existent entries are silently skipped during name resolution. + An explicit catalog-qualified schema reference (`catalog.schema`). Both parts are required. + The catalog and schema do not need to exist at the time of `SET PATH`; non-existent entries + are silently skipped during name resolution. - Identifier quoting follows the usual rules; backtick-quoted parts that contain a dot are - preserved, for example `spark_catalog.` + `` `sch.b` ``. + Identifier quoting follows the usual rules. Backtick-quoted parts that contain a dot are + preserved, for example ``spark_catalog.`sch.b` ``. ### Semantics @@ -216,6 +195,24 @@ schema_name system.session,system.builtin,spark_catalog.default > RESET spark.sql.defaultPath; +-- Append a schema of shared UDFs so callers do not have to qualify them. +> CREATE SCHEMA spark_catalog.shared_udfs; +> CREATE FUNCTION spark_catalog.shared_udfs.to_iso_date(d DATE) RETURNS STRING + RETURN date_format(d, 'yyyy-MM-dd'); +> SET PATH = PATH, spark_catalog.shared_udfs; +> SELECT to_iso_date(DATE'2026-05-22'); + 2026-05-22 + +-- Drop system.session from the path to force temporary objects to be spelled out. +> CREATE TEMPORARY FUNCTION revenue() RETURNS INT RETURN 42; +> SELECT revenue(); -- resolves via the default path + 42 +> SET PATH = system.builtin, current_schema; +> SELECT revenue(); -- now must be qualified + [UNRESOLVED_ROUTINE] `revenue` ... +> SELECT session.revenue(); + 42 + -- Error cases. > SET PATH = spark_catalog.default, spark_catalog.default; [DUPLICATE_SQL_PATH_ENTRY] diff --git a/docs/sql-ref-syntax-aux-describe-function.md b/docs/sql-ref-syntax-aux-describe-function.md index 3dca36833c3d4..2da1b9466fc23 100644 --- a/docs/sql-ref-syntax-aux-describe-function.md +++ b/docs/sql-ref-syntax-aux-describe-function.md @@ -151,9 +151,7 @@ DESC FUNCTION getemps; +--------------------------------------+ -- DESC FUNCTION EXTENDED for a SQL UDF adds the body, the characteristic clauses, --- the captured SQL configs, the owner, the create time, and the frozen SQL Path --- (when `spark.sql.path.enabled` is true). -SET spark.sql.path.enabled = true; +-- the captured SQL configs, the owner, the create time, and the frozen SQL Path. SET PATH = spark_catalog.default, system.builtin; CREATE FUNCTION frozen_fn() RETURNS INT COMMENT 'demo function' diff --git a/docs/sql-ref-syntax-aux-describe-table.md b/docs/sql-ref-syntax-aux-describe-table.md index 395fea1f0bcb3..b535110c8f9ef 100644 --- a/docs/sql-ref-syntax-aux-describe-table.md +++ b/docs/sql-ref-syntax-aux-describe-table.md @@ -105,6 +105,10 @@ to return the metadata pertaining to a partition or column respectively. "view_schema_mode": "", "view_catalog_and_namespace": "", "view_query_output_columns": ["col1", "col2"], + // SQL Path captured at the time of permanent view creation + "sql_path": [ + {"catalog_name": "", "namespace": [""]} + ], // Spark SQL configurations captured at the time of permanent view creation "view_creation_spark_configuration": { "conf1": "", @@ -272,13 +276,29 @@ DESCRIBE customer salesdb.customer.name; +---------+----------+ -- Returns the table metadata in JSON format. +-- (Formatted for readability; the actual output is on a single line.) DESC FORMATTED customer AS JSON; -{"table_name":"customer","catalog_name":"spark_catalog","schema_name":"default","namespace":["default"],"columns":[{"name":"cust_id","type":{"name":"integer"},"nullable":true},{"name":"name","type":{"name":"string"},"comment":"Short name","nullable":true},{"name":"state","type":{"name":"varchar","length":20},"nullable":true}],"location": "file:/tmp/salesdb.db/custom...","created_time":"2020-04-07T14:05:43Z","last_access":"UNKNOWN","created_by":"None","type":"MANAGED","provider":"parquet","partition_provider":"Catalog","partition_columns":["state"]} - --- DESCRIBE EXTENDED on a view emits view-specific rows. When `spark.sql.path.enabled` is true, --- the output also includes the frozen `SQL Path` that the view body resolves against; see --- [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). -SET spark.sql.path.enabled = true; +{ + "table_name": "customer", + "catalog_name": "spark_catalog", + "schema_name": "default", + "namespace": ["default"], + "columns": [ + {"name": "cust_id", "type": {"name": "integer"}, "nullable": true}, + {"name": "name", "type": {"name": "string"}, "comment": "Short name", "nullable": true}, + {"name": "state", "type": {"name": "varchar", "length": 20}, "nullable": true} + ], + "location": "file:/tmp/salesdb.db/custom...", + "created_time": "2020-04-07T14:05:43Z", + "last_access": "UNKNOWN", + "created_by": "None", + "type": "MANAGED", + "provider": "parquet", + "partition_provider": "Catalog", + "partition_columns": ["state"] +} + +-- DESCRIBE EXTENDED on a view emits view-specific rows. SET PATH = spark_catalog.default, system.builtin; CREATE VIEW recent_customers AS SELECT cust_id, name FROM customer WHERE cust_id > 1000; @@ -302,6 +322,37 @@ DESCRIBE EXTENDED recent_customers; | View Query Output Columns| [`cust_id`, `name`] | | | SQL Path| spark_catalog.default, system.builtin| | +----------------------------+---------------------------------------+--------+ + +-- The same metadata in JSON form. +-- (Formatted for readability; the actual output is on a single line.) +DESCRIBE EXTENDED recent_customers AS JSON; +{ + "table_name": "recent_customers", + "catalog_name": "spark_catalog", + "schema_name": "default", + "namespace": ["default"], + "columns": [ + {"name": "cust_id", "type": {"name": "int"}, "nullable": true}, + {"name": "name", "type": {"name": "string", "collation": "UTF8_BINARY"}, "nullable": true} + ], + "created_time": "2026-05-22T10:00:00Z", + "last_access": "UNKNOWN", + "created_by": "Spark 4.2.0", + "type": "VIEW", + "collation": "UTF8_BINARY", + "view_text": "SELECT cust_id, name FROM customer WHERE cust_id > 1000", + "view_original_text": "SELECT cust_id, name FROM customer WHERE cust_id > 1000", + "view_schema_mode": "COMPENSATION", + "view_catalog_and_namespace": "spark_catalog.default", + "view_query_output_columns": ["cust_id", "name"], + "sql_path": [ + {"catalog_name": "spark_catalog", "namespace": ["default"]}, + {"catalog_name": "system", "namespace": ["builtin"]} + ], + "view_creation_spark_configuration": { + "spark.sql.ansi.enabled": "true" + } +} ``` ### Related Statements diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index 7228fe6f85869..9125ca78dc9ee 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -38,10 +38,7 @@ CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name Specifies the name of the database to be created. - > Note: avoid naming a database `session` or `builtin`. The catalog API accepts both names, - > but the 2-part forms `session.x` and `builtin.x` are interpreted as the temporary and - > built-in namespaces respectively, which hides a persistent database with one of these - > names. See + > Note: avoid naming a database `session` or `builtin`; see > [Reserved system names](sql-ref-identifier.html#reserved-system-names). * **IF NOT EXISTS** diff --git a/docs/sql-ref-syntax-ddl-create-sql-function.md b/docs/sql-ref-syntax-ddl-create-sql-function.md index a4bb2c1e3daa1..f1f1c8ce851a0 100644 --- a/docs/sql-ref-syntax-ddl-create-sql-function.md +++ b/docs/sql-ref-syntax-ddl-create-sql-function.md @@ -87,13 +87,6 @@ characteristic The function name must be unique among all routines (procedures and functions) in its schema. -> Note: a SQL UDF captures the SQL Path that was in effect when `CREATE FUNCTION` ran. When the -> function is invoked, the body resolves against that frozen path, not the invoker's current path. -> Inside the body, `current_schema()` and `current_path()` still reflect the invoker's context. -> See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). -> Use [DESCRIBE FUNCTION EXTENDED](sql-ref-syntax-aux-describe-function.html) to inspect the -> captured path. - - **function_parameter** Specifies a parameter of the function. @@ -147,6 +140,15 @@ characteristic - [Aggregate functions](sql-ref-functions-builtin.md#aggregate-functions) - [Window functions](sql-ref-functions-builtin.md#analytic-window-functions) - [Ranking functions](sql-ref-functions-builtin.md#ranking-window-functions) + + A persistent SQL UDF cannot reference temporary views, temporary functions, or session + variables. + + The SQL Path in effect at `CREATE FUNCTION` time is captured into the function's metadata; the + body resolves against that frozen path on every invocation, not the invoker's current path. + `current_schema()` and `current_path()` inside the body still return the invoker's context. + Use [DESCRIBE FUNCTION EXTENDED](sql-ref-syntax-aux-describe-function.html) to inspect the + captured path. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). - Row producing functions such as `explode` Within the body of the function you can refer to parameter by its unqualified name or by qualifying the parameter with the function name. @@ -354,8 +356,6 @@ against that frozen path on every invocation, even if the caller's session has s PATH. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). ```sql -> SET spark.sql.path.enabled = true; - > CREATE SCHEMA path_a; > CREATE SCHEMA path_b; > CREATE TABLE path_a.t USING parquet AS SELECT 10 AS id; diff --git a/docs/sql-ref-syntax-ddl-create-view.md b/docs/sql-ref-syntax-ddl-create-view.md index d0fd9b98e5b6c..f6fc6c0e85c75 100644 --- a/docs/sql-ref-syntax-ddl-create-view.md +++ b/docs/sql-ref-syntax-ddl-create-view.md @@ -71,11 +71,6 @@ CREATE [ OR REPLACE ] [ [ GLOBAL ] TEMPORARY ] VIEW [ IF NOT EXISTS ] view_ident The fully qualified view name must be unique within its schema. -> Note: a persistent view captures the SQL Path that was in effect when `CREATE VIEW` ran. When the -> view is referenced, the body resolves against that frozen path, not the invoker's current path. -> See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). -> Use [DESCRIBE EXTENDED](sql-ref-syntax-aux-describe-table.html) to inspect the captured path. - * **create_view_clauses** These clauses are optional and order insensitive. It can be of following formats. @@ -96,8 +91,16 @@ CREATE [ OR REPLACE ] [ [ GLOBAL ] TEMPORARY ] VIEW [ IF NOT EXISTS ] view_ident The default is `WITH SCHEMA COMPENSATION`. * **query** + A [SELECT](sql-ref-syntax-qry-select.html) statement that constructs the view from base tables or other views. + A persistent view cannot reference temporary views, temporary functions, or session variables. + + For a persistent view, the SQL Path in effect at `CREATE VIEW` time is captured into the view's + metadata; the body resolves against that frozen path on every reference, not the invoker's + current path. Use [DESCRIBE EXTENDED](sql-ref-syntax-aux-describe-table.html) to inspect the + captured path. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). + ### Examples ```sql @@ -156,8 +159,6 @@ resolves against that frozen path on every reference, even when the caller's ses different PATH. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). ```sql -> SET spark.sql.path.enabled = true; - > CREATE SCHEMA views_a; > CREATE SCHEMA views_b; > CREATE TABLE views_a.t USING parquet AS SELECT 1 AS id; diff --git a/docs/sql-ref-syntax-ddl-drop-function.md b/docs/sql-ref-syntax-ddl-drop-function.md index ebf5b01404a19..b9272e34b81d6 100644 --- a/docs/sql-ref-syntax-ddl-drop-function.md +++ b/docs/sql-ref-syntax-ddl-drop-function.md @@ -34,26 +34,13 @@ DROP [ TEMPORARY ] FUNCTION [ IF EXISTS ] function_name * **function_name** - Specifies the name of an existing function. Whether `function_name` refers to a temporary - function or a persistent function is selected by the `TEMPORARY` keyword, not by the - identifier — without `TEMPORARY` the name always targets a persistent function (even if - you write `session.f` or `system.session.f`, in which case Spark looks for a persistent - function in a schema literally named `session`). + Specifies the name of an existing function. With `TEMPORARY`, the name may optionally be + qualified with `session` or `system.session`. Without `TEMPORARY`, the name may optionally be + qualified with a database (or a catalog and database) and resolves to a persistent function. - * With `TEMPORARY`: the name may be optionally qualified with the session schema - (`session` or `system.session`); for example, `DROP TEMPORARY FUNCTION f`, - `DROP TEMPORARY FUNCTION session.f`, and `DROP TEMPORARY FUNCTION system.session.f` all - drop the same temporary function. + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` - **Syntax:** `[ { session | system.session } . ] function_name` - - * Without `TEMPORARY`: the name may be optionally qualified with a database name (or a catalog - and database) and resolves to a persistent function in that schema. - - **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` - - The built-in namespace `system.builtin` cannot be dropped: `DROP FUNCTION system.builtin.abs` - raises `FORBIDDEN_OPERATION`. + Functions in `system.builtin` cannot be dropped. * **TEMPORARY** diff --git a/docs/sql-ref-syntax-ddl-drop-view.md b/docs/sql-ref-syntax-ddl-drop-view.md index eca58c840e436..16f711a9074eb 100644 --- a/docs/sql-ref-syntax-ddl-drop-view.md +++ b/docs/sql-ref-syntax-ddl-drop-view.md @@ -23,14 +23,6 @@ license: | `DROP VIEW` removes the metadata associated with a specified view from the catalog. -Unlike `DROP FUNCTION`, `DROP VIEW` has no `TEMPORARY` keyword. The choice between a temporary -view and a persistent view is driven by the identifier alone: - -* An unqualified `view_name` first matches a temporary view, then a persistent view in - the current schema. -* A name qualified with `session` or `system.session` targets only the temporary view scope. -* Any other qualifier targets only persistent views. - ### Syntax ```sql @@ -45,18 +37,11 @@ DROP VIEW [ IF EXISTS ] view_identifier * **view_identifier** - Specifies the view name to be dropped. - - * For a **persistent** view the name may be optionally qualified with a database name - (or a catalog and database). - - **Syntax:** `[ catalog_name. ] [ database_name. ] view_name` + Specifies the view name to be dropped. The name may be optionally qualified with a database + name (or a catalog and database). A name qualified with `session` or `system.session` + targets a temporary view. - * For a **temporary** view the name may be optionally qualified with the session schema - (`session` or `system.session`); for example, `DROP VIEW v`, `DROP VIEW session.v`, and - `DROP VIEW system.session.v` all drop the same temporary view. - - **Syntax:** `[ { session | system.session } . ] view_name` + **Syntax:** `[ catalog_name. ] [ database_name. ] view_name` ### Examples @@ -70,12 +55,20 @@ DROP VIEW userdb.employeeView; -- Assumes a view named `employeeView` does not exist. -- Throws exception DROP VIEW employeeView; -Error: org.apache.spark.sql.AnalysisException: Table or view not found: employeeView; -(state=,code=0) +Error: TABLE_OR_VIEW_NOT_FOUND -- Assumes a view named `employeeView` does not exist,Try with IF EXISTS -- this time it will not throw exception DROP VIEW IF EXISTS employeeView; + +-- A temporary view that shadows a persistent view with the same name. +-- An unqualified DROP VIEW drops the temporary view first; qualifying with `session` +-- always targets the temporary view explicitly. +CREATE VIEW default.recent_orders AS SELECT * FROM orders WHERE order_date > current_date - 7; +CREATE TEMPORARY VIEW recent_orders AS SELECT * FROM orders WHERE order_date = current_date; + +DROP VIEW session.recent_orders; -- drops the temporary view +DROP VIEW default.recent_orders; -- drops the persistent view ``` ### Related Statements diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md index ef1597cd96b41..1e0ea4a2b8d64 100644 --- a/docs/sql-ref-syntax.md +++ b/docs/sql-ref-syntax.md @@ -29,7 +29,8 @@ Data Definition Statements are used to create or modify the structure of databas * [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html) * [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html) * [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html) - * [CREATE FUNCTION](sql-ref-syntax-ddl-create-function.html) + * [CREATE FUNCTION (External)](sql-ref-syntax-ddl-create-function.html) + * [CREATE FUNCTION (SQL)](sql-ref-syntax-ddl-create-sql-function.html) * [CREATE TABLE](sql-ref-syntax-ddl-create-table.html) * [CREATE VIEW](sql-ref-syntax-ddl-create-view.html) * [DECLARE VARIABLE](sql-ref-syntax-ddl-declare-variable.html) From f1c1ba12b602aac029b566b530a546c83c9e1dec Mon Sep 17 00:00:00 2001 From: Serge Rielau Date: Fri, 22 May 2026 11:17:13 +0200 Subject: [PATCH 4/6] [SPARK-56984][DOCS][FOLLOWUP] Self-review accuracy fixes Four small accuracy nits caught in a self-review pass: 1. SET PATH grammar: removed the self-referential `schema_name` production by inlining `catalog_name . schema_name` directly into `path_element`. The previous form defined `schema_name` recursively with itself, which is hard to read literally. 2. `SYSTEM_PATH` parameter: dropped the explicit ordering "expands to `system.builtin, system.session`". The actual order depends on `spark.sql.functionResolution.sessionOrder`, which the rest of the page de-emphasizes. Now reads "Expands to the two system namespaces, `system.builtin` and `system.session`." 3. SET PATH Description: "To revert mid-session, run `SET PATH = DEFAULT_PATH`" overstated the operation. The statement stores a snapshot of `DEFAULT_PATH` into `_sessionPath` rather than restoring the "never-set" state, so a later change to `spark.sql.defaultPath` is not picked up. Reworded to "re-apply the current default path mid-session" with a brief parenthetical that names the snapshot behavior. 4. Reserved system names: "spark.sql.catalog.system = ... is unsupported" was correct but suggested a clean rejection. In fact the v2 catalog loader does not special-case `system` and registering it gives undefined results, per the CatalogManager comment. Now reads "is unsupported and may yield undefined results". Documentation only; no behavior changes. --- docs/sql-ref-identifier.md | 2 +- docs/sql-ref-syntax-aux-conf-mgmt-set-path.md | 15 +++++++-------- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/docs/sql-ref-identifier.md b/docs/sql-ref-identifier.md index e185ce04e8db8..f9501626d8077 100644 --- a/docs/sql-ref-identifier.md +++ b/docs/sql-ref-identifier.md @@ -59,7 +59,7 @@ catalog or schema names. | Name | Position | Notes | | :--- | :------- | :---- | -| `system` | catalog | Synthetic catalog hosting `system.builtin` and `system.session`. Not loadable as a v2 catalog plugin; `spark.sql.catalog.system = ...` is unsupported, and the current catalog cannot be `system`. | +| `system` | catalog | Synthetic catalog hosting `system.builtin` and `system.session`. Spark does not load `system` through the v2 catalog API; setting `spark.sql.catalog.system = ...` is unsupported and may yield undefined results. The current catalog cannot be `system`. | | `builtin` | schema | A persistent schema literally named `builtin` is allowed but discouraged because it collides with `system.builtin`. | | `session` | schema | A persistent schema literally named `session` is allowed but discouraged because it collides with `system.session`. | diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md index be628d9f41f8c..729344f38aae1 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md @@ -47,9 +47,11 @@ The initial value of `PATH` in a session is `DEFAULT_PATH`. `DEFAULT_PATH` is ei `spark.sql.defaultPath`. See the [`DEFAULT_PATH` parameter](#parameters) for the exact derivation rules. -A `SET PATH` is scoped to the current session and is lost when the session ends. To revert -mid-session, run `SET PATH = DEFAULT_PATH`. Cloned sessions inherit the parent's path at clone -time; later changes in the child do not propagate back. +A `SET PATH` is scoped to the current session and is lost when the session ends. To re-apply +the current default path mid-session, run `SET PATH = DEFAULT_PATH`. (This stores a snapshot of +`DEFAULT_PATH` at the moment of the statement; later changes to `spark.sql.defaultPath` are not +picked up automatically.) Cloned sessions inherit the parent's path at clone time; later changes +in the child do not propagate back. Persistent views and SQL UDFs capture the path at `CREATE` time into the object's metadata. Each invocation resolves the body against that frozen path, not the invoker's current path; @@ -69,10 +71,7 @@ path_element PATH | CURRENT_SCHEMA | CURRENT_DATABASE | - schema_name } - -schema_name - catalog_name . schema_name + catalog_name . schema_name } ``` ### Parameters @@ -102,7 +101,7 @@ schema_name * **`SYSTEM_PATH`** - Expands to `system.builtin, system.session`. + Expands to the two system namespaces, `system.builtin` and `system.session`. * **`PATH`** From 5a72819158a28077f83c2d6b73cc96e6ef22f487 Mon Sep 17 00:00:00 2001 From: Serge Rielau Date: Fri, 22 May 2026 11:25:19 +0200 Subject: [PATCH 5/6] [SPARK-56984][DOCS][FOLLOWUP] Editorial pass on SQL Path docs Plain-English copy edit on the SQL Path documentation. No content changes; word choices favor common terms over technical jargon and remove minor stylistic inconsistencies. - `synthetic` -> `virtual` for the `system` catalog and namespaces (matches usage elsewhere in the doc set). - `is gated by` -> `is controlled by` for `spark.sql.path.enabled`. - `how it is gated` -> `how to enable it`. - `A SET PATH is scoped` -> `The effect of SET PATH is scoped` (avoids the awkward indefinite-article noun). - `(cycle break) rather than recursing` -> `to avoid a cycle, rather than recursing`. - `live marker` -> `re-evaluated each time` in a code comment. - `spelled out` -> `qualified explicitly` in a code comment. - `flip the preference` -> `reverse the preference`. - `may yield undefined results` -> `produces undefined results`. - `literally named X` -> `named X` (drop the redundant adverb). - `extension-injected functions` -> `functions injected through SparkSessionExtensions` in the migration guide. Documentation only; no behavior changes. --- docs/sql-migration-guide.md | 2 +- docs/sql-ref-function-current-path.md | 2 +- docs/sql-ref-identifier.md | 10 +++++----- docs/sql-ref-syntax-aux-conf-mgmt-set-path.md | 20 +++++++++---------- 4 files changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 10b4652b5db04..c50124bdbc662 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -31,7 +31,7 @@ license: | - Since Spark 4.2, Spark enables order-independent checksums for shuffle outputs by default to detect data inconsistencies during indeterminate shuffle stage retries. If a checksum mismatch is detected, Spark rolls back and re-executes all succeeding stages that depend on the shuffle output. If rolling back is not possible for some succeeding stages, the job will fail. To restore the previous behavior, set `spark.sql.shuffle.orderIndependentChecksum.enabled` and `spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch` to `false`. - Since Spark 4.2, support for Derby JDBC datasource is deprecated. - Since Spark 4.2, a new default method `mergeWith` has been added to the `CustomTaskMetric` interface. The default implementation sums the two metric values, which is correct for count-type metrics. Data source connector implementations that report non-additive metrics (e.g., maximum, average, compression ratio, or gauge values) must override `mergeWith` to provide correct merge semantics. -- Since Spark 4.2, the synthetic `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and extension-injected functions; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. As a result, 2-part references like `builtin.func()` and `session.func()` now follow a mini-path that tries the synthetic system namespace first and the current catalog second, so a persistent schema literally named `builtin` or `session` is no longer reached by `builtin.func()` / `session.func()` when the system namespace contains an object of the same name. To restore the previous behavior (current catalog first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `spark_catalog.session.x`). See [Reserved system names](sql-ref-identifier.html#reserved-system-names). +- Since Spark 4.2, the virtual `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and functions injected through `SparkSessionExtensions`; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. As a result, 2-part references like `builtin.func()` and `session.func()` now follow a mini-path that tries the system namespace first and the current catalog second, so a persistent schema named `builtin` or `session` is no longer reached by `builtin.func()` / `session.func()` when the system namespace contains an object of the same name. To restore the previous behavior (current catalog first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `spark_catalog.session.x`). See [Reserved system names](sql-ref-identifier.html#reserved-system-names). - Since Spark 4.2, `CREATE TEMPORARY VIEW`, `CREATE TEMPORARY FUNCTION`, and the corresponding `DROP` statements accept the `session` and `system.session` qualifiers on the object name (in addition to the previously supported unqualified form); for example, `CREATE TEMPORARY VIEW system.session.v AS ...` and `DROP TEMPORARY FUNCTION session.f` are now valid. Any other qualifier on a temporary object is rejected with `INVALID_TEMP_OBJ_QUALIFIER`. - Spark 4.2 introduces the SQL standard `PATH` feature: the `SET PATH` statement, the `current_path()` function, the path-based resolution of unqualified routines / tables / views, and the configurations `spark.sql.path.enabled` (default `false`) and `spark.sql.defaultPath`. The feature is opt-in; when `spark.sql.path.enabled` is `false`, unqualified resolution falls back to a fixed default path and `SET PATH` is rejected with `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) and [Name Resolution](sql-ref-name-resolution.html). diff --git a/docs/sql-ref-function-current-path.md b/docs/sql-ref-function-current-path.md index c23b860072946..06cb42932b797 100644 --- a/docs/sql-ref-function-current-path.md +++ b/docs/sql-ref-function-current-path.md @@ -21,7 +21,7 @@ license: | Returns the effective SQL Path for the current session as a comma-separated string of qualified namespace names. See [`SET PATH`](sql-ref-syntax-aux-conf-mgmt-set-path.html) for a -description of what the path is, how it is gated, and how to change it, and +description of what the path is, how to enable it, and how to change it, and [Name Resolution](sql-ref-name-resolution.html) for how the path drives unqualified name resolution. diff --git a/docs/sql-ref-identifier.md b/docs/sql-ref-identifier.md index f9501626d8077..609a4cfa29118 100644 --- a/docs/sql-ref-identifier.md +++ b/docs/sql-ref-identifier.md @@ -59,17 +59,17 @@ catalog or schema names. | Name | Position | Notes | | :--- | :------- | :---- | -| `system` | catalog | Synthetic catalog hosting `system.builtin` and `system.session`. Spark does not load `system` through the v2 catalog API; setting `spark.sql.catalog.system = ...` is unsupported and may yield undefined results. The current catalog cannot be `system`. | -| `builtin` | schema | A persistent schema literally named `builtin` is allowed but discouraged because it collides with `system.builtin`. | -| `session` | schema | A persistent schema literally named `session` is allowed but discouraged because it collides with `system.session`. | +| `system` | catalog | Virtual catalog hosting `system.builtin` and `system.session`. Spark does not load `system` through the v2 catalog API; setting `spark.sql.catalog.system = ...` is unsupported and produces undefined results. The current catalog cannot be `system`. | +| `builtin` | schema | A persistent schema named `builtin` is allowed but discouraged because it collides with `system.builtin`. | +| `session` | schema | A persistent schema named `session` is allowed but discouraged because it collides with `system.session`. | An unqualified 2-part reference like `builtin.x` or `session.x` resolves to `system.builtin.x` / `system.session.x` if such an object exists, and otherwise falls back to the same name in the current catalog. So an object in a persistent `builtin` or `session` schema is shadowed only when an object of the same name exists in the corresponding system namespace. The shadowed object stays reachable via its fully qualified 3-part name (for example -`spark_catalog.session.x`). Set `spark.sql.legacy.persistentCatalogFirst` to `true` to flip the -preference: the current catalog is tried first and the system namespace becomes the fallback. +`spark_catalog.session.x`). Set `spark.sql.legacy.persistentCatalogFirst` to `true` to reverse +the preference: the current catalog is tried first and the system namespace becomes the fallback. The `system.builtin` and `system.session` namespaces are described in [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). Temporary objects in `system.session` are diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md index 729344f38aae1..3921615d702b2 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md @@ -37,8 +37,8 @@ The path can include two virtual namespaces in the `system` catalog: - `system.session` — temporary views, temporary functions, and session variables in the current session. -`SET PATH` is gated by `spark.sql.path.enabled`. When it is `false` (the default), `SET PATH` -raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. Unqualified resolution and +`SET PATH` is controlled by `spark.sql.path.enabled`. When it is `false` (the default), +`SET PATH` raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. Unqualified resolution and [`current_path()`](sql-ref-function-current-path.html) still use the default path. The initial value of `PATH` in a session is `DEFAULT_PATH`. `DEFAULT_PATH` is either the value of @@ -47,11 +47,11 @@ The initial value of `PATH` in a session is `DEFAULT_PATH`. `DEFAULT_PATH` is ei `spark.sql.defaultPath`. See the [`DEFAULT_PATH` parameter](#parameters) for the exact derivation rules. -A `SET PATH` is scoped to the current session and is lost when the session ends. To re-apply -the current default path mid-session, run `SET PATH = DEFAULT_PATH`. (This stores a snapshot of -`DEFAULT_PATH` at the moment of the statement; later changes to `spark.sql.defaultPath` are not -picked up automatically.) Cloned sessions inherit the parent's path at clone time; later changes -in the child do not propagate back. +The effect of `SET PATH` is scoped to the current session and is lost when the session ends. To +re-apply the current default path mid-session, run `SET PATH = DEFAULT_PATH`. (This stores a +snapshot of `DEFAULT_PATH` at the moment of the statement; later changes to +`spark.sql.defaultPath` are not picked up automatically.) Cloned sessions inherit the parent's +path at clone time; later changes in the child do not propagate back. Persistent views and SQL UDFs capture the path at `CREATE` time into the object's metadata. Each invocation resolves the body against that frozen path, not the invoker's current path; @@ -88,7 +88,7 @@ path_element Static duplicates inside the conf are tolerated (unlike interactive `SET PATH`, which rejects them) so a later `USE SCHEMA` cannot turn a previously valid default into a runtime error. A `DEFAULT_PATH` token inside the conf value resolves to the spark-built-in default - below (cycle break) rather than recursing. + below to avoid a cycle, rather than recursing. 2. If `spark.sql.defaultPath` is empty (the factory setting), the spark-built-in default applies: `system.builtin`, `system.session`, and the current schema @@ -178,7 +178,7 @@ path_element > SELECT current_path(); spark_catalog.default,system.builtin,spark_catalog.analytics --- CURRENT_SCHEMA is a live marker; USE SCHEMA updates the effective path. +-- CURRENT_SCHEMA is re-evaluated each time; USE SCHEMA updates the effective path. > SET PATH = CURRENT_SCHEMA, system.builtin; > USE spark_catalog.finance; > SELECT current_path(); @@ -202,7 +202,7 @@ path_element > SELECT to_iso_date(DATE'2026-05-22'); 2026-05-22 --- Drop system.session from the path to force temporary objects to be spelled out. +-- Drop system.session from the path to force temporary objects to be qualified explicitly. > CREATE TEMPORARY FUNCTION revenue() RETURNS INT RETURN 42; > SELECT revenue(); -- resolves via the default path 42 From d04ce7afb903e15e6edba1703a7b767cd2e8e6f4 Mon Sep 17 00:00:00 2001 From: Serge Rielau Date: Fri, 22 May 2026 14:08:50 +0200 Subject: [PATCH 6/6] [SPARK-56984][DOCS][FOLLOWUP] Address review comments from @cloud-fan Seven items from the PR review: 1. `sql-ref-syntax-ddl-create-sql-function.md`: the new frozen-path paragraphs were inserted inside the bulleted list of disallowed expression types, orphaning the `Row producing functions such as explode` bullet (Kramdown rendered it as two separate lists with body paragraphs in between). Moved the paragraphs after the list. 2. `sql-ref-syntax-aux-conf-mgmt-set-path.md`: the `schema_name` parameter previously said "Both parts are required" (i.e. exactly two), but the implementation accepts multi-level namespaces (`SetPathSuite` test "multi-level namespace (3+ parts) is accepted", and the `INVALID_SQL_PATH_SCHEMA_REFERENCE` error message itself documents the allowance). Updated to "`catalog.schema` or, for catalogs with multi-level namespaces, `catalog.ns1.ns2...`. At least two parts are required." The grammar block now reads `catalog_name . namespace [ . namespace ... ]`. 3. `sql-migration-guide.md`: the PATH-feature bullet omitted session variables (a documented PATH consumer with a dedicated test) and opened with "Spark 4.2 introduces..." while every other bullet in the section opens with "Since Spark 4.2,". Both fixed. 4. `sql-ref-function-current-path.md`: a stray "persisted view" in a code comment; the rest of the PR uses "persistent view". Fixed. 5. `sql-ref-identifier.md`: the canonical Reserved system names section now introduces the term "mini-path" in prose so that cross-page link text from `name-resolution.md` lands somewhere that defines it. 6. `sql-ref-syntax-aux-describe-table.md`: the same `cust_id` column appeared as `{"name": "int"}` in the view example and `{"name": "integer"}` in the legacy `customer` example. The doc's own JSON schema block specifies `int` for `IntegerType`, so the legacy example was wrong; aligned both to `int`. 7. `sql-ref-name-resolution.md`: the `abs` shadowing example created a 0-arg temp `abs()` and then called `abs(-5)` (one arg), which was a signature mismatch rather than a shadow. Rewrote with a matching `abs(x INT)` temp and an explicit `SET PATH = system.session, system.builtin, spark_catalog.default` so the unqualified `abs(-5)` actually resolves to the temp; the example then demonstrates `system.builtin.abs(-5)` reaching around the shadow. Documentation only; no behavior changes. --- docs/sql-migration-guide.md | 2 +- docs/sql-ref-function-current-path.md | 2 +- docs/sql-ref-identifier.md | 10 ++++----- docs/sql-ref-name-resolution.md | 22 ++++++++++++++----- docs/sql-ref-syntax-aux-conf-mgmt-set-path.md | 5 +++-- docs/sql-ref-syntax-aux-describe-table.md | 2 +- .../sql-ref-syntax-ddl-create-sql-function.md | 2 +- 7 files changed, 28 insertions(+), 17 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index c50124bdbc662..265e933c09123 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -33,7 +33,7 @@ license: | - Since Spark 4.2, a new default method `mergeWith` has been added to the `CustomTaskMetric` interface. The default implementation sums the two metric values, which is correct for count-type metrics. Data source connector implementations that report non-additive metrics (e.g., maximum, average, compression ratio, or gauge values) must override `mergeWith` to provide correct merge semantics. - Since Spark 4.2, the virtual `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and functions injected through `SparkSessionExtensions`; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. As a result, 2-part references like `builtin.func()` and `session.func()` now follow a mini-path that tries the system namespace first and the current catalog second, so a persistent schema named `builtin` or `session` is no longer reached by `builtin.func()` / `session.func()` when the system namespace contains an object of the same name. To restore the previous behavior (current catalog first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `spark_catalog.session.x`). See [Reserved system names](sql-ref-identifier.html#reserved-system-names). - Since Spark 4.2, `CREATE TEMPORARY VIEW`, `CREATE TEMPORARY FUNCTION`, and the corresponding `DROP` statements accept the `session` and `system.session` qualifiers on the object name (in addition to the previously supported unqualified form); for example, `CREATE TEMPORARY VIEW system.session.v AS ...` and `DROP TEMPORARY FUNCTION session.f` are now valid. Any other qualifier on a temporary object is rejected with `INVALID_TEMP_OBJ_QUALIFIER`. -- Spark 4.2 introduces the SQL standard `PATH` feature: the `SET PATH` statement, the `current_path()` function, the path-based resolution of unqualified routines / tables / views, and the configurations `spark.sql.path.enabled` (default `false`) and `spark.sql.defaultPath`. The feature is opt-in; when `spark.sql.path.enabled` is `false`, unqualified resolution falls back to a fixed default path and `SET PATH` is rejected with `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) and [Name Resolution](sql-ref-name-resolution.html). +- Since Spark 4.2, the SQL standard `PATH` feature is available: the `SET PATH` statement, the `current_path()` function, path-based resolution of unqualified routines, tables, views, and session variables, and the configurations `spark.sql.path.enabled` (default `false`) and `spark.sql.defaultPath`. The feature is opt-in; when `spark.sql.path.enabled` is `false`, unqualified resolution falls back to a fixed default path and `SET PATH` is rejected with `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) and [Name Resolution](sql-ref-name-resolution.html). ## Upgrading from Spark SQL 4.0 to 4.1 diff --git a/docs/sql-ref-function-current-path.md b/docs/sql-ref-function-current-path.md index 06cb42932b797..afe0d4f6ba54d 100644 --- a/docs/sql-ref-function-current-path.md +++ b/docs/sql-ref-function-current-path.md @@ -69,7 +69,7 @@ re-issuing `SET PATH`. > SELECT current_path(); spark_catalog.default,system.builtin --- Inside a persisted view or SQL function body, current_path() returns the invoker's path, +-- Inside a persistent view or SQL function body, current_path() returns the invoker's path, -- not the frozen path captured at creation time. > SET PATH = spark_catalog.default, system.builtin; > CREATE VIEW v_path AS SELECT current_path() AS p; diff --git a/docs/sql-ref-identifier.md b/docs/sql-ref-identifier.md index 609a4cfa29118..14d783994787e 100644 --- a/docs/sql-ref-identifier.md +++ b/docs/sql-ref-identifier.md @@ -63,11 +63,11 @@ catalog or schema names. | `builtin` | schema | A persistent schema named `builtin` is allowed but discouraged because it collides with `system.builtin`. | | `session` | schema | A persistent schema named `session` is allowed but discouraged because it collides with `system.session`. | -An unqualified 2-part reference like `builtin.x` or `session.x` resolves to -`system.builtin.x` / `system.session.x` if such an object exists, and otherwise falls back to -the same name in the current catalog. So an object in a persistent `builtin` or `session` -schema is shadowed only when an object of the same name exists in the corresponding system -namespace. The shadowed object stays reachable via its fully qualified 3-part name (for example +An unqualified 2-part reference like `builtin.x` or `session.x` walks a small **mini-path** to +choose the implicit catalog: by default it resolves to `system.builtin.x` / `system.session.x` +if such an object exists, and otherwise falls back to the same name in the current catalog. So +an object in a persistent `builtin` or `session` schema is shadowed only when an object of the +same name exists in the corresponding system namespace. The shadowed object stays reachable via its fully qualified 3-part name (for example `spark_catalog.session.x`). Set `spark.sql.legacy.persistentCatalogFirst` to `true` to reverse the preference: the current catalog is tried first and the system namespace becomes the fallback. diff --git a/docs/sql-ref-name-resolution.md b/docs/sql-ref-name-resolution.md index 42e9149041068..3d574e58a9ad2 100644 --- a/docs/sql-ref-name-resolution.md +++ b/docs/sql-ref-name-resolution.md @@ -436,17 +436,27 @@ effective search path, for example > SELECT spark_catalog.default.func(4, 3); 6 --- A built-in can always be reached by qualification, even when shadowed -> CREATE TEMPORARY FUNCTION abs() RETURNS INT RETURN 999; +-- A built-in can always be reached by qualification, even when shadowed. +-- Put system.session ahead of system.builtin so a matching temp `abs` shadows the built-in. +> SET PATH = system.session, system.builtin, spark_catalog.default; +> CREATE TEMPORARY FUNCTION abs(x INT) RETURNS INT RETURN x + 100; + +-- Unqualified abs(-5) resolves to the temp (-5 + 100 = 95). > SELECT abs(-5); + 95 + +-- system.builtin.abs and builtin.abs reach the built-in around the shadow. +> SELECT system.builtin.abs(-5); 5 -> SELECT session.abs(); - 999 > SELECT builtin.abs(-5); 5 -> SELECT system.builtin.abs(-5); - 5 + +-- session.abs reaches the temp explicitly. +> SELECT session.abs(-5); + 95 + > DROP TEMPORARY FUNCTION abs; +> SET PATH = DEFAULT_PATH; -- PATH controls unqualified routine lookup order > CREATE SCHEMA path_a; diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md index 3921615d702b2..6c5efd2443612 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md @@ -71,7 +71,7 @@ path_element PATH | CURRENT_SCHEMA | CURRENT_DATABASE | - catalog_name . schema_name } + catalog_name . namespace [ . namespace ... ] } ``` ### Parameters @@ -118,7 +118,8 @@ path_element * **`schema_name`** - An explicit catalog-qualified schema reference (`catalog.schema`). Both parts are required. + An explicit catalog-qualified schema reference (`catalog.schema` or, for catalogs with + multi-level namespaces, `catalog.ns1.ns2...`). At least two parts are required. The catalog and schema do not need to exist at the time of `SET PATH`; non-existent entries are silently skipped during name resolution. diff --git a/docs/sql-ref-syntax-aux-describe-table.md b/docs/sql-ref-syntax-aux-describe-table.md index b535110c8f9ef..cb84b0c7fefb2 100644 --- a/docs/sql-ref-syntax-aux-describe-table.md +++ b/docs/sql-ref-syntax-aux-describe-table.md @@ -284,7 +284,7 @@ DESC FORMATTED customer AS JSON; "schema_name": "default", "namespace": ["default"], "columns": [ - {"name": "cust_id", "type": {"name": "integer"}, "nullable": true}, + {"name": "cust_id", "type": {"name": "int"}, "nullable": true}, {"name": "name", "type": {"name": "string"}, "comment": "Short name", "nullable": true}, {"name": "state", "type": {"name": "varchar", "length": 20}, "nullable": true} ], diff --git a/docs/sql-ref-syntax-ddl-create-sql-function.md b/docs/sql-ref-syntax-ddl-create-sql-function.md index f1f1c8ce851a0..19f3e120f070f 100644 --- a/docs/sql-ref-syntax-ddl-create-sql-function.md +++ b/docs/sql-ref-syntax-ddl-create-sql-function.md @@ -140,6 +140,7 @@ characteristic - [Aggregate functions](sql-ref-functions-builtin.md#aggregate-functions) - [Window functions](sql-ref-functions-builtin.md#analytic-window-functions) - [Ranking functions](sql-ref-functions-builtin.md#ranking-window-functions) + - Row producing functions such as `explode` A persistent SQL UDF cannot reference temporary views, temporary functions, or session variables. @@ -149,7 +150,6 @@ characteristic `current_schema()` and `current_path()` inside the body still return the invoker's context. Use [DESCRIBE FUNCTION EXTENDED](sql-ref-syntax-aux-describe-function.html) to inspect the captured path. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). - - Row producing functions such as `explode` Within the body of the function you can refer to parameter by its unqualified name or by qualifying the parameter with the function name.