diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 66531397d2cc1..265e933c09123 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -31,6 +31,9 @@ license: | - Since Spark 4.2, Spark enables order-independent checksums for shuffle outputs by default to detect data inconsistencies during indeterminate shuffle stage retries. If a checksum mismatch is detected, Spark rolls back and re-executes all succeeding stages that depend on the shuffle output. If rolling back is not possible for some succeeding stages, the job will fail. To restore the previous behavior, set `spark.sql.shuffle.orderIndependentChecksum.enabled` and `spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch` to `false`. - Since Spark 4.2, support for Derby JDBC datasource is deprecated. - Since Spark 4.2, a new default method `mergeWith` has been added to the `CustomTaskMetric` interface. The default implementation sums the two metric values, which is correct for count-type metrics. Data source connector implementations that report non-additive metrics (e.g., maximum, average, compression ratio, or gauge values) must override `mergeWith` to provide correct merge semantics. +- Since Spark 4.2, the virtual `system` catalog hosts the new `system.builtin` and `system.session` namespaces. `system.builtin` exposes built-in functions and functions injected through `SparkSessionExtensions`; `system.session` exposes temporary views, temporary functions, and session variables created in the current session. As a result, 2-part references like `builtin.func()` and `session.func()` now follow a mini-path that tries the system namespace first and the current catalog second, so a persistent schema named `builtin` or `session` is no longer reached by `builtin.func()` / `session.func()` when the system namespace contains an object of the same name. To restore the previous behavior (current catalog first), set `spark.sql.legacy.persistentCatalogFirst` to `true`. Persistent schemas with these names are still allowed but should be reached with an explicit catalog prefix (for example, `spark_catalog.session.x`). See [Reserved system names](sql-ref-identifier.html#reserved-system-names). +- Since Spark 4.2, `CREATE TEMPORARY VIEW`, `CREATE TEMPORARY FUNCTION`, and the corresponding `DROP` statements accept the `session` and `system.session` qualifiers on the object name (in addition to the previously supported unqualified form); for example, `CREATE TEMPORARY VIEW system.session.v AS ...` and `DROP TEMPORARY FUNCTION session.f` are now valid. Any other qualifier on a temporary object is rejected with `INVALID_TEMP_OBJ_QUALIFIER`. +- Since Spark 4.2, the SQL standard `PATH` feature is available: the `SET PATH` statement, the `current_path()` function, path-based resolution of unqualified routines, tables, views, and session variables, and the configurations `spark.sql.path.enabled` (default `false`) and `spark.sql.defaultPath`. The feature is opt-in; when `spark.sql.path.enabled` is `false`, unqualified resolution falls back to a fixed default path and `SET PATH` is rejected with `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) and [Name Resolution](sql-ref-name-resolution.html). ## Upgrading from Spark SQL 4.0 to 4.1 diff --git a/docs/sql-ref-function-current-path.md b/docs/sql-ref-function-current-path.md new file mode 100644 index 0000000000000..afe0d4f6ba54d --- /dev/null +++ b/docs/sql-ref-function-current-path.md @@ -0,0 +1,85 @@ +--- +layout: global +title: current_path function +displayTitle: current_path function +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +Returns the effective SQL Path for the current session as a comma-separated string of +qualified namespace names. See [`SET PATH`](sql-ref-syntax-aux-conf-mgmt-set-path.html) for a +description of what the path is, how to enable it, and how to change it, and +[Name Resolution](sql-ref-name-resolution.html) for how the path drives unqualified name +resolution. + +### Syntax + +```sql +current_path() +``` + +### Arguments + +This function takes no arguments. The parentheses may be omitted. + +### Returns + +A non-nullable `STRING`. Each path entry is written as a dotted name with backticks added only +where required by Spark's identifier rules. Entries are separated by a single comma. + +When the path contains the virtual `CURRENT_SCHEMA` marker, the marker is materialized as the +catalog-qualified current schema (`current_catalog.current_schema`) each time +`current_path()` is evaluated, so subsequent `USE SCHEMA` statements are reflected without +re-issuing `SET PATH`. + +### Examples + +```sql +> SELECT current_path(); + system.builtin,system.session,spark_catalog.default + +-- ANSI no-parens form returns the same value. +> SELECT CURRENT_PATH; + system.builtin,system.session,spark_catalog.default + +-- The output reflects the latest SET PATH. +> SET PATH = spark_catalog.default, system.builtin; +> SELECT current_path(); + spark_catalog.default,system.builtin + +-- CURRENT_SCHEMA on the path is re-evaluated on every call. +> SET PATH = CURRENT_SCHEMA, system.builtin; +> USE spark_catalog.finance; +> SELECT current_path(); + spark_catalog.finance,system.builtin +> USE spark_catalog.default; +> SELECT current_path(); + spark_catalog.default,system.builtin + +-- Inside a persistent view or SQL function body, current_path() returns the invoker's path, +-- not the frozen path captured at creation time. +> SET PATH = spark_catalog.default, system.builtin; +> CREATE VIEW v_path AS SELECT current_path() AS p; +> SET PATH = spark_catalog.other, system.builtin; +> SELECT * FROM v_path; + spark_catalog.other,system.builtin +``` + +### Related Statements + +* [Name Resolution](sql-ref-name-resolution.html) +* [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) +* [Built-in Functions](sql-ref-functions-builtin.html) diff --git a/docs/sql-ref-functions-builtin.md b/docs/sql-ref-functions-builtin.md index 1912a1e577d59..22e52d0500c53 100644 --- a/docs/sql-ref-functions-builtin.md +++ b/docs/sql-ref-functions-builtin.md @@ -17,6 +17,10 @@ license: | limitations under the License. --- +All built-in functions live in the virtual schema `system.builtin`. They can always be referenced +unambiguously by their fully qualified name (for example `system.builtin.abs`), regardless of any +user-defined function that may share the same name. + ### Aggregate Functions {% include_api_gen generated-agg-funcs-table.html %} #### Examples diff --git a/docs/sql-ref-identifier.md b/docs/sql-ref-identifier.md index 7aca08ea9fd8d..14d783994787e 100644 --- a/docs/sql-ref-identifier.md +++ b/docs/sql-ref-identifier.md @@ -52,6 +52,30 @@ An identifier is a string used to identify a database object such as a table, vi Any character from the character set. Use ` to escape special characters (e.g., `). +### Reserved system names + +`system`, `session`, and `builtin` have special meaning and should not be used as user-defined +catalog or schema names. + +| Name | Position | Notes | +| :--- | :------- | :---- | +| `system` | catalog | Virtual catalog hosting `system.builtin` and `system.session`. Spark does not load `system` through the v2 catalog API; setting `spark.sql.catalog.system = ...` is unsupported and produces undefined results. The current catalog cannot be `system`. | +| `builtin` | schema | A persistent schema named `builtin` is allowed but discouraged because it collides with `system.builtin`. | +| `session` | schema | A persistent schema named `session` is allowed but discouraged because it collides with `system.session`. | + +An unqualified 2-part reference like `builtin.x` or `session.x` walks a small **mini-path** to +choose the implicit catalog: by default it resolves to `system.builtin.x` / `system.session.x` +if such an object exists, and otherwise falls back to the same name in the current catalog. So +an object in a persistent `builtin` or `session` schema is shadowed only when an object of the +same name exists in the corresponding system namespace. The shadowed object stays reachable via its fully qualified 3-part name (for example +`spark_catalog.session.x`). Set `spark.sql.legacy.persistentCatalogFirst` to `true` to reverse +the preference: the current catalog is tried first and the system namespace becomes the fallback. + +The `system.builtin` and `system.session` namespaces are described in +[SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). Temporary objects in `system.session` are +documented under [CREATE VIEW](sql-ref-syntax-ddl-create-view.html) and +[CREATE FUNCTION (SQL)](sql-ref-syntax-ddl-create-sql-function.html). + ### Examples ```sql diff --git a/docs/sql-ref-name-resolution.md b/docs/sql-ref-name-resolution.md index 2532f05e164b3..3d574e58a9ad2 100644 --- a/docs/sql-ref-name-resolution.md +++ b/docs/sql-ref-name-resolution.md @@ -19,7 +19,7 @@ license: | limitations under the License. --- -Name resolution is the process by which [identifiers](sql-ref-identifier.html) are resolved to specific column-, field-, parameter-, or table-references. +Name resolution is the process by which [identifiers](sql-ref-identifier.html) are resolved to specific column-, field-, parameter-, table-, function-, or variable-references. ## Column, field, parameter, and variable resolution @@ -50,7 +50,7 @@ In detail, resolution of identifiers to a specific reference follows these rules 1. **Parameterless function reference** - If the identifier is unqualified and matches `current_user`, `current_date`, or `current_timestamp`: Resolve it as one of these functions. + If the identifier is unqualified and matches `current_user`, `current_date`, `current_time`, `current_timestamp`, or `current_path`: Resolve it as one of these functions. 1. **Column DEFAULT specification** @@ -137,7 +137,10 @@ In detail, resolution of identifiers to a specific reference follows these rules 1. **Session Variables** - 1. Match the identifier to a variable name. If the identifier is qualified, the qualifier must be `session` or `system.session`. + 1. Match the identifier to a session variable name. + If the identifier is qualified, the qualifier must be `session` or `system.session`. + If the identifier is unqualified, `system.session` must be present on the + [SQL Path](sql-ref-syntax-aux-conf-mgmt-set-path.html) (the default path includes it). 1. If the identifier is qualified, match to a field or map key of a variable following rule 1.c ### Limitations @@ -256,37 +259,54 @@ This restriction also applies to parameter references in SQL functions. frm.a lat.b func.c ``` -## Table and view resolution - -An identifier in table-reference can be any one of the following: +## Object name resolution -- Persistent table or view -- Common table expression (CTE) -- [Temporary view](sql-ref-syntax-ddl-create-view.html) +Tables, views, and functions follow the same resolution rule. It depends on how many parts the +identifier has. -Resolution of an identifier depends on whether it is qualified: +### Fully qualified (3 parts) — `catalog.schema.object` -- **Qualified** +The reference is unique and is looked up in `catalog.schema`. `system.builtin.object` identifies +a built-in function; `system.session.object` identifies a temporary view, function, or session +variable. - If the identifier is fully qualified with three parts: `catalog.schema.relation`, it is unique. +### Partially qualified (2 parts) — `schema.object` - If the identifier consists of two parts: `schema.relation`, it is further qualified with the result of `SELECT current_catalog()` to make it unique. +The identifier is qualified with `current_catalog` — producing +`current_catalog.schema.object` — unless the leading part is `session` (or `builtin`, for +functions). In that case Spark uses the +[mini-path](sql-ref-identifier.html#reserved-system-names) to choose the implicit catalog, +returning the first match: -- **Unqualified** +| `spark.sql.legacy.persistentCatalogFirst` | Mini-path tried in order | +| :-------------------------------------- | :----------------------- | +| `false` (default) | the system namespace (`system.session.x` / `system.builtin.x`), then the current catalog's `session.x` / `builtin.x` | +| `true` (legacy) | the current catalog's `session.x` / `builtin.x`, then the system namespace (`system.session.x` / `system.builtin.x`) | - 1. **Common table expression** +### Unqualified (1 part) — `object` - If the reference is within the scope of a `WITH` clause, match the identifier to a CTE starting with the immediately containing `WITH` clause and moving outwards from there. +In queries and DML, Spark walks the [SQL Path](sql-ref-syntax-aux-conf-mgmt-set-path.html) and +returns the first match. In DDL, the identifier is qualified with `current_catalog.current_schema`. - 1. **Temporary view** +> Note: persistent views and SQL UDFs capture the SQL Path at `CREATE` time. When the view or +> function is invoked, its body resolves names — tables, views, and functions — +> against that frozen path, not the invoker's current path. `current_schema()` and +> `current_path()` inside the body still return the invoker's context. See +> [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). - Match the identifier to any temporary view defined within the current session. +## Table and view resolution - 1. **Persisted table** +A table reference can be a persistent table or view, a temporary view, or a common table +expression (CTE). - Fully qualify the identifier by pre-pending the result of `SELECT current_catalog()` and `SELECT current_schema()` and look it up as a persistent relation. +Resolution follows [Object name resolution](#object-name-resolution), with one addition for +unqualified references: when the reference is inside a `WITH` clause, Spark first matches the +identifier against CTEs from the innermost `WITH` outward. If no CTE matches, Spark walks the +SQL Path. -If the relation cannot be resolved to any table, view, or CTE, Databricks raises a TABLE_OR_VIEW_NOT_FOUND error. +If the relation cannot be resolved, Spark raises `TABLE_OR_VIEW_NOT_FOUND`. The error includes +the effective search path, for example +`searchPath = [system.builtin, system.session, spark_catalog.default]`. ### Examples @@ -317,7 +337,13 @@ If the relation cannot be resolved to any table, view, or CTE, Databricks raises > SELECT c1 FROM rel; 2 --- Temporary views cannot be qualified, so qualifiecation resolved to the table: +-- A temporary view can be qualified with `session` or `system.session`: +> SELECT c1 FROM session.rel; + 2 +> SELECT c1 FROM system.session.rel; + 2 + +-- Other 2-part qualifications resolve to the persisted table: > SELECT c1 FROM default.rel; 1 @@ -343,45 +369,34 @@ If the relation cannot be resolved to any table, view, or CTE, Databricks raises SELECT 1), cte; [TABLE_OR_VIEW_NOT_FOUND] The table or view `cte` cannot be found. -``` - -## Function resolution - -A function reference is recognized by the mandatory trailing set of parentheses. - -It can resolve to: - -- A builtin function provided by Spark, -- A temporary user defined function scoped to the current session, or -- A persistent user defined function. -Resolution of a function name depends on whether it is qualified: +-- PATH drives unqualified relation lookup order +> CREATE SCHEMA db_a; +> CREATE SCHEMA db_b; +> CREATE TABLE db_a.t USING parquet AS SELECT 1 AS v; +> CREATE TABLE db_b.t USING parquet AS SELECT 2 AS v; -- **Qualified** - - If the name is fully qualified with three parts: `catalog.schema.function`, it is unique. - - If the name consists of two parts: `schema.function`, it is further qualified with the result of `SELECT current_catalog()` to make it unique. - - The function is then looked up in the catalog. - -- **Unqualified** - - For unqualified function names Spark follows a fixed order of precedence (`PATH`): - - 1. **Builtin function** - - If a function by this name exists among the set of built-in functions, that function is chosen. +> SET PATH = spark_catalog.db_a, spark_catalog.db_b, system.builtin; +> SELECT v FROM t; + 1 - 1. **Temporary function** +> SET PATH = spark_catalog.db_b, spark_catalog.db_a, system.builtin; +> SELECT v FROM t; + 2 - If a function by this name exists among the set of temporary functions, that function is chosen. +-- Three-part `system.session.x` references the temporary scope only: +> SELECT * FROM system.session.no_such_view; + [TABLE_OR_VIEW_NOT_FOUND] ... `system`.`session`.`no_such_view` ... +``` - 1. **Persisted function** +## Function resolution - Fully qualify the function name by pre-pending the result of `SELECT current_catalog()` and `SELECT current_schema()` and look it up as a persistent function. +A function reference is recognized by the trailing parentheses, and follows +[Object name resolution](#object-name-resolution). -If the function cannot be resolved Spark raises an `UNRESOLVED_ROUTINE` error. +If the function cannot be resolved, Spark raises `UNRESOLVED_ROUTINE`. The error includes the +effective search path, for example +`searchPath = [system.builtin, system.session, spark_catalog.default]`. ### Examples @@ -420,4 +435,45 @@ If the function cannot be resolved Spark raises an `UNRESOLVED_ROUTINE` error. -- To resolve the persistent function it now needs qualification > SELECT spark_catalog.default.func(4, 3); 6 + +-- A built-in can always be reached by qualification, even when shadowed. +-- Put system.session ahead of system.builtin so a matching temp `abs` shadows the built-in. +> SET PATH = system.session, system.builtin, spark_catalog.default; +> CREATE TEMPORARY FUNCTION abs(x INT) RETURNS INT RETURN x + 100; + +-- Unqualified abs(-5) resolves to the temp (-5 + 100 = 95). +> SELECT abs(-5); + 95 + +-- system.builtin.abs and builtin.abs reach the built-in around the shadow. +> SELECT system.builtin.abs(-5); + 5 +> SELECT builtin.abs(-5); + 5 + +-- session.abs reaches the temp explicitly. +> SELECT session.abs(-5); + 95 + +> DROP TEMPORARY FUNCTION abs; +> SET PATH = DEFAULT_PATH; + +-- PATH controls unqualified routine lookup order +> CREATE SCHEMA path_a; +> CREATE SCHEMA path_b; +> CREATE FUNCTION path_a.pick() RETURNS INT RETURN 10; +> CREATE FUNCTION path_b.pick() RETURNS INT RETURN 20; + +> SET PATH = spark_catalog.path_a, spark_catalog.path_b, system.builtin; +> SELECT pick(); + 10 + +> SET PATH = spark_catalog.path_b, spark_catalog.path_a, system.builtin; +> SELECT pick(); + 20 + +-- Unresolved routine lists the effective search path +> SET PATH = spark_catalog.default, system.builtin; +> SELECT does_not_exist(); + [UNRESOLVED_ROUTINE] ... searchPath: [`spark_catalog`.`default`, `system`.`builtin`] ... ``` diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md new file mode 100644 index 0000000000000..6c5efd2443612 --- /dev/null +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set-path.md @@ -0,0 +1,239 @@ +--- +layout: global +title: SET PATH +displayTitle: SET PATH +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +### Description + +`SET PATH` changes the **SQL Path** of the current session. + +The SQL Path is an ordered list of catalog-qualified schema names that Spark walks when +resolving unqualified references to functions, tables, views, and session variables in queries +and DML (`SELECT`, `INSERT`, `UPDATE`, `DELETE`, `MERGE`). The first match wins. DDL +(`CREATE TABLE`, `CREATE VIEW`, `CREATE FUNCTION`, `DROP`, `ALTER`, ...) resolves unqualified +object names against `current_catalog.current_schema`, not the path; so `CREATE TABLE t` always +creates `t` in the current schema regardless of the path. + +The path can include two virtual namespaces in the `system` catalog: + +- `system.builtin` — built-in functions, including those injected by + `SparkSessionExtensions`. +- `system.session` — temporary views, temporary functions, and session variables in the + current session. + +`SET PATH` is controlled by `spark.sql.path.enabled`. When it is `false` (the default), +`SET PATH` raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. Unqualified resolution and +[`current_path()`](sql-ref-function-current-path.html) still use the default path. + +The initial value of `PATH` in a session is `DEFAULT_PATH`. `DEFAULT_PATH` is either the value of +`spark.sql.defaultPath`, or, when that configuration is empty, a built-in value composed of +`system.builtin`, `system.session`, and the current schema. To override, set +`spark.sql.defaultPath`. See the [`DEFAULT_PATH` parameter](#parameters) for the exact derivation +rules. + +The effect of `SET PATH` is scoped to the current session and is lost when the session ends. To +re-apply the current default path mid-session, run `SET PATH = DEFAULT_PATH`. (This stores a +snapshot of `DEFAULT_PATH` at the moment of the statement; later changes to +`spark.sql.defaultPath` are not picked up automatically.) Cloned sessions inherit the parent's +path at clone time; later changes in the child do not propagate back. + +Persistent views and SQL UDFs capture the path at `CREATE` time into the object's metadata. +Each invocation resolves the body against that frozen path, not the invoker's current path; +`current_schema()` and `current_path()` inside the body still return the invoker's context. + +The leading names `session` and `builtin` have special meaning in 2-part references; see +[Reserved system names](sql-ref-identifier.html#reserved-system-names). + +### Syntax + +```sql +SET PATH = path_element [ , ... ] + +path_element + { DEFAULT_PATH | + SYSTEM_PATH | + PATH | + CURRENT_SCHEMA | + CURRENT_DATABASE | + catalog_name . namespace [ . namespace ... ] } +``` + +### Parameters + +* **`DEFAULT_PATH`** + + Expands to the session's default path. The default path has two layers: + + 1. If `spark.sql.defaultPath` is set to a non-empty value, that value is parsed using the same + grammar as `SET PATH` (with one restriction: the `PATH` keyword is not allowed inside the + conf value, since it would be self-referential). + + The conf value is validated for syntax at the time it is set; an invalid value is rejected. + Static duplicates inside the conf are tolerated (unlike interactive `SET PATH`, which + rejects them) so a later `USE SCHEMA` cannot turn a previously valid default into a runtime + error. A `DEFAULT_PATH` token inside the conf value resolves to the spark-built-in default + below to avoid a cycle, rather than recursing. + + 2. If `spark.sql.defaultPath` is empty (the factory setting), the spark-built-in default + applies: `system.builtin`, `system.session`, and the current schema + (`current_catalog.current_schema`), in that order. + + To change the default path, set `spark.sql.defaultPath` via any of the usual mechanisms + (`SET spark.sql.defaultPath = ...` at runtime, `--conf` on `spark-submit`, `SparkConf`, or + `spark-defaults.conf`); clear it with `RESET spark.sql.defaultPath` to return to the + spark-built-in default. + +* **`SYSTEM_PATH`** + + Expands to the two system namespaces, `system.builtin` and `system.session`. + +* **`PATH`** + + Expands to the **current** value of the SQL Path. Useful for appending entries without + re-typing them, for example `SET PATH = PATH, spark_catalog.analytics`. + `PATH` is not allowed in the value of `spark.sql.defaultPath` (it would create a cycle). + +* **`CURRENT_SCHEMA`** / **`CURRENT_DATABASE`** + + A virtual marker that resolves to the catalog-qualified current schema + (`current_catalog.current_schema`) every time the path is consulted. This means subsequent + `USE SCHEMA` statements are picked up without re-issuing `SET PATH`. + `CURRENT_DATABASE` is a synonym for `CURRENT_SCHEMA`. + +* **`schema_name`** + + An explicit catalog-qualified schema reference (`catalog.schema` or, for catalogs with + multi-level namespaces, `catalog.ns1.ns2...`). At least two parts are required. + The catalog and schema do not need to exist at the time of `SET PATH`; non-existent entries + are silently skipped during name resolution. + + Identifier quoting follows the usual rules. Backtick-quoted parts that contain a dot are + preserved, for example ``spark_catalog.`sch.b` ``. + +### Semantics + +* Setting the path takes effect immediately. +* Identifier case is preserved in storage and in `current_path()` output. +* Duplicate entries are detected after expansion and raise `DUPLICATE_SQL_PATH_ENTRY`. + Comparisons honor the session's case sensitivity setting. Because `CURRENT_DATABASE` is an + alias for `CURRENT_SCHEMA`, listing both is flagged as a duplicate. + +### Error conditions + +| Condition | Cause | +| :-------- | :---- | +| `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED` | `SET PATH` was issued while `spark.sql.path.enabled` is `false`. | +| `INVALID_SQL_PATH_SCHEMA_REFERENCE` | A `schema_name` was given with fewer than two parts. | +| `DUPLICATE_SQL_PATH_ENTRY` | Two entries collapsed to the same concrete namespace after expansion. | + +### Examples + +```sql +-- Enable the feature first; the default is false. +> SET spark.sql.path.enabled = true; + +-- Observe the default path. +> SELECT current_path(); + system.builtin,system.session,spark_catalog.default + +-- Replace the path with explicit entries. +> SET PATH = spark_catalog.default, system.builtin; +> SELECT current_path(); + spark_catalog.default,system.builtin + +-- Identifier case is preserved. +> SET PATH = Spark_Catalog.Default, System.Builtin; +> SELECT current_path(); + Spark_Catalog.Default,System.Builtin + +-- Backtick-quoted parts that contain a dot round-trip with quoting. +> SET PATH = spark_catalog.`sch.b`, system.builtin; +> SELECT current_path(); + spark_catalog.`sch.b`,system.builtin + +-- DEFAULT_PATH and SYSTEM_PATH shortcuts. +> SET PATH = DEFAULT_PATH; +> SET PATH = SYSTEM_PATH; +> SELECT current_path(); + system.builtin,system.session + +-- Append an entry by referring to the current path. +> SET PATH = spark_catalog.default, system.builtin; +> SET PATH = PATH, spark_catalog.analytics; +> SELECT current_path(); + spark_catalog.default,system.builtin,spark_catalog.analytics + +-- CURRENT_SCHEMA is re-evaluated each time; USE SCHEMA updates the effective path. +> SET PATH = CURRENT_SCHEMA, system.builtin; +> USE spark_catalog.finance; +> SELECT current_path(); + spark_catalog.finance,system.builtin +> USE spark_catalog.default; +> SELECT current_path(); + spark_catalog.default,system.builtin + +-- DEFAULT_PATH can be customized via the conf. +> SET spark.sql.defaultPath = system.session, system.builtin, current_schema; +> SET PATH = DEFAULT_PATH; +> SELECT current_path(); + system.session,system.builtin,spark_catalog.default +> RESET spark.sql.defaultPath; + +-- Append a schema of shared UDFs so callers do not have to qualify them. +> CREATE SCHEMA spark_catalog.shared_udfs; +> CREATE FUNCTION spark_catalog.shared_udfs.to_iso_date(d DATE) RETURNS STRING + RETURN date_format(d, 'yyyy-MM-dd'); +> SET PATH = PATH, spark_catalog.shared_udfs; +> SELECT to_iso_date(DATE'2026-05-22'); + 2026-05-22 + +-- Drop system.session from the path to force temporary objects to be qualified explicitly. +> CREATE TEMPORARY FUNCTION revenue() RETURNS INT RETURN 42; +> SELECT revenue(); -- resolves via the default path + 42 +> SET PATH = system.builtin, current_schema; +> SELECT revenue(); -- now must be qualified + [UNRESOLVED_ROUTINE] `revenue` ... +> SELECT session.revenue(); + 42 + +-- Error cases. +> SET PATH = spark_catalog.default, spark_catalog.default; + [DUPLICATE_SQL_PATH_ENTRY] + +> SET PATH = my_schema_no_catalog; + [INVALID_SQL_PATH_SCHEMA_REFERENCE] + +-- PATH is rejected as a value of the DEFAULT_PATH conf (would cycle). +> SET spark.sql.defaultPath = PATH, system.builtin; + [Error: invalid value] + +-- SET PATH is rejected when the feature is disabled. +> SET spark.sql.path.enabled = false; +> SET PATH = spark_catalog.default; + [UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED] +``` + +### Related Statements + +* [Name Resolution](sql-ref-name-resolution.html) +* [`current_path` function](sql-ref-function-current-path.html) +* [SET](sql-ref-syntax-aux-conf-mgmt-set.html) +* [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) +* [USE DATABASE](sql-ref-syntax-ddl-usedb.html) diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set.md b/docs/sql-ref-syntax-aux-conf-mgmt-set.md index 9e57a221f9688..396559ca48e74 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt-set.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set.md @@ -25,6 +25,8 @@ The SET command sets a property, returns the value of an existing property or re To set SQL variables defined with [DECLARE VARIABLE](sql-ref-syntax-ddl-declare-variable.html) use [SET VAR](sql-ref-syntax-aux-set-var.html). +To change the session SQL Path used for unqualified name resolution use [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). + ### Syntax ```sql @@ -72,3 +74,4 @@ SET spark.sql.variable.substitute; * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) * [SET VAR](sql-ref-syntax-aux-set-var.html) +* [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) diff --git a/docs/sql-ref-syntax-aux-conf-mgmt.md b/docs/sql-ref-syntax-aux-conf-mgmt.md index 3312bcb503500..6b809d4a94655 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt.md @@ -22,3 +22,4 @@ license: | * [SET](sql-ref-syntax-aux-conf-mgmt-set.html) * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) * [SET TIME ZONE](sql-ref-syntax-aux-conf-mgmt-set-timezone.html) + * [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) diff --git a/docs/sql-ref-syntax-aux-describe-function.md b/docs/sql-ref-syntax-aux-describe-function.md index 0c5a3d751a564..2da1b9466fc23 100644 --- a/docs/sql-ref-syntax-aux-describe-function.md +++ b/docs/sql-ref-syntax-aux-describe-function.md @@ -22,9 +22,15 @@ license: | ### Description `DESCRIBE FUNCTION` statement returns the basic metadata information of an -existing function. The metadata information includes the function name, implementing -class and the usage details. If the optional `EXTENDED` option is specified, the basic -metadata information is returned along with the extended usage information. +existing function. For built-in and external (Java/Hive) functions the output includes the +function name, implementing class, and usage details. For +[SQL user-defined functions](sql-ref-syntax-ddl-create-sql-function.html) the output describes +the function signature (input parameters, return type/columns) and, with `EXTENDED`, the +function body, characteristics, and the frozen +[SQL Path](sql-ref-syntax-aux-conf-mgmt-set-path.html) that was captured at creation time. + +If the optional `EXTENDED` option is specified, the basic metadata is returned along with the +extended information. ### Syntax @@ -36,12 +42,14 @@ metadata information is returned along with the extended usage information. * **function_name** - Specifies a name of an existing function in the system. The function name may be - optionally qualified with a database name. If `function_name` is qualified with - a database then the function is resolved from the user specified database, otherwise - it is resolved from the current database. + Specifies a name of an existing function. The function name follows the regular + [name resolution](sql-ref-name-resolution.html#function-resolution) rules: unqualified + names walk the SQL Path; 3-part names target the chosen `catalog.schema` directly + (including the system namespaces `system.builtin` and `system.session`); 2-part names that + lead with `builtin` or `session` follow a mini-path across the system namespace and the + current catalog. - **Syntax:** `[ database_name. ] function_name` + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` ### Examples @@ -102,6 +110,70 @@ DESC FUNCTION EXTENDED explode; | 10 | | 20 | +---------------------------------------------------------------+ + +-- Built-in functions can be qualified with `builtin` or `system.builtin`. +DESC FUNCTION system.builtin.abs; ++-------------------------------------------------------------------+ +|function_desc | ++-------------------------------------------------------------------+ +|Function: abs | +|Class: org.apache.spark.sql.catalyst.expressions.Abs | +|Usage: abs(expr) - Returns the absolute value of the numeric value.| ++-------------------------------------------------------------------+ + +-- Describe a SQL scalar UDF: the output uses the SQL function layout +-- (Function / Type / Input / Returns). +CREATE FUNCTION area(x DOUBLE, y DOUBLE) RETURNS DOUBLE RETURN x * y; +DESC FUNCTION area; ++-------------------------------+ +|function_desc | ++-------------------------------+ +|Function: spark_catalog.default.area| +|Type: SCALAR | +|Input: x DOUBLE | +| y DOUBLE | +|Returns: DOUBLE | ++-------------------------------+ + +-- Describe a SQL table UDF. +CREATE FUNCTION getemps(deptno INT) + RETURNS TABLE (id INT, name STRING) + RETURN SELECT id, name FROM employee WHERE employee.deptno = getemps.deptno; +DESC FUNCTION getemps; ++--------------------------------------+ +|function_desc | ++--------------------------------------+ +|Function: spark_catalog.default.getemps| +|Type: TABLE | +|Input: deptno INT | +|Returns: id INT | +| name STRING | ++--------------------------------------+ + +-- DESC FUNCTION EXTENDED for a SQL UDF adds the body, the characteristic clauses, +-- the captured SQL configs, the owner, the create time, and the frozen SQL Path. +SET PATH = spark_catalog.default, system.builtin; +CREATE FUNCTION frozen_fn() RETURNS INT + COMMENT 'demo function' + RETURN (SELECT MAX(id) FROM frozen_t); +DESC FUNCTION EXTENDED frozen_fn; ++-----------------------------------------------------------------+ +|function_desc | ++-----------------------------------------------------------------+ +|Function: spark_catalog.default.frozen_fn | +|Type: SCALAR | +|Input: () | +|Returns: INT | +|Comment: demo function | +|Deterministic:false | +|Data Access: READS SQL DATA | +|Configs: spark.sql.ansi.enabled=true | +| ... | +|Owner: | +|Create Time: Wed Apr 30 14:05:43 PDT 2026 | +|Body: (SELECT MAX(id) FROM frozen_t) | +|SQL Path: spark_catalog.default, system.builtin | ++-----------------------------------------------------------------+ ``` ### Related Statements @@ -109,3 +181,5 @@ DESC FUNCTION EXTENDED explode; * [DESCRIBE DATABASE](sql-ref-syntax-aux-describe-database.html) * [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html) * [DESCRIBE QUERY](sql-ref-syntax-aux-describe-query.html) +* [CREATE FUNCTION (SQL)](sql-ref-syntax-ddl-create-sql-function.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-aux-describe-table.md b/docs/sql-ref-syntax-aux-describe-table.md index 46d9432f5d072..cb84b0c7fefb2 100644 --- a/docs/sql-ref-syntax-aux-describe-table.md +++ b/docs/sql-ref-syntax-aux-describe-table.md @@ -105,6 +105,10 @@ to return the metadata pertaining to a partition or column respectively. "view_schema_mode": "", "view_catalog_and_namespace": "", "view_query_output_columns": ["col1", "col2"], + // SQL Path captured at the time of permanent view creation + "sql_path": [ + {"catalog_name": "", "namespace": [""]} + ], // Spark SQL configurations captured at the time of permanent view creation "view_creation_spark_configuration": { "conf1": "", @@ -272,8 +276,83 @@ DESCRIBE customer salesdb.customer.name; +---------+----------+ -- Returns the table metadata in JSON format. +-- (Formatted for readability; the actual output is on a single line.) DESC FORMATTED customer AS JSON; -{"table_name":"customer","catalog_name":"spark_catalog","schema_name":"default","namespace":["default"],"columns":[{"name":"cust_id","type":{"name":"integer"},"nullable":true},{"name":"name","type":{"name":"string"},"comment":"Short name","nullable":true},{"name":"state","type":{"name":"varchar","length":20},"nullable":true}],"location": "file:/tmp/salesdb.db/custom...","created_time":"2020-04-07T14:05:43Z","last_access":"UNKNOWN","created_by":"None","type":"MANAGED","provider":"parquet","partition_provider":"Catalog","partition_columns":["state"]} +{ + "table_name": "customer", + "catalog_name": "spark_catalog", + "schema_name": "default", + "namespace": ["default"], + "columns": [ + {"name": "cust_id", "type": {"name": "int"}, "nullable": true}, + {"name": "name", "type": {"name": "string"}, "comment": "Short name", "nullable": true}, + {"name": "state", "type": {"name": "varchar", "length": 20}, "nullable": true} + ], + "location": "file:/tmp/salesdb.db/custom...", + "created_time": "2020-04-07T14:05:43Z", + "last_access": "UNKNOWN", + "created_by": "None", + "type": "MANAGED", + "provider": "parquet", + "partition_provider": "Catalog", + "partition_columns": ["state"] +} + +-- DESCRIBE EXTENDED on a view emits view-specific rows. +SET PATH = spark_catalog.default, system.builtin; +CREATE VIEW recent_customers AS + SELECT cust_id, name FROM customer WHERE cust_id > 1000; + +DESCRIBE EXTENDED recent_customers; ++----------------------------+---------------------------------------+--------+ +| col_name| data_type| comment| ++----------------------------+---------------------------------------+--------+ +| cust_id| int| null| +| name| string| null| +| | | | +|# Detailed Table Information| | | +| Catalog | spark_catalog| | +| Database| default| | +| Table| recent_customers| | +| Type| VIEW| | +| View Text|SELECT cust_id, name FROM customer ... | | +| View Original Text|SELECT cust_id, name FROM customer ... | | +| View Schema Mode| COMPENSATION| | +| View Catalog and Namespace| spark_catalog.default | | +| View Query Output Columns| [`cust_id`, `name`] | | +| SQL Path| spark_catalog.default, system.builtin| | ++----------------------------+---------------------------------------+--------+ + +-- The same metadata in JSON form. +-- (Formatted for readability; the actual output is on a single line.) +DESCRIBE EXTENDED recent_customers AS JSON; +{ + "table_name": "recent_customers", + "catalog_name": "spark_catalog", + "schema_name": "default", + "namespace": ["default"], + "columns": [ + {"name": "cust_id", "type": {"name": "int"}, "nullable": true}, + {"name": "name", "type": {"name": "string", "collation": "UTF8_BINARY"}, "nullable": true} + ], + "created_time": "2026-05-22T10:00:00Z", + "last_access": "UNKNOWN", + "created_by": "Spark 4.2.0", + "type": "VIEW", + "collation": "UTF8_BINARY", + "view_text": "SELECT cust_id, name FROM customer WHERE cust_id > 1000", + "view_original_text": "SELECT cust_id, name FROM customer WHERE cust_id > 1000", + "view_schema_mode": "COMPENSATION", + "view_catalog_and_namespace": "spark_catalog.default", + "view_query_output_columns": ["cust_id", "name"], + "sql_path": [ + {"catalog_name": "spark_catalog", "namespace": ["default"]}, + {"catalog_name": "system", "namespace": ["builtin"]} + ], + "view_creation_spark_configuration": { + "spark.sql.ansi.enabled": "true" + } +} ``` ### Related Statements @@ -281,3 +360,4 @@ DESC FORMATTED customer AS JSON; * [DESCRIBE DATABASE](sql-ref-syntax-aux-describe-database.html) * [DESCRIBE QUERY](sql-ref-syntax-aux-describe-query.html) * [DESCRIBE FUNCTION](sql-ref-syntax-aux-describe-function.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index 9d8bf47844724..9125ca78dc9ee 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -38,6 +38,9 @@ CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name Specifies the name of the database to be created. + > Note: avoid naming a database `session` or `builtin`; see + > [Reserved system names](sql-ref-identifier.html#reserved-system-names). + * **IF NOT EXISTS** Creates a database with the given name if it does not exist. If a database with the same name already exists, nothing will happen. @@ -85,3 +88,4 @@ DESCRIBE DATABASE EXTENDED customer_db; * [DESCRIBE DATABASE](sql-ref-syntax-aux-describe-database.html) * [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-ddl-create-function.md b/docs/sql-ref-syntax-ddl-create-function.md index e0e2545f5ee3f..2565870494410 100644 --- a/docs/sql-ref-syntax-ddl-create-function.md +++ b/docs/sql-ref-syntax-ddl-create-function.md @@ -50,8 +50,9 @@ CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ] * **TEMPORARY** Indicates the scope of function being created. When `TEMPORARY` is specified, the - created function is valid and visible in the current session. No persistent - entry is made in the catalog for these kind of functions. + created function is valid and visible in the current session. Temporary functions live in the + per-session `system.session` namespace. No persistent entry is made in the catalog for these + kind of functions. * **IF NOT EXISTS** @@ -62,9 +63,19 @@ CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ] * **function_name** - Specifies a name of function to be created. The function name may be optionally qualified with a database name. + Specifies a name of function to be created. - **Syntax:** `[ database_name. ] function_name` + * For a **permanent** function the name may be optionally qualified with a database name + (or a catalog and database). If the name is not qualified the function is created in the + current schema. + + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` + + * For a **temporary** function the name may be optionally qualified with the session schema + (`session` or `system.session`). Any other qualifier is rejected with + `INVALID_TEMP_OBJ_QUALIFIER`. + + **Syntax:** `[ { session | system.session } . ] function_name` * **class_name** diff --git a/docs/sql-ref-syntax-ddl-create-sql-function.md b/docs/sql-ref-syntax-ddl-create-sql-function.md index 649cd895a1974..19f3e120f070f 100644 --- a/docs/sql-ref-syntax-ddl-create-sql-function.md +++ b/docs/sql-ref-syntax-ddl-create-sql-function.md @@ -58,7 +58,10 @@ characteristic - **TEMPORARY** - The scope of the function being created. When you specify `TEMPORARY`, the created function is valid and visible in the current session. No persistent entry is made in the catalog. + The scope of the function being created. When you specify `TEMPORARY`, the created function is + valid and visible in the current session. Temporary functions live in the per-session + `system.session` namespace and are dropped when the session ends. No persistent entry is made in + the catalog. - **IF NOT EXISTS** @@ -66,10 +69,23 @@ characteristic - **function_name** - A name for the function. For a permanent function, you can optionally qualify the function name, or it will be created under the current catalog and namespace. - If the name is not qualified the permanent function is created in the current schema. + A name for the function. - **Syntax:** `[ database_name. ] function_name` + * For a **permanent** function, you can optionally qualify the function name with a database name + (or a catalog and database). If the name is not qualified the permanent function is created in + the current schema. + + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` + + * For a **temporary** function, you can optionally qualify the function name with the session + schema (`session` or `system.session`). Any other qualifier — including + `system.builtin`, the current schema, or an arbitrary database name — is rejected with + `INVALID_TEMP_OBJ_QUALIFIER`. For example, `CREATE TEMPORARY FUNCTION session.f ...` and + `CREATE TEMPORARY FUNCTION system.session.f ...` are accepted. + + **Syntax:** `[ { session | system.session } . ] function_name` + + The function name must be unique among all routines (procedures and functions) in its schema. - **function_parameter** @@ -126,6 +142,15 @@ characteristic - [Ranking functions](sql-ref-functions-builtin.md#ranking-window-functions) - Row producing functions such as `explode` + A persistent SQL UDF cannot reference temporary views, temporary functions, or session + variables. + + The SQL Path in effect at `CREATE FUNCTION` time is captured into the function's metadata; the + body resolves against that frozen path on every invocation, not the invoker's current path. + `current_schema()` and `current_path()` inside the body still return the invoker's context. + Use [DESCRIBE FUNCTION EXTENDED](sql-ref-syntax-aux-describe-function.html) to inspect the + captured path. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). + Within the body of the function you can refer to parameter by its unqualified name or by qualifying the parameter with the function name. - **characteristic** @@ -296,8 +321,74 @@ characteristic Returns: INT ``` +### Create a temporary SQL function with a session qualifier + +```sql +-- Unqualified, `session`-qualified, and `system.session`-qualified names all create the same +-- temporary function in the per-session `system.session` namespace. +> CREATE TEMPORARY FUNCTION add_one(x INT) RETURNS INT RETURN x + 1; + +> CREATE OR REPLACE TEMPORARY FUNCTION session.add_one(x INT) RETURNS INT + RETURN x + 1; + +> CREATE OR REPLACE TEMPORARY FUNCTION system.session.add_one(x INT) RETURNS INT + RETURN x + 1; + +-- All three names refer to the same temporary function: +> SELECT add_one(1), session.add_one(1), system.session.add_one(1); + 2 2 2 + +-- DROP TEMPORARY FUNCTION accepts the same qualifiers: +> DROP TEMPORARY FUNCTION session.add_one; + +-- Any other qualifier on a TEMPORARY function is rejected. +> CREATE TEMPORARY FUNCTION mydb.bad_temp() RETURNS INT RETURN 1; + [INVALID_TEMP_OBJ_QUALIFIER] qualifier `mydb` is not allowed for temporary FUNCTION ... + +> CREATE TEMPORARY FUNCTION system.builtin.bad_temp() RETURNS INT RETURN 1; + [INVALID_TEMP_OBJ_QUALIFIER] qualifier `system`.`builtin` is not allowed for temporary FUNCTION ... +``` + +### Frozen SQL Path + +A SQL UDF captures the SQL Path that is in effect at `CREATE FUNCTION` time. The body resolves +against that frozen path on every invocation, even if the caller's session has set a different +PATH. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). + +```sql +> CREATE SCHEMA path_a; +> CREATE SCHEMA path_b; +> CREATE TABLE path_a.t USING parquet AS SELECT 10 AS id; +> CREATE TABLE path_b.t USING parquet AS SELECT 20 AS id; + +-- The PATH at CREATE FUNCTION time points at path_a, so unqualified `t` in the body binds to +-- path_a.t. +> SET PATH = spark_catalog.path_a, system.builtin; +> CREATE FUNCTION default.frozen_fn() RETURNS INT + RETURN (SELECT MAX(id) FROM t); + +-- Flip the live PATH. The function body still resolves `t` against the frozen path. +> SET PATH = spark_catalog.path_b, system.builtin; + +-- A bare query follows the LIVE path: +> SELECT MAX(id) FROM t; + 20 + +-- The function body follows its FROZEN path: +> SELECT default.frozen_fn(); + 10 + +-- DESCRIBE FUNCTION EXTENDED shows the captured path: +> DESC FUNCTION EXTENDED default.frozen_fn; + Function: spark_catalog.default.frozen_fn + ... + SQL Path: spark_catalog.path_a, system.builtin +``` + ### Related Statements * [SHOW FUNCTIONS](sql-ref-syntax-aux-show-functions.html) * [DESCRIBE FUNCTION](sql-ref-syntax-aux-describe-function.html) * [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html) +* [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-ddl-create-view.md b/docs/sql-ref-syntax-ddl-create-view.md index 2d832636b38fc..f6fc6c0e85c75 100644 --- a/docs/sql-ref-syntax-ddl-create-view.md +++ b/docs/sql-ref-syntax-ddl-create-view.md @@ -40,9 +40,11 @@ CREATE [ OR REPLACE ] [ [ GLOBAL ] TEMPORARY ] VIEW [ IF NOT EXISTS ] view_ident * **[ GLOBAL ] TEMPORARY** - TEMPORARY views are session-scoped and will be dropped when session ends - because it skips persisting the definition in the underlying metastore, if any. - GLOBAL TEMPORARY views are tied to a system preserved temporary database `global_temp`. + `TEMPORARY` views are session-scoped and are dropped when the session ends; + no entry is persisted in the underlying metastore. + Temporary views live in the per-session `system.session` namespace. + + `GLOBAL TEMPORARY` views are tied to the system-preserved temporary database `global_temp`. * **IF NOT EXISTS** @@ -51,9 +53,23 @@ CREATE [ OR REPLACE ] [ [ GLOBAL ] TEMPORARY ] VIEW [ IF NOT EXISTS ] view_ident * **view_identifier** - Specifies a view name, which may be optionally qualified with a database name. + Specifies a view name. + + * For a **persistent** view the name may be optionally qualified with a database name (or a + catalog and database). If the name is not qualified the view is created in the current + schema. + + **Syntax:** `[ catalog_name. ] [ database_name. ] view_name` - **Syntax:** `[ database_name. ] view_name` + * For a **temporary** view the name may be optionally qualified with the session schema + (`session` or `system.session`). Any other qualifier is rejected with + `INVALID_TEMP_OBJ_QUALIFIER`. For example, `CREATE TEMPORARY VIEW session.v ...` and + `CREATE TEMPORARY VIEW system.session.v ...` are accepted; `CREATE TEMPORARY VIEW mydb.v ...` + is not. + + **Syntax:** `[ { session | system.session } . ] view_name` + + The fully qualified view name must be unique within its schema. * **create_view_clauses** @@ -75,8 +91,16 @@ CREATE [ OR REPLACE ] [ [ GLOBAL ] TEMPORARY ] VIEW [ IF NOT EXISTS ] view_ident The default is `WITH SCHEMA COMPENSATION`. * **query** + A [SELECT](sql-ref-syntax-qry-select.html) statement that constructs the view from base tables or other views. + A persistent view cannot reference temporary views, temporary functions, or session variables. + + For a persistent view, the SQL Path in effect at `CREATE VIEW` time is captured into the view's + metadata; the body resolves against that frozen path on every reference, not the invoker's + current path. Use [DESCRIBE EXTENDED](sql-ref-syntax-aux-describe-table.html) to inspect the + captured path. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). + ### Examples ```sql @@ -98,8 +122,74 @@ CREATE OR REPLACE VIEW open_orders WITH SCHEMA EVOLUTION AS SELECT * FROM orders WHERE status = 'open'; ``` +### Create a temporary view with a session qualifier + +```sql +-- Unqualified, `session`-qualified, and `system.session`-qualified names all create the same +-- temporary view in the per-session `system.session` namespace. +CREATE TEMPORARY VIEW recent_orders + AS SELECT * FROM orders WHERE order_date > current_date - INTERVAL 7 DAYS; + +CREATE OR REPLACE TEMPORARY VIEW session.recent_orders + AS SELECT * FROM orders WHERE order_date > current_date - INTERVAL 7 DAYS; + +CREATE OR REPLACE TEMPORARY VIEW system.session.recent_orders + AS SELECT * FROM orders WHERE order_date > current_date - INTERVAL 7 DAYS; + +-- All three names address the same temporary view: +SELECT count(*) FROM recent_orders; +SELECT count(*) FROM session.recent_orders; +SELECT count(*) FROM system.session.recent_orders; + +-- DROP VIEW accepts the same qualifiers (there is no DROP TEMPORARY VIEW form): +DROP VIEW session.recent_orders; + +-- Any other qualifier on a TEMPORARY view is rejected. +CREATE TEMPORARY VIEW mydb.bad_temp AS SELECT 1; + [INVALID_TEMP_OBJ_QUALIFIER] qualifier `mydb` is not allowed for temporary VIEW ... + +CREATE TEMPORARY VIEW system.builtin.bad_temp AS SELECT 1; + [INVALID_TEMP_OBJ_QUALIFIER] qualifier `system`.`builtin` is not allowed for temporary VIEW ... +``` + +### Frozen SQL Path + +A persistent view captures the SQL Path that is in effect at `CREATE VIEW` time. The view body +resolves against that frozen path on every reference, even when the caller's session has set a +different PATH. See [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html). + +```sql +> CREATE SCHEMA views_a; +> CREATE SCHEMA views_b; +> CREATE TABLE views_a.t USING parquet AS SELECT 1 AS id; +> CREATE TABLE views_b.t USING parquet AS SELECT 2 AS id; + +-- The PATH at CREATE VIEW time points at views_a, so unqualified `t` in the view body binds to +-- views_a.t. +> SET PATH = spark_catalog.views_a, system.builtin; +> CREATE VIEW default.v_frozen AS SELECT id FROM t; + +-- Flip the live PATH. The view body still resolves `t` against the frozen path. +> SET PATH = spark_catalog.views_b, system.builtin; + +-- A bare query follows the LIVE path: +> SELECT id FROM t; + 2 + +-- The view body follows its FROZEN path: +> SELECT id FROM default.v_frozen; + 1 + +-- DESCRIBE EXTENDED shows the captured path: +> DESCRIBE EXTENDED default.v_frozen; + ... + SQL Path spark_catalog.views_a, system.builtin +``` + ### Related Statements * [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html) * [DROP VIEW](sql-ref-syntax-ddl-drop-view.html) * [SHOW VIEWS](sql-ref-syntax-aux-show-views.html) +* [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) +* [Name Resolution](sql-ref-name-resolution.html) diff --git a/docs/sql-ref-syntax-ddl-drop-function.md b/docs/sql-ref-syntax-ddl-drop-function.md index bef31d74afcff..b9272e34b81d6 100644 --- a/docs/sql-ref-syntax-ddl-drop-function.md +++ b/docs/sql-ref-syntax-ddl-drop-function.md @@ -34,14 +34,18 @@ DROP [ TEMPORARY ] FUNCTION [ IF EXISTS ] function_name * **function_name** - Specifies the name of an existing function. The function name may be - optionally qualified with a database name. + Specifies the name of an existing function. With `TEMPORARY`, the name may optionally be + qualified with `session` or `system.session`. Without `TEMPORARY`, the name may optionally be + qualified with a database (or a catalog and database) and resolves to a persistent function. - **Syntax:** `[ database_name. ] function_name` + **Syntax:** `[ catalog_name. ] [ database_name. ] function_name` + + Functions in `system.builtin` cannot be dropped. * **TEMPORARY** - Should be used to delete the `TEMPORARY` function. + Required to drop a temporary function. Without `TEMPORARY`, `DROP FUNCTION` only considers + persistent functions. * **IF EXISTS** diff --git a/docs/sql-ref-syntax-ddl-drop-view.md b/docs/sql-ref-syntax-ddl-drop-view.md index 5b680d7f907e0..16f711a9074eb 100644 --- a/docs/sql-ref-syntax-ddl-drop-view.md +++ b/docs/sql-ref-syntax-ddl-drop-view.md @@ -37,9 +37,11 @@ DROP VIEW [ IF EXISTS ] view_identifier * **view_identifier** - Specifies the view name to be dropped. The view name may be optionally qualified with a database name. + Specifies the view name to be dropped. The name may be optionally qualified with a database + name (or a catalog and database). A name qualified with `session` or `system.session` + targets a temporary view. - **Syntax:** `[ database_name. ] view_name` + **Syntax:** `[ catalog_name. ] [ database_name. ] view_name` ### Examples @@ -53,12 +55,20 @@ DROP VIEW userdb.employeeView; -- Assumes a view named `employeeView` does not exist. -- Throws exception DROP VIEW employeeView; -Error: org.apache.spark.sql.AnalysisException: Table or view not found: employeeView; -(state=,code=0) +Error: TABLE_OR_VIEW_NOT_FOUND -- Assumes a view named `employeeView` does not exist,Try with IF EXISTS -- this time it will not throw exception DROP VIEW IF EXISTS employeeView; + +-- A temporary view that shadows a persistent view with the same name. +-- An unqualified DROP VIEW drops the temporary view first; qualifying with `session` +-- always targets the temporary view explicitly. +CREATE VIEW default.recent_orders AS SELECT * FROM orders WHERE order_date > current_date - 7; +CREATE TEMPORARY VIEW recent_orders AS SELECT * FROM orders WHERE order_date = current_date; + +DROP VIEW session.recent_orders; -- drops the temporary view +DROP VIEW default.recent_orders; -- drops the persistent view ``` ### Related Statements diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md index d8c37dc021985..1e0ea4a2b8d64 100644 --- a/docs/sql-ref-syntax.md +++ b/docs/sql-ref-syntax.md @@ -29,7 +29,8 @@ Data Definition Statements are used to create or modify the structure of databas * [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html) * [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html) * [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html) - * [CREATE FUNCTION](sql-ref-syntax-ddl-create-function.html) + * [CREATE FUNCTION (External)](sql-ref-syntax-ddl-create-function.html) + * [CREATE FUNCTION (SQL)](sql-ref-syntax-ddl-create-sql-function.html) * [CREATE TABLE](sql-ref-syntax-ddl-create-table.html) * [CREATE VIEW](sql-ref-syntax-ddl-create-view.html) * [DECLARE VARIABLE](sql-ref-syntax-ddl-declare-variable.html) @@ -123,6 +124,7 @@ You use SQL scripting to execute procedural logic in SQL. * [REFRESH FUNCTION](sql-ref-syntax-aux-cache-refresh-function.html) * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) * [SET](sql-ref-syntax-aux-conf-mgmt-set.html) + * [SET PATH](sql-ref-syntax-aux-conf-mgmt-set-path.html) * [SET VAR](sql-ref-syntax-aux-set-var.html) * [SHOW COLLATIONS](sql-ref-syntax-aux-show-collations.html) * [SHOW COLUMNS](sql-ref-syntax-aux-show-columns.html)