[SPARK-56975][SS] Reject user-specified schema in DataStreamReader.table()#56017
Closed
PorridgeSwim wants to merge 1 commit into
Closed
[SPARK-56975][SS] Reject user-specified schema in DataStreamReader.table()#56017PorridgeSwim wants to merge 1 commit into
PorridgeSwim wants to merge 1 commit into
Conversation
Contributor
|
Thanks! Merging to master/4.x. |
HeartSaVioR
pushed a commit
that referenced
this pull request
May 21, 2026
…ble()
### What changes were proposed in this pull request?
Make `DataStreamReader.table()` reject user-specified schemas by calling `assertNoSpecifiedSchema("table")`, mirroring `DataStreamReader.changes()`.
### Why are the changes needed?
`DataStreamReader.table()` accepts a user-specified schema without complaint and then silently ignores it:
```scala
spark.readStream
.schema(new StructType().add("a", IntegerType))
.table("some_table") // no error; the schema has no effect
```
User-specified schema is not a meaningful input to `.table()` — catalog tables declare their own schema, and `TableCatalog.loadTable(Identifier)` has no parameter to receive a user schema, so even if Spark wanted to forward one it couldn't. The user's `.schema(...)` call is therefore always a misconfiguration.
The rest of `DataStreamReader` already surfaces this kind of misconfiguration as a clear error:
- `.load()` goes through `DataSourceV2Utils.getTableFromProvider`, which throws `_LEGACY_ERROR_TEMP_2242` ("`<provider>` source does not support user-specified schema") when the provider does not implement `supportsExternalMetadata()`.
- `.changes()` explicitly calls `assertNoSpecifiedSchema("changes")` and throws `_LEGACY_ERROR_TEMP_1189` ("User specified schema not supported with `changes`.").
`.table()` is the odd one out: same invalid configuration, no error. Users can write `readStream.schema(s).table(name)`, see a working query, and reasonably assume `s` had an effect — when in fact the resulting stream uses the catalog schema and `s` was dropped. Surfacing this as a clear error aligns `.table()` with the existing behavior of `.load()` and `.changes()`.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added `DataStreamTableAPISuite` test `"read: user-specified schema is not allowed with table API"`.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #56017 from PorridgeSwim/forbidSpecifySchemaForTable.
Lead-authored-by: You Zhou <98635051+PorridgeSwim@users.noreply.github.com>
Co-authored-by: You Zhou <you.zhou@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 05b4d81)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Make
DataStreamReader.table()reject user-specified schemas by callingassertNoSpecifiedSchema("table"), mirroringDataStreamReader.changes().Why are the changes needed?
DataStreamReader.table()accepts a user-specified schema without complaint and then silently ignores it:User-specified schema is not a meaningful input to
.table()— catalog tables declare their own schema, andTableCatalog.loadTable(Identifier)has no parameter to receive a user schema, so even if Spark wanted to forward one it couldn't. The user's.schema(...)call is therefore always a misconfiguration.The rest of
DataStreamReaderalready surfaces this kind of misconfiguration as a clear error:.load()goes throughDataSourceV2Utils.getTableFromProvider, which throws_LEGACY_ERROR_TEMP_2242("<provider>source does not support user-specified schema") when the provider does not implementsupportsExternalMetadata()..changes()explicitly callsassertNoSpecifiedSchema("changes")and throws_LEGACY_ERROR_TEMP_1189("User specified schema not supported withchanges.")..table()is the odd one out: same invalid configuration, no error. Users can writereadStream.schema(s).table(name), see a working query, and reasonably assumeshad an effect — when in fact the resulting stream uses the catalog schema andswas dropped. Surfacing this as a clear error aligns.table()with the existing behavior of.load()and.changes().Does this PR introduce any user-facing change?
No
How was this patch tested?
Added
DataStreamTableAPISuitetest"read: user-specified schema is not allowed with table API".Was this patch authored or co-authored using generative AI tooling?
No