[SPARK-56956][SDP] Introduce AutoCDC Flow Dataclasses#56042
Open
AnishMahto wants to merge 18 commits into
Open
[SPARK-56956][SDP] Introduce AutoCDC Flow Dataclasses#56042AnishMahto wants to merge 18 commits into
AnishMahto wants to merge 18 commits into
Conversation
This was referenced May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Approved AutoCDC SPIP: https://lists.apache.org/thread/j6sj9wo9odgdpgzlxtvhoy7szs0jplf7
This is a stacked PR. Review incremental diff here: AnishMahto/spark@SPARK-56870-extend-microbatch-with-cdc-metadata...SPARK-56956-introduce-flow-data-classes
What changes were proposed in this pull request?
Introduce dataclass for unresolved AutoCDC flow (
AutoCdcFlow) and resolved AutoCDC flow (AutoCdcMergeFlow). Add wiring to analyze anAutoCdcFlowto anAutoCdcMergeFlow.A small refactor was additionally made on the
UnresolvedFlowandResolvedFlowclass hierarchy.Why are the changes needed?
Support AutoCDC flow registration and analysis. AutoCDC flow execution will be supported in a future PR. Previously, an
UnresolvedFlowadditionally always represented an untyped-flow; a flow where do not yet know its execution-type, i.e streaming, append-once, etc.AutoCdcFlowis a specialized flow with support for only streaming flows, hence it represents a flow whose execution-type we know at construction. It is still unresolved at registration time, and needs to go through resolution to determine its position in the DAG and its input/outut schemas.Hence we introduce the intermediary child
UntypedFlowforUnresolvedFlow, which all previous flows are classified as during registration. AnAutoCdcFlowdirectly implementsUnresolvedFlow(skipping `UntypedFlow in its inheritance chain) because it is not untyped.Does this PR introduce any user-facing change?
No, the AutoCDC feature is not released anywhere yet.
How was this patch tested?
ConnectValidPipelineSuiteandAutoCdcFlowSuiteWas this patch authored or co-authored using generative AI tooling?
Co-authored.
Generated-by: Claude-Opus-4.7-thinking-xhigh