-
Notifications
You must be signed in to change notification settings - Fork 413
fix: Validate SetStatisticsUpdate correctly (fixes #2865) #2866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously the pydantic @model_validator would fail because it assumed statistics was a model instance. In a "before"" validator that is not the case. Use an "after" validator instead, where we can use instantiated and validated fields.
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I think we should just deprecate the top-level snapshot_id entirely.
For context, the "before model_validator` was added in 3b53edc#diff-769b43e1d8beaa86141f694679de2bbea3604a65f987a9acd7d9e9efca193b7eR181-R193 to maintain backwards compatibility and prep for deprecation
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops didnt mean to approve
|
@kevinjqliu Ok, do you want me to change the fix so that |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the java implementation has not deprecated the top-level snapshot_id. lets proceed with this change since it improves the current validation logic.
Thanks for the PR! Lets add a test and it should be good to go!
| @model_validator(mode="after") | ||
| def validate_snapshot_id(self) -> Self: | ||
| return self.model_copy(update={"snapshot_id": self.statistics.snapshot_id}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @model_validator(mode="after") | |
| def validate_snapshot_id(self) -> Self: | |
| return self.model_copy(update={"snapshot_id": self.statistics.snapshot_id}) | |
| @model_validator(mode="after") | |
| def validate_snapshot_id(self) -> Self: | |
| self.snapshot_id = self.statistics.snapshot_id | |
| return self |
nit: use direct assignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test, something like:
def test_set_statistics_update_without_snapshot_id(table_v2_with_statistics: Table) -> None:
"""Test that SetStatisticsUpdate auto-populates snapshot_id from statistics."""
snapshot_id = table_v2_with_statistics.metadata.current_snapshot_id
blob_metadata = BlobMetadata(
type="apache-datasketches-theta-v1",
snapshot_id=snapshot_id,
sequence_number=2,
fields=[1],
properties={"prop-key": "prop-value"},
)
statistics_file = StatisticsFile(
snapshot_id=snapshot_id,
statistics_path="s3://bucket/warehouse/stats.puffin",
file_size_in_bytes=124,
file_footer_size_in_bytes=27,
blob_metadata=[blob_metadata],
)
# Create update without explicitly providing snapshot_id
update = SetStatisticsUpdate(statistics=statistics_file)
# Validator should auto-populate snapshot_id from statistics
assert update.snapshot_id == snapshot_id
new_metadata = update_table_metadata(
table_v2_with_statistics.metadata,
(update,),
)
assert len(new_metadata.statistics) == 2
updated_statistics = [stat for stat in new_metadata.statistics if stat.snapshot_id == snapshot_id]
assert len(updated_statistics) == 1
Previously the pydantic
@model_validatoronSetStatisticsUpdatewould fail because it assumed statistics was a model instance. In a "before"" validator that is not the case.Use an "after" validator instead, where we can use instantiated and validated fields.
Before
After
Rationale for this change
Are these changes tested?
Yes, but only using the two-liners above.
Are there any user-facing changes?
No.