fix: Validate SetStatisticsUpdate correctly (fixes #2865) #2866

ragnard · 2025-12-26T22:18:20Z

Previously the pydantic @model_validator on SetStatisticsUpdate would fail because it assumed statistics was a model instance. In a "before"" validator that is not the case.

Use an "after" validator instead, where we can use instantiated and validated fields.

Before

>>> import pyiceberg.table.update
>>> pyiceberg.table.update.SetStatisticsUpdate.model_validate({'statistics': {'snapshot-id': 1234, 'file-size-in-bytes': 0, 'statistics-path': '', 'file-footer-size-in-bytes': 0, 'blob-metadata': []}})
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    pyiceberg.table.update.SetStatisticsUpdate.model_validate({'statistics': {'snapshot-id': 1234, 'file-size-in-bytes': 0, 'statistics-path': '', 'file-footer-size-in-bytes': 0, 'blob-metadata': []}})
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ragge/projects/github.com/ragnard/iceberg-python/.venv/lib/python3.14/site-packages/pydantic/main.py", line 716, in model_validate
    return cls.__pydantic_validator__.validate_python(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        obj,
        ^^^^
    ...<5 lines>...
        by_name=by_name,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/ragge/projects/github.com/ragnard/iceberg-python/pyiceberg/table/update/__init__.py", line 191, in validate_snapshot_id
    data["snapshot_id"] = stats.snapshot_id
                          ^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'snapshot_id'

After

>>> import pyiceberg.table.update
>>> pyiceberg.table.update.SetStatisticsUpdate.model_validate({'statistics': {'snapshot-id': 1234, 'file-size-in-bytes': 0, 'statistics-path': '', 'file-footer-size-in-bytes': 0, 'blob-metadata': []}})
SetStatisticsUpdate(action='set-statistics', statistics=StatisticsFile(snapshot_id=1234, statistics_path='', file_size_in_bytes=0, file_footer_size_in_bytes=0, key_metadata=None, blob_metadata=[]), snapshot_id=1234)

Rationale for this change

Are these changes tested?

Yes, but only using the two-liners above.

Are there any user-facing changes?

No.

Previously the pydantic @model_validator would fail because it assumed statistics was a model instance. In a "before"" validator that is not the case. Use an "after" validator instead, where we can use instantiated and validated fields.

kevinjqliu

Thanks for the PR. I think we should just deprecate the top-level snapshot_id entirely.
For context, the "before model_validator` was added in 3b53edc#diff-769b43e1d8beaa86141f694679de2bbea3604a65f987a9acd7d9e9efca193b7eR181-R193 to maintain backwards compatibility and prep for deprecation

kevinjqliu

oops didnt mean to approve

ragnard · 2025-12-27T07:53:04Z

@kevinjqliu Ok, do you want me to change the fix so that snapshot_id is still there, but just not automatically populated?

kevinjqliu

Looks like the java implementation has not deprecated the top-level snapshot_id. lets proceed with this change since it improves the current validation logic.

Thanks for the PR! Lets add a test and it should be good to go!

kevinjqliu · 2025-12-28T19:14:50Z

pyiceberg/table/update/__init__.py

+    @model_validator(mode="after")
+    def validate_snapshot_id(self) -> Self:
+        return self.model_copy(update={"snapshot_id": self.statistics.snapshot_id})


Suggested change

@model_validator(mode="after")

def validate_snapshot_id(self) -> Self:

return self.model_copy(update={"snapshot_id": self.statistics.snapshot_id})

@model_validator(mode="after")

def validate_snapshot_id(self) -> Self:

self.snapshot_id = self.statistics.snapshot_id

return self

nit: use direct assignment

kevinjqliu · 2025-12-28T19:15:37Z

pyiceberg/table/update/__init__.py

add a test, something like:

def test_set_statistics_update_without_snapshot_id(table_v2_with_statistics: Table) -> None: """Test that SetStatisticsUpdate auto-populates snapshot_id from statistics.""" snapshot_id = table_v2_with_statistics.metadata.current_snapshot_id blob_metadata = BlobMetadata( type="apache-datasketches-theta-v1", snapshot_id=snapshot_id, sequence_number=2, fields=[1], properties={"prop-key": "prop-value"}, ) statistics_file = StatisticsFile( snapshot_id=snapshot_id, statistics_path="s3://bucket/warehouse/stats.puffin", file_size_in_bytes=124, file_footer_size_in_bytes=27, blob_metadata=[blob_metadata], ) # Create update without explicitly providing snapshot_id update = SetStatisticsUpdate(statistics=statistics_file) # Validator should auto-populate snapshot_id from statistics assert update.snapshot_id == snapshot_id new_metadata = update_table_metadata( table_v2_with_statistics.metadata, (update,), ) assert len(new_metadata.statistics) == 2 updated_statistics = [stat for stat in new_metadata.statistics if stat.snapshot_id == snapshot_id] assert len(updated_statistics) == 1

kevinjqliu approved these changes Dec 26, 2025

View reviewed changes

kevinjqliu requested changes Dec 26, 2025

View reviewed changes

kevinjqliu reviewed Dec 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Validate SetStatisticsUpdate correctly (fixes #2865) #2866

fix: Validate SetStatisticsUpdate correctly (fixes #2865) #2866

Uh oh!

ragnard commented Dec 26, 2025 •

edited

Loading

Uh oh!

kevinjqliu left a comment

Uh oh!

kevinjqliu left a comment

Uh oh!

ragnard commented Dec 27, 2025

Uh oh!

kevinjqliu left a comment

Uh oh!

kevinjqliu Dec 28, 2025

Uh oh!

kevinjqliu Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Validate SetStatisticsUpdate correctly (fixes #2865) #2866

Are you sure you want to change the base?

fix: Validate SetStatisticsUpdate correctly (fixes #2865) #2866

Uh oh!

Conversation

ragnard commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

ragnard commented Dec 27, 2025

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

kevinjqliu Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

kevinjqliu Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ragnard commented Dec 26, 2025 •

edited

Loading