refactor: abstract CustomizableDataflow into pandas/polars subclasses#548
refactor: abstract CustomizableDataflow into pandas/polars subclasses#548
Conversation
📦 TestPyPI package publishedpip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.9.dev22353825774or with uv: uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.9.dev22353825774MCP server for Claude Codeclaude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.12.9.dev22353825774" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table |
Learnings & Self-ReviewWas it an improvement?Structurally, yes. The core win is that The However, there's significant code duplication. Performance implications around serializationNo change for pandas users. The The polars path is unchanged too — One subtle behavioral change in What this PR doesn't do yet
|
cd36705 to
1360e5e
Compare
Make CustomizableDataflow an abstract base class with concrete PandasCustomizableDataflow and PolarsCustomizableDataflow subclasses. Widgets select their backend via a dataflow_klass attribute. This decouples pandas from the core dataflow module, enabling pandas to become an optional dependency in the future. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1360e5e to
2db1042
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 02ecf492b0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.exception = None | ||
| kls = self.__class__ | ||
| class InnerDataFlow(CustomizableDataflow): | ||
| class InnerDataFlow(kls.dataflow_klass): |
There was a problem hiding this comment.
Route _df_to_obj through widget overrides
This refactor removed the InnerDataFlow._df_to_obj bridge, so _df_to_obj implementations on widget subclasses are no longer used during serialization. That breaks built-in override paths like GeopandasBase._df_to_obj (buckaroo/geopandas_buckaroo.py), which exists to coerce geopandas frames before pd_to_obj; after this change, serialization falls back to PandasCustomizableDataflow._df_to_obj and bypasses that conversion, causing geopandas widget payload generation to fail or produce incorrect data.
Useful? React with 👍 / 👎.
Summary
CustomizableDataflowan abstract base class with 4 abstract methods:_compute_processed_result,_build_error_dataframe,_get_summary_sd,_df_to_objPandasCustomizableDataflow(new file) andPolarsCustomizableDataflow(new file) as concrete implementationsdataflow_klassclass attribute instead of inheriting from the pandas widgetdataflow.pyanddataflow_extras.pybase classesMotivation
This decouples the dataflow pipeline from pandas, enabling pandas to become an optional dependency. Previously
PolarsBuckarooWidgetinherited from the pandas widget and overrode methods — now each backend has its own clean dataflow subclass.Files changed
buckaroo/dataflow/dataflow.pyCustomizableDataflowabstract, remove pandas importsbuckaroo/dataflow/dataflow_extras.pysort_indexcheckbuckaroo/dataflow/pandas_dataflow.pyPandasCustomizableDataflowbuckaroo/dataflow/polars_dataflow.pyPolarsCustomizableDataflowbuckaroo/buckaroo_widget.pydataflow_klass, use it inInnerDataFlowbuckaroo/polars_buckaroo.pydataflow_klass = PolarsCustomizableDataflowbuckaroo/server/data_loading.pyPandasCustomizableDataflowtests/unit/dataflow/*.pyPandasCustomizableDataflowTest plan
🤖 Generated with Claude Code