refactor: Extract split filter provider interface to improve flexibility for user customization.#46
Merged
anlowee merged 35 commits intoy-scope:release-0.293-clp-connectorfrom Aug 25, 2025
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
In this PR we mainly extract an interface of a split filter provider, so that the user can extend their own config format for their own metadata database. Since the metadata filter is only used for split filtering, we decide to rename it to split filter.
We can add a new package: split.filter, and a base class
ClpSplitFilterProvider.In this base class we keep all functions for processing scope, which is a concept that in the previous metadata filter design (i.e., is the filter for all schemas and all tables, or for tables under a certain schema, or only for the certain table). And also two abstract functions:
getCustomSplitFilterOptionsClass, which is for the user to return their own implementation forCustomSplitFilterOptions class. This is also a guard so that the user won’t forget to implement theCustomSplitFilterOptions class(see below for the definition of CustomSplitFilterOptions).remapSplitFilterPushDown, which is renamed from the old function remapFilterSql. This function is for rewriting the split filter push down expression with split-filter-specific stuff, so the user can only extend this function to handle split-filter-specific logic.For the current filter structure, we move it to a separate class
ClpMetadataFilterwhich defined the basic structure for filters of all types of metadata database:”columnName”: the filter name.”customOptions”: the split-filter-specific data structure which should be implemented by the user, where we can put all split-filter-specific fields in it (e.g.,”rangeMapping”). We provide an implementation for MySql metadata database inClpMySqlSplitFilterProvider.SplitDatabaseSpecific.”required”: does the filter have to exist after push down.As discussed, we will move this field into
”customOptions”in the next PR.We also add a new class
ClpCustomSplitFilterOptionsDeserializerwhich is for deserializing the”customOptions”field with the class given bygetCustomSplitFilterOptionsClass.For the changes in the config file for MySql metadata database, we only move the
“rangeMapping”into the field”customOptions”:{ "clp.default.table_1": [ { "columnName": "msg.timestamp", "customOptions": { "rangeMapping": { "lowerBound": "begin_timestamp", "upperBound": "end_timestamp" } }, "required": true }, { "columnName": "file_name" } ] }Checklist
breaking change.
Validation performed
timestampas a required metadata filter:SELECT 1 = 1 FROM default LIMIT 1, throw exception due to missing, because even it is not querying on any columns it still scans the table.SELECT query_id + 1 FROM default LIMIT 1, throw exception due to missing.SELECT * FROM default WHERE ps = 'startup' LIMIT 1, throw exception due to missing.SELECT COUNT(*) FROM default WHERE timestamp > FROM_UNIXTIME(0) AND timestamp < FROM_UNIXTIME(9999999999.1234);, return correctly.Summary by CodeRabbit