-
-
Notifications
You must be signed in to change notification settings - Fork 211
[ENH] V1 → V2 API Migration - Tasks #1611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ported functions to APIv1
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1611 +/- ##
==========================================
- Coverage 52.78% 44.98% -7.80%
==========================================
Files 36 46 +10
Lines 4331 4517 +186
==========================================
- Hits 2286 2032 -254
- Misses 2045 2485 +440 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a high-level review, I noticed a few points that need adjustment:
- Caching can likely be removed from the SDK, since these concerns should be handled by the base client.
- I don't see the
api_contextbeing used intasks/functions, so it's not clear to me how the SDK is actually using the new API interface here. - Instead of moving entire methods out of
tasks/functions.py, it would be better to stick to the goal of minimal SDK changes while enabling v2 support. - API calls should be updated at the specific root functions (for example
_get_task_description,OpenMLTask._download_split). - For listing tasks, please follow the approach discussed in #1575 comment.
| ) | ||
|
|
||
| print(evals_setups.head(10)) | ||
| print(evals_setups.head(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep these changes away from this PR. If there are some ruff errors in the existing code, they should be fixed in another PR which will probably get merged soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally had ran ruff format . on this branch, ruff PR getting merged solved these issues automatically though.
geetu040
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments, please take a look and make sure the signature of all methods in TasksAPI, TasksV1 and TasksV2 stay same.
| def get(self, dataset_id: int) -> OpenMLDataset | tuple[OpenMLDataset, Response]: ... | ||
|
|
||
|
|
||
| class TasksAPI(ResourceAPI, ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are the methods commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to remove them, if I add abstract methods they have to be for shared functions right? The only shared function right now is get.
| # @abstractmethod | ||
| # def list_tasks( | ||
| # self, | ||
| # *, | ||
| # task_type: TaskType | None = None, | ||
| # offset: int | None = None, | ||
| # size: int | None = None, | ||
| # **filters: Any, | ||
| # ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method should simply be called list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the name in TasksV1 (where the function actually exists)
|
|
||
|
|
||
| class TasksV1(TasksAPI): | ||
| @openml.utils.thread_safe_if_oslo_installed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this, it's not needed, it's related to cache and should be handled at client
| def get( | ||
| self, | ||
| task_id: int, | ||
| download_splits: bool = False, # noqa: FBT002 | ||
| **get_dataset_kwargs: Any, | ||
| ) -> OpenMLTask: | ||
| """Download OpenML task for a given task ID. | ||
| Downloads the task representation. | ||
| Use the `download_splits` parameter to control whether the splits are downloaded. | ||
| Moreover, you may pass additional parameter (args or kwargs) that are passed to | ||
| :meth:`openml.datasets.get_dataset`. | ||
| Parameters | ||
| ---------- | ||
| task_id : int | ||
| The OpenML task id of the task to download. | ||
| download_splits: bool (default=False) | ||
| Whether to download the splits as well. | ||
| get_dataset_kwargs : | ||
| Args and kwargs can be used pass optional parameters to | ||
| :meth:`openml.datasets.get_dataset`. | ||
| Returns | ||
| ------- | ||
| task: OpenMLTask | ||
| """ | ||
| if not isinstance(task_id, int): | ||
| raise TypeError(f"Task id should be integer, is {type(task_id)}") | ||
|
|
||
| task = self._get_task_description(task_id) | ||
| dataset = get_dataset(task.dataset_id, **get_dataset_kwargs) | ||
| # List of class labels available in dataset description | ||
| # Including class labels as part of task meta data handles | ||
| # the case where data download was initially disabled | ||
| if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)): | ||
| task.class_labels = dataset.retrieve_class_labels(task.target_name) | ||
| # Clustering tasks do not have class labels | ||
| # and do not offer download_split | ||
| if download_splits and isinstance(task, OpenMLSupervisedTask): | ||
| task.download_split() | ||
|
|
||
| return task | ||
|
|
||
| def _get_task_description(self, task_id: int) -> OpenMLTask: | ||
| result = self._http.get(f"task/{task_id}", return_response=True) | ||
|
|
||
| if isinstance(result, tuple): | ||
| task, _response = result | ||
| else: | ||
| task = result | ||
|
|
||
| return task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should not copy this entirely from tasks/functions.py, only the specific part which loads the task object should be here, which would probably be
response = self._http.get(f"task/{task_id}")
task = self._create_task_from_xml(response.text)
return task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to highlight the entire get function or only _get_task_description? Is this:
dataset = get_dataset(task.dataset_id, **get_dataset_kwargs)
# List of class labels available in dataset description
# Including class labels as part of task meta data handles
# the case where data download was initially disabled
if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)):
task.class_labels = dataset.retrieve_class_labels(task.target_name)
# Clustering tasks do not have class labels
# and do not offer download_split
if download_splits and isinstance(task, OpenMLSupervisedTask):
task.download_split()
not useful? Why?
|
|
||
| return self.__list_tasks(api_call=api_call) | ||
|
|
||
| def __list_tasks(self, api_call: str) -> pd.DataFrame: # noqa: C901, PLR0912 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use better helper functions like _create_list_url and _parse_list_response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused as to what youre trying to say here, do you mean I should transfer the functionalities of list (previously list_tasks) to _create_list_url, and rename __list_tasks to _parse_list_response?
| def get_tasks( | ||
| self, | ||
| task_ids: list[int], | ||
| download_data: bool | None = None, | ||
| download_qualities: bool | None = None, | ||
| ) -> list[OpenMLTask]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this method in tasks/functions.py, because we are sticking to the rule "minimal sdk changes for v1/v2 compatibility"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't I do the same for create_task and delete_task too?
Edit: Just saw your comment below :D
| def create_task( | ||
| self, | ||
| task_type: TaskType, | ||
| dataset_id: int, | ||
| estimation_procedure_id: int, | ||
| target_name: str | None = None, | ||
| evaluation_measure: str | None = None, | ||
| **kwargs: Any, | ||
| ) -> ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should stay in tasks/fucntions.py
| bool | ||
| True if the deletion was successful. False otherwise. | ||
| """ | ||
| return openml.utils._delete_entity("task", task_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll implement this in the base class that you can replace with later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed today during the standup, makes sense
|
|
||
| return cls(**common_kwargs) | ||
|
|
||
| def list_task_types(self) -> list[dict[str, str | int | None]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, there is an endpoint for it though, same for get_task_type.
| raise OpenMLCacheException(f"Task file for tid {tid} not cached") from e | ||
|
|
||
|
|
||
| def _get_estimation_procedure_list() -> list[dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this method here and inside try to use the method list_estimation_procedures already implemented in evaluations/functions.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list_estimation_procedures returns only the "oml:name" whereas _get_estimation_procedure_list requires more items, they make call the same API, and list_estimation_procedures may be somewhat of a subset of _get_estimation_procedure_list, but that does not mean it can be used inside _get_estimation_procedure_list
for more information, see https://pre-commit.ci
Metadata