Skip to content

Conversation

@satvshr
Copy link
Contributor

@satvshr satvshr commented Jan 9, 2026

Metadata

@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 38.31522% with 227 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.98%. Comparing base (645ef01) to head (249efec).

Files with missing lines Patch % Lines
openml/_api/resources/tasks.py 14.07% 171 Missing ⚠️
openml/_api/http/client.py 46.37% 37 Missing ⚠️
openml/_api/runtime/fallback.py 0.00% 6 Missing ⚠️
openml/_api/runtime/core.py 81.48% 5 Missing ⚠️
openml/tasks/functions.py 50.00% 3 Missing ⚠️
openml/_api/resources/datasets.py 77.77% 2 Missing ⚠️
openml/_api/__init__.py 75.00% 1 Missing ⚠️
openml/_api/config.py 96.87% 1 Missing ⚠️
openml/tasks/task.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1611      +/-   ##
==========================================
- Coverage   52.78%   44.98%   -7.80%     
==========================================
  Files          36       46      +10     
  Lines        4331     4517     +186     
==========================================
- Hits         2286     2032     -254     
- Misses       2045     2485     +440     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@geetu040 geetu040 mentioned this pull request Jan 9, 2026
25 tasks
@satvshr satvshr marked this pull request as ready for review January 12, 2026 15:29
Copy link
Contributor

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high-level review, I noticed a few points that need adjustment:

  • Caching can likely be removed from the SDK, since these concerns should be handled by the base client.
  • I don't see the api_context being used in tasks/functions, so it's not clear to me how the SDK is actually using the new API interface here.
  • Instead of moving entire methods out of tasks/functions.py, it would be better to stick to the goal of minimal SDK changes while enabling v2 support.
  • API calls should be updated at the specific root functions (for example _get_task_description, OpenMLTask._download_split).
  • For listing tasks, please follow the approach discussed in #1575 comment.

)

print(evals_setups.head(10))
print(evals_setups.head(10))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep these changes away from this PR. If there are some ruff errors in the existing code, they should be fixed in another PR which will probably get merged soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally had ran ruff format . on this branch, ruff PR getting merged solved these issues automatically though.

@satvshr satvshr marked this pull request as draft January 14, 2026 20:25
@satvshr satvshr changed the title [ENH] Tasks Migration [ENH] V1 → V2 API Migration - Tasks Jan 15, 2026
Copy link
Contributor

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments, please take a look and make sure the signature of all methods in TasksAPI, TasksV1 and TasksV2 stay same.

def get(self, dataset_id: int) -> OpenMLDataset | tuple[OpenMLDataset, Response]: ...


class TasksAPI(ResourceAPI, ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the methods commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to remove them, if I add abstract methods they have to be for shared functions right? The only shared function right now is get.

Comment on lines +69 to +77
# @abstractmethod
# def list_tasks(
# self,
# *,
# task_type: TaskType | None = None,
# offset: int | None = None,
# size: int | None = None,
# **filters: Any,
# ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method should simply be called list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the name in TasksV1 (where the function actually exists)



class TasksV1(TasksAPI):
@openml.utils.thread_safe_if_oslo_installed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this, it's not needed, it's related to cache and should be handled at client

Comment on lines +27 to +80
def get(
self,
task_id: int,
download_splits: bool = False, # noqa: FBT002
**get_dataset_kwargs: Any,
) -> OpenMLTask:
"""Download OpenML task for a given task ID.
Downloads the task representation.
Use the `download_splits` parameter to control whether the splits are downloaded.
Moreover, you may pass additional parameter (args or kwargs) that are passed to
:meth:`openml.datasets.get_dataset`.
Parameters
----------
task_id : int
The OpenML task id of the task to download.
download_splits: bool (default=False)
Whether to download the splits as well.
get_dataset_kwargs :
Args and kwargs can be used pass optional parameters to
:meth:`openml.datasets.get_dataset`.
Returns
-------
task: OpenMLTask
"""
if not isinstance(task_id, int):
raise TypeError(f"Task id should be integer, is {type(task_id)}")

task = self._get_task_description(task_id)
dataset = get_dataset(task.dataset_id, **get_dataset_kwargs)
# List of class labels available in dataset description
# Including class labels as part of task meta data handles
# the case where data download was initially disabled
if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)):
task.class_labels = dataset.retrieve_class_labels(task.target_name)
# Clustering tasks do not have class labels
# and do not offer download_split
if download_splits and isinstance(task, OpenMLSupervisedTask):
task.download_split()

return task

def _get_task_description(self, task_id: int) -> OpenMLTask:
result = self._http.get(f"task/{task_id}", return_response=True)

if isinstance(result, tuple):
task, _response = result
else:
task = result

return task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not copy this entirely from tasks/functions.py, only the specific part which loads the task object should be here, which would probably be

       response = self._http.get(f"task/{task_id}")
       task = self._create_task_from_xml(response.text)
       return task

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to highlight the entire get function or only _get_task_description? Is this:

        dataset = get_dataset(task.dataset_id, **get_dataset_kwargs)
        # List of class labels available in dataset description
        # Including class labels as part of task meta data handles
        #   the case where data download was initially disabled
        if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)):
            task.class_labels = dataset.retrieve_class_labels(task.target_name)
        # Clustering tasks do not have class labels
        # and do not offer download_split
        if download_splits and isinstance(task, OpenMLSupervisedTask):
            task.download_split()

not useful? Why?


return self.__list_tasks(api_call=api_call)

def __list_tasks(self, api_call: str) -> pd.DataFrame: # noqa: C901, PLR0912
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use better helper functions like _create_list_url and _parse_list_response?

Copy link
Contributor Author

@satvshr satvshr Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused as to what youre trying to say here, do you mean I should transfer the functionalities of list (previously list_tasks) to _create_list_url, and rename __list_tasks to _parse_list_response?

Comment on lines +366 to +371
def get_tasks(
self,
task_ids: list[int],
download_data: bool | None = None,
download_qualities: bool | None = None,
) -> list[OpenMLTask]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this method in tasks/functions.py, because we are sticking to the rule "minimal sdk changes for v1/v2 compatibility"

Copy link
Contributor Author

@satvshr satvshr Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't I do the same for create_task and delete_task too?

Edit: Just saw your comment below :D

Comment on lines +414 to +422
def create_task(
self,
task_type: TaskType,
dataset_id: int,
estimation_procedure_id: int,
target_name: str | None = None,
evaluation_measure: str | None = None,
**kwargs: Any,
) -> (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should stay in tasks/fucntions.py

bool
True if the deletion was successful. False otherwise.
"""
return openml.utils._delete_entity("task", task_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll implement this in the base class that you can replace with later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed today during the standup, makes sense


return cls(**common_kwargs)

def list_task_types(self) -> list[dict[str, str | int | None]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used anywhere?

Copy link
Contributor Author

@satvshr satvshr Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, there is an endpoint for it though, same for get_task_type.

raise OpenMLCacheException(f"Task file for tid {tid} not cached") from e


def _get_estimation_procedure_list() -> list[dict[str, Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this method here and inside try to use the method list_estimation_procedures already implemented in evaluations/functions.py

Copy link
Contributor Author

@satvshr satvshr Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_estimation_procedures returns only the "oml:name" whereas _get_estimation_procedure_list requires more items, they make call the same API, and list_estimation_procedures may be somewhat of a subset of _get_estimation_procedure_list, but that does not mean it can be used inside _get_estimation_procedure_list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants