implement datasets sdk by 0xthc · Pull Request #14 · basalt-ai/basalt-python

0xthc · 2025-06-27T11:43:41Z

Summary by CodeRabbit

New Features
- Introduced comprehensive dataset management capabilities, including listing datasets, retrieving detailed dataset information, and adding new rows to datasets.
- Added a dedicated Dataset SDK, integrated into the main interface for streamlined access to dataset operations.
- Provided example usage in a new Jupyter notebook to guide users through typical dataset workflows.
Tests
- Added a complete test suite to ensure reliability of all dataset-related features.

coderabbitai · 2025-06-27T11:44:01Z

Caution

Review failed

The pull request is closed.

Walkthrough

This update introduces comprehensive dataset management capabilities to the Basalt SDK. It adds a new DatasetSDK for listing, retrieving, and modifying datasets, integrates dataset support into the main SDK facade, defines DTOs and protocols for datasets, implements endpoint handlers, and provides both usage examples and unit tests for the new functionality.

Changes

File(s)	Change Summary
`basalt/basalt_facade.py`, `basalt/basaltsdk.py`	Integrated DatasetSDK into BasaltFacade and BasaltSDK, updated constructors, and added new properties for dataset access.
`basalt/sdk/datasetsdk.py`	Added DatasetSDK class with methods for listing, retrieving, and adding dataset rows.
`basalt/utils/dtos.py`, `basalt/utils/protocols.py`	Introduced DTOs and protocol interfaces for dataset operations and results.
`basalt/endpoints/create_dataset_item.py`, `basalt/endpoints/get_dataset.py`, `basalt/endpoints/list_datasets.py`	Added endpoint handlers for creating dataset items, fetching datasets, and listing datasets.
`basalt/objects/dataset.py`, `basalt/ressources/datasets/dataset_types.py`	Added dataclasses for structured dataset and row representations, including serialization and deserialization methods.
`basalt/ressources/datasets/__init__.py`	Added a docstring-only module for datasets resource.
`examples/dataset_sdk_demo.ipynb`	Introduced a Jupyter notebook demonstrating dataset SDK usage, including listing, retrieving, adding rows, and error handling.
`tests/test_datasetsdk.py`	Added unit tests for DatasetSDK covering list, get, create, and error handling scenarios.
`basalt/_version.py`	Updated version string from "0.2.1" to "0.2.2".

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant BasaltFacade
    participant BasaltSDK
    participant DatasetSDK
    participant API

    User->>BasaltFacade: Access datasets property
    BasaltFacade->>BasaltSDK: Return datasets property
    BasaltSDK->>DatasetSDK: Return self
    User->>DatasetSDK: list() / get(slug) / addRow(slug, ...)
    DatasetSDK->>API: Invoke corresponding endpoint
    API-->>DatasetSDK: Return response
    DatasetSDK-->>User: Return result (DTOs or error)

Poem

In the warren of data, a rabbit now gleams,
With datasets to list and fetch, fulfilling dreams.
Rows are added with care, columns in tow,
Facades and SDKs in a seamless flow.
With tests and a notebook, the journey’s complete—
Hop on, dear coder, for data so sweet!
🐇📊

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d867c9 and cd47fbc.

📒 Files selected for processing (1)

basalt/_version.py (1 hunks)

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (15)

basalt/utils/protocols.py (3)
1-2: Remove unused import and fix formatting.

The List import from typing is not used in this file and should be removed. Also, the imports could be better organized.
-from typing import Any, Optional, Protocol, Hashable, Tuple, TypeVar, Dict, Mapping, Literal, List
+from typing import Any, Optional, Protocol, Hashable, Tuple, TypeVar, Dict, Mapping, Literal
 from .dtos import GetResult, DescribeResult, ListResult, ListDatasetsResult, GetDatasetResult, CreateDatasetItemResult
34-39: Fix formatting issues in the IDatasetSDK protocol.

The protocol definition has formatting issues that should be addressed for consistency.
+
 class IDatasetSDK(Protocol):
     def list(self) -> ListDatasetsResult: ...
     def get(self, slug: str) -> GetDatasetResult: ...
-    def add_row(self, slug: str, values: Dict[str, str], name: Optional[str] = None, 
-                  ideal_output: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateDatasetItemResult: ...
+    def add_row(self, slug: str, values: Dict[str, str], name: Optional[str] = None,
+                ideal_output: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateDatasetItemResult: ...
40-46: Fix formatting and approve the IBasaltSDK extension.

The protocol extension correctly adds the datasets property, but needs a blank line before the class definition.
+
 class IBasaltSDK(Protocol):
     @property
     def prompt(self) -> IPromptSDK: ...
     @property
     def monitor(self) -> IMonitorSDK: ...
     @property
     def datasets(self) -> IDatasetSDK: ...
basalt/endpoints/list_datasets.py (1)
9-29: LGTM! Well-structured response dataclass with minor formatting fix needed.

The response dataclass is well-designed with proper deserialization logic. Just needs a formatting fix for the blank line before the class definition.
+
 @dataclass
 class ListDatasetsEndpointResponse:
tests/test_datasetsdk.py (1)
3-3: Remove unused import.

The parameterized import is not used in the test file.
-from parameterized import parameterized
basalt/endpoints/get_dataset.py (1)
8-9: Add missing blank line for PEP 8 compliance.

Static analysis correctly identifies missing blank lines before class definition.
 from ..utils.dtos import DatasetDTO, GetDatasetDTO
+
 @dataclass
basalt/sdk/datasetsdk.py (2)
4-4: Remove unused imports.

Static analysis correctly identifies unused imports that should be cleaned up.
-from typing import Dict, List, Optional, Tuple, Any
+from typing import Dict, Optional, Any
6-10: Remove unused DatasetRowDTO import.

The DatasetRowDTO import is not used in this file since the methods return the DTO directly from API responses.
from ..utils.dtos import (
    ListDatasetsDTO, GetDatasetDTO, CreateDatasetItemDTO,
    ListDatasetsResult, GetDatasetResult, CreateDatasetItemResult,
-    DatasetDTO, DatasetRowDTO
+    DatasetDTO
)
examples/dataset_example.py (3)
16-16: Remove unused imports.

The Dict, List, and Optional imports from typing are not used in this file.
-from typing import Dict, List, Optional
28-28: Fix PEP 8 formatting: add blank lines before function definitions.

Python PEP 8 requires 2 blank lines before top-level function definitions.
+

 def list_datasets():
Apply similar changes before each function definition (lines 45, 70, 107, 137).

Also applies to: 45-45, 70-70, 107-107, 137-137

156-158: Fix PEP 8 formatting: add blank line after main block.

Python PEP 8 requires 2 blank lines after class or function definitions.
 if __name__ == "__main__":
     main()
+
basalt/endpoints/create_dataset_item.py (1)
9-9: Fix PEP 8 formatting: add blank line before class definition.

Python PEP 8 requires 2 blank lines before top-level class definitions.
+

@dataclass
class CreateDatasetItemEndpointResponse:
basalt/utils/dtos.py (3)
133-133: Fix PEP 8 formatting: reduce excessive blank lines.

Too many blank lines before the datasets section.
-



+
 ListResult = Tuple[Optional[Exception], Optional[List[PromptListResponse]]]
136-136: Fix PEP 8 formatting: add blank line before section.

Python PEP 8 requires 2 blank lines before top-level class definitions.
+

 # ------------------------------ Datasets ----------------------------- #
142-142: Improve consistency in default values for collection fields.

The rows field defaults to None but gets an empty list in from_dict, while metadata defaults to None but gets an empty dict in from_dict. Consider consistent defaults.

For better consistency and to avoid None checks:
-    rows: List[Dict[str, Any]] = None
+    rows: List[Dict[str, Any]] = field(default_factory=list)
-    metadata: Dict[str, Any] = None
+    metadata: Dict[str, Any] = field(default_factory=dict)
Don't forget to import field from dataclasses if using this approach.

Also applies to: 160-160

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 871833c and e12146a.

📒 Files selected for processing (13)

basalt/basalt_facade.py (3 hunks)
basalt/basaltsdk.py (3 hunks)
basalt/endpoints/create_dataset_item.py (1 hunks)
basalt/endpoints/get_dataset.py (1 hunks)
basalt/endpoints/list_datasets.py (1 hunks)
basalt/objects/dataset.py (1 hunks)
basalt/ressources/datasets/__init__.py (1 hunks)
basalt/ressources/datasets/dataset_types.py (1 hunks)
basalt/sdk/datasetsdk.py (1 hunks)
basalt/utils/dtos.py (1 hunks)
basalt/utils/protocols.py (2 hunks)
examples/dataset_example.py (1 hunks)
tests/test_datasetsdk.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (9)

basalt/basaltsdk.py (2)

basalt/utils/protocols.py (4)

IPromptSDK (29-32)

IBasaltSDK (40-46)

IDatasetSDK (34-38)

datasets (46-46)

basalt/basalt_facade.py (1)

datasets (63-67)

basalt/utils/protocols.py (4)

basalt/sdk/datasetsdk.py (3)

get (50-70)

add_row (72-110)

add_row (135-179)

basalt/basaltsdk.py (3)

prompt (16-18)

monitor (21-23)

datasets (26-28)

basalt/basalt_facade.py (3)

prompt (49-53)

monitor (56-60)

datasets (63-67)

basalt/ressources/monitor/monitorsdk_types.py (1)

IMonitorSDK (9-176)

tests/test_datasetsdk.py (6)

basalt/sdk/datasetsdk.py (5)

DatasetSDK (18-179)

get (50-70)

add_row (72-110)

add_row (135-179)

get_dataset_object (112-133)

basalt/utils/logger.py (1)

Logger (3-26)

basalt/utils/dtos.py (2)

DatasetDTO (137-151)

DatasetRowDTO (155-169)

basalt/endpoints/list_datasets.py (2)

ListDatasetsEndpoint (32-64)

ListDatasetsEndpointResponse (10-29)

basalt/endpoints/get_dataset.py (2)

GetDatasetEndpoint (37-72)

GetDatasetEndpointResponse (10-34)

basalt/endpoints/create_dataset_item.py (2)

CreateDatasetItemEndpoint (39-87)

CreateDatasetItemEndpointResponse (10-36)

basalt/objects/dataset.py (5)

basalt/ressources/datasets/dataset_types.py (4)

DatasetRow (18-43)

from_dict (28-43)

from_dict (57-68)

Dataset (47-68)

basalt/endpoints/get_dataset.py (1)

from_dict (18-34)

basalt/endpoints/create_dataset_item.py (1)

from_dict (19-36)

basalt/utils/dtos.py (7)

from_dict (23-33)

from_dict (43-49)

from_dict (59-65)

from_dict (87-96)

from_dict (117-125)

from_dict (145-151)

from_dict (163-169)

basalt/sdk/datasetsdk.py (1)

get (50-70)

examples/dataset_example.py (3)

basalt/basaltsdk.py (1)

datasets (26-28)

basalt/basalt_facade.py (1)

datasets (63-67)

basalt/sdk/datasetsdk.py (4)

get (50-70)

add_row (72-110)

add_row (135-179)

get_dataset_object (112-133)

basalt/sdk/datasetsdk.py (9)

basalt/utils/dtos.py (12)

ListDatasetsDTO (173-175)

GetDatasetDTO (179-181)

CreateDatasetItemDTO (185-191)

DatasetDTO (137-151)

DatasetRowDTO (155-169)

from_dict (23-33)

from_dict (43-49)

from_dict (59-65)

from_dict (87-96)

from_dict (117-125)

from_dict (145-151)

from_dict (163-169)

basalt/utils/protocols.py (10)

IApi (17-18)

ILogger (48-51)

invoke (18-18)

datasets (46-46)

get (10-10)

get (30-30)

get (36-36)

error (51-51)

add_row (37-38)

warn (49-49)

examples/dataset_example.py (3)

list_datasets (28-43)

get_dataset (45-68)

create_dataset_item (70-105)

basalt/endpoints/list_datasets.py (2)

ListDatasetsEndpoint (32-64)

from_dict (17-29)

basalt/endpoints/get_dataset.py (2)

GetDatasetEndpoint (37-72)

from_dict (18-34)

basalt/endpoints/create_dataset_item.py (2)

CreateDatasetItemEndpoint (39-87)

from_dict (19-36)

basalt/objects/dataset.py (4)

Dataset (54-90)

DatasetRow (10-50)

from_dict (35-50)

from_dict (73-90)

basalt/basaltsdk.py (1)

datasets (26-28)

basalt/basalt_facade.py (1)

datasets (63-67)

basalt/endpoints/get_dataset.py (3)

basalt/utils/dtos.py (9)

DatasetDTO (137-151)

GetDatasetDTO (179-181)

from_dict (23-33)

from_dict (43-49)

from_dict (59-65)

from_dict (87-96)

from_dict (117-125)

from_dict (145-151)

from_dict (163-169)

basalt/endpoints/create_dataset_item.py (3)

from_dict (19-36)

prepare_request (44-71)

decode_response (74-87)

basalt/endpoints/list_datasets.py (3)

from_dict (17-29)

prepare_request (37-48)

decode_response (51-64)

basalt/endpoints/create_dataset_item.py (4)

basalt/utils/dtos.py (9)

DatasetRowDTO (155-169)

CreateDatasetItemDTO (185-191)

from_dict (23-33)

from_dict (43-49)

from_dict (59-65)

from_dict (87-96)

from_dict (117-125)

from_dict (145-151)

from_dict (163-169)

basalt/endpoints/get_dataset.py (2)

from_dict (18-34)

decode_response (59-72)

basalt/endpoints/list_datasets.py (2)

from_dict (17-29)

decode_response (51-64)

basalt/objects/dataset.py (2)

from_dict (35-50)

from_dict (73-90)

basalt/utils/dtos.py (6)

basalt/endpoints/get_dataset.py (1)

from_dict (18-34)

basalt/endpoints/create_dataset_item.py (1)

from_dict (19-36)

basalt/endpoints/list_datasets.py (1)

from_dict (17-29)

basalt/objects/dataset.py (2)

from_dict (35-50)

from_dict (73-90)

basalt/ressources/datasets/dataset_types.py (2)

from_dict (28-43)

from_dict (57-68)

basalt/sdk/datasetsdk.py (1)

get (50-70)

🪛 Ruff (0.11.9)

basalt/utils/protocols.py

1-1: typing.List imported but unused

Remove unused import: typing.List

(F401)

tests/test_datasetsdk.py

3-3: parameterized.parameterized imported but unused

Remove unused import: parameterized.parameterized

(F401)

examples/dataset_example.py

16-16: typing.Dict imported but unused

Remove unused import

(F401)

16-16: typing.List imported but unused

Remove unused import

(F401)

16-16: typing.Optional imported but unused

Remove unused import

(F401)

basalt/sdk/datasetsdk.py

4-4: typing.List imported but unused

Remove unused import

(F401)

4-4: typing.Tuple imported but unused

Remove unused import

(F401)

9-9: ..utils.dtos.DatasetRowDTO imported but unused

Remove unused import: ..utils.dtos.DatasetRowDTO

(F401)

135-135: Redefinition of unused add_row from line 72

(F811)

🪛 Flake8 (7.2.0)

basalt/utils/protocols.py

[error] 1-1: 'typing.List' imported but unused

(F401)

[error] 34-34: expected 2 blank lines, found 1

(E302)

[error] 38-38: continuation line over-indented for visual indent

(E127)

[error] 40-40: expected 2 blank lines, found 1

(E302)

basalt/endpoints/list_datasets.py

[error] 9-9: expected 2 blank lines, found 1

(E302)

tests/test_datasetsdk.py

[error] 3-3: 'parameterized.parameterized' imported but unused

(F401)

examples/dataset_example.py

[error] 16-16: 'typing.Dict' imported but unused

(F401)

[error] 16-16: 'typing.List' imported but unused

(F401)

[error] 16-16: 'typing.Optional' imported but unused

(F401)

[error] 28-28: expected 2 blank lines, found 1

(E302)

[error] 45-45: expected 2 blank lines, found 1

(E302)

[error] 70-70: expected 2 blank lines, found 1

(E302)

[error] 107-107: expected 2 blank lines, found 1

(E302)

[error] 137-137: expected 2 blank lines, found 1

(E302)

[error] 156-156: expected 2 blank lines after class or function definition, found 1

(E305)

basalt/sdk/datasetsdk.py

[error] 4-4: 'typing.List' imported but unused

(F401)

[error] 4-4: 'typing.Tuple' imported but unused

(F401)

[error] 6-6: '..utils.dtos.DatasetRowDTO' imported but unused

(F401)

[error] 26-26: continuation line with same indent as next logical line

(E125)

[error] 135-135: redefinition of unused 'add_row' from line 72

(F811)

basalt/endpoints/get_dataset.py

[error] 9-9: expected 2 blank lines, found 1

(E302)

basalt/endpoints/create_dataset_item.py

[error] 9-9: expected 2 blank lines, found 1

(E302)

basalt/utils/dtos.py

[error] 133-133: too many blank lines (3)

(E303)

[error] 136-136: expected 2 blank lines, found 1

(E302)

🪛 Pylint (3.3.7)

basalt/utils/protocols.py

[refactor] 37-37: Too many arguments (6/5)

(R0913)

[refactor] 37-37: Too many positional arguments (6/5)

(R0917)

basalt/sdk/datasetsdk.py

[refactor] 72-72: Too many arguments (6/5)

(R0913)

[refactor] 72-72: Too many positional arguments (6/5)

(R0917)

[error] 135-135: method already defined line 72

(E0102)

[refactor] 135-135: Too many arguments (6/5)

(R0913)

[refactor] 135-135: Too many positional arguments (6/5)

(R0917)

🪛 GitHub Actions: Python SDK Tests

tests/test_datasetsdk.py

[error] 119-119: TypeError: DatasetSDK.add_row() got an unexpected keyword argument 'slug' in test_create_dataset_item

🔇 Additional comments (22)

basalt/ressources/datasets/__init__.py (1)

1-3: LGTM! Clean module initialization.

The docstring appropriately describes the module's purpose and follows Python conventions for package initialization.

basalt/basaltsdk.py (3)

1-1: LGTM! Proper import addition.

The IDatasetSDK import is correctly added to support the new dataset functionality.

10-13: LGTM! Consistent constructor pattern.

The constructor follows the established pattern of accepting SDK dependencies and storing them as private attributes.

25-28: LGTM! Consistent property implementation.

The datasets property follows the same pattern as the existing prompt and monitor properties, maintaining interface consistency.

basalt/basalt_facade.py (3)

2-2: LGTM! Proper imports for dataset integration.

The imports correctly add the necessary DatasetSDK class and IDatasetSDK protocol interface.

Also applies to: 5-5

44-46: LGTM! Consistent SDK instantiation and integration.

The DatasetSDK instantiation follows the same pattern as MonitorSDK, using appropriate dependencies, and is correctly passed to the BasaltSDK constructor.

62-67: LGTM! Consistent property exposure.

The datasets property follows the established facade pattern, providing consistent access to the DatasetSDK instance.

basalt/endpoints/list_datasets.py (1)

32-64: LGTM! Solid endpoint implementation with proper error handling.

The endpoint class follows good patterns with clear request preparation and robust response decoding that includes proper exception handling.
basalt/objects/dataset.py (3)

9-50: Well-implemented DatasetRow class with proper serialization.

The DatasetRow implementation correctly handles serialization between Python naming conventions (ideal_output) and API conventions (idealOutput). The to_dict() and from_dict() methods provide clean conversion for API interactions.

53-90: Dataset class implementation looks good.

The Dataset class properly handles nested DatasetRow objects and provides clean serialization methods. The from_dict() method correctly converts raw row data into DatasetRow instances.

1-91: ```shell
#!/bin/bash
echo "=== Locating any dataset_types.py files ==="
find . -type f -name 'dataset_types.py' || echo "No dataset_types.py found"

echo
echo "=== Searching repository for references to dataset_types ==="
rg -n "dataset_types" || echo "No references to dataset_types"

echo
echo "=== Searching for DatasetRowValue definition ==="
rg -n "class DatasetRowValue" || echo "No DatasetRowValue class found"
</details>
<details>
<summary>tests/test_datasetsdk.py (1)</summary>

`63-175`: **Well-structured test suite with comprehensive coverage.**

The test cases cover all major SDK operations including listing, retrieval, creation, and error handling. The mock setup is properly structured and assertions are thorough.

</details>
<details>
<summary>basalt/endpoints/get_dataset.py (1)</summary>

`10-72`: **Solid endpoint implementation following established patterns.**

The endpoint correctly handles both success and error responses, follows the same structure as other endpoints in the codebase, and properly integrates with the DTO system.

</details>
<details>
<summary>basalt/sdk/datasetsdk.py (2)</summary>

`72-110`: **First add_row method implementation is correct.**

This method properly handles adding rows by dataset slug and follows the established error handling patterns.

---

`112-133`: **get_dataset_object method is well implemented.**

The method properly converts DTOs to domain objects and handles error cases appropriately with logging.

</details>
<details>
<summary>basalt/ressources/datasets/dataset_types.py (4)</summary>

`8-15`: **DatasetRowValue class is well-defined.**

Simple, clear dataclass for representing labeled values in dataset rows.

---

`17-43`: **DatasetRow implementation handles flexible input formats.**

The from_dict method intelligently handles both list and dictionary formats for values, providing good flexibility for API responses.

---

`46-68`: **Dataset class properly manages rows collection.**

Clean implementation with proper handling of nested DatasetRow objects.

---

`1-69`: ```shell
#!/bin/bash
set -e

echo "=== 1. References to dataset_types.py ==="
rg -n "dataset_types" || echo "No references to dataset_types.py found"

echo -e "\n=== 2. Usage of DatasetRowValue outside its own file ==="
rg -n "DatasetRowValue" | grep -v "dataset_types.py" || echo "No external references to DatasetRowValue"

echo -e "\n=== 3. Instantiations of DatasetRow outside definitions ==="
rg -n "DatasetRow\(" | grep -Ev "dataset_types.py|objects/dataset.py" || echo "No external instantiations of DatasetRow"

echo -e "\n=== 4. Instantiations of Dataset outside definitions ==="
rg -n "Dataset\(" | grep -v "objects/dataset.py" || echo "No external instantiations of Dataset"
examples/dataset_example.py (1)

28-155: LGTM! Comprehensive dataset SDK example.

The example effectively demonstrates all key dataset operations:

Listing datasets

Getting dataset details

Creating dataset items

Working with dataset objects

The error handling is appropriate and the code follows established SDK patterns.

basalt/endpoints/create_dataset_item.py (1)

39-87: LGTM! Endpoint implementation follows established patterns.

The endpoint class correctly:

Handles optional fields in request preparation

Follows the same decode_response pattern as other endpoints

Uses appropriate error handling and response structure

basalt/utils/dtos.py (1)

135-197: LGTM! Dataset DTOs follow established patterns.

The new dataset DTOs are well-structured and consistent with existing patterns in the file:

Proper dataclass definitions with type annotations

Appropriate from_dict class methods for deserialization

Result type aliases follow the established tuple pattern

DTOs cover all necessary dataset operations

coderabbitai · 2025-06-27T11:51:40Z

+        err, row, warning = self.dataset_sdk.add_row(
+            slug="test-dataset",
+            values=values,
+            name="New Row",
+            ideal_output="New ideal output",
+            metadata={"source": "test"}
+        )


⚠️ Potential issue

Fix method signature for add_row call.

The pipeline failure indicates that DatasetSDK.add_row() doesn't accept a slug keyword argument. Based on the SDK implementation, the method signature issue needs to be resolved.

Looking at the SDK implementation, there are two add_row methods with different signatures. This test appears to be calling the wrong overload. The fix depends on the intended SDK design.

🧰 Tools

🪛 GitHub Actions: Python SDK Tests

[error] 119-119: TypeError: DatasetSDK.add_row() got an unexpected keyword argument 'slug' in test_create_dataset_item

🤖 Prompt for AI Agents

In tests/test_datasetsdk.py around lines 119 to 125, the call to add_row incorrectly uses a slug keyword argument which is not accepted by DatasetSDK.add_row(). Review the SDK implementation to determine the correct method signature and remove or replace the slug argument accordingly. Adjust the call to match the intended overload by passing only the supported parameters as per the SDK design.

coderabbitai · 2025-06-27T11:51:41Z

+    datasetRow: DatasetRowDTO
+    warning: Optional[str] = None
+    error: Optional[str] = None


⚠️ Potential issue

Fix type consistency in error handling.

The datasetRow field is typed as DatasetRowDTO (non-optional) but gets set to None when there's an error. This creates a type inconsistency.

- datasetRow: DatasetRowDTO + datasetRow: Optional[DatasetRowDTO]

Also applies to: 29-30

🤖 Prompt for AI Agents

In basalt/endpoints/create_dataset_item.py around lines 14 to 16 and also lines 29 to 30, the datasetRow field is typed as non-optional DatasetRowDTO but is assigned None in error cases, causing type inconsistency. Change the type annotation of datasetRow to Optional[DatasetRowDTO] to allow None values, ensuring type consistency when errors occur.

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (3)

tests/test_datasetsdk.py (1)
119-125: Verify method signature consistency.

Based on past review comments, there was a pipeline failure indicating that DatasetSDK.add_row() doesn't accept a slug keyword argument. However, the current test code matches the protocol definition in protocols.py. Please ensure the actual implementation matches the protocol.
#!/bin/bash
# Description: Verify the DatasetSDK.addRow method signature matches the protocol
# Expected: Method should accept slug as first parameter

ast-grep --pattern $'class DatasetSDK {
  $$$
  def addRow($_, $$$) {
    $$$
  }
  $$$
}'
basalt/endpoints/create_dataset_item.py (2)
14-16: Fix type consistency in error handling.

The datasetRow field is typed as DatasetRowDTO (non-optional) but gets set to None when there's an error. This creates a type inconsistency.
-    datasetRow: DatasetRowDTO
+    datasetRow: Optional[DatasetRowDTO]
29-30: Type inconsistency continues in from_dict method.

This line confirms the type issue where None is assigned to a field typed as non-optional.

The fix in the dataclass definition will resolve this assignment as well.

🧹 Nitpick comments (9)

basalt/utils/protocols.py (2)
1-1: Remove unused import.

The List import is not used in this file and should be removed to keep imports clean.
-from typing import Any, Optional, Protocol, Hashable, Tuple, TypeVar, Dict, Mapping, Literal, List
+from typing import Any, Optional, Protocol, Hashable, Tuple, TypeVar, Dict, Mapping, Literal
40-46: Fix formatting issue.

Missing blank lines before class definition.
+
+
 class IBasaltSDK(Protocol):
     @property
     def prompt(self) -> IPromptSDK: ...
     @property
     def monitor(self) -> IMonitorSDK: ...
     @property
     def datasets(self) -> IDatasetSDK: ...
basalt/endpoints/list_datasets.py (1)
9-9: Fix formatting issue.

Missing blank lines before class definition.
+
+
 @dataclass
 class ListDatasetsEndpointResponse:
tests/test_datasetsdk.py (1)
3-3: Remove unused import.

The parameterized import is not used in this test file and should be removed.
-from parameterized import parameterized
basalt/endpoints/get_dataset.py (1)
9-9: Fix formatting issue.

Missing blank lines before class definition.
+
+
 @dataclass
 class GetDatasetEndpointResponse:
basalt/sdk/datasetsdk.py (2)
4-4: Clean up unused imports.

Several imports are not used in this file and should be removed to keep the code clean.
-from typing import Dict, List, Optional, Tuple, Any
+from typing import Dict, Optional, Any
9-9: Remove unused DatasetRowDTO import.

The DatasetRowDTO import is not used in this file.
from ..utils.dtos import (
    ListDatasetsDTO, GetDatasetDTO, CreateDatasetItemDTO,
    ListDatasetsResult, GetDatasetResult, CreateDatasetItemResult,
-    DatasetDTO, DatasetRowDTO
+    DatasetDTO
)
examples/dataset_example.py (1)
16-16: Remove unused imports.

Several type imports are not used in this file.
-from typing import Dict, List, Optional
basalt/utils/dtos.py (1)
133-134: Fix formatting issues.

Remove excessive blank lines to follow PEP8 guidelines.
-

-
ListResult = Tuple[Optional[Exception], Optional[List[PromptListResponse]]]
+ListResult = Tuple[Optional[Exception], Optional[List[PromptListResponse]]]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e12146a and e0d6c16.

📒 Files selected for processing (14)

basalt/basalt_facade.py (3 hunks)
basalt/basaltsdk.py (3 hunks)
basalt/endpoints/create_dataset_item.py (1 hunks)
basalt/endpoints/get_dataset.py (1 hunks)
basalt/endpoints/list_datasets.py (1 hunks)
basalt/objects/dataset.py (1 hunks)
basalt/ressources/datasets/__init__.py (1 hunks)
basalt/ressources/datasets/dataset_types.py (1 hunks)
basalt/sdk/datasetsdk.py (1 hunks)
basalt/utils/dtos.py (1 hunks)
basalt/utils/protocols.py (2 hunks)
examples/dataset_example.py (1 hunks)
examples/dataset_sdk_demo.ipynb (1 hunks)
tests/test_datasetsdk.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

basalt/ressources/datasets/init.py
basalt/basaltsdk.py
basalt/basalt_facade.py
basalt/objects/dataset.py
basalt/ressources/datasets/dataset_types.py

🧰 Additional context used

🧬 Code Graph Analysis (3)

basalt/endpoints/get_dataset.py (3)

basalt/utils/dtos.py (9)

DatasetDTO (137-151)

GetDatasetDTO (179-181)

from_dict (23-33)

from_dict (43-49)

from_dict (59-65)

from_dict (87-96)

from_dict (117-125)

from_dict (145-151)

from_dict (163-169)

basalt/utils/protocols.py (2)

prepare_request (14-14)

decode_response (15-15)

basalt/endpoints/list_datasets.py (3)

from_dict (17-29)

prepare_request (37-48)

decode_response (51-64)

basalt/endpoints/list_datasets.py (3)

basalt/utils/dtos.py (9)

DatasetDTO (137-151)

ListDatasetsDTO (173-175)

from_dict (23-33)

from_dict (43-49)

from_dict (59-65)

from_dict (87-96)

from_dict (117-125)

from_dict (145-151)

from_dict (163-169)

basalt/utils/protocols.py (3)

datasets (46-46)

prepare_request (14-14)

decode_response (15-15)

basalt/endpoints/get_dataset.py (3)

from_dict (18-34)

prepare_request (42-56)

decode_response (59-72)

examples/dataset_example.py (3)

basalt/basaltsdk.py (1)

datasets (26-28)

basalt/basalt_facade.py (1)

datasets (63-67)

basalt/sdk/datasetsdk.py (4)

get (50-70)

add_row_to_dataset (135-179)

addRow (72-110)

get_dataset_object (112-133)

🪛 Flake8 (7.2.0)

basalt/endpoints/create_dataset_item.py

[error] 9-9: expected 2 blank lines, found 1

(E302)

basalt/endpoints/get_dataset.py

[error] 9-9: expected 2 blank lines, found 1

(E302)

basalt/endpoints/list_datasets.py

[error] 9-9: expected 2 blank lines, found 1

(E302)

basalt/sdk/datasetsdk.py

[error] 4-4: 'typing.List' imported but unused

(F401)

[error] 4-4: 'typing.Tuple' imported but unused

(F401)

[error] 6-6: '..utils.dtos.DatasetRowDTO' imported but unused

(F401)

[error] 26-26: continuation line with same indent as next logical line

(E125)

basalt/utils/dtos.py

[error] 133-133: too many blank lines (3)

(E303)

[error] 136-136: expected 2 blank lines, found 1

(E302)

basalt/utils/protocols.py

[error] 1-1: 'typing.List' imported but unused

(F401)

[error] 34-34: expected 2 blank lines, found 1

(E302)

[error] 38-38: continuation line over-indented for visual indent

(E127)

[error] 40-40: expected 2 blank lines, found 1

(E302)

examples/dataset_example.py

[error] 16-16: 'typing.Dict' imported but unused

(F401)

[error] 16-16: 'typing.List' imported but unused

(F401)

[error] 16-16: 'typing.Optional' imported but unused

(F401)

[error] 28-28: expected 2 blank lines, found 1

(E302)

[error] 45-45: expected 2 blank lines, found 1

(E302)

[error] 70-70: expected 2 blank lines, found 1

(E302)

[error] 107-107: expected 2 blank lines, found 1

(E302)

[error] 137-137: expected 2 blank lines, found 1

(E302)

[error] 151-151: undefined name 'create_dataset_item'

(F821)

[error] 156-156: expected 2 blank lines after class or function definition, found 1

(E305)

tests/test_datasetsdk.py

[error] 3-3: 'parameterized.parameterized' imported but unused

(F401)

🪛 Ruff (0.11.9)

basalt/sdk/datasetsdk.py

4-4: typing.List imported but unused

Remove unused import

(F401)

4-4: typing.Tuple imported but unused

Remove unused import

(F401)

9-9: ..utils.dtos.DatasetRowDTO imported but unused

Remove unused import: ..utils.dtos.DatasetRowDTO

(F401)

basalt/utils/protocols.py

1-1: typing.List imported but unused

Remove unused import: typing.List

(F401)

examples/dataset_example.py

16-16: typing.Dict imported but unused

Remove unused import

(F401)

16-16: typing.List imported but unused

Remove unused import

(F401)

16-16: typing.Optional imported but unused

Remove unused import

(F401)

151-151: Undefined name create_dataset_item

(F821)

tests/test_datasetsdk.py

3-3: parameterized.parameterized imported but unused

Remove unused import: parameterized.parameterized

(F401)

🪛 Pylint (3.3.7)

basalt/sdk/datasetsdk.py

[refactor] 72-72: Too many arguments (6/5)

(R0913)

[refactor] 72-72: Too many positional arguments (6/5)

(R0917)

[refactor] 135-135: Too many arguments (6/5)

(R0913)

[refactor] 135-135: Too many positional arguments (6/5)

(R0917)

basalt/utils/protocols.py

[refactor] 37-37: Too many arguments (6/5)

(R0913)

[refactor] 37-37: Too many positional arguments (6/5)

(R0917)

examples/dataset_example.py

[error] 151-151: Undefined variable 'create_dataset_item'

(E0602)

🔇 Additional comments (6)

basalt/endpoints/list_datasets.py (1)

37-48: Consider if the unused dto parameter is intentional.

The prepare_request method accepts a ListDatasetsDTO parameter but doesn't use it. This is consistent with a list endpoint that doesn't require parameters, but please verify this is the intended design.

If the parameter is not needed, consider updating the protocol or documenting why it's unused.

tests/test_datasetsdk.py (1)

63-175: Excellent test coverage!

The test suite provides comprehensive coverage of all dataset SDK functionality including:

Listing datasets with proper mock setup

Getting specific datasets with validation

Creating dataset items with all parameters

Error handling scenarios

Object conversion methods

The test structure is well-organized and follows good testing practices.

basalt/endpoints/get_dataset.py (1)

37-72: Well-implemented endpoint following established patterns.

The endpoint implementation correctly follows the same pattern as other endpoints with proper request preparation and response decoding. The error handling approach is consistent and appropriate.

examples/dataset_sdk_demo.ipynb (1)

1-234: Excellent comprehensive example demonstrating dataset SDK usage.

This Jupyter notebook provides outstanding educational value with:

Progressive learning structure: Each section builds on the previous one

Complete operation coverage: Lists, retrieval, creation, and error handling

Robust error handling: Shows both basic and advanced error handling patterns

Realistic examples: Uses practical data and scenarios

Clear documentation: Well-commented code with explanatory markdown

The notebook effectively demonstrates best practices for:

SDK initialization and configuration

Safe error handling with try-catch blocks

Dynamic value generation with timestamps

Comprehensive parameter usage (values, names, metadata, ideal outputs)

Edge case testing (invalid slugs, incomplete data)

This serves as an excellent reference for users learning to integrate the dataset SDK.

basalt/sdk/datasetsdk.py (1)

18-179: Well-structured DatasetSDK implementation.

The DatasetSDK class provides a clean interface for dataset operations with proper error handling and method separation. The previous method redefinition issue has been properly resolved with distinct methods serving different purposes.

basalt/utils/dtos.py (1)

135-197: Well-structured dataset DTOs.

The dataset-related DTOs are well-designed with proper type annotations, default values, and from_dict class methods. They provide a clean interface for dataset operations.

coderabbitai · 2025-06-27T12:28:36Z

+class IDatasetSDK(Protocol):
+    def list(self) -> ListDatasetsResult: ...
+    def get(self, slug: str) -> GetDatasetResult: ...
+    def addRow(self, slug: str, values: Dict[str, str], name: Optional[str] = None, 
+                  ideal_output: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateDatasetItemResult: ...


🛠️ Refactor suggestion

Fix formatting issues and consider simplifying method signature.

The code has formatting issues and the addRow method has too many parameters, which can make it difficult to use and maintain.

+ + class IDatasetSDK(Protocol): def list(self) -> ListDatasetsResult: ... def get(self, slug: str) -> GetDatasetResult: ... - def addRow(self, slug: str, values: Dict[str, str], name: Optional[str] = None, - ideal_output: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateDatasetItemResult: ... + def addRow(self, slug: str, values: Dict[str, str], name: Optional[str] = None, + ideal_output: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> CreateDatasetItemResult: ...

Consider creating a dedicated DTO for the addRow parameters to reduce the method signature complexity:

def addRow(self, slug: str, row_data: CreateDatasetRowDTO) -> CreateDatasetItemResult: ...

🧰 Tools

🪛 Flake8 (7.2.0)

[error] 34-34: expected 2 blank lines, found 1

(E302)

[error] 38-38: continuation line over-indented for visual indent

(E127)

🪛 Pylint (3.3.7)

[refactor] 37-37: Too many arguments (6/5)

(R0913)

[refactor] 37-37: Too many positional arguments (6/5)

(R0917)

🤖 Prompt for AI Agents

In basalt/utils/protocols.py around lines 34 to 38, fix the formatting issues by properly aligning the method parameters and line breaks. Simplify the addRow method signature by creating a dedicated data transfer object (DTO) to encapsulate the parameters currently passed individually (values, name, ideal_output, metadata). Then update the addRow method to accept this DTO as a single argument along with slug, reducing complexity and improving maintainability.

coderabbitai · 2025-06-27T12:28:36Z

+        return cls(
+            datasets=[DatasetDTO.from_dict(dataset) for dataset in data["datasets"]],
+        )


🛠️ Refactor suggestion

Add error handling for missing datasets key.

The method assumes the "datasets" key exists in the response data, which could cause a KeyError if the API response format changes or is malformed.

return cls( - datasets=[DatasetDTO.from_dict(dataset) for dataset in data["datasets"]], + datasets=[DatasetDTO.from_dict(dataset) for dataset in data.get("datasets", [])], )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

return cls(

datasets=[DatasetDTO.from_dict(dataset) for dataset in data["datasets"]],

)

return cls(

datasets=[DatasetDTO.from_dict(dataset) for dataset in data.get("datasets", [])],

)

🤖 Prompt for AI Agents

In basalt/endpoints/list_datasets.py around lines 27 to 29, the code assumes the "datasets" key is always present in the data dictionary, which can cause a KeyError if it is missing. Add error handling by checking if the "datasets" key exists in the data before accessing it. If the key is missing, handle the situation gracefully by either returning an empty list or raising a clear exception with an informative message.

coderabbitai · 2025-06-27T12:28:36Z

+    dataset: DatasetDTO
+    error: Optional[str] = None


⚠️ Potential issue

Fix type inconsistency in dataclass field.

The dataset field is typed as DatasetDTO but can be set to None in the error case, causing a type mismatch.

- dataset: DatasetDTO + dataset: Optional[DatasetDTO] error: Optional[str] = None

This ensures type consistency throughout the error handling logic.

Also applies to: 28-29

🤖 Prompt for AI Agents

In basalt/endpoints/get_dataset.py at lines 14-15 and 28-29, the dataset field is typed as DatasetDTO but can be None in error cases, causing a type inconsistency. Update the type annotation of the dataset field to Optional[DatasetDTO] to allow None values and ensure type consistency in error handling.

coderabbitai · 2025-06-27T12:28:36Z

+            return None
+
+        if warning:
+            self._logger.warn(f"Warning when adding row to dataset {dataset.slug}: {warning}")


🛠️ Refactor suggestion

Use correct logger method name.

The logger method should be warning instead of warn for consistency with Python's logging standards.

- self._logger.warn(f"Warning when adding row to dataset {dataset.slug}: {warning}") + self._logger.warning(f"Warning when adding row to dataset {dataset.slug}: {warning}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self._logger.warn(f"Warning when adding row to dataset {dataset.slug}: {warning}")

self._logger.warning(f"Warning when adding row to dataset {dataset.slug}: {warning}")

🤖 Prompt for AI Agents

In basalt/sdk/datasetsdk.py at line 169, replace the logger method call from self._logger.warn to self._logger.warning to align with Python's standard logging method names.

coderabbitai · 2025-06-27T12:28:36Z

+    get_dataset(dataset_slug)
+
+    # Create a new item in the dataset
+    create_dataset_item(dataset_slug)


⚠️ Potential issue

Fix undefined function call.

The function create_dataset_item is not defined. It should be add_row_to_dataset based on the function definition at line 70.

- create_dataset_item(dataset_slug) + add_row_to_dataset(dataset_slug)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

create_dataset_item(dataset_slug)

- create_dataset_item(dataset_slug)

+ add_row_to_dataset(dataset_slug)

🧰 Tools

🪛 Ruff (0.11.9)

151-151: Undefined name create_dataset_item

(F821)

🪛 Flake8 (7.2.0)

[error] 151-151: undefined name 'create_dataset_item'

(F821)

🪛 Pylint (3.3.7)

[error] 151-151: Undefined variable 'create_dataset_item'

(E0602)

🤖 Prompt for AI Agents

In examples/dataset_example.py at line 151, the function call to create_dataset_item is undefined. Replace this call with add_row_to_dataset, which is the correctly defined function at line 70, to fix the undefined function error.

coderabbitai · 2025-06-27T12:28:36Z

+    row = basalt.datasets.add_row(
+        dataset=dataset,
+        values=values,
+        name="Object Example Row",
+        ideal_output="Object example ideal output",
+        metadata={"created_by": "dataset_object_example"}
+    )


⚠️ Potential issue

Fix incorrect method name.

The method call should be add_row_to_dataset instead of add_row based on the DatasetSDK implementation.

- row = basalt.datasets.add_row( + row = basalt.datasets.add_row_to_dataset( dataset=dataset, values=values, name="Object Example Row", ideal_output="Object example ideal output", metadata={"created_by": "dataset_object_example"} )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

row = basalt.datasets.add_row(

dataset=dataset,

values=values,

name="Object Example Row",

ideal_output="Object example ideal output",

metadata={"created_by": "dataset_object_example"}

)

row = basalt.datasets.add_row_to_dataset(

dataset=dataset,

values=values,

name="Object Example Row",

ideal_output="Object example ideal output",

metadata={"created_by": "dataset_object_example"}

)

🤖 Prompt for AI Agents

In examples/dataset_example.py around lines 121 to 127, the method called on basalt.datasets is incorrectly named add_row. Rename this method call to add_row_to_dataset to match the DatasetSDK implementation and ensure the code functions correctly.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tests/test_datasetsdk.py (1)

119-125: Method call appears correct based on current SDK implementation.

The test correctly calls addRow with the slug parameter, which matches the current SDK implementation signature. The past review comment about this issue appears to be outdated.

🧹 Nitpick comments (5)

tests/test_datasetsdk.py (1)
3-3: Remove unused import.

The parameterized import is not used anywhere in the test file.
-from parameterized import parameterized
basalt/sdk/datasetsdk.py (4)
4-4: Remove unused type imports.

Several type imports are not used in the current implementation.
-from typing import Dict, List, Optional, Tuple, Any
+from typing import Dict, Optional, Any
9-9: Remove unused DTO import.

DatasetRowDTO is imported but not used in this file.
-    DatasetDTO, DatasetRowDTO
+    DatasetDTO
15-15: Remove unused domain object imports.

The Dataset and DatasetRow classes are imported but not used in the current implementation.
-from ..objects.dataset import Dataset, DatasetRow
26-26: Fix indentation for consistency.

The continuation line should be indented differently from the next logical line.
         logger: ILogger
-        ):
+    ):

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0d6c16 and 568e8d1.

📒 Files selected for processing (3)

basalt/sdk/datasetsdk.py (1 hunks)
examples/dataset_sdk_demo.ipynb (1 hunks)
tests/test_datasetsdk.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/dataset_sdk_demo.ipynb

🧰 Additional context used

🪛 Ruff (0.11.9)

tests/test_datasetsdk.py

3-3: parameterized.parameterized imported but unused

Remove unused import: parameterized.parameterized

(F401)

basalt/sdk/datasetsdk.py

4-4: typing.List imported but unused

Remove unused import

(F401)

4-4: typing.Tuple imported but unused

Remove unused import

(F401)

9-9: ..utils.dtos.DatasetRowDTO imported but unused

Remove unused import: ..utils.dtos.DatasetRowDTO

(F401)

15-15: ..objects.dataset.Dataset imported but unused

Remove unused import

(F401)

15-15: ..objects.dataset.DatasetRow imported but unused

Remove unused import

(F401)

🪛 Flake8 (7.2.0)

tests/test_datasetsdk.py

[error] 3-3: 'parameterized.parameterized' imported but unused

(F401)

basalt/sdk/datasetsdk.py

[error] 4-4: 'typing.List' imported but unused

(F401)

[error] 4-4: 'typing.Tuple' imported but unused

(F401)

[error] 6-6: '..utils.dtos.DatasetRowDTO' imported but unused

(F401)

[error] 15-15: '..objects.dataset.Dataset' imported but unused

(F401)

[error] 15-15: '..objects.dataset.DatasetRow' imported but unused

(F401)

[error] 26-26: continuation line with same indent as next logical line

(E125)

🪛 Pylint (3.3.7)

basalt/sdk/datasetsdk.py

[refactor] 72-72: Too many arguments (6/5)

(R0913)

[refactor] 72-72: Too many positional arguments (6/5)

(R0917)

🔇 Additional comments (6)

tests/test_datasetsdk.py (3)

70-88: LGTM! Well-structured test for list functionality.

The test properly mocks the API response, verifies the correct endpoint usage, and validates the returned data structure.

89-111: LGTM! Comprehensive test for get functionality.

The test validates both the API interaction and DTO construction, ensuring proper data flow through the SDK.

145-157: LGTM! Good error handling test coverage.

The test properly verifies that API errors are propagated correctly through the SDK.

basalt/sdk/datasetsdk.py (3)

30-48: LGTM! Clean implementation of list functionality.

The method properly handles API errors and transforms the response data into the expected format.

50-70: LGTM! Robust get method with proper error handling.

The implementation correctly handles both API errors and application-level errors from the response.

72-110: LGTM! Well-implemented addRow method.

The method signature and implementation look correct, properly handling all the optional parameters and error cases. The past review comment about method redefinition appears to have been resolved.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (6)

basalt/objects/dataset.py (2)
28-33: Remove commented-out dead code.

The commented code should be removed as it serves no purpose and clutters the implementation.
-        # if self.name:
-        #     result["name"] = self.name
-            
-        # if self.ideal_output:
-        #     result["idealOutput"] = self.ideal_output
-            
49-50: Optimize .get() calls by removing redundant None defaults.

The static analysis correctly identifies that .get("key", None) can be simplified to .get("key") since None is the default.
-            name=data.get("name", None),
-            ideal_output=data.get("idealOutput", None),
+            name=data.get("name"),
+            idealOutput=data.get("idealOutput"),
basalt/ressources/datasets/dataset_types.py (2)
40-41: Optimize .get() calls by removing redundant None defaults.

Static analysis correctly identifies redundant None defaults.
-            name=data.get("name", None),
-            idealOutput=data.get("idealOutput", None),
+            name=data.get("name"),
+            idealOutput=data.get("idealOutput"),
8-15: Consider adding validation to DatasetRowValue.

The DatasetRowValue class accepts any strings but might benefit from basic validation.
 @dataclass
 class DatasetRowValue:
     """
     A value in a dataset row
     """
     label: str
     value: str
+    
+    def __post_init__(self):
+        if not self.label:
+            raise ValueError("DatasetRowValue label cannot be empty")
basalt/utils/dtos.py (2)
166-167: Optimize .get() calls by removing redundant None defaults.

Consistent with the static analysis feedback across all files.
-            name=data.get("name", None),
-            idealOutput=data.get("idealOutput", None),
+            name=data.get("name"),
+            idealOutput=data.get("idealOutput"),
133-136: Fix formatting issues with blank lines.

The static analysis correctly identifies formatting issues.
-

-

-# ------------------------------ Datasets ----------------------------- #
+
+# ------------------------------ Datasets ----------------------------- #

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 568e8d1 and 3d867c9.

📒 Files selected for processing (4)

basalt/objects/dataset.py (1 hunks)
basalt/ressources/datasets/dataset_types.py (1 hunks)
basalt/utils/dtos.py (1 hunks)
examples/dataset_sdk_demo.ipynb (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/dataset_sdk_demo.ipynb

🧰 Additional context used

🧬 Code Graph Analysis (1)

basalt/objects/dataset.py (4)

basalt/ressources/datasets/dataset_types.py (4)

DatasetRow (18-43)

from_dict (28-43)

from_dict (57-68)

Dataset (47-68)

basalt/endpoints/get_dataset.py (1)

from_dict (18-34)

basalt/utils/dtos.py (7)

from_dict (23-33)

from_dict (43-49)

from_dict (59-65)

from_dict (87-96)

from_dict (117-125)

from_dict (145-151)

from_dict (163-169)

basalt/sdk/datasetsdk.py (1)

get (50-70)

🪛 Ruff (0.11.9)

basalt/objects/dataset.py

49-49: Use data.get("name") instead of data.get("name", None)

Replace data.get("name", None) with data.get("name")

(SIM910)

50-50: Use data.get("idealOutput") instead of data.get("idealOutput", None)

Replace data.get("idealOutput", None) with data.get("idealOutput")

(SIM910)

basalt/ressources/datasets/dataset_types.py

40-40: Use data.get("name") instead of data.get("name", None)

Replace data.get("name", None) with data.get("name")

(SIM910)

41-41: Use data.get("idealOutput") instead of data.get("idealOutput", None)

Replace data.get("idealOutput", None) with data.get("idealOutput")

(SIM910)

basalt/utils/dtos.py

166-166: Use data.get("name") instead of data.get("name", None)

Replace data.get("name", None) with data.get("name")

(SIM910)

167-167: Use data.get("idealOutput") instead of data.get("idealOutput", None)

Replace data.get("idealOutput", None) with data.get("idealOutput")

(SIM910)

🪛 Flake8 (7.2.0)

basalt/objects/dataset.py

[error] 24-24: indentation contains mixed spaces and tabs

(E101)

[error] 24-24: continuation line unaligned for hanging indent

(E131)

[error] 25-25: indentation contains mixed spaces and tabs

(E101)

basalt/utils/dtos.py

[error] 133-133: too many blank lines (3)

(E303)

[error] 136-136: expected 2 blank lines, found 1

(E302)

🔇 Additional comments (2)

basalt/objects/dataset.py (1)

14-14: Ignore structural inconsistency warning: data models serve different purposes
The SDK’s internal representation (basalt/objects/dataset.py with values: Dict[str, str]) is intentionally a simple key–value map, whereas the resource layer (basalt/ressources/datasets/dataset_types.py) uses List[DatasetRowValue] to model incoming/outgoing API payloads. No changes are required.

Likely an incorrect or invalid review comment.

basalt/utils/dtos.py (1)

135-197: Dataset DTO integration verified across endpoints, SDK, and tests.

All new DTO classes are referenced consistently in:

basalt/endpoints/list_datasets.py

basalt/endpoints/get_dataset.py

basalt/endpoints/create_dataset_item.py

basalt/sdk/datasetsdk.py

tests/test_datasetsdk.py

No misalignments with API contracts or usage patterns were found.

coderabbitai · 2025-06-30T09:57:12Z

+    """
+    values: Dict[str, str] 
+    name: Optional[str] = None
+    ideal_output: Optional[str] = None


🛠️ Refactor suggestion

Inconsistent naming convention with other dataset classes.

This class uses ideal_output (snake_case) while similar classes in the codebase use idealOutput (camelCase). For consistency with the DTOs and API contracts, consider using camelCase.

- ideal_output: Optional[str] = None + idealOutput: Optional[str] = None

And update the corresponding references:

- "idealOutput": self.ideal_output + "idealOutput": self.idealOutput

- ideal_output=data.get("idealOutput", None), + idealOutput=data.get("idealOutput", None),

Also applies to: 24-25, 50-50

🤖 Prompt for AI Agents

In basalt/objects/dataset.py at lines 16, 24-25, and 50, the variable name ideal_output uses snake_case, which is inconsistent with the camelCase naming convention used in other dataset classes and API contracts. Rename ideal_output to idealOutput in the class definition and update all corresponding references within these lines to maintain consistency with the rest of the codebase.

coderabbitai · 2025-06-30T09:57:12Z

+						"name": self.name,
+						"idealOutput": self.ideal_output


⚠️ Potential issue

Fix indentation issues and mixed tabs/spaces.

Lines 24-25 contain mixed tabs and spaces which will cause syntax errors. The indentation is also incorrect for the dictionary structure.

- "name": self.name, - "idealOutput": self.ideal_output + "name": self.name, + "idealOutput": self.ideal_output

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"name": self.name,

"idealOutput": self.ideal_output

"name": self.name,

"idealOutput": self.ideal_output

🧰 Tools

🪛 Flake8 (7.2.0)

[error] 24-24: indentation contains mixed spaces and tabs

(E101)

[error] 24-24: continuation line unaligned for hanging indent

(E131)

[error] 25-25: indentation contains mixed spaces and tabs

(E101)

🤖 Prompt for AI Agents

In basalt/objects/dataset.py around lines 24 to 25, fix the indentation by replacing any tabs with spaces and ensure consistent indentation for the dictionary keys "name" and "idealOutput". Align these keys properly within the dictionary structure to avoid syntax errors caused by mixed tabs and spaces.

coderabbitai · 2025-06-30T09:57:13Z

+                values_list = [DatasetRowValue(**val) if isinstance(val, dict) else DatasetRowValue(label=val["label"], value=val["value"]) 
+                               for val in data["values"]]


🛠️ Refactor suggestion

Simplify and fix the complex value parsing logic.

The current logic has a redundant condition and could fail if val is not a dictionary in the list case.

- values_list = [DatasetRowValue(**val) if isinstance(val, dict) else DatasetRowValue(label=val["label"], value=val["value"]) - for val in data["values"]] + values_list = [ + DatasetRowValue(**val) if isinstance(val, dict) and "label" in val and "value" in val + else DatasetRowValue(label=str(val), value=str(val)) + for val in data["values"] + ]

This handles cases where list items might not be proper dictionaries and provides a fallback conversion.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

values_list = [DatasetRowValue(**val) if isinstance(val, dict) else DatasetRowValue(label=val["label"], value=val["value"])

for val in data["values"]]

values_list = [

DatasetRowValue(**val) if isinstance(val, dict) and "label" in val and "value" in val

else DatasetRowValue(label=str(val), value=str(val))

for val in data["values"]

]

🤖 Prompt for AI Agents

In basalt/ressources/datasets/dataset_types.py around lines 33 to 34, the current list comprehension for creating values_list has a redundant condition and may fail if val is not a dictionary. Simplify the logic by assuming val is always a dictionary and directly unpack it into DatasetRowValue, or add a safe fallback that handles non-dictionary items gracefully by extracting label and value keys if present. This will make the code more robust and easier to maintain.

coderabbitai · 2025-06-30T09:57:13Z

+    values: Dict[str, str]
+    name: Optional[str] = None
+    idealOutput: Optional[str] = None
+    metadata: Dict[str, Any] = None


⚠️ Potential issue

Fix mutable default for metadata field.

Same issue as the rows field - use field(default_factory=dict) for mutable defaults.

- metadata: Dict[str, Any] = None + metadata: Dict[str, Any] = field(default_factory=dict)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

metadata: Dict[str, Any] = None

metadata: Dict[str, Any] = field(default_factory=dict)

🤖 Prompt for AI Agents

In basalt/utils/dtos.py at line 160, the metadata field is assigned a mutable default value None, which should be replaced with a safe mutable default. Change the metadata field definition to use field(default_factory=dict) to avoid shared mutable defaults, similar to the rows field fix.

coderabbitai · 2025-06-30T09:57:13Z

+    slug: str
+    name: str
+    columns: List[str]
+    rows: List['DatasetRowDTO'] = None


⚠️ Potential issue

Fix mutable default for rows field.

Using None as a default for a mutable field in a dataclass is problematic. Use field(default_factory=list) instead.

- rows: List['DatasetRowDTO'] = None + rows: List['DatasetRowDTO'] = field(default_factory=list)

You'll need to import field from dataclasses if it's not already imported.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rows: List['DatasetRowDTO'] = None

rows: List['DatasetRowDTO'] = field(default_factory=list)

🤖 Prompt for AI Agents

In basalt/utils/dtos.py at line 142, the rows field in the dataclass uses None as a default for a mutable list, which can cause shared mutable state issues. Replace the default None with field(default_factory=list) to ensure each instance gets its own list. Also, import field from dataclasses if it is not already imported.

0xthc force-pushed the implement-datasets-sdk branch from 47ef118 to e12146a Compare June 27, 2025 11:46

coderabbitai Bot reviewed Jun 27, 2025

View reviewed changes

implement datasets sdk

31bda63

0xthc force-pushed the implement-datasets-sdk branch from e12146a to 31bda63 Compare June 27, 2025 12:02

coderabbitai Bot reviewed Jun 27, 2025

View reviewed changes

rewrite example as notebook jupyter

6abcf23

0xthc force-pushed the implement-datasets-sdk branch from e0d6c16 to 6abcf23 Compare June 27, 2025 12:29

remove unused methods

568e8d1

coderabbitai Bot reviewed Jun 30, 2025

View reviewed changes

fix get dataset dict

3d867c9

coderabbitai Bot reviewed Jun 30, 2025

View reviewed changes

bump version

cd47fbc

0xthc merged commit fe31547 into master Jun 30, 2025
5 of 7 checks passed

coderabbitai Bot mentioned this pull request Jul 1, 2025

add support for async methods #15

Merged

CorentinGS deleted the implement-datasets-sdk branch January 23, 2026 15:15

	self._logger.warn(f"Warning when adding row to dataset {dataset.slug}: {warning}")
	self._logger.warning(f"Warning when adding row to dataset {dataset.slug}: {warning}")

	create_dataset_item(dataset_slug)
	- create_dataset_item(dataset_slug)
	+ add_row_to_dataset(dataset_slug)

		values_list = [DatasetRowValue(**val) if isinstance(val, dict) else DatasetRowValue(label=val["label"], value=val["value"])
		for val in data["values"]]

	metadata: Dict[str, Any] = None
	metadata: Dict[str, Any] = field(default_factory=dict)

	rows: List['DatasetRowDTO'] = None
	rows: List['DatasetRowDTO'] = field(default_factory=list)

Conversation

0xthc commented Jun 27, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0xthc commented Jun 27, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 27, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)