Skip to content

refactor(phase 14): Model Endpoints Registry and Per-Partition Presets#444

Open
hedhoud wants to merge 20 commits into
refactor/hexagonalfrom
refactor/phase14-B
Open

refactor(phase 14): Model Endpoints Registry and Per-Partition Presets#444
hedhoud wants to merge 20 commits into
refactor/hexagonalfrom
refactor/phase14-B

Conversation

@hedhoud
Copy link
Copy Markdown
Collaborator

@hedhoud hedhoud commented Jun 4, 2026

Context: Phase 14H defines the admin API contract for model endpoints, presets, and partition preset assignment while the service, repository, migration, and runtime-resolution work lands separately. This PR stays intentionally API-focused so reviewers can validate the route contracts before the data layer is complete.

Supersedes #442 and keeps the work on the upstream linagora/openrag branch.

Commit structure:

14H.0 - Model endpoint API
Adds the model endpoint request/response contract, admin router, and DI provider entry point needed by that router. This gives operators the future API surface for registering endpoints, validating them, and choosing defaults without tying the router to a concrete storage implementation yet.

14H.1 - Preset API schemas
Adds the preset request/response contract for indexation and retrieval presets. This keeps validation rules close to the API boundary before the preset service starts persisting and resolving those configs.

14H.2 - Preset admin router
Adds the preset management routes and focused router tests. This makes the preset API shape reviewable now, while real behavior remains delegated to the future service layer.

14H.3 - Partition preset schemas
Extends the admin schema contract so partitions can carry embedder, indexation preset, retrieval preset, chat depth, and chat LLM references. This also keeps the explicit-null validation behavior field-specific.

14H.4 - Partition config routes
Adds partition config update/read routes and tests. Missing service methods return explicit not-implemented responses until Person A's partition resolution work is merged.

14H.5 - Router registration
Registers the Phase 14 admin routers in the FastAPI application, adds OpenAPI tags, and verifies the stable mount points.

Validation:

Local validation passed after the six-commit rewrite: ruff, layer import guard, and the full unit suite. GitHub checks are expected to re-run after the force-with-lease push.

Remaining Phase 14 dependencies:

The database migration, repositories, endpoint service, preset service, partition resolution, default seeding, and full E2E behavior are still outside this PR and depend on the Person A/shared Phase 14 work.

Summary by CodeRabbit

  • New Features

    • Admin UI/API: manage model endpoints (create/list/get/update/delete/validate/set-default), pipeline presets (CRUD + options), and partition configuration (get/update).
    • OpenAPI: added "Model Endpoints" and "Presets" tags for clearer API docs.
    • Configs: new per-partition indexation/retrieval and global models/presets/partition settings.
  • Chores

    • Database migration adding tables for endpoints and presets; storage client timeout set.
  • Tests

    • New unit tests for admin routers, schemas, partition routes, DI providers, configs, and mount assertions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Phase 14 admin APIs for model endpoint registry, pipeline presets, and partition config: new Pydantic schemas, admin-only routers delegating to DI services, optional DI getters with 503 fallback, main app mounts (/model-endpoints, /presets), updated ports/persistence stubs, migration/schema updates, and unit tests.

Changes

Phase 14 Admin APIs: Model Endpoints, Presets, and Partition Configuration

Layer / File(s) Summary
Phase 14 Schema Definitions
openrag/api/schemas/admin/model_endpoint_schemas.py, openrag/api/schemas/admin/preset_schemas.py, openrag/api/schemas/admin/partition_schemas.py, openrag/api/schemas/admin/__init__.py
Defines ModelEndpointType and PresetType, plus request/response DTOs and validators for model endpoints, presets, and partitions (normalization, non-empty checks, explicit null rejection, update-payload presence checks).
Admin Router Endpoints
openrag/api/routers/admin/model_endpoints.py, openrag/api/routers/admin/presets.py, openrag/api/routers/admin/partitions.py
Adds admin-protected routers: model endpoints (create, list, get, update, delete, set-default, validate), presets (options, create, list, get, update, delete), and partition config (PATCH update, GET resolved config). Endpoints delegate to DI services and forward only provided update fields via exclude_unset=True.
DI Service Integration
openrag/di/providers.py
Adds _get_optional_service helper and exported getters get_model_endpoint_service and get_preset_service that resolve from the initialized container or raise HTTP 503 when absent.
API App Wiring and OpenAPI Tags
openrag/api/main.py
Mounts model_endpoints_router at /model-endpoints and presets_router at /presets, extends Tags enum with MODEL_ENDPOINTS and PRESETS, and updates docstrings for OpenAPI helpers and the Ray Serve wrapper.
Ports and Persistence Stubs
openrag/core/ports/model_endpoint_repo.py, openrag/services/persistence/model_endpoint_repo.py
Revises model-endpoint repository interface to use ModelEndpointRow types and replaces upsert with explicit create, update, rename, and set_default method signatures; updates persistence stub signatures accordingly.
Core Configs and Domain Models
openrag/core/config/indexation_pipeline.py, openrag/core/config/retrieval_pipeline.py, openrag/core/config/model_endpoints.py, openrag/core/config/presets.py, openrag/core/models/preset.py, openrag/core/config/root.py
Adds Indexation/Retrieval pipeline configs, ModelsConfig/ModelEndpointRow, PresetsConfig, Preset/Partition domain models, and composes them into root Settings with models, presets, and partitions.
Schema/DB Migration and Catalog
openrag/services/persistence/migrations/alembic/versions/06dd2101ea3a_add_endpoints_presets_phase14.py, openrag/services/persistence/schema.py
Adds Alembic migration for model_endpoints and pipeline_presets tables, adds files.indexation_config and partition columns, introduces SQLAlchemy table objects model_endpoints and pipeline_presets, and consolidates metadata usage.
Comprehensive Test Coverage
tests/unit/api/schemas/admin/test_phase14_schemas.py, tests/unit/api/routers/admin/test_phase14_admin_routers.py, tests/unit/api/routers/admin/test_phase14_partition_routes.py, tests/unit/api/routers/test_router_imports.py, tests/unit/api/test_main_proxy_headers.py, tests/unit/di/test_container.py, tests/unit/core/config/*
Adds tests for schema normalization/validation, router behavior (payload normalization, update-field forwarding, validation), partition config 501 fallback, DI optional-provider resolution/503 behavior, static mounting assertions, and frozen-field + mutable-dict invariants for configs. Test helpers and fake services included.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ModelEndpointsRouter
  participant DIGetter as get_model_endpoint_service
  participant ModelEndpointService
  Client->>ModelEndpointsRouter: POST /model-endpoints (CreateModelEndpointRequest)
  ModelEndpointsRouter->>DIGetter: resolve service via DI
  DIGetter->>ModelEndpointService: create_model_endpoint(payload)
  ModelEndpointService-->>ModelEndpointsRouter: ModelEndpointResponse
  ModelEndpointsRouter-->>Client: 201 Created
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • Ahmath-Gadji

Poem

🐰 In admin fields the endpoints bloom,
Presets and partitions find their room,
Schemas trim and routers start to sing,
DI wires hum and tests take wing,
A rabbit cheers: "Phase 14 — let's groom!"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 87.22% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The pull request title clearly identifies the main changes: it's about Phase 14 refactoring that adds a Model Endpoints Registry and Per-Partition Presets functionality. This accurately reflects the primary objectives of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/phase14-B

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feat Add a new feature refactor labels Jun 4, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
openrag/api/schemas/admin/preset_schemas.py (1)

52-56: ⚡ Quick win

Use field name from validator context for clearer error messages.

Line 56 passes the hardcoded string "name" to _reject_explicit_null, which works correctly for this validator since it only applies to the name field. However, this pattern is brittle—if someone later adds more fields to the validator decorator, the error message will remain hardcoded as "name cannot be null". Using info.field_name makes the validator more maintainable and consistent with the pattern used elsewhere.

♻️ Proposed refactor to use field name from context
-    `@field_validator`("name")
+    `@field_validator`("name", mode="before")
     `@classmethod`
-    def validate_name(cls, value: str | None) -> str | None:
+    def validate_name(cls, value: str | None, info) -> str | None:
         """Normalize the optional replacement name and reject null."""
-        return _normalize_name(_reject_explicit_null("name", value))
+        return _normalize_name(_reject_explicit_null(info.field_name, value))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/api/schemas/admin/preset_schemas.py` around lines 52 - 56, The
validator validate_name uses a hardcoded "name" when calling
_reject_explicit_null which is brittle; change the validator to accept the
pydantic FieldValidationInfo (e.g., add an info parameter) and pass
info.field_name into _reject_explicit_null so the error message uses the actual
field name (update validate_name signature to include info and call
_normalize_name(_reject_explicit_null(info.field_name, value))). Ensure
imports/types for the info param match the project’s pydantic version and keep
the `@field_validator`("name") decorator as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openrag/api/schemas/admin/partition_schemas.py`:
- Around line 58-62: The validator validate_non_empty_name currently passes the
hardcoded string "name" to _reject_explicit_null, causing incorrect error
messages; update validate_non_empty_name to accept the validator info (e.g.,
change signature to def validate_non_empty_name(cls, value: str | None, info) ->
str | None) and call _reject_explicit_null(info.field_name, value) before
passing the result into _normalize_name so the error message identifies the
actual field (embedder, indexation_preset, or retrieval_preset).

---

Nitpick comments:
In `@openrag/api/schemas/admin/preset_schemas.py`:
- Around line 52-56: The validator validate_name uses a hardcoded "name" when
calling _reject_explicit_null which is brittle; change the validator to accept
the pydantic FieldValidationInfo (e.g., add an info parameter) and pass
info.field_name into _reject_explicit_null so the error message uses the actual
field name (update validate_name signature to include info and call
_normalize_name(_reject_explicit_null(info.field_name, value))). Ensure
imports/types for the info param match the project’s pydantic version and keep
the `@field_validator`("name") decorator as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f7a5e4be-26cd-41c4-a7d4-741aa2584645

📥 Commits

Reviewing files that changed from the base of the PR and between e207d64 and e596cac.

📒 Files selected for processing (15)
  • openrag/api/main.py
  • openrag/api/routers/admin/model_endpoints.py
  • openrag/api/routers/admin/partitions.py
  • openrag/api/routers/admin/presets.py
  • openrag/api/schemas/admin/__init__.py
  • openrag/api/schemas/admin/model_endpoint_schemas.py
  • openrag/api/schemas/admin/partition_schemas.py
  • openrag/api/schemas/admin/preset_schemas.py
  • openrag/di/providers.py
  • tests/unit/api/routers/admin/test_phase14_admin_routers.py
  • tests/unit/api/routers/admin/test_phase14_partition_routes.py
  • tests/unit/api/routers/test_router_imports.py
  • tests/unit/api/schemas/admin/test_phase14_schemas.py
  • tests/unit/api/test_main_proxy_headers.py
  • tests/unit/di/test_container.py

Comment thread openrag/api/schemas/admin/partition_schemas.py Outdated
@hedhoud hedhoud changed the title Add Phase 14 admin API layer refactor(phase 14H): add admin APIs for model endpoints, presets, and partition config Jun 4, 2026
@hedhoud hedhoud force-pushed the refactor/phase14-B branch from e596cac to 81fe53c Compare June 4, 2026 09:33
@hedhoud hedhoud force-pushed the refactor/phase14-B branch from 81fe53c to c8f8b61 Compare June 4, 2026 09:48
Ahmath-Gadji and others added 2 commits June 4, 2026 12:06
- `core/config/model_endpoints.py`: ModelEndpointConfig, ModelsConfig (frozen ConfigMixin with mutable dicts), ModelEndpointRow
- `core/ports/model_endpoint_repo.py`: redesign port with 7 explicit methods (create, get, list_all, update, rename, delete, set_default); drop upsert
- `services/persistence/model_endpoint_repo.py`: update stub to match new port
- `tests/unit/core/config/test_model_endpoints.py`: frozen-field + mutable-dict invariant tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- `core/config/indexation_pipeline.py`: IndexationPipelineConfig (chunking, VLM, contextualization, entity extraction, topic tagging)
- `core/config/retrieval_pipeline.py`: RetrievalPipelineConfig (flat per-partition, named endpoint refs)
- `core/models/preset.py`: PresetRow, PartitionRow, PartitionConfig (runtime-resolved)
- `core/config/presets.py`: PresetsConfig (frozen ConfigMixin with mutable dicts)
- `core/config/root.py`: add presets + partitions fields to Settings
- `tests/unit/core/config/test_pipeline_configs.py`: frozen-field + mutable-dict invariant tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
openrag/core/config/model_endpoints.py (2)

10-10: ⚡ Quick win

Switch this relative import to a root-based absolute import.

Line 10 should follow the same import style used across this PR (core.* from repo root).

Proposed change
-from .base import ConfigMixin
+from core.config.base import ConfigMixin

As per coding guidelines: **/*.py: Use absolute imports from the openrag/ directory as the Python path root, avoiding relative imports across packages.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/core/config/model_endpoints.py` at line 10, Replace the relative
import in model_endpoints.py with a root-based absolute import: change the
statement that currently imports ConfigMixin via ".base" to import it from
"core.config.base" so the file uses the same absolute import style as the rest
of the project (look for the import of ConfigMixin in model_endpoints.py and
update it to reference core.config.base.ConfigMixin).

48-48: ⚡ Quick win

Harden ModelEndpointRow validation to match the repository/admin contract.

ModelEndpointRow uses unconstrained model_type: str and default batch_size/timeout without gt=0, unlike ModelEndpointConfig and the admin request schemas (which enforce ModelEndpointType and gt=0). Even though PgModelEndpointRepository is currently stubbed, tightening ModelEndpointRow prevents contract drift when persistence is implemented.

Proposed change
-from typing import Any
+from typing import Any, Literal
@@
 class ModelEndpointRow(BaseModel):
@@
-    model_type: str  # embedder | reranker | llm | vlm
+    model_type: Literal["embedder", "reranker", "llm", "vlm"]
@@
-    batch_size: int = 32
-    timeout: float = 30.0
+    batch_size: int = Field(default=32, gt=0)
+    timeout: float = Field(default=30.0, gt=0)

Also applies to: 51-52

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/core/config/model_endpoints.py` at line 48, ModelEndpointRow
currently declares model_type: str and unconstrained batch_size/timeout which
diverges from the admin/repository contract; change ModelEndpointRow to use the
ModelEndpointType enum for model_type (import ModelEndpointType) and tighten
batch_size and timeout with Pydantic Field validators (gt=0) and the same
defaults as ModelEndpointConfig so the row schema matches the admin
request/ModelEndpointConfig contract and prevents future drift.
openrag/core/config/indexation_pipeline.py (1)

12-12: ⚡ Quick win

Use package-root absolute import instead of a relative import.

Line 12 should use a root-based import to match the repository import convention.

Proposed change
-from .chunking import ChunkerConfig
+from core.config.chunking import ChunkerConfig

As per coding guidelines: **/*.py: Use absolute imports from the openrag/ directory as the Python path root, avoiding relative imports across packages.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/core/config/indexation_pipeline.py` at line 12, Replace the relative
import in indexation_pipeline by using a package-root absolute import: change
the statement importing ChunkerConfig (currently "from .chunking import
ChunkerConfig") to import from the repository root module path (e.g. "from
openrag.core.config.chunking import ChunkerConfig") so it follows the project's
absolute-import convention and matches other modules.
tests/unit/core/config/test_model_endpoints.py (1)

10-29: ⚡ Quick win

Add contract tests for ModelEndpointRow validation boundaries.

This file currently validates ModelsConfig mutability/frozen behavior only. Please add negative tests for invalid model_type and non-positive batch_size/timeout so the new row contract is regression-protected.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/core/config/test_model_endpoints.py` around lines 10 - 29, Add
tests to validate ModelEndpointRow contract boundaries by creating three
negative test functions in the same test module: (1)
test_model_endpoint_row_invalid_model_type: instantiate ModelEndpointRow with an
invalid model_type (e.g., "not-a-model") and assert it raises ValidationError
via pytest.raises; (2) test_model_endpoint_row_non_positive_batch_size: create
ModelEndpointRow with batch_size set to 0 and another with a negative value and
assert each raises ValidationError; (3)
test_model_endpoint_row_non_positive_timeout: create ModelEndpointRow with
timeout 0 and with a negative value and assert each raises ValidationError. Use
the ModelEndpointRow constructor directly and import ValidationError/pytest as
in the file, ensuring each test targets the model_type, batch_size, and timeout
fields respectively.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openrag/core/config/indexation_pipeline.py`:
- Line 21: IndexationPipelineConfig currently types parsing_strategy and
contextualization_mode as plain str which allows any value; change their
annotations to typing.Literal with the supported options (e.g.,
parsing_strategy: Literal["pymupdf", "marker", "docling"] = "marker" and
contextualization_mode: Literal[...supported values...] = <current_default>) and
import Literal from typing (or typing_extensions for older Python) so Pydantic
validation will reject invalid config values; keep the existing default values
and update any tests or callers expecting str if necessary.

In `@openrag/core/config/retrieval_pipeline.py`:
- Around line 18-26: The schema currently allows any string for the retrieval
mode and an unconstrained similarity_threshold; update the model so "type" is a
Literal restricted to "single", "multiQuery", or "hyde" (import typing.Literal
and replace the current type: str) and tighten similarity_threshold to a
validated Field with bounds (ge=0, le=1) so out-of-range values are rejected;
keep existing Field constraints for top_k/top_n and leave reranker/llm as
optional strings.

In `@openrag/core/models/preset.py`:
- Around line 17-18: Add invariant validation for persisted partition/preset
fields so invalid states cannot be loaded from the DB: enforce that preset_type
(in the Preset model) is one of the allowed values ("indexation" or
"retrieval"), that numeric config fields like chat_history_depth are
non-negative, and that dimension (or any vector dimension fields) is a positive
integer; implement these checks in the model validation/constructor for Preset
and the related Partition/Config classes referenced around the same file (lines
near the current preset_type/config definitions and the blocks at the other
noted ranges) so any invalid persisted record raises a clear validation error
instead of leaking into runtime.

---

Nitpick comments:
In `@openrag/core/config/indexation_pipeline.py`:
- Line 12: Replace the relative import in indexation_pipeline by using a
package-root absolute import: change the statement importing ChunkerConfig
(currently "from .chunking import ChunkerConfig") to import from the repository
root module path (e.g. "from openrag.core.config.chunking import ChunkerConfig")
so it follows the project's absolute-import convention and matches other
modules.

In `@openrag/core/config/model_endpoints.py`:
- Line 10: Replace the relative import in model_endpoints.py with a root-based
absolute import: change the statement that currently imports ConfigMixin via
".base" to import it from "core.config.base" so the file uses the same absolute
import style as the rest of the project (look for the import of ConfigMixin in
model_endpoints.py and update it to reference core.config.base.ConfigMixin).
- Line 48: ModelEndpointRow currently declares model_type: str and unconstrained
batch_size/timeout which diverges from the admin/repository contract; change
ModelEndpointRow to use the ModelEndpointType enum for model_type (import
ModelEndpointType) and tighten batch_size and timeout with Pydantic Field
validators (gt=0) and the same defaults as ModelEndpointConfig so the row schema
matches the admin request/ModelEndpointConfig contract and prevents future
drift.

In `@tests/unit/core/config/test_model_endpoints.py`:
- Around line 10-29: Add tests to validate ModelEndpointRow contract boundaries
by creating three negative test functions in the same test module: (1)
test_model_endpoint_row_invalid_model_type: instantiate ModelEndpointRow with an
invalid model_type (e.g., "not-a-model") and assert it raises ValidationError
via pytest.raises; (2) test_model_endpoint_row_non_positive_batch_size: create
ModelEndpointRow with batch_size set to 0 and another with a negative value and
assert each raises ValidationError; (3)
test_model_endpoint_row_non_positive_timeout: create ModelEndpointRow with
timeout 0 and with a negative value and assert each raises ValidationError. Use
the ModelEndpointRow constructor directly and import ValidationError/pytest as
in the file, ensuring each test targets the model_type, batch_size, and timeout
fields respectively.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b346e97-366e-408c-97c8-a03236147482

📥 Commits

Reviewing files that changed from the base of the PR and between c8f8b61 and 7ad7b10.

📒 Files selected for processing (10)
  • openrag/core/config/indexation_pipeline.py
  • openrag/core/config/model_endpoints.py
  • openrag/core/config/presets.py
  • openrag/core/config/retrieval_pipeline.py
  • openrag/core/config/root.py
  • openrag/core/models/preset.py
  • openrag/core/ports/model_endpoint_repo.py
  • openrag/services/persistence/model_endpoint_repo.py
  • tests/unit/core/config/test_model_endpoints.py
  • tests/unit/core/config/test_pipeline_configs.py
✅ Files skipped from review due to trivial changes (1)
  • tests/unit/core/config/test_pipeline_configs.py

model_config = ConfigDict(extra="ignore")

chunking: ChunkerConfig = Field(default_factory=ChunkerConfig)
parsing_strategy: str = "marker" # "pymupdf" | "marker" | "docling"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

ls
rg -n "class IndexationPipelineConfig|parsing_strategy|contextualization_mode" openrag/core/config/indexation_pipeline.py
nl -ba openrag/core/config/indexation_pipeline.py | sed -n '1,120p'

Repository: linagora/openrag

Length of output: 614


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
import pathlib
p = pathlib.Path("openrag/core/config/indexation_pipeline.py")
text = p.read_text(encoding="utf-8").splitlines()
for i in [15,16,17,18,19,20,21,22,23,24,25,29,30,31,32,33,34,35]:
    if 1 <= i <= len(text):
        print(f"{i:4d}: {text[i-1]}")
PY

rg -n "parsing_strategy|contextualization_mode" openrag | head -n 200

Repository: linagora/openrag

Length of output: 1089


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show rest of the config file (bounded)
python3 - <<'PY'
import pathlib
p = pathlib.Path("openrag/core/config/indexation_pipeline.py")
lines = p.read_text(encoding="utf-8").splitlines()
for start in [1,1,40,60,80]:
    end = min(len(lines), start+120)
PY

# Print the whole file if it's small enough; otherwise print key sections
python3 - <<'PY'
import pathlib
p = pathlib.Path("openrag/core/config/indexation_pipeline.py")
lines = p.read_text(encoding="utf-8").splitlines()
print("lines:", len(lines))
PY

# Show sections that mention these fields
rg -n "parsing_strategy|contextualization_mode" openrag/core/config/indexation_pipeline.py
echo "---- context around parsing_strategy ----"
rg -n "parsing_strategy" -C 3 openrag/core/config/indexation_pipeline.py
echo "---- context around contextualization_mode ----"
rg -n "contextualization_mode" -C 3 openrag/core/config/indexation_pipeline.py

echo "---- find usages across openrag ----"
rg -n "parsing_strategy" openrag | head -n 200
rg -n "contextualization_mode" openrag | head -n 200

Repository: linagora/openrag

Length of output: 1215


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "IndexationPipelineConfig" openrag | head -n 200

echo "---- search for parsing strategy keywords ----"
rg -n "\"pymupdf\"|\"marker\"|\"docling\"" openrag | head -n 200
rg -n "\bdocling\b|\bpymupdf\b|\bmarker\b" openrag | head -n 200

echo "---- search for contextualization mode keywords ----"
rg -n "\"none\"|\"simple\"|\"structured\"" openrag | head -n 200
rg -n "\bcontextualization_mode\b|\benable_contextualization\b|\bcontextualization\b" openrag | head -n 200

Repository: linagora/openrag

Length of output: 7459


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "indexation\.parsing_strategy|parsing_strategy\s*=" openrag | head -n 200
rg -n "indexation\.contextualization_mode|contextualization_mode\s*=" openrag | head -n 200
rg -n "\.indexation\." openrag | head -n 200
rg -n "IndexationPipelineConfig\(" -S openrag | head -n 200
rg -n "preset\.indexation|indexation\s*:" openrag/core | head -n 200

Repository: linagora/openrag

Length of output: 42


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find any mention of the config keys anywhere
rg -n "parsing_strategy" . | head -n 200
rg -n "contextualization_mode" . | head -n 200

# Look for dynamic/dict-style access patterns
rg -n "get\(['\"]parsing_strategy['\"]\)|\[['\"]parsing_strategy['\"]\]" . | head -n 200
rg -n "get\(['\"]contextualization_mode['\"]\)|\[['\"]contextualization_mode['\"]\]" . | head -n 200

# Look for any selection/dispatch based on these values by scanning for the value literals
rg -n "\"marker\"|\"pymupdf\"|\"docling\"" openrag | head -n 200
rg -n "\"none\"|\"simple\"|\"structured\"" openrag/core/indexing openrag/core/config openrag/core/prompts openrag/core/chunking 2>/dev/null | head -n 200

Repository: linagora/openrag

Length of output: 731


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Search for direct attribute usage patterns
rg -n "\.parsing_strategy\b|parsing_strategy\s*\]" openrag | head -n 200
rg -n "\.contextualization_mode\b|contextualization_mode\s*\]" openrag | head -n 200

# 2) Search for dict-style access patterns in code
rg -n "(\.get\(|\[\s*['\"])" openrag | rg -n "parsing_strategy|contextualization_mode" | head -n 200

# 3) Search for dispatch based on parser registry / parser names
rg -n "parser_registry|register\(" openrag/core/indexing openrag/services openrag/core | head -n 200
rg -n "select.*parser|get_parser|create_.*parser|parser\s*=\s*|parser_name" openrag/core openrag/services | head -n 200

# 4) Search for the config object usage in the indexing pipeline code
rg -n "IndexationPipelineConfig|preset\.indexation|enable_contextualization" openrag | head -n 200

Repository: linagora/openrag

Length of output: 42


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate runtime entry points for the indexation pipeline config
rg -n "IndexationPipelineConfig|indexation\.|preset\.indexation|enable_contextualization" openrag | head -n 200

# Show preset model definition
python3 - <<'PY'
import pathlib
p = pathlib.Path("openrag/core/models/preset.py")
lines = p.read_text(encoding="utf-8").splitlines()
for i,l in enumerate(lines, start=1):
    if i<=120:
        pass
print("lines:", len(lines))
for i in range(1,120):
    if i<=len(lines):
        print(f"{i:4d}: {lines[i-1]}")
PY

# Show config model again for context
python3 - <<'PY'
import pathlib
p = pathlib.Path("openrag/core/config/indexation_pipeline.py")
lines = p.read_text(encoding="utf-8").splitlines()
print("lines:", len(lines))
for i in range(1,len(lines)+1):
    print(f"{i:4d}: {lines[i-1]}")
PY

# Find where presets are turned into an actual indexing job/pipeline
rg -n "indexation|indexing" openrag/core | head -n 200

Repository: linagora/openrag

Length of output: 7196


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Print key modules that likely consume IndexationPipelineConfig
python3 - <<'PY'
import pathlib
paths = [
  "openrag/core/indexing/dispatcher.py",
  "openrag/core/indexing/validators.py",
  "openrag/core/config/presets.py",
  "openrag/core/config/root.py",
  "openrag/core/config/indexation.py",
]
for path in paths:
    p = pathlib.Path(path)
    if not p.exists():
        print(f"Missing: {path}")
        continue
    print("\n====", path, "====")
    lines = p.read_text(encoding="utf-8").splitlines()
    for i, l in enumerate(lines[:220], start=1):
        print(f"{i:4d}: {l}")
    if len(lines) > 220:
        print(f"... ({len(lines)} lines total; truncated)")
PY

# Broader searches for camelCase / dynamic JSON keys that might be used by workers/clients
rg -n "parsingStrategy|contextualizationMode|contextualization_mode|parsing_strategy" openrag | head -n 200
rg -n "docling|pymupdf|marker" openrag/core/indexing openrag/services/workers | head -n 200

# Look for config keys being forwarded as dict/JSON
rg -n "model_dump\\(|dict\\(|json\\.|config\\s*\\[|get\\(|\\\"parsing_strategy\\\"|\\\"contextualization_mode\\\"" openrag/core openrag/services | head -n 200

Repository: linagora/openrag

Length of output: 30339


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
from pathlib import Path
p = Path("tests/unit/core/config/test_pipeline_configs.py")
lines = p.read_text(encoding="utf-8").splitlines()
for i in range(1,80):
    if i<=len(lines):
        print(f"{i:4d}: {lines[i-1]}")
PY

echo "---- literal key usage ----"
rg -n "\"parsing_strategy\"|'parsing_strategy'|\"contextualization_mode\"|'contextualization_mode'" openrag tests docs | head -n 200

echo "---- search for contextualization dispatch ----"
rg -n "contextualize|contextualization" openrag/core/indexing openrag/core/chunking openrag/core/prompts | head -n 200
rg -n "structured|simple|none" openrag/core/indexing/contextualize.py openrag/core | head -n 200

echo "---- search for parser dispatch ----"
rg -n "parser_registry|register\(|Parser|parse\(" openrag/core/indexing/parsers | head -n 200
rg -n "pymupdf|marker|docling" openrag/core/indexing | head -n 200

Repository: linagora/openrag

Length of output: 22956


🏁 Script executed:

#!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
p = Path("openrag/core/config/indexation_pipeline.py")
for i,l in enumerate(p.read_text(encoding="utf-8").splitlines()[:40], start=1):
    print(f"{i:4d}: {l}")
PY

Repository: linagora/openrag

Length of output: 1651


Constrain enum-like indexation preset fields to supported values

IndexationPipelineConfig.parsing_strategy and IndexationPipelineConfig.contextualization_mode are currently typed as str, so Pydantic will accept any string during config deserialization. Constrain them with Literal[...] to fail fast on invalid config values.

Proposed change
+from typing import Literal
+
 class IndexationPipelineConfig(BaseModel):
@@
-    parsing_strategy: str = "marker"  # "pymupdf" | "marker" | "docling"
+    parsing_strategy: Literal["pymupdf", "marker", "docling"] = "marker"
@@
-    contextualization_mode: str = "none"  # "none" | "simple" | "structured"
+    contextualization_mode: Literal["none", "simple", "structured"] = "none"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parsing_strategy: str = "marker" # "pymupdf" | "marker" | "docling"
from typing import Literal
class IndexationPipelineConfig(BaseModel):
parsing_strategy: Literal["pymupdf", "marker", "docling"] = "marker"
contextualization_mode: Literal["none", "simple", "structured"] = "none"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/core/config/indexation_pipeline.py` at line 21,
IndexationPipelineConfig currently types parsing_strategy and
contextualization_mode as plain str which allows any value; change their
annotations to typing.Literal with the supported options (e.g.,
parsing_strategy: Literal["pymupdf", "marker", "docling"] = "marker" and
contextualization_mode: Literal[...supported values...] = <current_default>) and
import Literal from typing (or typing_extensions for older Python) so Pydantic
validation will reject invalid config values; keep the existing default values
and update any tests or callers expecting str if necessary.

Comment on lines +18 to +26
type: str = "single" # "single" | "multiQuery" | "hyde"
reranker: str | None = None # endpoint name; None = use default
llm: str | None = None # endpoint name for multiQuery / hyde expansion

top_k: int = Field(default=50, gt=0, le=1000) # vector candidates fetched
top_n: int = Field(default=10, gt=0, le=1000) # final results after reranking
enable_reranker: bool = True
similarity_threshold: float = 0.6
include_related: bool = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constrain retrieval mode and threshold at schema level.

type is documented as a closed set but currently accepts any string, and similarity_threshold accepts out-of-range values. This lets invalid configs pass and fail later at runtime.

Suggested fix
+from typing import Literal
 from pydantic import BaseModel, ConfigDict, Field
@@
-    type: str = "single"  # "single" | "multiQuery" | "hyde"
+    type: Literal["single", "multiQuery", "hyde"] = "single"
@@
-    similarity_threshold: float = 0.6
+    similarity_threshold: float = Field(default=0.6, ge=0.0, le=1.0)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/core/config/retrieval_pipeline.py` around lines 18 - 26, The schema
currently allows any string for the retrieval mode and an unconstrained
similarity_threshold; update the model so "type" is a Literal restricted to
"single", "multiQuery", or "hyde" (import typing.Literal and replace the current
type: str) and tighten similarity_threshold to a validated Field with bounds
(ge=0, le=1) so out-of-range values are rejected; keep existing Field
constraints for top_k/top_n and leave reranker/llm as optional strings.

Comment on lines +17 to +18
preset_type: str # "indexation" | "retrieval"
config: dict[str, Any]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add invariant validation on persisted partition/preset fields.

These models currently accept invalid states (e.g., unknown preset_type, negative chat_history_depth, non-positive dimension) that can leak from DB into runtime resolution.

Suggested fix
 from datetime import datetime
-from typing import Any
+from typing import Any, Literal
@@
-from pydantic import BaseModel
+from pydantic import BaseModel, Field
@@
-    preset_type: str  # "indexation" | "retrieval"
+    preset_type: Literal["indexation", "retrieval"]
@@
-    dimension: int = 1024
+    dimension: int = Field(default=1024, gt=0)
@@
-    chat_history_depth: int = 0
+    chat_history_depth: int = Field(default=0, ge=0)
@@
-    chat_history_depth: int = 0
+    chat_history_depth: int = Field(default=0, ge=0)

Also applies to: 32-35, 53-54

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/core/models/preset.py` around lines 17 - 18, Add invariant validation
for persisted partition/preset fields so invalid states cannot be loaded from
the DB: enforce that preset_type (in the Preset model) is one of the allowed
values ("indexation" or "retrieval"), that numeric config fields like
chat_history_depth are non-negative, and that dimension (or any vector dimension
fields) is a positive integer; implement these checks in the model
validation/constructor for Preset and the related Partition/Config classes
referenced around the same file (lines near the current preset_type/config
definitions and the blocks at the other noted ranges) so any invalid persisted
record raises a clear validation error instead of leaking into runtime.

Ahmath-Gadji and others added 2 commits June 4, 2026 12:43
On low-resource machines the Milvus 2.6 gRPC handshake can exceed the
default 10-second channel-ready wait, causing pymilvus to throw
MilvusException (code=2, "Fail connecting to server … illegal connection
params or server unavailable").  The underlying TCP port is open and the
HTTP health endpoint responds OK — the channel just needs more time to
complete the gRPC negotiation.

- Pass `timeout=60` to both `MilvusClient` and `AsyncMilvusClient` at
  construction time in `MilvusVectorStore.__init__`.
- Store the value in `self._timeout` so any future callers have a single
  source of truth.

Root-cause confirmed by reproducing the error with the default timeout and
a successful connect with timeout=60 against the same Milvus 2.6.11
container.
… migration

- openrag/services/persistence/schema.py: add model_endpoints and pipeline_presets tables, extend partitions with 9 new columns, add indexation_config to files
- openrag/services/persistence/migrations/alembic/versions/06dd2101ea3a_add_endpoints_presets_phase14.py: idempotent migration with composite PKs, JSONB defaults, check constraints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hedhoud hedhoud changed the title refactor(phase 14H): add admin APIs for model endpoints, presets, and partition config refactor(phase 14): Model Endpoints Registry and Per-Partition Presets Jun 4, 2026
Ahmath-Gadji and others added 5 commits June 4, 2026 14:42
…handler logs

- Fix migration 06dd2101ea3a: `index_exists()` called with 1 arg instead of
  the required 2 (`table`, `index`), crashing Alembic at startup and causing
  all endpoints to return 500 (container degraded to None)
- Add try/except in AuthMiddleware.dispatch() so a None container returns
  JSONResponse(503) instead of propagating AttributeError through
  ServerErrorMiddleware to a 500
- Add `message`, `method`, and `path` fields to both error handler log calls
  so failing requests are diagnosable without a full traceback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t/partition repos

- PgModelEndpointRepository: 7 methods (create, get, list_all, update,
  rename, delete, set_default) with atomic transaction for set_default
- PgPresetRepository: 5 methods (get, list_all, upsert, delete,
  count_partitions_using) with ON CONFLICT upsert and safe column dispatch
- PgPartitionRepository: 3 new methods (get_partition_row,
  list_partition_rows, update_partition) + _row_to_full_dict helper
- Extend PartitionRepository port with get_partition_row,
  list_partition_rows, update_partition abstract methods
- Extend PresetRepository port with count_partitions_using abstract method
- 18 new unit tests covering all new methods with fake pool/conn pattern

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Ahmath-Gadji Ahmath-Gadji force-pushed the refactor/phase14-B branch 2 times, most recently from de28be1 to 80e28e7 Compare June 5, 2026 10:09
- seed_defaults(): reads Settings + env-var overrides (EMBEDDER_ENDPOINT,
  LLM_ENDPOINT, VLM_ENDPOINT, RERANKER_ENDPOINT and model counterparts),
  inserts one is_default=True row per type when the DB is empty for that type
- load_all(): fetches all rows, rebuilds config.models.{embedder,llm,vlm,
  reranker} dicts atomically via clear()+update(); injects virtual "default"
  alias pointing to the is_default=True row per type
- create_model_endpoint(): validates model_type, raises 409 on duplicate
- update_model_endpoint(): updates fields and/or renames, evicts stale
  client-cache entry, reloads config.models
- delete_model_endpoint(): guards last-endpoint-of-type, evicts cache
- set_default(): swaps is_default flag, evicts "default" cache alias
- validate_endpoint(): async HTTP probe of {url}/models, returns reachable +
  model_found + models_served dict
- 17 unit tests covering all methods
- PresetService with seed_defaults(), load_all(), create_preset(),
  update_preset(), delete_preset(), get_preset(), list_presets()
- 6 hardcoded default seeds: 3 indexation (default, legal, finance)
  and 3 retrieval (default, multiquery, hyde)
- Pydantic validation via IndexationPipelineConfig / RetrievalPipelineConfig
- Atomic in-place swap of config.presets.indexation / .retrieval dicts
- Back-reference to PartitionService.load_partitions() on preset update
- Referential integrity guard: raises 422 if partition(s) use the preset
- 17 unit tests, all passing
- add resolve_partition_row() to map preset names to validated PartitionConfig
- add load_partitions() with atomic clear()+update() swap into config.partitions
- add seed_default_partition() for idempotent default-partition bootstrap
- extend create_partition() with preset/config columns + pre-write validation
- add update_partition() with merged-row validation and cache re-resolution
- accept optional config in constructor; wire settings in DI container
feat(14G) injected config into the container's PartitionService, which made
create_partition validate presets against config.presets at runtime. But
nothing seeds that registry at startup yet (that is 14K), so every partition
create returned [CONFIG_ERROR]: preset 'default' not found.

Drop the container wiring so PartitionService runs with config=None — the
legacy passthrough — until 14K seeds the registries on boot and 14J's
per-partition retrieval path is complete. The resolution methods and their
unit tests stay; they are exercised directly with a config.
Complete the Phase 14 composition-root wiring so the endpoint and preset registries are available during application startup.

The container now seeds and loads model endpoints before presets, then resolves partition configs after both registries are populated. This closes the runtime gap where the Phase 14 admin API surface existed but service-backed routes could not use the real services.

The partition service is also built with Settings again, so 14J can consume resolved per-partition configs once startup has loaded them.
@hedhoud hedhoud force-pushed the refactor/phase14-B branch from f711db7 to 44d5c16 Compare June 5, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat Add a new feature refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants