Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
14 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -215,4 +215,5 @@ cython_debug/

# logs
*.log
*.log.*reports/
*.log.*
reports/
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,16 @@ The library is built on a **three-layer architecture** with a framework-agnostic
## 🚀 **Quick Start**

```python
from views_evaluation import PandasAdapter, NativeEvaluator
from views_evaluation import EvaluationFrame, NativeEvaluator
import numpy as np

# 1. Convert DataFrames → EvaluationFrame
ef = PandasAdapter.from_dataframes(actual=actuals, predictions=predictions_list, target="ged_sb_best")
# 1. Construct EvaluationFrame with NumPy arrays
ef = EvaluationFrame(
y_true=y_true_array,
y_pred=y_pred_array, # shape (N, S) where S >= 1
identifiers={'time': times, 'unit': units, 'origin': origins, 'step': steps},
metadata={'target': 'ged_sb_best'},
)

# 2. Configure and evaluate
config = {
Expand All @@ -89,7 +95,7 @@ VIEWS Evaluation ensures **forecasting accuracy and model robustness** as the **

### **Pipeline Integration:**
1. **Model Predictions** →
2. **PandasAdapter** (DataFrame → EvaluationFrame) →
2. **EvaluationFrame** (validated NumPy container) →
3. **NativeEvaluator** (metrics computation) →
4. **EvaluationReport** (structured results)

Expand Down Expand Up @@ -195,7 +201,7 @@ config = {
---

* **Data Integrity Checks**: Validates input arrays for shape consistency, NaN/infinity, and required identifiers.
* **Automatic Index Matching**: `PandasAdapter` aligns actual and predicted values based on MultiIndex structures.
* **Framework-Agnostic Core**: All evaluation operates on pure NumPy arrays via `EvaluationFrame`.
* **Metric Catalog & Profiles**: Hyperparameters are managed through named evaluation profiles with a Chain of Responsibility resolver (model overrides → profile → fail loud).

---
Expand Down Expand Up @@ -223,11 +229,11 @@ Level 0 — Pure Core (NumPy + SciPy only, zero framework imports)
Profiles Named hyperparameter sets (base, hydranet_ucdp, ...)

Level 1 — Bridge / Adapter
PandasAdapter DataFrame → EvaluationFrame conversion (PHASE-3-DELETE)
EvaluationFrame Validated NumPy data container
EvaluationReport Results container with DataFrame/dict export

Level 2 — Legacy Orchestrator
EvaluationManager Deprecated wrapper; delegates to Level 0
MetricCatalog Genome registry and parameter resolver
```

**Key design decisions:**
Expand All @@ -244,7 +250,7 @@ views-evaluation/
├── views_evaluation/
│ ├── __init__.py # Public API exports
│ ├── adapters/
│ │ └── pandas.py # PandasAdapter (PHASE-3-DELETE)
│ │ └── __init__.py # Reserved for future framework bridges
│ ├── evaluation/
│ │ ├── config_schema.py # EvaluationConfig TypedDict
│ │ ├── evaluation_frame.py # Core data container
Expand Down
4 changes: 3 additions & 1 deletion documentation/ADRs/000_use_of_adrs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

Expand Down
4 changes: 3 additions & 1 deletion documentation/ADRs/001_silicon_based_agent_protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

Expand Down
4 changes: 3 additions & 1 deletion documentation/ADRs/010_ontology_of_evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

Expand Down
12 changes: 7 additions & 5 deletions documentation/ADRs/011_topology_and_dependency_rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,17 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

## Context

In complex evaluation systems, architectural fragility often emerges not from incorrect logic, but from uncontrolled dependencies between components.

The Evaluation repository pre-Feb 2026 suffered from "Pandas-heavy" coupling. Higher-level logic (EvaluationManager) depended on Pandas `MultiIndex` internals for alignment, which constrained our ability to scale probabilistic forecasts (N, S) due to memory/performance limits of Pandas' "lists-in-cells."
The Evaluation repository pre-Feb 2026 suffered from "Pandas-heavy" coupling. Higher-level logic (e.g., Pipeline Core) depended on Pandas `MultiIndex` internals for alignment, which constrained our ability to scale probabilistic forecasts (N, S) due to memory/performance limits of Pandas' "lists-in-cells."

Without explicit topology rules, we risk high-level math modules beginning to depend on implementation details (e.g., NumPy indexing vs Xarray coordinates).

Expand All @@ -29,16 +31,16 @@ Violations are architectural defects.
The Evaluation Core is the lowest-level layer (most stable).

- **Level 0: Evaluation Core** (Pure NumPy, `EvaluationFrame`, `NativeEvaluator`). No external imports except `numpy` and `scipy`.
- **Level 1: Adapters** (Framework-specific bridges like `PandasAdapter`). May depend on Level 0.
- **Level 2: Orchestration** (e.g., `EvaluationManager`, Pipeline Core). May depend on Level 1 and Level 0.
- **Level 1: Adapters** (Framework-specific bridges, reserved for future use). May depend on Level 0.
- **Level 2: Orchestration** (e.g., Pipeline Core — external to this repo). May depend on Level 1 and Level 0.

Dependency direction must always flow **toward the Core**.

## Forbidden Patterns

- Math kernels importing `pandas` or `polars`.
- `EvaluationFrame` containing anything other than NumPy arrays.
- Higher-level modules (e.g., `EvaluationManager`) passing DataFrames directly into metric functions.
- Higher-level modules (e.g., external orchestrators) passing DataFrames directly into metric functions.

If a dependency feels “convenient but wrong,” it probably is.

Expand Down
6 changes: 4 additions & 2 deletions documentation/ADRs/012_authority_over_inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

Expand Down Expand Up @@ -58,5 +60,5 @@ the system **must fail loudly and immediately**.
- Improves debuggability: we can inspect the `EvaluationFrame` and see exactly what the system *thinks* it is evaluating.

### Negative
- Requires more metadata in the `EvaluationFrame` and `PandasAdapter`.
- Requires more metadata in the `EvaluationFrame` and external adapters.
- Some "convenient" hacks are disallowed.
4 changes: 3 additions & 1 deletion documentation/ADRs/013_observability_and_explicit_failure.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

Expand Down
6 changes: 4 additions & 2 deletions documentation/ADRs/014_boundary_contracts_and_validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

Expand All @@ -25,7 +27,7 @@ Every boundary between components (e.g., Adapter → Core) must define:
- Declared invariants.

### 2. Validation at Entry
All configuration and external inputs must be validated at the system boundary (e.g., in `EvaluationManager` or `Adapters`).
All configuration and external inputs must be validated at the system boundary (e.g., in the `EvaluationFrame` constructor or `NativeEvaluator`).
- Before execution begins.
- Before orchestration proceeds.

Expand Down
4 changes: 3 additions & 1 deletion documentation/ADRs/020_multi_perspective_testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

Expand Down
6 changes: 4 additions & 2 deletions documentation/ADRs/021_intent_contracts_for_classes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

**Status:** Accepted
**Date:** 2026-02-25
**Deciders:** Project maintainers, Gemini CLI
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

Expand All @@ -14,7 +16,7 @@ To prevent semantic drift, non-trivial classes require an explicit declaration o

## Decision

All **non-trivial and substantial classes** (e.g., `EvaluationFrame`, `NativeEvaluator`, `PandasAdapter`) must have an explicit **intent contract**.
All **non-trivial and substantial classes** (e.g., `EvaluationFrame`, `NativeEvaluator`, `EvaluationReport`) must have an explicit **intent contract**.

An intent contract is a short, human-readable description of:
- **Purpose**: what the class is for.
Expand Down
2 changes: 2 additions & 0 deletions documentation/ADRs/022_evolution_and_stability.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
**Status:** Proposed (Deferred)
**Date:** 2026-02-25
**Deciders:** —
**Consulted:** —
**Informed:** All contributors

---

Expand Down
69 changes: 69 additions & 0 deletions documentation/ADRs/023_technical_risk_register.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# ADR-023: Technical Risk Register

**Status:** Accepted
**Date:** 2026-03-31
**Deciders:** Project maintainers
**Consulted:** —
**Informed:** All contributors

---

## Context

As the views-evaluation codebase matures through its EvaluationFrame refactor and metric catalog implementation, structural risks have been identified through repo-assimilation and expert review. Without a centralized, living register of these risks, concerns are scattered across reports, post-mortems, and tribal knowledge.

A formalized risk register ensures that architectural concerns are:
- tracked with consistent metadata,
- prioritized by severity,
- linked to their source of discovery,
- and revisited systematically.

---

## Decision

This repository maintains a **Technical Risk Register** at `reports/technical_risk_register.md` as a first-class governance artifact.

### Concern Format

Each entry uses:
- **ID:** `C-xx` for concerns, `D-xx` for disagreements
- **Tier:** 1 (critical) through 4 (informational)
- **Trigger:** The specific circumstance under which the risk becomes actionable
- **Source:** How the concern was identified (e.g. repo-assimilation, expert review, falsification audit)

### Tier Definitions

| Tier | Severity | Response |
|------|----------|----------|
| 1 | Critical — blocks release or causes data corruption | Must be resolved before next release |
| 2 | High — significant architectural risk | Must have a mitigation plan within one sprint |
| 3 | Medium — known weakness, bounded impact | Track and address opportunistically |
| 4 | Low/Informational — minor or cosmetic | Document and revisit during tech debt cleanup |

### Lifecycle

- Concerns are opened during expert reviews, tech debt audits, repo-assimilation, and falsification audits.
- Concerns are closed when the risk is resolved, mitigated, or explicitly accepted with rationale.
- The register header tracks the total count for quick reference.

---

## Consequences

### Positive
- Centralized visibility of all known risks
- Consistent prioritization and tracking
- Prevents risks from being forgotten between conversations

### Negative
- Requires discipline to keep updated
- Risk of register staleness if not reviewed regularly

---

## References

- `reports/technical_risk_register.md`
- Repo-assimilation output (2026-03-31)
- `reports/technical_debt_backlog.md` (related but focuses on actionable debt, not structural risks)
12 changes: 5 additions & 7 deletions documentation/ADRs/030_evaluation_strategy.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# ADR-030: Evaluation Strategy

| ADR Info | Details |
|---------------------|-------------------|
| Subject | Evaluation Strategy |
| ADR Number | 030 |
| Status | Accepted |
| Author | Xiaolong, Mihai|
| Date | 16.07.2025 |
**Status:** Accepted
**Date:** 2025-07-16
**Deciders:** Xiaolong, Mihai
**Consulted:** —
**Informed:** All contributors

## Context
To ensure reliable and realistic model performance assessment, our forecasting framework supports both **offline** and **online** evaluation strategies. These strategies serve complementary purposes: offline evaluation simulates the forecasting process retrospectively, while online evaluation assesses actual deployed forecasts against observed data.
Expand Down
12 changes: 5 additions & 7 deletions documentation/ADRs/031_evaluation_metrics.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# ADR-031: Evaluation Metrics

| ADR Info | Details |
|---------------------|--------------------|
| Subject | Evaluation Metrics |
| ADR Number | 031 |
| Status | Accepted |
| Author | Xiaolong |
| Date | 12.09.2024 |
**Status:** Accepted
**Date:** 2024-09-12
**Deciders:** Xiaolong
**Consulted:** —
**Informed:** All contributors

## Context
In the context of the VIEWS pipeline, it is necessary to evaluate the models using a robust set of metrics that account for the characteristics of conflict data, such as right-skewness and zero-inflation in the outcome variable.
Expand Down
12 changes: 5 additions & 7 deletions documentation/ADRs/032_metric_calculation_schemas.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# ADR-032: Metric Calculation Schemas

| ADR Info | Details |
|---------------------|-------------------|
| Subject | Metric Calculation |
| ADR Number | 032 |
| Status | Accepted|
| Author | Mihai, Xiaolong|
| Date | 31.10.2024 |
**Status:** Accepted
**Date:** 2024-10-31
**Deciders:** Mihai, Xiaolong
**Consulted:** —
**Informed:** All contributors

## Context
Traditional machine learning metrics do not directly translate to time-series forecasting across multiple horizons. A standardized approach to regrouping data is necessary.
Expand Down
20 changes: 9 additions & 11 deletions documentation/ADRs/040_evaluation_input_schema.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
# ADR-040: Evaluation Input Schema

| ADR Info | Details |
|---------------------|-------------------------|
| Subject | Evaluation Input Schema |
| ADR Number | 040 |
| Status | Accepted |
| Author | Xiaolong |
| Date | 16.06.2025 |
**Status:** Accepted
**Date:** 2025-06-16
**Deciders:** Xiaolong
**Consulted:** —
**Informed:** All contributors

## Context

A consistent input format is required to compare model performance across the VIEWS pipeline.
Two integration paths exist: the native path (primary) and the legacy path (`EvaluationManager`,
deprecated per ADR-011).
The native path via `EvaluationFrame` is the sole integration path. The legacy
`EvaluationManager` path was removed in Phase 3.

## Decision

Expand Down Expand Up @@ -42,9 +40,9 @@ Prediction type (point vs. sample) is determined structurally from the number of
No name-based inference occurs (ADR-012). Callers must ensure all cells in a prediction column
have the same number of values.

### Native Path Invariants (PandasAdapter)
### Native Path Invariants

When using `PandasAdapter`, the following identifiers are synthesised automatically:
When constructing an `EvaluationFrame`, the following identifiers must be provided:

| Identifier | Source |
|------------|--------------------------------------------------|
Expand Down
Loading
Loading