views-platform · Polichinel · Apr 6, 2026
diff --git a/apis/un_fao/main.py b/apis/un_fao/main.py
@@ -1,11 +1,8 @@
 import wandb
-import warnings
 from pathlib import Path
 from views_faoapi.managers.model import APIPathManager
 from views_faoapi.managers.api import FAOApiManager
 
-warnings.filterwarnings("ignore")
-
 try:
     model_path = APIPathManager(Path(__file__))
 except FileNotFoundError as fnf_error:

diff --git a/docs/ADRs/001_ontology.md b/docs/ADRs/001_ontology.md
@@ -20,7 +20,7 @@ The repository recognizes the following ontological categories:
 ### Domain Entities
 | Category | Location | Description |
 |----------|----------|-------------|
-| **Models** | `models/*/` | Individual forecasting model launchers (~66). Each is a thin `main.py` + config directory that delegates to an external architecture package. |
+| **Models** | `models/*/` | Individual forecasting model launchers (66 active). Each is a thin `main.py` + config directory that delegates to an external architecture package. |
 | **Ensembles** | `ensembles/*/` | Ensemble aggregation launchers (5). Aggregate predictions from constituent models. |
 
 ### Configuration Entities

diff --git a/docs/ADRs/004_evolution.md b/docs/ADRs/004_evolution.md
@@ -0,0 +1,121 @@
+
+# ADR-004: Rules for Evolution and Stability
+
+**Status:** Accepted  
+**Date:** 2026-04-05  
+**Deciders:** Project maintainers  
+**Informed:** All contributors  
+
+---
+
+## Context
+
+The preceding ADRs establish:
+
+- **ADR-001:** the ontology of the repository (what exists)
+- **ADR-002:** the topology of the repository (how components may relate)
+- **ADR-003:** semantic authority (who owns meaning and how it is declared)
+
+Together, these decisions define the system's structure and semantics at a point in time.
+
+What they do **not** yet define is how the system is allowed to **change over time**:
+- which components are expected to be stable
+- which components may evolve freely
+- what constitutes a breaking change
+- when compatibility guarantees apply
+- when a new ADR is required
+
+In views-models, these questions are now concrete:
+
+- 68 models and 5 ensembles share identical partition boundaries across 73 files; a boundary change is a coordinated multi-file update (Risk R1).
+- External consumers (the VIEWS platform, UN FAO API) depend on `white_mustang` ensemble output; breaking changes have real downstream cost.
+- The config key vocabulary (`regression_targets`, `prediction_format`, `rolling_origin_stride`) is enforced by tests; adding or renaming required keys is a breaking change to all 68+ models.
+- Contributors regularly express uncertainty about what is safe to change (hyperparameters: freely; partition boundaries: never without coordination).
+
+Multiple trigger conditions from the original deferred ADR-004 template are now met.
+
+---
+
+## Decision
+
+The repository adopts a three-tier stability classification for its components:
+
+### Tier 1 — Stable (change requires ADR or explicit team decision)
+
+| Component | Examples | Rationale |
+|---|---|---|
+| Partition boundaries | `(121, 444)`, `(445, 492)`, `(493, 540)` | Cross-model comparability depends on identical splits |
+| Required config keys | `name`, `algorithm`, `level`, `steps`, `time_steps`, `deployment_status` | Enforced by `test_config_completeness.py`; adding/removing breaks all models |
+| Config file set | The 6 config files per model | Enforced by `test_model_structure.py`; scaffold builder generates this set |
+| CLI argument contract | `-r`, `-t`, `-e`, `-f`, `--sweep` | All `run.sh` and integration tests depend on this interface |
+| Deployment status vocabulary | `shadow`, `deployed`, `baseline`, `deprecated` | Enforced by test; production gating depends on it |
+
+### Tier 2 — Conventional (change requires updating all models + tests)
+
+| Component | Examples | Rationale |
+|---|---|---|
+| Model naming convention | `adjective_noun` lowercase | Enforced by `test_model_structure.py`; catalog scripts depend on pattern |
+| Directory structure | `configs/`, `artifacts/`, `data/`, `main.py`, `run.sh` | Enforced by tests; scaffold builder generates this layout |
+| CLI import pattern | `from views_pipeline_core.cli import ForecastingModelArgs` | Enforced by `test_cli_pattern.py` |
+| Ensemble dependency declarations | `config_meta["models"]` list | Enforced by `test_ensemble_configs.py` |
+
+### Tier 3 — Volatile (changed freely by model owners)
+
+| Component | Examples | Rationale |
+|---|---|---|
+| Hyperparameters | All keys in `config_hyperparameters.py` beyond `steps`/`time_steps` | Algorithm-specific; model owner's domain |
+| Querysets | Feature selection and transformation chains in `config_queryset.py` | Model owner's domain |
+| W&B experiment tracking | Run names, tags, logging frequency | Operational convenience |
+| Model-specific README content | Beyond scaffold-generated sections | Documentation convenience |
+
+---
+
+## Rationale
+
+The three-tier model makes the cost of change explicit:
+
+- **Stable** components have high coordination cost and downstream impact. Changes require an ADR or explicit team decision, plus updates to all affected models and tests.
+- **Conventional** components have moderate coordination cost. Changes propagate across the model zoo but don't affect external consumers.
+- **Volatile** components are model-local. No coordination required.
+
+This classification reflects the existing reality (tests already enforce Stable and Conventional tiers) while making the rules discoverable for contributors.
+
+---
+
+## Consequences
+
+### Positive
+- Contributors can immediately determine whether a change is safe to make unilaterally
+- The cost of adding new required config keys is made explicit before the change is attempted
+- Partition boundary changes are recognized as architectural events, not routine updates
+
+### Negative
+- Stable components resist change even when change is desirable — the coordination cost is real
+- The 73-file partition duplication (intentional per ADR-002) amplifies the cost of Stable-tier changes
+- Model owners may be tempted to treat Conventional components as Volatile; tests are the enforcement mechanism
+
+---
+
+## Implementation Notes
+
+- Stability tiers are enforced primarily by the test suite, not by tooling
+- The integration test runner (`run_integration_tests.sh`) provides behavioral verification but is not in CI; Stable-tier changes should include an integration test run
+- ADR-001 already defines a stability classification consistent with these tiers; this ADR makes the rules actionable
+
+---
+
+## Open Questions
+
+- Should partition boundary changes require a formal migration tool (updating all 73 files atomically)?
+- Should there be a deprecation protocol for removing models (currently only `electric_relaxation` is deprecated)?
+- Should Tier 2 changes require a PR review from a specific set of maintainers?
+
+---
+
+## References
+
+- [ADR-001](001_ontology.md) — Ontology stability levels
+- [ADR-002](002_topology.md) — Self-contained config files (why duplication is intentional)
+- [ADR-003](003_authority.md) — Authority of declarations
+- [ADR-005](005_testing.md) — Testing enforces tiers
+- [ADR-009](009_boundary_contracts.md) — Boundary contracts define the Stable-tier interface
diff --git a/docs/ADRs/005_testing.md b/docs/ADRs/005_testing.md
@@ -25,12 +25,12 @@ We adopt a three-team testing taxonomy:
 |------|---------|----------------------|
 | **Green** (Correctness) | Verify the system works as intended | `test_config_completeness.py` — required keys exist, values are valid |
 | **Beige** (Convention) | Catch configuration drift and convention violations | `test_model_structure.py` — naming, file presence; `test_config_partitions.py` — delegation to shared module; `test_cli_pattern.py` — CLI import consistency |
-| **Red** (Adversarial) | Expose failure modes by testing edge cases | Not yet implemented — future work |
+| **Red** (Adversarial) | Expose failure modes by testing edge cases | `test_failure_modes.py` — config loading error paths |
 
 ### Test Design Principles
 
 1. **Tests must run without ML dependencies** — Tests parse source code and use `importlib.util` to load config modules, avoiding dependency on `views_pipeline_core`, `ingester3`, or algorithm packages.
-2. **Tests are parametrized over all models** — Every test runs against all ~66 models, catching drift immediately.
+2. **Tests are parametrized over all models** — Every test runs against all 66 models, catching drift immediately.
 3. **Tests run fast** — The full suite completes in ~2 seconds.
 
 ### Current Test Suite
@@ -42,6 +42,10 @@ We adopt a three-team testing taxonomy:
 | `tests/test_model_structure.py` | Beige | Naming convention, required files, config directory structure |
 | `tests/test_cli_pattern.py` | Beige | New CLI import pattern, no explicit `wandb.login()` |
 | `tests/test_catalogs.py` | Green | No `exec()` usage, markdown generation correctness |
+| `tests/test_ensemble_configs.py` | Green | Ensemble structure, required keys, constituent model existence and level consistency |
+| `tests/test_darts_reproducibility.py` | Green | DARTS reproducibility gate parameter completeness (skipped without `views_r2darts2`) |
+| `tests/test_algorithm_coherence.py` | Beige | Algorithm-to-package mapping, requirements.txt consistency with main.py imports |
+| `tests/test_failure_modes.py` | Red | Config loading error paths: syntax errors, missing functions, non-existent files |
 
 ### Test Requirements for Changes
 
@@ -53,11 +57,11 @@ We adopt a three-team testing taxonomy:
 
 ## Known Gaps
 
-- No red-team (adversarial) tests yet
 - Catalog generation function tests require `views_pipeline_core` (skipped in most dev environments)
-- No cross-validation between `config_meta.algorithm` and `main.py` manager import
-- No ensemble config tests
-- Tests are not wired into CI (`.github/workflows/`)
+- DARTS reproducibility tests require `views_r2darts2` (skipped without it); no equivalent for stepshifter or baseline
+- Tests are not wired into CI (`.github/workflows/`) — see Risk Register C-03
+- No static validation of queryset correctness — see Risk Register C-02
+- Red-team coverage is limited to config loading infrastructure; no adversarial tests for runtime behavior
 
 ---
 

diff --git a/docs/ADRs/010_technical_risk_register.md b/docs/ADRs/010_technical_risk_register.md
@@ -0,0 +1,90 @@
+
+# ADR-010: Technical Risk Register as a Governance Artifact
+
+**Status:** Accepted  
+**Date:** 2026-04-05  
+**Deciders:** Project maintainers  
+**Informed:** All contributors  
+
+---
+
+## Context
+
+Repository assimilation (April 2026) identified 11 structural risks in views-models, ranging from partition coordination fragility (high severity) to untested scaffold builders (medium severity). These risks are architectural — they emerge from design decisions, not from bugs in any single file.
+
+Without a persistent, structured register, risks are:
+- Discovered during audits but forgotten between them
+- Discussed informally but not tracked to resolution
+- Rediscovered by new contributors who lack context on prior analysis
+
+---
+
+## Decision
+
+The repository maintains a **Technical Risk Register** at `reports/technical_risk_register.md`.
+
+### Concern Format
+
+Each entry uses this format:
+
+| Field | Description |
+|---|---|
+| **ID** | `C-xx` for concerns, `D-xx` for disagreements |
+| **Tier** | 1 (critical) through 4 (informational) |
+| **Title** | Short description |
+| **Trigger** | The specific circumstance under which the risk becomes actionable |
+| **Source** | How this concern was identified (e.g., repo-assimilation, expert review, incident) |
+| **Status** | Open, Mitigated, Accepted, Resolved |
+| **Notes** | Additional context, references to related ADRs or PRs |
+
+### Tier Definitions
+
+| Tier | Meaning | Response |
+|---|---|---|
+| **1** | Critical structural risk; failure would affect multiple models or external consumers | Must be addressed or explicitly accepted with rationale |
+| **2** | Significant risk; failure would affect a class of models or a governance mechanism | Should be addressed in the next development cycle |
+| **3** | Moderate risk; failure would cause inconvenience or require manual intervention | Address when adjacent work touches the area |
+| **4** | Informational; noted for awareness | No action required unless promoted |
+
+### When Entries Are Added
+
+Concerns are opened during:
+- Repository assimilation audits
+- Expert code reviews
+- Tech debt cleanup sessions
+- Falsification audits
+- Incident post-mortems
+
+### When Entries Are Closed
+
+Concerns are resolved when:
+- The underlying risk is eliminated (code change + test)
+- The risk is explicitly accepted with rationale (documented in Notes)
+- The risk is superseded by a different concern
+
+---
+
+## Rationale
+
+A structured register makes risks visible, trackable, and reviewable. It prevents the pattern of "we know about that problem" without any record of what "that problem" actually is.
+
+---
+
+## Consequences
+
+### Positive
+- Risks persist across conversations and contributors
+- New contributors can quickly understand known architectural weaknesses
+- Audit findings have a concrete landing place
+
+### Negative
+- Register requires maintenance; stale entries reduce trust
+- Risk of "concern inflation" if trivial items are registered at high tiers
+
+---
+
+## References
+
+- `reports/technical_risk_register.md` — the register itself
+- [ADR-004](004_evolution.md) — Evolution rules that influence risk severity
+- [ADR-005](005_testing.md) — Testing gaps are a common risk source
diff --git a/docs/ADRs/README.md b/docs/ADRs/README.md
@@ -13,7 +13,7 @@ These ADRs define system philosophy and governance:
 - **[ADR-001](001_ontology.md)** — Ontology of the Repository
 - **[ADR-002](002_topology.md)** — Topology and Dependency Rules
 - **[ADR-003](003_authority.md)** — Authority of Declarations Over Inference
-- **ADR-004** — Evolution and Stability *(Deferred)*
+- **[ADR-004](004_evolution.md)** — Rules for Evolution and Stability
 - **[ADR-005](005_testing.md)** — Testing as Mandatory Critical Infrastructure
 - **[ADR-006](006_intent_contracts.md)** — Intent Contracts for Non-Trivial Classes
 - **[ADR-007](007_silicon_agents.md)** — Silicon-Based Agents as Untrusted Contributors
@@ -27,23 +27,27 @@ These ADRs define system philosophy and governance:
 - **Ontology (001)** defines what exists.
 - **Topology (002)** defines structural direction.
 - **Authority (003)** defines who owns meaning.
+- **Evolution (004)** defines stability tiers and change rules.
 - **Boundary Contracts (009)** define interaction rules.
 - **Observability (008)** enforces failure semantics.
 - **Testing (005)** verifies system integrity.
 - **Intent Contracts (006)** bind class-level behavior.
 - **Automation Governance (007)** constrains silicon-based agents.
+- **Risk Register (010)** tracks structural risks.
 
 ---
 
 ## Project-Specific ADRs (010+)
 
-No project-specific ADRs have been written yet. Candidates:
+- **[ADR-010](010_technical_risk_register.md)** — Technical Risk Register as a Governance Artifact
 
-- **ADR-010** — Partition Boundary Semantics (why 121-444/445-492/493-540)
-- **ADR-011** — CM-before-PGM Ensemble Ordering
-- **ADR-012** — Config Key Evolution Policy (how to add new required keys)
-- **ADR-013** — Model Naming Convention and Governance
-- **ADR-014** — Conda Environment Sharing via run.sh
+Candidates for future ADRs:
+
+- Partition Boundary Semantics — ViewsMonth 121 = Jan 1990, 444 = Dec 2016, 492 = Dec 2020, 540 = Dec 2024; rationale for these split points is undocumented (C-21)
+- CM-before-PGM Ensemble Ordering
+- Config Key Evolution Policy (how to add new required keys)
+- Model Naming Convention and Governance
+- Conda Environment Sharing via run.sh
 
 ---