Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions apis/un_fao/main.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
import wandb
import warnings
from pathlib import Path
from views_faoapi.managers.model import APIPathManager
from views_faoapi.managers.api import FAOApiManager

warnings.filterwarnings("ignore")

try:
model_path = APIPathManager(Path(__file__))
except FileNotFoundError as fnf_error:
Expand Down
2 changes: 1 addition & 1 deletion docs/ADRs/001_ontology.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The repository recognizes the following ontological categories:
### Domain Entities
| Category | Location | Description |
|----------|----------|-------------|
| **Models** | `models/*/` | Individual forecasting model launchers (~66). Each is a thin `main.py` + config directory that delegates to an external architecture package. |
| **Models** | `models/*/` | Individual forecasting model launchers (66 active). Each is a thin `main.py` + config directory that delegates to an external architecture package. |
| **Ensembles** | `ensembles/*/` | Ensemble aggregation launchers (5). Aggregate predictions from constituent models. |

### Configuration Entities
Expand Down
121 changes: 121 additions & 0 deletions docs/ADRs/004_evolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@

# ADR-004: Rules for Evolution and Stability

**Status:** Accepted
**Date:** 2026-04-05
**Deciders:** Project maintainers
**Informed:** All contributors

---

## Context

The preceding ADRs establish:

- **ADR-001:** the ontology of the repository (what exists)
- **ADR-002:** the topology of the repository (how components may relate)
- **ADR-003:** semantic authority (who owns meaning and how it is declared)

Together, these decisions define the system's structure and semantics at a point in time.

What they do **not** yet define is how the system is allowed to **change over time**:
- which components are expected to be stable
- which components may evolve freely
- what constitutes a breaking change
- when compatibility guarantees apply
- when a new ADR is required

In views-models, these questions are now concrete:

- 68 models and 5 ensembles share identical partition boundaries across 73 files; a boundary change is a coordinated multi-file update (Risk R1).
- External consumers (the VIEWS platform, UN FAO API) depend on `white_mustang` ensemble output; breaking changes have real downstream cost.
- The config key vocabulary (`regression_targets`, `prediction_format`, `rolling_origin_stride`) is enforced by tests; adding or renaming required keys is a breaking change to all 68+ models.
- Contributors regularly express uncertainty about what is safe to change (hyperparameters: freely; partition boundaries: never without coordination).

Multiple trigger conditions from the original deferred ADR-004 template are now met.

---

## Decision

The repository adopts a three-tier stability classification for its components:

### Tier 1 — Stable (change requires ADR or explicit team decision)

| Component | Examples | Rationale |
|---|---|---|
| Partition boundaries | `(121, 444)`, `(445, 492)`, `(493, 540)` | Cross-model comparability depends on identical splits |
| Required config keys | `name`, `algorithm`, `level`, `steps`, `time_steps`, `deployment_status` | Enforced by `test_config_completeness.py`; adding/removing breaks all models |
| Config file set | The 6 config files per model | Enforced by `test_model_structure.py`; scaffold builder generates this set |
| CLI argument contract | `-r`, `-t`, `-e`, `-f`, `--sweep` | All `run.sh` and integration tests depend on this interface |
| Deployment status vocabulary | `shadow`, `deployed`, `baseline`, `deprecated` | Enforced by test; production gating depends on it |

### Tier 2 — Conventional (change requires updating all models + tests)

| Component | Examples | Rationale |
|---|---|---|
| Model naming convention | `adjective_noun` lowercase | Enforced by `test_model_structure.py`; catalog scripts depend on pattern |
| Directory structure | `configs/`, `artifacts/`, `data/`, `main.py`, `run.sh` | Enforced by tests; scaffold builder generates this layout |
| CLI import pattern | `from views_pipeline_core.cli import ForecastingModelArgs` | Enforced by `test_cli_pattern.py` |
| Ensemble dependency declarations | `config_meta["models"]` list | Enforced by `test_ensemble_configs.py` |

### Tier 3 — Volatile (changed freely by model owners)

| Component | Examples | Rationale |
|---|---|---|
| Hyperparameters | All keys in `config_hyperparameters.py` beyond `steps`/`time_steps` | Algorithm-specific; model owner's domain |
| Querysets | Feature selection and transformation chains in `config_queryset.py` | Model owner's domain |
| W&B experiment tracking | Run names, tags, logging frequency | Operational convenience |
| Model-specific README content | Beyond scaffold-generated sections | Documentation convenience |

---

## Rationale

The three-tier model makes the cost of change explicit:

- **Stable** components have high coordination cost and downstream impact. Changes require an ADR or explicit team decision, plus updates to all affected models and tests.
- **Conventional** components have moderate coordination cost. Changes propagate across the model zoo but don't affect external consumers.
- **Volatile** components are model-local. No coordination required.

This classification reflects the existing reality (tests already enforce Stable and Conventional tiers) while making the rules discoverable for contributors.

---

## Consequences

### Positive
- Contributors can immediately determine whether a change is safe to make unilaterally
- The cost of adding new required config keys is made explicit before the change is attempted
- Partition boundary changes are recognized as architectural events, not routine updates

### Negative
- Stable components resist change even when change is desirable — the coordination cost is real
- The 73-file partition duplication (intentional per ADR-002) amplifies the cost of Stable-tier changes
- Model owners may be tempted to treat Conventional components as Volatile; tests are the enforcement mechanism

---

## Implementation Notes

- Stability tiers are enforced primarily by the test suite, not by tooling
- The integration test runner (`run_integration_tests.sh`) provides behavioral verification but is not in CI; Stable-tier changes should include an integration test run
- ADR-001 already defines a stability classification consistent with these tiers; this ADR makes the rules actionable

---

## Open Questions

- Should partition boundary changes require a formal migration tool (updating all 73 files atomically)?
- Should there be a deprecation protocol for removing models (currently only `electric_relaxation` is deprecated)?
- Should Tier 2 changes require a PR review from a specific set of maintainers?

---

## References

- [ADR-001](001_ontology.md) — Ontology stability levels
- [ADR-002](002_topology.md) — Self-contained config files (why duplication is intentional)
- [ADR-003](003_authority.md) — Authority of declarations
- [ADR-005](005_testing.md) — Testing enforces tiers
- [ADR-009](009_boundary_contracts.md) — Boundary contracts define the Stable-tier interface
16 changes: 10 additions & 6 deletions docs/ADRs/005_testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ We adopt a three-team testing taxonomy:
|------|---------|----------------------|
| **Green** (Correctness) | Verify the system works as intended | `test_config_completeness.py` — required keys exist, values are valid |
| **Beige** (Convention) | Catch configuration drift and convention violations | `test_model_structure.py` — naming, file presence; `test_config_partitions.py` — delegation to shared module; `test_cli_pattern.py` — CLI import consistency |
| **Red** (Adversarial) | Expose failure modes by testing edge cases | Not yet implemented — future work |
| **Red** (Adversarial) | Expose failure modes by testing edge cases | `test_failure_modes.py` — config loading error paths |

### Test Design Principles

1. **Tests must run without ML dependencies** — Tests parse source code and use `importlib.util` to load config modules, avoiding dependency on `views_pipeline_core`, `ingester3`, or algorithm packages.
2. **Tests are parametrized over all models** — Every test runs against all ~66 models, catching drift immediately.
2. **Tests are parametrized over all models** — Every test runs against all 66 models, catching drift immediately.
3. **Tests run fast** — The full suite completes in ~2 seconds.

### Current Test Suite
Expand All @@ -42,6 +42,10 @@ We adopt a three-team testing taxonomy:
| `tests/test_model_structure.py` | Beige | Naming convention, required files, config directory structure |
| `tests/test_cli_pattern.py` | Beige | New CLI import pattern, no explicit `wandb.login()` |
| `tests/test_catalogs.py` | Green | No `exec()` usage, markdown generation correctness |
| `tests/test_ensemble_configs.py` | Green | Ensemble structure, required keys, constituent model existence and level consistency |
| `tests/test_darts_reproducibility.py` | Green | DARTS reproducibility gate parameter completeness (skipped without `views_r2darts2`) |
| `tests/test_algorithm_coherence.py` | Beige | Algorithm-to-package mapping, requirements.txt consistency with main.py imports |
| `tests/test_failure_modes.py` | Red | Config loading error paths: syntax errors, missing functions, non-existent files |

### Test Requirements for Changes

Expand All @@ -53,11 +57,11 @@ We adopt a three-team testing taxonomy:

## Known Gaps

- No red-team (adversarial) tests yet
- Catalog generation function tests require `views_pipeline_core` (skipped in most dev environments)
- No cross-validation between `config_meta.algorithm` and `main.py` manager import
- No ensemble config tests
- Tests are not wired into CI (`.github/workflows/`)
- DARTS reproducibility tests require `views_r2darts2` (skipped without it); no equivalent for stepshifter or baseline
- Tests are not wired into CI (`.github/workflows/`) — see Risk Register C-03
- No static validation of queryset correctness — see Risk Register C-02
- Red-team coverage is limited to config loading infrastructure; no adversarial tests for runtime behavior

---

Expand Down
90 changes: 90 additions & 0 deletions docs/ADRs/010_technical_risk_register.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@

# ADR-010: Technical Risk Register as a Governance Artifact

**Status:** Accepted
**Date:** 2026-04-05
**Deciders:** Project maintainers
**Informed:** All contributors

---

## Context

Repository assimilation (April 2026) identified 11 structural risks in views-models, ranging from partition coordination fragility (high severity) to untested scaffold builders (medium severity). These risks are architectural — they emerge from design decisions, not from bugs in any single file.

Without a persistent, structured register, risks are:
- Discovered during audits but forgotten between them
- Discussed informally but not tracked to resolution
- Rediscovered by new contributors who lack context on prior analysis

---

## Decision

The repository maintains a **Technical Risk Register** at `reports/technical_risk_register.md`.

### Concern Format

Each entry uses this format:

| Field | Description |
|---|---|
| **ID** | `C-xx` for concerns, `D-xx` for disagreements |
| **Tier** | 1 (critical) through 4 (informational) |
| **Title** | Short description |
| **Trigger** | The specific circumstance under which the risk becomes actionable |
| **Source** | How this concern was identified (e.g., repo-assimilation, expert review, incident) |
| **Status** | Open, Mitigated, Accepted, Resolved |
| **Notes** | Additional context, references to related ADRs or PRs |

### Tier Definitions

| Tier | Meaning | Response |
|---|---|---|
| **1** | Critical structural risk; failure would affect multiple models or external consumers | Must be addressed or explicitly accepted with rationale |
| **2** | Significant risk; failure would affect a class of models or a governance mechanism | Should be addressed in the next development cycle |
| **3** | Moderate risk; failure would cause inconvenience or require manual intervention | Address when adjacent work touches the area |
| **4** | Informational; noted for awareness | No action required unless promoted |

### When Entries Are Added

Concerns are opened during:
- Repository assimilation audits
- Expert code reviews
- Tech debt cleanup sessions
- Falsification audits
- Incident post-mortems

### When Entries Are Closed

Concerns are resolved when:
- The underlying risk is eliminated (code change + test)
- The risk is explicitly accepted with rationale (documented in Notes)
- The risk is superseded by a different concern

---

## Rationale

A structured register makes risks visible, trackable, and reviewable. It prevents the pattern of "we know about that problem" without any record of what "that problem" actually is.

---

## Consequences

### Positive
- Risks persist across conversations and contributors
- New contributors can quickly understand known architectural weaknesses
- Audit findings have a concrete landing place

### Negative
- Register requires maintenance; stale entries reduce trust
- Risk of "concern inflation" if trivial items are registered at high tiers

---

## References

- `reports/technical_risk_register.md` — the register itself
- [ADR-004](004_evolution.md) — Evolution rules that influence risk severity
- [ADR-005](005_testing.md) — Testing gaps are a common risk source
18 changes: 11 additions & 7 deletions docs/ADRs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ These ADRs define system philosophy and governance:
- **[ADR-001](001_ontology.md)** — Ontology of the Repository
- **[ADR-002](002_topology.md)** — Topology and Dependency Rules
- **[ADR-003](003_authority.md)** — Authority of Declarations Over Inference
- **ADR-004** — Evolution and Stability *(Deferred)*
- **[ADR-004](004_evolution.md)** — Rules for Evolution and Stability
- **[ADR-005](005_testing.md)** — Testing as Mandatory Critical Infrastructure
- **[ADR-006](006_intent_contracts.md)** — Intent Contracts for Non-Trivial Classes
- **[ADR-007](007_silicon_agents.md)** — Silicon-Based Agents as Untrusted Contributors
Expand All @@ -27,23 +27,27 @@ These ADRs define system philosophy and governance:
- **Ontology (001)** defines what exists.
- **Topology (002)** defines structural direction.
- **Authority (003)** defines who owns meaning.
- **Evolution (004)** defines stability tiers and change rules.
- **Boundary Contracts (009)** define interaction rules.
- **Observability (008)** enforces failure semantics.
- **Testing (005)** verifies system integrity.
- **Intent Contracts (006)** bind class-level behavior.
- **Automation Governance (007)** constrains silicon-based agents.
- **Risk Register (010)** tracks structural risks.

---

## Project-Specific ADRs (010+)

No project-specific ADRs have been written yet. Candidates:
- **[ADR-010](010_technical_risk_register.md)** — Technical Risk Register as a Governance Artifact

- **ADR-010** — Partition Boundary Semantics (why 121-444/445-492/493-540)
- **ADR-011** — CM-before-PGM Ensemble Ordering
- **ADR-012** — Config Key Evolution Policy (how to add new required keys)
- **ADR-013** — Model Naming Convention and Governance
- **ADR-014** — Conda Environment Sharing via run.sh
Candidates for future ADRs:

- Partition Boundary Semantics — ViewsMonth 121 = Jan 1990, 444 = Dec 2016, 492 = Dec 2020, 540 = Dec 2024; rationale for these split points is undocumented (C-21)
- CM-before-PGM Ensemble Ordering
- Config Key Evolution Policy (how to add new required keys)
- Model Naming Convention and Governance
- Conda Environment Sharing via run.sh

---

Expand Down
Loading
Loading