views-platform · Polichinel · Mar 17, 2026 · Feb 24, 2026 · Feb 24, 2026 · Mar 14, 2026
diff --git a/.github/workflows/run_tests.yml b/.github/workflows/run_tests.yml
@@ -0,0 +1,22 @@
+name: Run Tests
+on:
+  push:
+    branches: [main, development]
+  pull_request:
+    branches: [main, development]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - uses: actions/setup-python@v4
+        with:
+          python-version: '3.11'
+
+      - name: Install dependencies
+        run: pip install views_pipeline_core pytest
+
+      - name: Run tests
+        run: pytest
diff --git a/.gitignore b/.gitignore
@@ -6,6 +6,9 @@
 # But please, take a second to consult with the team before doing so anyways.
 
 
+# Integration test logs
+logs/
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
@@ -86,6 +89,7 @@ coverage.xml
 *.py,cover
 .hypothesis/
 .pytest_cache/
+.ruff_cache/
 cover/
 
 # Translations
@@ -245,7 +249,7 @@ cython_debug/
 *.bak
 
 # txt logs
-# *.txt
+*.txt
 
 # logs
 *.log

diff --git a/README.md b/README.md
@@ -39,6 +39,7 @@ APPWRITE_DATASTORE_PROJECT_ID=""
 - [Ensemble scripts](#ensemble-scripts)
 - [Ensemble filesystem](#ensemble-filesystem)
 - [Running an ensemble](#running-an-ensemble)
+- [Integration Testing](#integration-testing)
 - [Implemented Models](#implemented-models)
 - [Model Catalogs](#catalogs)
     - [Country-Month Models](#country-month-model-catalog)
@@ -332,6 +333,41 @@ Consequently, in order to train a model and generate predictions, execute either
 As of now, the only implemented model architecture is the [stepshifter model](https://github.com/views-platform/views-stepshifter/blob/main/README.md). Experienced users have the possibility to develop their own model architecture including their own model class manager. Head over to [views-pipeline-core](https://github.com/views-platform/views-pipeline-core) for further information on the model class manager and on how to develop new model architectures. 
 
 
+## Integration Testing
+<a name="integration-testing"></a>
+
+The repository includes an integration test runner that verifies models haven't been broken by changes in this repo or in upstream/downstream packages. It trains and evaluates every model end-to-end on calibration and validation partitions, running them sequentially in a single shared conda environment, and produces a summary table of PASS/FAIL/TIMEOUT results with per-model logs.
+
+```bash
+# Run all models (calibration + validation)
+bash run_integration_tests.sh
+
+# Run only country-month models
+bash run_integration_tests.sh --level cm
+
+# Run only baseline models
+bash run_integration_tests.sh --library baseline
+
+# Run specific models with a custom timeout
+bash run_integration_tests.sh --models "counting_stars bad_blood" --timeout 3600
+```
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--models "m1 m2"` | all models | Run only these models |
+| `--level` `cm` or `pgm` | no filter | Run only models at this level of analysis |
+| `--library NAME` | no filter | Run only models using this library (baseline/stepshifter/r2darts2/hydranet) |
+| `--exclude "m1 m2"` | `"purple_alien"` | Skip these models (replaces the default, does not append) |
+| `--partitions "p1 p2"` | `"calibration validation"` | Partitions to test |
+| `--timeout SECONDS` | `1800` | Max wall-clock time per model run |
+| `--env NAME` | `views_pipeline` | Conda environment to activate |
+
+Logs are written to `logs/integration_test_<timestamp>/` with a `summary.log` and per-model logs under `{partition}/{model}.log`.
+
+For the full guide — including how model discovery works, how to read failure logs, and important caveats — see [docs/run_integration_tests.md](docs/run_integration_tests.md).
+
+---
+
 ## Implemented Models
 
 In addition to the possibility of easily creating new models and ensembles, in order to maintain an organized and structured overview over all of the implemented models, the views-models repository also contains model catalogs containing all of the information about individual models. This information is collected from the metadata of each model and entails:

diff --git a/compare_configs.py b/compare_configs.py
diff --git a/create_catalogs.py b/create_catalogs.py
@@ -1,4 +1,5 @@
 import os
+import importlib.util
 import logging
 logging.basicConfig(
     level=logging.ERROR, format="%(asctime)s %(name)s - %(levelname)s - %(message)s"
@@ -38,27 +39,26 @@ def extract_models(model_class):
     """
 
     model_dict = {}
-    tmp_dict = {}
     config_meta = os.path.join(model_class.configs, 'config_meta.py')
     config_deployment = os.path.join(model_class.configs, 'config_deployment.py')
     config_hyperparameters = os.path.join(model_class.configs, 'config_hyperparameters.py')
 
 
     if os.path.exists(config_meta):
         logging.info(f"Found meta config: {config_meta}")
-        with open(config_meta, 'r') as file:
-            code = file.read()
-            exec(code, {}, tmp_dict)
-        model_dict.update(tmp_dict['get_meta_config']())
+        spec = importlib.util.spec_from_file_location("config_meta", config_meta)
+        module = importlib.util.module_from_spec(spec)
+        spec.loader.exec_module(module)
+        model_dict.update(module.get_meta_config())
         model_dict['queryset'] = create_link(model_dict['queryset'], model_class.queryset_path) if 'queryset' in model_dict else 'None'
 
 
     if os.path.exists(config_deployment):
         logging.info(f"Found deployment config: {config_deployment}")
-        with open(config_deployment, 'r') as file:
-            code = file.read()
-            exec(code, {}, tmp_dict) 
-        model_dict.update(tmp_dict['get_deployment_config']())
+        spec = importlib.util.spec_from_file_location("config_deployment", config_deployment)
+        module = importlib.util.module_from_spec(spec)
+        spec.loader.exec_module(module)
+        model_dict.update(module.get_deployment_config())
 
     if os.path.exists(config_hyperparameters):
         logging.info(f"Found hyperparameters config: {config_hyperparameters}") 
@@ -192,9 +192,6 @@ def replace_table_in_section(content, section_name, new_table):
 
 
 if __name__ == "__main__":
-    #import time
-    #start_time = time.time()
-
     models_list_cm = []
     models_list_pgm = []
     ensemble_list = []
@@ -224,20 +221,6 @@ def replace_table_in_section(content, section_name, new_table):
 
 
 
-    # markdown_table_pgm = generate_markdown_table(models_list_pgm)
-    # with open('pgm_model_catalog.md', 'w') as f:
-    #     f.write(markdown_table_pgm)
-
-    # markdown_table_cm = generate_markdown_table(models_list_cm)
-    # with open('cm_model_catalog.md', 'w') as f:
-    #     f.write(markdown_table_cm)
-
-    # markdown_table_ensembles = generate_markdown_table(ensemble_list)
-    # with open('ensembles_catalog.md', 'w') as f:
-    #     f.write(markdown_table_ensembles)
-
-
-
 
 
     markdown_table_cm = generate_markdown_table(models_list_cm)
@@ -252,6 +235,3 @@ def replace_table_in_section(content, section_name, new_table):
         markdown_table_ensembles,
     )
 
-
-    #print("--- %s seconds ---" % (time.time() - start_time))
-
diff --git a/docs/ADRs/000_use_of_adrs.md b/docs/ADRs/000_use_of_adrs.md
@@ -0,0 +1,79 @@
+# ADR-000: Use of Architecture Decision Records (ADRs)
+
+**Status:** Accepted
+**Date:** 2026-03-15
+**Deciders:** Simon (project maintainer)
+**Informed:** All contributors
+
+---
+
+## Context
+
+views-models is a monorepo containing ~66 forecasting models, 5 ensembles, data extractors, postprocessors, and tooling for the VIEWS conflict prediction platform. The repository has multiple contributors, evolving conventions, and a history of implicit decisions that have led to architectural drift (e.g., two CLI patterns, duplicated partition configs, inconsistent config keys).
+
+Without a shared record of *why* decisions were made, the project risks:
+- Re-litigating settled questions (e.g., why all models use the same partition boundaries)
+- Accidental reversals of critical design choices
+- Accumulating invisible technical debt
+- Losing institutional memory as contributors change
+
+---
+
+## Decision
+
+We will use **Architecture Decision Records (ADRs)** to document significant technical, architectural, and conceptual decisions in this project.
+
+ADRs are:
+- Written in Markdown
+- Stored in the repository under `docs/ADRs/`
+- Numbered sequentially
+- Treated as first-class project artifacts
+
+---
+
+## When to Write an ADR
+
+Write an ADR when making a decision that:
+- Affects model configuration conventions or required config keys
+- Changes partition boundaries, training windows, or evaluation methodology
+- Introduces new shared infrastructure or conventions
+- Changes the CLI API pattern or model launcher conventions
+- Modifies ensemble reconciliation logic or CM/PGM ordering
+- Affects the CI/CD pipeline or catalog generation
+
+Do **not** write ADRs for:
+- Adding a new model that follows existing conventions
+- Routine hyperparameter changes within a single model
+- Documentation-only changes
+
+---
+
+## Lifecycle
+
+- **Proposed** — decision under consideration
+- **Accepted** — decision is active and authoritative
+- **Superseded** — replaced by a newer ADR
+- **Deprecated** — decision remains but should no longer be used
+
+Decisions are never deleted. If a decision changes, it is **superseded**, not erased.
+
+---
+
+## Consequences
+
+### Positive
+- Clearer decision-making across a multi-contributor forecasting platform
+- Fewer repeated debates about config conventions
+- Easier onboarding for new model developers
+- Better long-term coherence as the model zoo grows
+
+### Negative
+- Small upfront cost in writing
+- Requires discipline to maintain
+
+---
+
+## References
+
+- `docs/ADRs/adr_template.md`
+- `docs/ADRs/README.md`