Skip to content

Latest commit

 

History

History
462 lines (299 loc) · 19.3 KB

File metadata and controls

462 lines (299 loc) · 19.3 KB

Observer Strategic Backlog

This file tracks what Observer is buying next.

Observer is no longer best understood as "a test harness with extra features".

It is becoming a deterministic verification platform with explicit contracts, product certification, derived analytics, and repo-owned operational truth.

That means the backlog should not be driven by "what other runners have".

It should be driven by the platform capabilities that make verification durable, comparable, governable, and trustworthy.

Direction

The current strategic direction is:

  • verification as an explicit contract, not discovery folklore
  • product certification as a first-class platform layer
  • deterministic and canonical artifacts as the source of truth
  • comparison, history, and drift as first-class product concerns
  • policy and provenance as the path from "green" to "trustworthy"

Backlog Rules

Any backlog item that violates these rules should be rejected:

  • determinism is the gate
  • canonical contracts outrank convenience sugar
  • tracked identity must come from explicit data, not heuristic inference
  • product semantics belong in Observer contracts, not shell glue
  • explanation should be mechanically derived from evidence already present in the model

Release Readiness: crates.io Push

Goal:

  • publish the reusable crates with deterministic metadata and no licensing ambiguity

Why this matters:

  • crates.io publication is now blocked by mechanical issues, not missing product direction
  • the publish set should be explicit and reproducible rather than inferred ad hoc at release time

Backlog:

  • Add explicit SPDX headers to repo-owned core platform source files and verification scripts.
  • Add explicit internal dependency versions for publishable path dependencies.
  • Add repository, homepage, and package descriptions to crate manifests.
  • Mark observer-selftest as internal-only rather than part of the public publish set.
  • Decide the intended public publish set and publish order for observer-core, observer-rust, observer-rust-host, and observer-rust-lib.
  • Resolve the crates.io namespace plan for the CLI crate: publish the package as frogfish-observer while keeping the executable name observer.
  • Run final dry-run publication checks for the intended publish set after naming is settled.

Guardrails:

  • no publishing crate surfaces whose names or ownership are still ambiguous
  • no release artifact whose license metadata disagrees with the repository split-license policy

Priority 1: Verification Provenance And Attestation

Goal:

  • make every product and stage result attestable

Why this matters:

  • a pass result should mean something stronger than "it happened on one machine"
  • teams should be able to ask exactly what contract was satisfied, with what toolchain, against what evidence

Backlog:

  • Specify canonical verification provenance records for runs and products.
  • Record exact provider host identity, tool build stamp, and config identity alongside suite and inventory identity.
  • Define a config normalization and hashing contract where one does not yet exist.
  • Define artifact lineage fields strong enough to say which produced artifacts were actually certified.
  • Expose provenance cleanly in derived product artifacts and explorer views.

Non-goals:

  • no vague "environment fingerprint" heuristics
  • no provenance fields that are unstable across equivalent runs

Priority 2: First-Class Regression And Degradation Semantics

Goal:

  • move beyond pass or fail into mechanically derived change judgment

Why this matters:

  • the most expensive verification failures are often green builds that are silently getting worse

Backlog:

  • Specify first-class comparison result kinds such as regressed, drifted, missing_target, new_target, and artifact_contract_changed.
  • Extend compare artifacts to distinguish correctness failure from degradation and drift.
  • Define target-set delta semantics for disappearance, addition, and explicit rename mapping.
  • Define telemetry regression semantics that stay non-canonical by default but are still comparable.
  • Define artifact contract drift semantics for schema and structural output changes.

Guardrails:

  • no heuristic target matching
  • explicit tracked identity or explicit mapping only

Priority 3: Verification Policy

Goal:

  • let teams express which verification changes are acceptable and which are release blockers

Why this matters:

  • this is where Observer becomes a release gate brain instead of a report generator

Backlog:

  • Specify a policy surface above compare, product, and artifact checks.
  • Support rules such as "no tracked target may disappear", "compare delta must be empty", and "telemetry regressions above threshold fail release".
  • Support policy severity levels such as pass, warning, and fail.
  • Define how policy results compose into product certification.
  • Emit policy verdicts as structured machine-readable artifacts, not console-only text.

Guardrails:

  • policy must be explicit and declarative
  • policy evaluation must be reproducible from declared inputs only

Priority 4: Longitudinal Tracked Identity

Goal:

  • make Observer history-bearing for durable verification targets

Why this matters:

  • teams need to know what happened to one target across many builds, not just one run in isolation

Backlog:

  • Specify tracked identity objects above ephemeral suite case identity.
  • Define explicit rename, split, merge, and removal mapping primitives.
  • Add build-to-build history derivation keyed by tracked identity.
  • Support questions such as when a target first started failing or when its output contract changed.
  • Surface historical identity cleanly in compare-index and explorer flows.

Guardrails:

  • no fuzzy rename inference
  • no natural-language guessing over names or stderr shapes

Priority 5: Artifact Contract Verification

Goal:

  • treat produced artifacts as contractual verification subjects, not just side effects

Why this matters:

  • many serious products fail at artifact boundaries, compatibility boundaries, and downstream-consumption boundaries

Backlog:

  • Specify schema-oriented artifact contract checks.
  • Support backward-compatibility and invariance checks for declared artifacts.
  • Define artifact lineage and downstream consumability checks.
  • Add product-stage patterns for compiler outputs, package metadata, manifests, and other structured artifacts.
  • Make artifact contract drift a first-class compare and policy input.

Priority 6: Differential Verification As A First-Class Mode

Goal:

  • make baseline-versus-candidate judgment a direct platform concern

Why this matters:

  • compare should not remain only an after-the-fact viewer story

Backlog:

  • Define a first-class differential verification mode over baseline and candidate evidence.
  • Support target-set, artifact, telemetry, and policy comparison in one structured flow.
  • Emit one regression judgment artifact suitable for release gating.
  • Define how differential verification composes with product certification.

Priority 7: Cross-Layer Product Verification

Goal:

  • verify one product across unit, artifact, deployment, compatibility, and operational layers in one explicit model

Why this matters:

  • this is one of the strongest category differentiators available to Observer

Backlog:

  • Extend product certification guidance and examples for multi-layer release contracts.
  • Support stage patterns that mix provider-backed tests, workflow verification, consumer compatibility, and deployment-shaped checks.
  • Keep this layer local, explicit, and deterministic rather than drifting into CI-YAML replacement territory.

Strategic Integration Direction: CMake Product-Model Ingestion

Goal:

  • let Observer certify products that CMake already knows how to construct

Why this matters:

  • CMake and Observer are structurally complementary
  • CMake defines construction truth
  • Observer can define certification truth
  • this is not a CTest story and should not be framed as one

Working framing:

  • CMake defines the product surface
  • Observer defines the certification surface
  • CMake constructs
  • Observer certifies

Backlog:

  • Define what explicit CMake product data Observer should ingest.
  • Prefer machine-readable CMake model outputs over scraped text or convention guessing.
  • Identify the first valuable certification primitives for CMake-shaped products.
  • Support certification of targets, install trees, export sets, package outputs, generated artifacts, and downstream consumer compatibility.
  • Support product and subproduct rollup over CMake-defined components and outputs.
  • Define how target-set drift, artifact drift, and packaging drift should appear in compare and policy flows for CMake-built products.

Candidate first questions:

  • did all declared targets produce the expected artifact set
  • is the install or package surface complete and internally consistent
  • do exported targets still work for downstream consumers
  • are required docs, manifests, and metadata present for shipped components
  • did any component or artifact drift in a way policy should reject

Current achieved slice:

  • Observer can ingest explicit CMake File API reply data into a canonical lowered model.
  • Observer can hash that lowered model deterministically.
  • Observer can certify one repo-owned CMake stage with observer_cmake_model.
  • Observer can derive analytics from CMake child report evidence rather than only stage summary metadata.

Parity backlog:

  • Validate the unreleased CMake thin slice on Linux and WSL2 before release, and decide explicitly whether lowered-model identity should preserve host tool metadata such as CMake version and generator.

Next:

  • Add install-tree certification primitives backed by explicit CMake install rule data.
  • Add export-set certification primitives backed by explicit exported target data.
  • Add generated-artifact certification beyond final target artifact existence.
  • Add downstream consumer certification patterns for exported targets and packages.
  • Define the explicit contract shape for downstream consumer verification over exported CMake targets.
  • Support repo-owned downstream consumer fixtures that consume exported targets without hidden runner magic.
  • Keep downstream consumer verification as ordinary Observer certification stages rather than embedding ad hoc compile logic inside the CMake runner.

After that:

  • Add package-surface certification only where package outputs are explicit in CMake truth or explicit Observer contract data.
  • Add product and subproduct rollup semantics over CMake-defined components and outputs.
  • Add compare semantics for CMake target-set drift, artifact drift, install drift, and export drift.
  • Define compare result kinds for CMake products such as missing target, new target, artifact drift, install drift, and export drift.
  • Make CMake child evidence produce compare-ready tracked rows rather than only stage-local certification rows.
  • Define how CMake compare semantics compose with broader product compare and compare-index flows.

Later:

  • Add policy inputs and policy verdict composition over CMake-derived compare results.
  • Define policy rules for CMake-shaped products such as no target disappearance, no export loss, and no install drift.
  • Define how CMake-specific policy verdicts compose into top-level product certification.
  • Emit CMake policy outcomes as structured artifacts rather than viewer-only summaries.
  • Add provenance fields strong enough to attest which CMake model, artifacts, and toolchain facts were actually certified.
  • Record canonical identity for the certified lowered CMake model alongside child evidence and derived analytics.
  • Record which declared artifact paths, install evidence, and export evidence were actually consumed by certification.
  • Record explicit toolchain and generator facts needed to explain what CMake-defined product surface was certified.
  • Keep Observer's own repo build on its native toolchain until the CMake certification surface is broad enough to exercise real platform semantics rather than only thin-slice plumbing.

Guardrails:

  • CTest is not the conceptual center of this integration
  • Observer should not become "better CTest"
  • Ninja, CTest, and similar tools may exist in the surrounding toolchain, but they do not define the platform boundary
  • Observer should integrate with explicit CMake truth, not inherit CMake testing folklore

Priority 8: Explainable Failure And Drift

Goal:

  • make failures and regressions mechanically explainable from structured evidence

Why this matters:

  • high-rigor teams need "why" without hand-waving

Backlog:

  • Derive structured explanations for exit-code changes, target disappearance, artifact field loss, and telemetry baseline movement.
  • Expose explanation records through compare and product outputs.
  • Improve viewer UX around evidence-linked explanations.

Guardrails:

  • explanations must be derived from explicit evidence already in the model
  • no speculative narrative generation in canonical flows

Priority 9: Conformance Packs And Reusable Verification Modules

Goal:

  • let teams ship reusable verification contracts, not just local suites

Why this matters:

  • this is how Observer can grow into an ecosystem of serious quality contracts

Backlog:

  • Define packaging for reusable conformance suites, policies, and artifact checks.
  • Support examples such as CLI conformance, package-format conformance, compiler-output conformance, and protocol conformance.
  • Define deterministic installation and versioning rules for reusable packs.

Later Productization Layers

These are promising, but should come after provenance, regression semantics, policy, and artifact contracts are real:

  • certification explorer and hierarchical product evidence UI
  • verification bill of materials
  • portable verification bundles
  • richer verification graph replay and evidence explorers

These should be consequences of a stronger evidence model, not substitutes for one.

Certification Explorer Direction

This is not "make the HTML prettier".

It is a future certification system UI, and it may justify becoming a standalone GUI application.

Its job would be to let a user explore:

  • products
  • subproducts
  • components
  • certification stages
  • tracked targets
  • artifact contracts
  • regressions, degradations, drift, and policy outcomes

The intended questions are:

  • why is this product certified
  • which subproduct or component is blocking certification
  • which target degraded without fully failing
  • which artifact contract drifted
  • what changed relative to the previous build
  • what evidence actually supports this verdict

Required platform dependencies:

  • explicit product and subproduct composition
  • verification provenance and attestation
  • tracked identity across builds
  • first-class regression and degradation semantics
  • verification policy
  • artifact lineage and artifact contract evidence

Guardrails:

  • the UI must not lead the model
  • it should render explicit evidence, not invent semantics the platform does not have
  • it should remain a certification explorer, not drift into a generic CI dashboard

Foundation Already Bought

These items are no longer backlog direction. They are now platform foundation:

  • Rust provider host and self-hosted Rust verification path
  • canonical inventory and exact conformance fixtures for the Rust provider path
  • product certification as a first-class top-level contract
  • repo-owned self-certification tree at product.json plus tests/
  • examples that explain the structural handoff from teaching shape to operational truth
  • first repo-owned CMake product-model dogfood slice with canonical lowering, hashing, certification, and mixed product analytics

Small Remaining Ergonomics Work

These are real, but they are not the main direction of the platform:

  • If approved, add crates/observer-rust-macros.
  • Add #[observer_rust::test("...")] macro support.
  • Keep macro expansion mechanically equivalent to explicit registration.
  • Add tests proving macro and explicit registry produce the same inventory and execution behavior.

Decision note:

  • v0 stays explicit-registry-only unless a macro path can be proven mechanically equivalent
  • determinism is the gate; any sugar that changes canonical inventory or execution semantics is rejected

Aspirational Edge

This section is not a promise of near-term delivery.

It exists to show how far the project could go if Observer fully buys into being a certification platform rather than a mere test harness.

The point of this section is directional space.

We may never reach all of it directly. The value is that aiming toward this edge will force the platform to grow stronger intermediate capabilities, and those capabilities will likely unlock important new ideas we do not yet know how to name.

Productizable Far Edge

These are the far-edge bets most worth aiming at because they could put Observer into a genuinely different class of system.

  • verification certificates and proof-carrying builds
  • temporal policy and temporal verification over tracked build history
  • claim-space coverage over verification claims, invariants, policies, and release-contract dark areas
  • counterexample-guided reduction of failing verification evidence and rerun scope

Manifesto version:

  • a build is not complete until its verification claims, evidence, and policy verdict are first-class artifacts

Deep Semantic Foundation

These are strong aspirational foundations for how Observer may eventually think, even if they are not exposed directly as product slogans.

  • verification status as a richer algebra or lattice, not a bare pass/fail bit
  • explicit claim-and-evidence reasoning over what concrete evidence justifies which abstract verification claims
  • compositional verification where larger certification systems can be built from smaller ones without semantic collapse

These should guide design taste more than roadmap marketing.

Research Lenses, Not Roadmap Commitments

These ideas are useful at the edge of the subject matter, but should remain thinking tools unless and until they become mechanically defensible platform features.

  • abstract-interpretation style views of evidence-to-claim lowering
  • logic-engine or solver-style verification reasoning
  • semantic equivalence of verification objects under explicit transformation rules
  • other high-rigor models that emerge from the platform once stronger evidence contracts exist

Guardrail:

  • aspirational depth is welcome, but Observer must not buy mystery by sacrificing determinism, canonical grounding, or explicit contracts

Expected Payoff Of Aiming High

If this edge is correct, the path toward it should keep unlocking intermediate gains such as:

  • stronger provenance
  • stronger regression semantics
  • stronger policy
  • stronger history and tracked identity
  • stronger artifact contracts
  • stronger certification UX

And somewhere along that path, new capabilities that are hard to see today should become obvious because the platform surface will have changed.