diff --git a/docs/tutorials/ci-integration.md b/docs/tutorials/ci-integration.md new file mode 100644 index 0000000..abf809b --- /dev/null +++ b/docs/tutorials/ci-integration.md @@ -0,0 +1,209 @@ +# Running trace-tests in a CI Pipeline + +Configure a GitHub Actions workflow that installs `agentrust-trace-tests`, runs the full conformance suite, and uploads the test report as an artifact. + +## What you'll learn + +- A working GitHub Actions workflow file with matrix Python version testing +- How to use `CMCP_DEV_MODE=1` to run the software-only TEE path in standard CI +- How to read a failure back to the specific error code and field it names +- When to skip hardware attestation tests that require real TEE hardware + +## Prerequisites + +```bash +pip install agentrust-trace-tests pytest pytest-json-report +``` + +--- + +## Write the workflow + +Create `.github/workflows/trace-tests.yml` in your repository: + +```yaml +name: TRACE conformance + +on: + push: + branches: [main] + pull_request: + +jobs: + conformance: + runs-on: ubuntu-latest + + strategy: + fail-fast: false + matrix: + python-version: ["3.10", "3.11", "3.12"] + + steps: + - uses: actions/checkout@v4 + + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + + - name: Install dependencies + run: pip install agentrust-trace-tests pytest pytest-json-report + + - name: Run conformance suite + env: + CMCP_DEV_MODE: "1" + run: | + pytest --tb=short \ + --json-report \ + --json-report-file=report-${{ matrix.python-version }}.json + + - name: Upload test report + if: always() + uses: actions/upload-artifact@v4 + with: + name: trace-report-py${{ matrix.python-version }} + path: report-${{ matrix.python-version }}.json +``` + +The `fail-fast: false` setting lets all Python versions finish even if one matrix leg fails, so you can see whether a failure is version-specific. + +--- + +## Set CMCP_DEV_MODE for software-only CI + +Standard CI runners have no TEE hardware. Set `CMCP_DEV_MODE=1` to allow records with `runtime.platform: "software-only"` to pass TR-RTE without a real attestation measurement. + +```yaml +- name: Run conformance suite + env: + CMCP_DEV_MODE: "1" + run: pytest --tb=short +``` + +When this environment variable is absent, TR-RTE checks that `runtime.platform` is a registered hardware TEE enum (`intel-tdx`, `amd-sev-snp`, `nvidia-h100`, etc.). With `CMCP_DEV_MODE=1`, the `software-only` platform value passes. Never set this flag in production verification. + +--- + +## Skip hardware attestation tests + +Some tests require a live TEE to produce a real attestation report. Mark them so they skip automatically on standard runners: + +```python +import os +import pytest + +HW_AVAILABLE = os.getenv("TEE_HARDWARE") == "1" + +@pytest.mark.skipif(not HW_AVAILABLE, reason="requires real TEE hardware") +def test_amd_sev_snp_measurement_round_trip(): + ... +``` + +In your workflow, standard jobs skip these tests silently. A separate job on hardware-provisioned runners can set `TEE_HARDWARE=1` to run the full set. + +You can also use pytest markers to exclude the hardware group entirely in CI: + +```bash +pytest -m "not hardware" --tb=short +``` + +Register the marker in `pytest.ini` or `pyproject.toml` to avoid the `PytestUnknownMarkWarning`: + +```toml +[tool.pytest.ini_options] +markers = [ + "hardware: tests that require a physical TEE (deselect with -m 'not hardware')", + "level0: Level 0 conformance tests", + "level1: Level 1 conformance tests", + "level2: Level 2 conformance tests", +] +``` + +--- + +## Read a failure back to its field + +When a test fails, `--tb=short` shows the assertion message, which includes the error code: + +``` +FAILED tests/test_level0.py::TestLevel0Conformance::test_policy_enforcement_mode_known +AssertionError: assert 'strict' in {'advisory', 'enforce', 'silent'} +``` + +Match the test name to the module (`tr_pol`) and look up the code in the [Error Codes](../error-codes.md) table. For failures surfaced by `runner.run()`, the `Finding` object carries the code directly: + +```python +from trace_tests.runner import run +from trace_tests.loader import load_record + +record, fmt = load_record("trust-record.json") +results = run(record, fmt, level=1) + +for module_id, findings in results.items(): + for f in findings: + if f.failed(): + print(f"{f.code} {f.message}") +``` + +Output: + +``` +TR-RTE-001 TR-RTE-001: runtime.platform must be a recognised TEE enum value, got 'custom-tee' +``` + +The code `TR-RTE-001` maps to `runtime.platform`. Fix the field in your record and re-run. + +--- + +## Upload the test report as an artifact + +`pytest-json-report` writes a machine-readable JSON file. The `upload-artifact` step makes it available in the GitHub Actions UI under the run summary. + +```yaml +- name: Upload test report + if: always() + uses: actions/upload-artifact@v4 + with: + name: trace-report-py${{ matrix.python-version }} + path: report-${{ matrix.python-version }}.json +``` + +`if: always()` uploads the report even when the test step fails, so you can inspect failures without re-running. + +To produce a JUnit XML report instead (for GitHub's built-in test summary): + +```bash +pytest --tb=short --junit-xml=results.xml +``` + +```yaml +- name: Publish test results + uses: actions/upload-artifact@v4 + if: always() + with: + name: junit-results + path: results.xml +``` + +--- + +## Matrix testing across Python versions + +The `strategy.matrix` block runs the full suite on Python 3.10, 3.11, and 3.12 in parallel. Each leg uploads its own artifact so version-specific regressions are visible without comparing logs manually. + +To add a new version, append it to the list: + +```yaml +matrix: + python-version: ["3.10", "3.11", "3.12", "3.13"] +``` + +If a module uses a stdlib API that changed between versions, the matrix will catch it. The `trace_tests` library targets the same version range, so failures here indicate a compatibility problem in your custom tests or fixtures, not in the library itself. + +--- + +## Summary + +You have a GitHub Actions workflow that installs `agentrust-trace-tests`, runs the suite across three Python versions with `CMCP_DEV_MODE=1`, and saves per-version JSON reports as artifacts. Hardware attestation tests are marked and skipped on standard runners. When a test fails, the error code in the assertion message maps directly to the spec field that failed. + +For more on what each error code means, see [Error Codes](../error-codes.md). To write custom tests against specific modules, see [Writing Conformance Tests](./writing-conformance-tests.md). diff --git a/docs/tutorials/writing-conformance-tests.md b/docs/tutorials/writing-conformance-tests.md new file mode 100644 index 0000000..110eb5e --- /dev/null +++ b/docs/tutorials/writing-conformance-tests.md @@ -0,0 +1,247 @@ +# Writing Custom TRACE Conformance Tests + +Write a custom pytest test that verifies a specific field in a TRACE Trust Record using the `trace_tests` library. + +## What you'll learn + +- What the three conformance levels require and which modules activate at each level +- How to build a minimal TRACE record fixture and call the module `check()` functions directly +- How to interpret `Finding` results and match them to error codes + +## Prerequisites + +```bash +pip install agentrust-trace-tests pytest cryptography +``` + +--- + +## Understand the conformance levels + +TRACE defines three levels. Each level activates a cumulative set of modules: + +| Level | Required modules | Typical use | +|-------|-----------------|-------------| +| 0 | TR-ENV, TR-SIG, TR-POL | Software-only development and staging | +| 1 | Level 0 + TR-RTE, TR-SCA | Production TEE-attested records | +| 2 | Level 1 + TR-TXN, TR-ANC | Full records with SCITT transparency anchoring | + +At Level 0 you can set `runtime.platform` to `"software-only"` and skip hardware attestation entirely. At Level 1 you must supply a real TEE measurement from AMD SEV-SNP, Intel TDX, NVIDIA H100, or similar. Level 2 adds a SCITT receipt URI and a bound tool-call transcript hash. + +The `runner.run()` function respects this table. Modules not required at the requested level are never invoked. + +--- + +## Run the existing test suite + +The published suite uses pytest markers to group tests by level: + +```bash +# Run everything +pytest + +# Run only Level 0 tests +pytest -m level0 + +# Short traceback, stop after the first failure +pytest --tb=short -x +``` + +Each test file uses a pytest fixture (defined in `tests/conftest.py`) that loads a JSON vector or builds a signed record in memory. The fixture names match the level they cover: `valid_level0`, `signed_eat_fixture`, `trust_record`. + +A passing run looks like: + +``` +tests/test_level0.py::TestLevel0Conformance::test_schema_valid PASSED +tests/test_level0.py::TestLevel0Conformance::test_eat_profile_sentinel PASSED +... +``` + +A skip appears when a test is conditional on an optional field: + +``` +tests/test_level0.py::TestLevel0Conformance::test_transcript_digest_when_present SKIPPED +``` + +A failure looks like: + +``` +FAILED tests/test_level1.py::test_tee_platform_present +AssertionError: TR-RTE-001: runtime.platform must be a recognised TEE enum value +``` + +--- + +## Understand the module system + +Each module is a Python file under `src/trace_tests/modules/`. Every module exposes a `check()` function that accepts the extracted TRACE fields dict and returns `list[Finding]`. + +```python +# Every module follows this pattern +from trace_tests.modules import tr_pol +from trace_tests.result import Finding, Status + +findings: list[Finding] = tr_pol.check(trace_dict) +``` + +A `Finding` carries three fields: + +```python +finding.code # e.g. "TR-POL-001" +finding.status # Status.PASS, Status.FAIL, Status.SKIP, or Status.UNVERIFIED +finding.message # human-readable detail + +finding.passed() # True when status == PASS +finding.failed() # True when status == FAIL +finding.skipped() # True when status == SKIP +finding.unverified() # True when status == UNVERIFIED +``` + +`UNVERIFIED` is distinct from `SKIP`. It means the record carries no signature that can be verified. At Level 0 this is allowed; at Level 1 and above it counts as a failure so a caller cannot mistake an unverified record for a passing one. + +--- + +## Write a custom test + +The simplest pattern is to build a minimal trace dict, call the module directly, and assert on the findings. + +```python +# tests/custom/test_my_policy.py +import pytest +from trace_tests.modules import tr_pol +from trace_tests.result import Status + + +def _policy_trace(bundle_hash: str, enforcement_mode: str) -> dict: + return { + "policy": { + "bundle_hash": bundle_hash, + "enforcement_mode": enforcement_mode, + } + } + + +def test_sha256_bundle_hash_passes(): + trace = _policy_trace( + bundle_hash="sha256:" + "a" * 64, + enforcement_mode="enforce", + ) + findings = tr_pol.check(trace) + assert all(not f.failed() for f in findings), findings + + +def test_md5_bundle_hash_fails_tr_pol_001(): + trace = _policy_trace(bundle_hash="md5:abc123", enforcement_mode="enforce") + codes = {f.code for f in tr_pol.check(trace) if f.failed()} + assert "TR-POL-001" in codes + + +def test_unknown_enforcement_mode_fails_tr_pol_002(): + trace = _policy_trace( + bundle_hash="sha256:" + "b" * 64, + enforcement_mode="strict", # not in {enforce, advisory, silent} + ) + codes = {f.code for f in tr_pol.check(trace) if f.failed()} + assert "TR-POL-002" in codes +``` + +For modules that need the full raw record (TR-SIG), pass both the extracted trace and the raw record: + +```python +from trace_tests.modules.tr_sig import check as check_sig + +# fmt is "cmcp-runtime" for cMCP envelopes, "trace" for plain TRACE records +findings = check_sig(trace=record["trace"], record=record, fmt="cmcp-runtime", level=0) +``` + +For TR-ENV, pass `max_age_seconds` to override the default 24-hour freshness window in tests: + +```python +from trace_tests.modules.tr_env import check as check_env + +findings = check_env(trace, max_age_seconds=3600) +``` + +--- + +## Build a signed fixture + +When you need a cryptographically valid record (required for TR-SIG tests at Level 1+), generate a key pair and sign the canonical JSON body yourself: + +```python +import base64 +import json +import time + +from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey + + +def _b64url(b: bytes) -> str: + return base64.urlsafe_b64encode(b).rstrip(b"=").decode() + + +def _canonical_json(d: dict) -> bytes: + return json.dumps(d, sort_keys=True, separators=(",", ":"), ensure_ascii=True).encode() + + +def build_signed_cmcp_record(platform: str = "tpm2") -> dict: + priv = Ed25519PrivateKey.generate() + pub_raw = priv.public_key().public_bytes_raw() + x = _b64url(pub_raw) + kid = f"test-{pub_raw[:4].hex()}" + + record = { + "cmcp_version": "1.0", + "trace": { + "eat_profile": "tag:agentrust.io,2026:trace-v0.1", + "iat": int(time.time()) - 30, + "subject": "spiffe://cmcp.gateway/session/my-test", + "runtime": { + "platform": platform, + "measurement": "sha256:" + "a" * 64, + }, + "policy": { + "bundle_hash": "sha256:" + "b" * 64, + "enforcement_mode": "enforce", + }, + "data_class": "internal", + "cnf": {"jwk": {"kty": "OKP", "crv": "Ed25519", "x": x, "kid": kid}}, + }, + "gateway": {"session_id": "my-test"}, + "signature": "", + } + + body = _canonical_json({k: v for k, v in record.items() if k != "signature"}) + record["signature"] = _b64url(priv.sign(body)) + return record +``` + +This matches the helper pattern in `tests/conftest.py` and the `_build_signed_cmcp_record` function used by the published fixtures. + +--- + +## Interpret error codes + +Error codes follow the pattern `TR--`. The module prefix tells you which spec section failed; the number points to the specific field. The full table is at [Error Codes](../error-codes.md). + +Common codes you will encounter: + +| Code | Field | Fix | +|------|-------|-----| +| TR-ENV-001 | `eat_profile` | Must be `tag:agentrust.io,2026:trace-v0.1` | +| TR-ENV-002 | `iat` | Must be a Unix timestamp in the last 24 hours | +| TR-SIG-001 | `signature` | Signature missing or does not verify | +| TR-SIG-002 | `cnf.jwk` | Key must be OKP/Ed25519 | +| TR-POL-001 | `policy.bundle_hash` | Must match `sha256:<64 hex chars>` | +| TR-POL-002 | `policy.enforcement_mode` | Must be `enforce`, `advisory`, or `silent` | +| TR-RTE-001 | `runtime.platform` | Must be a registered TEE platform enum | + +When a finding carries `status == Status.UNVERIFIED`, the record has no signature. This is not a benign skip at Level 1 or above. + +--- + +## Summary + +You ran the existing suite with pytest, called individual module `check()` functions directly, and built a signed test fixture from scratch. The `Finding` dataclass with `code`, `status`, and `message` fields is the single interface across all seven modules. + +Next steps: [CI Integration](./ci-integration.md) shows how to run these tests in GitHub Actions with matrix Python versions and artifact upload. diff --git a/mkdocs.yml b/mkdocs.yml index 4d6b509..48b5763 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -122,6 +122,9 @@ extra_css: nav: - Home: README.md - Getting Started: docs/quickstart.md + - Tutorials: + - Writing conformance tests: docs/tutorials/writing-conformance-tests.md + - CI integration: docs/tutorials/ci-integration.md - Test Modules: - Overview: docs/modules.md - Envelope (TR-ENV): docs/modules/tr-env.md