Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
61f9b62
Ignore notes & snippets subdirs in `git`
goodboy Apr 10, 2026
bca7f10
Reorganize `.gitignore` by skill/purpose
goodboy Apr 11, 2026
dca7967
Add `lastfailed` cache inspection to `/run-tests` skill
goodboy Apr 14, 2026
fae2525
Expand `/run-tests` venv pre-flight to cover all cases
goodboy Apr 14, 2026
c72dc8c
Bump `xonsh` to latest pre `0.23` release
goodboy Apr 16, 2026
e8e4657
Pin `xonsh` to GH `main` in editable mode
goodboy Apr 17, 2026
ba111b3
Split py-version-gated uv dependency-groups
goodboy Jun 10, 2026
47ac8c0
Avoid skip `.ipc._ringbuf` import when no `cffi`
goodboy Apr 17, 2026
e5d4d94
Import-or-skip `.devx.` tests requiring `greenback`
goodboy Apr 24, 2026
fc3ab95
Add global 200s `pytest-timeout`
goodboy Apr 21, 2026
c170007
Add zombie-actor check to `run-tests` skill
goodboy Apr 23, 2026
974cb15
Use SIGINT-first ladder in `run-tests` cleanup
goodboy Apr 23, 2026
a26619d
Claude-perms: ensure /commit-msg files can be written!
goodboy Apr 23, 2026
cda9423
Codify capture-pipe hang lesson in skills
goodboy Apr 24, 2026
4e9f392
Drop global `pytest-timeout` cap from `pyproject.toml`
goodboy Apr 28, 2026
8c6b7fd
Add posix-multithreaded-`fork()` explainer doc
goodboy Apr 29, 2026
3dc3f84
Flip back to default `pytest` capture for CI
goodboy Apr 29, 2026
3f18caa
Add `test_register_duplicate_name` race analysis
goodboy May 5, 2026
0a3772c
Add `RuntimeVars` env-var lift design plan
goodboy May 6, 2026
9defec4
Bump to latest `pytest` release!
goodboy May 13, 2026
cec0731
Mk `test_no_runtime()` not require `pytest-trio`
goodboy May 14, 2026
dd65194
Add `logspec` leaf-mod Route B follow-up doc
goodboy May 29, 2026
6b0cb17
Address Copilot review nits on PR #461
goodboy Jun 17, 2026
9f1a64f
Relock `uv.lock` for py3.13+ & `pytest` CVE
goodboy Jun 17, 2026
c5feeac
Pin `pytest>=9.0.3` for CVE-2025-71176 floor
goodboy Jun 17, 2026
c0fdfa4
Pin sdist-install step to py3.13
goodboy Jun 17, 2026
26f2b23
Address Copilot review nits on PR #461 (round 2)
goodboy Jun 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions .claude/notes/rt_vars_lift_plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# `RuntimeVars` env-var lift — design plan

Status: **draft, awaiting user edits**

## Goal

Consolidate the sprawl of pytest CLI flags + ad-hoc env vars +
hardcoded fixture defaults into a *single* env-var-encoded
runtime-vars envelope, with a typed in-memory representation
(`tractor.runtime._state.RuntimeVars`) as the sole source of
truth.

## Why now

- `--tpt-proto`, `--spawn-backend`, `--diag-on-hang`,
`--diag-capture-delay` and (soon) `TRACTOR_REG_ADDR` etc. are
proliferating. Each adds a parsing seam.
- `tests/devx/test_debugger.py` invokes example scripts as
separate subprocesses; they currently can't see the
fixture-allocated `reg_addr` at all (root cause of why
parametrizing devx scripts on `reg_addr` is on your TODO).
- Concurrent pytest sessions on the same host collide on
shared defaults (the `registry@1616` race we just fixed is
one symptom; per-session unique addr is the structural
fix).
- `tractor.runtime._state.RuntimeVars: Struct` is already
defined and **unused** — its docstring even says it
"should be utilized as possible for future calls."

## Design

### Module: `tractor/_testing/_rtvars.py`

Lifted from `modden.runtime.env`, ~50 LOC, no new deps.

```python
_TRACTOR_RT_VARS_OSENV: str = '_TRACTOR_RT_VARS'

def dump_rtvars(rtvars: RuntimeVars|dict) -> tuple[str, str]:
'''str-serialize via `str(dict)` — ast.literal_eval-able'''

def load_rtvars(env: dict) -> RuntimeVars:
'''ast.literal_eval the env-var value, hydrate to struct'''

def get_rtvars(proc: psutil.Process|None = None) -> RuntimeVars:
'''read the var from a target proc's env (or current)'''

def update_rtvars(
rtvars: RuntimeVars|dict|None = None,
update_osenv: bool|dict = True,
) -> tuple[str, str]:
'''mutate + re-encode + (optionally) write to os.environ'''
```

### Encoding choice: `str(dict)` + `ast.literal_eval`

Pros:
- stdlib only
- handles all the types tractor's tests need: `str`, `int`,
`float`, `bool`, `None`, `list`, `tuple`, `dict`
- human-readable in the env (greppable, inspectable via
`cat /proc/<pid>/environ | tr '\0' '\n'`)

Cons:
- non-stdlib types (msgspec Structs, `Path`, custom classes)
must be lowered first — fine for the test fixture set
- not stable across Python versions for esoteric repr cases
(we don't hit any)

Alternatives considered:
- **msgpack**: adds a dep + binary form is ungreppable
- **json**: doesn't preserve tuples (becomes lists), which is
a common type for `reg_addr`
- **toml/yaml**: heavier deps, no real benefit

### `RuntimeVars` becomes the single source of truth

The legacy `_runtime_vars: dict[str, Any]` global in
`runtime/_state.py` becomes a *cached view* of a
`RuntimeVars` singleton instance:

- `get_runtime_vars()` returns either the struct or a
`.to_dict()` view depending on caller's preference
- `set_runtime_vars(...)` validates against the struct schema
- spawn-time SpawnSpec sends the struct (already does
conceptually — just gets typed)
- `__setattr__` `breakpoint()` debug instrumentation gets
removed (unrelated cleanup, mentioned in conversation)

### Migration path

**Phase 0** *(prep)*: strip the stray `breakpoint()` from
`RuntimeVars.__setattr__`.

**Phase 1**: land `_rtvars.py` as a leaf module, used only by
test infra. Subprocess-spawned scripts in `tests/devx/`
read `_TRACTOR_RT_VARS` on startup → reconstruct
`RuntimeVars` → call `tractor.open_root_actor(**rtvars.as_kwargs())`.
Concurrent runs become deterministic-isolated because each
session writes a unique `_registry_addrs` into the env.

**Phase 2**: migrate runtime callers (`_state.get_runtime_vars`,
spawn `SpawnSpec`, `Actor.async_main`) to operate on the
struct directly, with the dict as a compat view that gets
deprecated.

**Phase 3** *(structural)*: per-session bindspace subdir
`/run/user/<uid>/tractor/<session_uuid>/` — encoded in the
rt-vars envelope, picked up by every subactor automatically.
Obsoletes the entire bindspace-leak warning class.

## Open design questions (user input wanted)

- (placeholder for your edits)
- (placeholder)
- (placeholder)

## Out-of-scope for this lift

- Anything in `modden.runtime.env` related to `Spawn`,
`WmCtl`, `Wks` — that's a workspace orchestration layer,
not an env-var helper. We only lift the four utility
functions + the var name constant.
- Switching to msgpack/json — explicitly chosen against
above.
22 changes: 14 additions & 8 deletions .claude/settings.local.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
{
"permissions": {
"allow": [
"Bash(date *)",
"Bash(cp .claude/*)",
"Read(.claude/**)",
"Read(.claude/skills/run-tests/**)",
"Write(.claude/**/*commit_msg*)",
"Write(.claude/git_commit_msg_LATEST.md)",
"Skill(run-tests)",
"Skill(close-wkt)",
"Skill(open-wkt)",
"Skill(prompt-io)",
"Bash(date *)",
"Bash(git diff *)",
"Bash(git log *)",
"Bash(git status)",
Expand All @@ -23,14 +31,12 @@
"Bash(UV_PROJECT_ENVIRONMENT=py* uv sync:*)",
"Bash(UV_PROJECT_ENVIRONMENT=py* uv run:*)",
"Bash(echo EXIT:$?:*)",
"Write(.claude/*commit_msg*)",
"Write(.claude/git_commit_msg_LATEST.md)",
"Skill(run-tests)",
"Skill(close-wkt)",
"Skill(open-wkt)",
"Skill(prompt-io)"
"Bash(echo \"EXIT=$?\")",
"Read(/tmp/**)"
],
Comment on lines 33 to 36

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 response authored by claude-code

Intentional for now — this is a local-only
settings.local.json (per-dev convenience, not shipped
policy). The suite reads many transient /tmp artifacts
beyond the registry socks (pytest's /tmp/pytest-*, UDS
binds, stackscope dumps), so a blanket Read(/tmp/**)
avoids whack-a-mole on patterns. Can tighten later if we
formalize a shared policy.

"deny": [],
"ask": []
}
},
"prefersReducedMotion": false,
"outputStyle": "default"
}
66 changes: 66 additions & 0 deletions .claude/skills/conc-anal/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,3 +229,69 @@ Unlike asyncio, trio allows checkpoints in
that does `await` can itself be cancelled (e.g.
by nursery shutdown). Watch for cleanup code that
assumes it will run to completion.

### Unbounded waits in cleanup paths

Any `await <event>.wait()` in a teardown path is
a latent deadlock unless the event's setter is
GUARANTEED to fire. If the setter depends on
external state (peer disconnects, child process
exit, subsequent task completion) that itself
depends on the current task's progress, you have
a mutual wait.

Rule: **bound every `await X.wait()` in cleanup
paths with `trio.move_on_after()`** unless you
can prove the setter is unconditionally reachable
from the state at the await site. Concrete recent
example: `ipc_server.wait_for_no_more_peers()` in
`async_main`'s finally (see
`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`
"probe iteration 3") — it was unbounded, and when
one peer-handler was stuck the wait-for-no-more-
peers event never fired, deadlocking the whole
actor-tree teardown cascade.

### The capture-pipe-fill hang pattern (grep this first)

When investigating any hang in the test suite
**especially under fork-based backends**, first
check whether the hang reproduces under `pytest
-s` (`--capture=no`). If `-s` makes it go away
you're not looking at a trio concurrency bug —
you're looking at a Linux pipe-buffer fill.

Mechanism: pytest replaces fds 1,2 with pipe
write-ends. Fork-child subactors inherit those
fds. High-volume error-log tracebacks (cancel
cascade spew) fill the 64KB pipe buffer. Child
`write()` blocks. Child can't exit. Parent's
`waitpid`/pidfd wait blocks. Deadlock cascades up
the tree.

Pre-existing guards in `tests/conftest.py` encode
this knowledge — grep these BEFORE blaming
concurrency:

```python
# tests/conftest.py:258
if loglevel in ('trace', 'debug'):
# XXX: too much logging will lock up the subproc (smh)
loglevel: str = 'info'

# tests/conftest.py:316
# can lock up on the `_io.BufferedReader` and hang..
stderr: str = proc.stderr.read().decode()
```

Full post-mortem +
`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`
for the canonical reproduction. Cost several
investigation sessions before catching it —
because the capture-pipe symptom was masked by
deeper cascade-deadlocks. Once the cascades were
fixed, the tree tore down enough to generate
pipe-filling log volume → capture-pipe finally
surfaced. Grep-note for future-self: **if a
multi-subproc tractor test hangs, `pytest -s`
first, conc-anal second.**
Loading
Loading