perf: optimize signals hot paths#2756
Conversation
Four targeted optimizations driven by CodSpeed flamegraphs and local CPU profiles: 1. O(1) dependency revalidation. link() fell back to scanning the sub's dep list from the head whenever a dependency was re-read non-consecutively during the same recompute (e.g. doc.meta.author then doc.meta.timestamp), making dep tracking O(n^2) in dep-list length. Links now carry a per-recompute generation stamp and the membership test is a single integer compare. This was 51% of the saturated listened-paths reconcile bench (12k deps). 2. Batched-write subscriber-walk skip. Writing the same signal N times in one batch walked the full subscriber list N times; after the first walk every sub is already queued. setSignal now skips the walk while the global notify epoch is unchanged. The epoch advances whenever a notification is consumed (heap removal or tracked-effect run) and a signal's cache is invalidated when it gains a subscriber, so the skip only fires when the walk would be a provable no-op. 3. Reconcile allocation reduction. getAllKeys allocated spread arrays, a Set, and Array.from per tree node; it now returns the existing key array when key sets match. unwrap skips $TARGET symbol lookups on primitive leaves (which forced boxing). 4. Small allocation cleanups: getKeys only wraps Object.keys in untrack for proxy sources, snapshotImpl's no-override walk reads each property once via the descriptor, and effects reuse one bound runner instead of allocating a closure per update. Local results (same workloads as the CodSpeed suite, dev build): - reconcile, all ~12k paths subscribed: 27.5ms -> 5.6ms (4.9x) - reconcile, single deep() effect: 13.4ms -> 5.6ms (2.4x) - updateSignals 1->1000 fanout: 1.44ms -> 0.06ms (24x) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 91d1fd9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Merging this PR will improve performance by 76.5%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | createComputations:create1to1 |
81.5 ms | 97.4 ms | -16.33% |
| ❌ | read |
25.3 µs | 28.2 µs | -10.4% |
| ❌ | read |
32 µs | 34.9 µs | -8.27% |
| ❌ | construct |
56.7 µs | 60.5 µs | -6.25% |
| ⚡ | updateSignals:update1to1000 |
183.8 ms | 1.9 ms | ×97 |
| ⚡ | propagation:diamond |
2.7 ms | 1.2 ms | ×2.3 |
| ⚡ | propagation:avoidable |
2 ms | 1.2 ms | +70.03% |
| ⚡ | reconcile: deep tree, single deep() effect |
76.6 ms | 54.6 ms | +40.44% |
| ⚡ | reconcile: deep tree, all ~12k paths subscribed |
84.2 ms | 65.6 ms | +28.39% |
| ⚡ | hasAllowed |
26.5 µs | 24.4 µs | +8.76% |
| ⚡ | ownKeys |
163.8 µs | 152 µs | +7.75% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing brenelz:perf/signals-hot-paths (91d1fd9) with next (b26bc04)
|
Used the new Fable 5 model to test so see if it had any perf suggestions and it came up with these. I more or less wanted to see how codspeed would show up for these perf improvements. Feel free to take any of them or discard them |
Summary
Four targeted optimizations to
@solidjs/signalshot paths, driven by the CodSpeed flamegraphs from run 6a2a5421 plus local--cpu-profprofiles of the same workloads.1. O(1) dependency revalidation (
graph.ts,types.ts,core.ts)When a computation re-reads a dependency it already saw earlier in the same recompute pass non-consecutively (e.g.
doc.meta.authorthendoc.meta.timestamp—doc.metaread twice),link()fell back toisValidLink, which scans the dep list from the head to check membership. With large dep lists this is O(n²): it was 51% of total time in thereconcile: deep tree, all ~12k paths subscribedbench.Links now carry a
_genstamp of the subscriber's_depGenrecompute counter, set on creation and on in-order reuse. The membership question "was this dep already revalidated this pass?" becomes one integer compare. Equivalence: a link is inside the[head.._depsTail]prefix iff it was created or (B)-reused during the current pass, which is exactly what the stamp records.2. Skip redundant subscriber walks on batched writes (
core.ts,heap.ts,effect.ts)Writing the same signal N times in one batch walked the full subscriber list N times — after the first walk, every subscriber is already queued, so the remaining walks are provable no-ops (the
updateSignals:update1to1000flamegraph is ~95%insertSubs+insertIntoHeap).setSignalnow skips the walk while a globalnotifyEpochis unchanged. The epoch advances whenever a queued notification is consumed — any heap removal (deleteFromHeap, which all recompute paths go through) or a tracked-effect run draining_modified— and a signal's cached epoch is invalidated when it gains a new subscriber (link()) and on optimistic writes, which mutate subscriber lane state and always walk.3. Reconcile allocation reduction (
reconcile.ts)getAllKeysallocated[...keys, ...nextKeys]+ aSet+Array.fromfor every tracked node in the tree; it now returns the existing keys array when key sets match (the common shape), and merges without the spread copies otherwise.unwrapdid$TARGETsymbol lookups on primitive leaves, forcing boxing; primitives now return immediately.4. Small allocation cleanups (
store.ts,utils.ts,core.ts)getKeysonly wrapsObject.keysin anuntrackclosure when the source is itself a wrapped store (plain objects can't trigger traps).snapshotImpl's no-override walk reads each property once via the descriptor instead of three times._runEffect.bind(null, el)per update (runEffectalready no-ops on stale_modified, so re-enqueueing the same function is safe).Results (local, dev build, same workloads as the CodSpeed suite)
deep()effectExpected CodSpeed movement:
update1to1000(183.8 ms),reconcile all paths(84.2 ms), andreconcile deep()(76.6 ms) should drop substantially. The DOM-lane benches won't move much — their flamegraphs are 75–95% jsdom internals.Trade-offs
_gen), +2 fields per node (_depGen,_notifyEpoch): ~8–16 bytes each, may show as a sub-percent tick on the creation benches.notifyEpochis a monotonically increasing double — no wraparound concern in practice.Testing
tsc -p tsconfig.build.jsoncleansuppressComputedRecompute, dispose, and mid-batch new subscribers🤖 Generated with Claude Code