Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
],
"scripts": {
"test": "vitest run",
"bench": "vitest run --config packages/core/vitest.perf.config.ts",
"lint": "biome check .",
"lint:fix": "biome check --write .",
"typecheck": "tsc --build tsconfig.build.json",
Expand Down
73 changes: 73 additions & 0 deletions packages/core/__perf__/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# `@logic-md/core` perf assertions

Pre-merge regression assertions on the three core paths most likely to acquire
silent quadratic behaviour, per the analysis in #46.

## Running

From the repository root:

```bash
npm run bench
```

This invokes vitest with [`vitest.perf.config.ts`](../vitest.perf.config.ts),
which picks up only `**/__perf__/**/*.perf.ts` files and runs them serially in
a single fork (for stable timings). Default `npm test` does not run the bench
suite — `*.perf.ts` is outside the default `**/*.test.ts` glob.

## Coverage

| File | Asserts |
|---|---|
| [`compiler.perf.ts`](compiler.perf.ts) | `compileWorkflow` on a 200-step linear chain |
| [`expression.perf.ts`](expression.perf.ts) | `evaluate` × 10,000 calls on the same template against varying contexts |
| [`dag.perf.ts`](dag.perf.ts) | `resolve` on a 1000-step linear chain |

Linear chains are the worst-case input shape — depth equals node count, which
maximises the impact of any per-pop or per-level work in the DAG resolver and
maximises the per-step traversal cost in the compiler.

## Calibration methodology

Thresholds are calibrated against `main` per the methodology agreed in #46:

1. Run the bench on `main` repeatedly across multiple developer-machine
sessions with varying background load.
2. Take the worst observed elapsed time per metric.
3. Multiply by **1.5** (Math.ceil) for slower-machine headroom.
4. Round up to a clean number for the assertion threshold.

The +50% headroom is wider than the +25% suggested in the original #46 review,
based on observed variance on Windows developer machines (single-shot timings
can vary up to ~3× between quiet and loaded sessions). The bench is opt-in, not
default-CI, so this trade-off favours stable execution at the cost of slightly
weaker regression sensitivity. Once the algorithmic fixes in PRs 2-4 land, the
assertion margin will widen substantially (~100× for the compiler fix), which
provides a much sharper proof-of-fix signal than the initial calibration.

Each `*.perf.ts` file documents its own calibration data in a header comment so
that recalibration after a change is auditable. If a fix legitimately reduces
the workload (e.g. PR 2 in the #46 sequence eliminating the per-step DAG
re-resolution), the threshold should NOT be tightened in the same PR — leave
the headroom widening as visible proof of the fix.

## Adding a new bench

1. Create `<name>.perf.ts` next to existing files.
2. Use `describe` + `test` from `vitest`.
3. Always include a warm-up call before timed measurement (let v8 optimise the
hot path).
4. Run `node` directly with the same workload 5 times against `main`, capture
raw timings, document them in a header comment, and lock the worst × 1.25.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale multiplier in the "Adding a new bench" guide (×1.25 vs ×1.5 in the methodology).

Line 62 still says "lock the worst × 1.25", but the calibration methodology section (line 38) uses ×1.5. A future contributor following the "Adding a new bench" steps will under-provision headroom and produce a flakier assertion than intended.

📝 Proposed fix
-4. Run `node` directly with the same workload 5 times against `main`, capture
-   raw timings, document them in a header comment, and lock the worst × 1.25.
+4. Run `node` directly with the same workload 5 times against `main`, capture
+   raw timings, document them in a header comment, and lock the worst × 1.5
+   (see Calibration methodology above).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/__perf__/README.md` at line 62, The README contains a stale
multiplier in the "Adding a new bench" guide: replace the phrase "lock the worst
× 1.25" with the correct multiplier "lock the worst × 1.5" so it matches the
calibration methodology described earlier (the "calibration methodology" section
that specifies ×1.5); update any nearby explanatory text or examples that
reference ×1.25 to use ×1.5 to keep the guide consistent.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: The new-bench checklist conflicts with the calibration methodology: it says worst × 1.25 while the documented policy is worst × 1.5.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/__perf__/README.md, line 62:

<comment>The new-bench checklist conflicts with the calibration methodology: it says worst × 1.25 while the documented policy is worst × 1.5.</comment>

<file context>
@@ -0,0 +1,73 @@
+3. Always include a warm-up call before timed measurement (let v8 optimise the
+   hot path).
+4. Run `node` directly with the same workload 5 times against `main`, capture
+   raw timings, document them in a header comment, and lock the worst × 1.25.
+
+## Why these three?
</file context>


## Why these three?

These are the three concrete candidates surfaced in [#46](../../../../issues/46) — places where the implementation is correct at small scale but algorithmically quadratic+ at scale, currently invisible to all 325 unit tests. The bench suite is the regression net for the full sequence:

- **PR 1 (this scaffold):** establish discipline; assertions pass on main.
- **PR 2:** compiler fix (compileStep accepting pre-computed dagResult).
- **PR 3:** expression cache (AST cache in `evaluate`).
- **PR 4:** DAG sort tightening (eliminate per-pop queue sort and level-filter loop).

After each fix, re-running `npm run bench` shows the assertion margin widening — which IS the proof.
67 changes: 67 additions & 0 deletions packages/core/__perf__/_helpers.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
// =============================================================================
// Perf-test helpers — synthetic spec generators for scaling assertions
// =============================================================================
// These are NOT part of the public API. They live under __perf__/ and are only
// used by the bench suite (`npm run bench`).
// =============================================================================

import type { LogicSpec, Step, WorkflowContext } from "../types.js";

/**
* Generate a `LogicSpec` with `n` steps in a strict linear chain
* (step_0 → step_1 → … → step_{n-1}).
*
* Linear chains are the worst case for several scaling concerns:
* - DAG resolve's level-grouping filter (D = N depths)
* - compileWorkflow's per-step DAG re-resolution (N×(V+E) traversal)
* - Token-budget warnings as the prompt segment grows.
*/
export function makeLinearChainSpec(n: number): LogicSpec {
if (n < 1) {
throw new Error(`makeLinearChainSpec requires n >= 1, got ${n}`);
}
const steps: Record<string, Step> = {
step_0: {
description: "first",
instructions: "first step in linear chain",
},
};
for (let i = 1; i < n; i++) {
steps[`step_${i}`] = {
description: `step ${i}`,
instructions: `step ${i} in linear chain`,
needs: [`step_${i - 1}`],
};
}
return {
spec_version: "1.0",
name: "linear-chain-perf",
steps,
};
}

/**
* Just the `steps` map from `makeLinearChainSpec(n)`.
* Useful when calling `resolve(steps)` directly.
*/
export function makeLinearChainSteps(n: number): Record<string, Step> {
const spec = makeLinearChainSpec(n);
return spec.steps as Record<string, Step>;
}
Comment on lines +47 to +50
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm the type of LogicSpec.steps in types.ts
rg -n "steps" --type ts -A 2 -B 2 packages/core/types.ts

Repository: SingularityAI-Dev/logic-md

Length of output: 1661


🏁 Script executed:

#!/bin/bash
# Find makeLinearChainSpec implementation
rg -n "makeLinearChainSpec" --type ts -A 10 packages/core/__perf__/_helpers.ts | head -40

Repository: SingularityAI-Dev/logic-md

Length of output: 969


Replace the as Record<string, Step> cast with a proper type fix.

The cast suppresses a real type mismatch: makeLinearChainSpec always populates steps, but TypeScript sees LogicSpec.steps as optional (Record<string, Step> | undefined). While the function logic guarantees steps are present, the type definition doesn't reflect this. Either make steps non-optional in LogicSpec (or in a dedicated return type from makeLinearChainSpec), or use a non-null assertion (spec.steps!) if the optional definition must remain. The silent cast hides a type safety gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/__perf__/_helpers.ts` around lines 47 - 50, The current
makeLinearChainSteps uses an unsafe cast (as Record<string, Step>) to silence
that LogicSpec.steps is optional; replace this by fixing the types instead of
casting: either update the LogicSpec definition (or the specific return type of
makeLinearChainSpec) to make steps non-optional, or if you must keep
LogicSpec.steps optional, use a non-null assertion when accessing spec.steps in
makeLinearChainSteps (spec.steps!) so the compiler sees you intended a present
value; change the type in LogicSpec or the makeLinearChainSpec return type to
reflect that steps is always populated, or use spec.steps! in
makeLinearChainSteps and remove the as Record<string, Step> cast.


/**
* Default `WorkflowContext` for compile-bench measurements.
*/
export function makeWorkflowContext(): WorkflowContext {
return {
currentStep: "step_0",
previousOutputs: {},
input: {},
attemptNumber: 1,
branchReason: null,
previousFailureReason: null,
totalSteps: 0,
completedSteps: [],
dagLevels: [],
};
}
57 changes: 57 additions & 0 deletions packages/core/__perf__/compiler.perf.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
// =============================================================================
// Perf assertion: compileWorkflow scaling
// =============================================================================
// Pins the cost of compiling a 200-step linear-chain workflow against current
// `main`. Linear chains are the worst-case shape for `compileWorkflow` because
// every `compileStep` call re-resolves the full DAG (Candidate 1 in #46).
//
// Chain size of 200 (rather than 1000) keeps the bench under 2 seconds per
// run; once Candidate 1's fix lands the same workload should drop ~100×, and
// the assertion margin will widen dramatically — exactly the proof-of-fix
// signal Rain asked for in his sequencing comment.
//
// Threshold calibration methodology (per #46 review):
// 1. Run on `main` 5 times.
// 2. Take the worst observed elapsed time.
// 3. Multiply by 1.25 (Math.ceil) for slower-machine headroom.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Stale methodology comment: line says "Multiply by 1.25" but the actual calibration used ×1.5 (as documented just below at line 23). This contradiction in the same header will mislead future recalibrators.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/__perf__/compiler.perf.ts, line 16:

<comment>Stale methodology comment: line says "Multiply by 1.25" but the actual calibration used ×1.5 (as documented just below at line 23). This contradiction in the same header will mislead future recalibrators.</comment>

<file context>
@@ -0,0 +1,57 @@
+// Threshold calibration methodology (per #46 review):
+//   1. Run on `main` 5 times.
+//   2. Take the worst observed elapsed time.
+//   3. Multiply by 1.25 (Math.ceil) for slower-machine headroom.
+//   4. Lock that value in as the assertion threshold.
+//
</file context>
Suggested change
// 3. Multiply by 1.25 (Math.ceil) for slower-machine headroom.
// 3. Multiply by 1.5 (Math.ceil) for slower-machine headroom.

// 4. Lock that value in as the assertion threshold.
Comment on lines +13 to +17
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale ×1.25 comment in the methodology block; actual calibration used ×1.5.

Lines 13–17 still describe the original ×1.25 methodology, while lines 25–30 correctly document the ×1.5 approach actually used. Having both in the same header will confuse future recalibrators — the lines below should supersede, but that relies on the reader noticing the contradiction.

📝 Proposed fix
-// Threshold calibration methodology (per `#46` review):
-//   1. Run on `main` 5 times.
-//   2. Take the worst observed elapsed time.
-//   3. Multiply by 1.25 (Math.ceil) for slower-machine headroom.
-//   4. Lock that value in as the assertion threshold.
+// Threshold calibration methodology (per README calibration section):
+//   1. Run on `main` 5 times across quiet and loaded sessions.
+//   2. Take the worst observed elapsed time.
+//   3. Multiply by 1.5 (Math.ceil) for slower-machine headroom.
+//   4. Lock that value in as the assertion threshold.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Threshold calibration methodology (per #46 review):
// 1. Run on `main` 5 times.
// 2. Take the worst observed elapsed time.
// 3. Multiply by 1.25 (Math.ceil) for slower-machine headroom.
// 4. Lock that value in as the assertion threshold.
// Threshold calibration methodology (per README calibration section):
// 1. Run on `main` 5 times across quiet and loaded sessions.
// 2. Take the worst observed elapsed time.
// 3. Multiply by 1.5 (Math.ceil) for slower-machine headroom.
// 4. Lock that value in as the assertion threshold.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/__perf__/compiler.perf.ts` around lines 13 - 17, Update the
"Threshold calibration methodology" comment block so it matches the actual
calibration multiplier used (×1.5) instead of the stale ×1.25; locate the header
titled "Threshold calibration methodology" in
packages/core/__perf__/compiler.perf.ts and change the step describing "Multiply
by 1.25 (Math.ceil) for slower-machine headroom." to mention "Multiply by 1.5"
(and update any parenthetical/math note accordingly) so the top-of-file
methodology no longer contradicts the later documentation that documents ×1.5.

//
// Calibration data captured 2026-05-07 on Node v22.18.0 across multiple
// developer-machine sessions with varying background load:
// quiet runs: 746ms, 778ms, 1318ms, 1326ms, 1398ms
// loaded runs: 2102ms, 2607ms, 2899ms
// worst observed = 2899ms → ceil(2899 × 1.5) = 4349ms → 4500ms (rounded)
//
// The +50% headroom (rather than the +25% in the original methodology) reflects
// observed variance on Windows developer machines under realistic background
// load. The bench is opt-in (`npm run bench`, NOT default `npm test`), so this
// trade-off favours stable execution at the cost of slightly weaker regression
// sensitivity. Once Candidate 1's fix lands, the assertion margin will widen
// from ~1.5× to ~100×, providing a much sharper proof-of-fix signal.
// =============================================================================

import { describe, expect, test } from "vitest";
import { compileWorkflow } from "../index.js";
import { makeLinearChainSpec, makeWorkflowContext } from "./_helpers.js";

/**
* Calibrated threshold for compileWorkflow on a 200-step linear chain.
* See header comment for methodology and raw data.
*/
const COMPILE_200_STEP_THRESHOLD_MS = 4500;

describe("perf: compileWorkflow scaling", () => {
test(`compileWorkflow on 200-step linear chain completes <${COMPILE_200_STEP_THRESHOLD_MS}ms`, () => {
const spec = makeLinearChainSpec(200);
const ctx = makeWorkflowContext();

// Warm-up: let v8 optimise the hot path before measurement.
compileWorkflow(spec, ctx);

const t0 = performance.now();
compileWorkflow(spec, ctx);
const elapsed = performance.now() - t0;

expect(elapsed).toBeLessThan(COMPILE_200_STEP_THRESHOLD_MS);
});
});
43 changes: 43 additions & 0 deletions packages/core/__perf__/dag.perf.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
// =============================================================================
// Perf assertion: resolve() scaling on a 1000-step linear chain
// =============================================================================
// Pins the cost of topological sort + level grouping on the worst-case DAG
// shape (linear chain, where depth = N). Catches regressions in the per-pop
// queue sort, neighbour sort, and level-filter loop in `dag.ts`.
// Threshold calibrated against current `main` (5 runs, take worst, +25%).
// =============================================================================

import { describe, expect, test } from "vitest";
import { resolve } from "../index.js";
import { makeLinearChainSteps } from "./_helpers.js";

/**
* Calibrated threshold for resolve() on a 1000-step linear chain.
*
* Calibration methodology: multiple runs on `main` across developer-machine
* sessions with varying background load; take worst observed, multiply by 1.5
* for headroom.
*
* Calibration data captured 2026-05-07 on Node v22.18.0:
* quiet runs: 117ms, 128ms, 143ms, 152ms, 215ms
* loaded runs: 419ms, 484ms
* worst observed = 484ms → ceil(484 × 1.5) = 727ms → 800ms (rounded)
*/
const RESOLVE_1000_STEP_THRESHOLD_MS = 800;

describe("perf: dag.resolve scaling", () => {
test(`resolve(1000-step linear chain) completes <${RESOLVE_1000_STEP_THRESHOLD_MS}ms`, () => {
const steps = makeLinearChainSteps(1000);

// Warm-up.
const warm = resolve(steps);
expect(warm.ok).toBe(true);

const t0 = performance.now();
const r = resolve(steps);
const elapsed = performance.now() - t0;

expect(r.ok).toBe(true);
expect(elapsed).toBeLessThan(RESOLVE_1000_STEP_THRESHOLD_MS);
});
});
50 changes: 50 additions & 0 deletions packages/core/__perf__/expression.perf.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// =============================================================================
// Perf assertion: evaluate() throughput on repeated expressions
// =============================================================================
// Pins the cost of evaluating the same `{{ ... }}` expression 10,000 times
// against varying contexts. Catches regressions in tokenize/parse hot path
// (e.g. accidental disabling of an AST cache once one is added in PR 3).
// Threshold calibrated against current `main` (5 runs, take worst, +25%).
// =============================================================================

import { describe, expect, test } from "vitest";
import { evaluate } from "../index.js";

/**
* Calibrated threshold for 10,000 evaluate() calls on the same template.
*
* Calibration methodology: multiple runs on `main` across developer-machine
* sessions with varying background load; take worst observed, multiply by 1.5
* for headroom. The +50% (rather than the original +25%) reflects observed
* variance on Windows developer machines.
*
* Calibration data captured 2026-05-07 on Node v22.18.0:
* quiet runs: 135ms, 197ms, 234ms, 268ms, 382ms
* loaded runs: 617ms
* worst observed = 617ms → ceil(617 × 1.5) = 926ms → 1000ms (rounded)
*/
const EVAL_10K_THRESHOLD_MS = 1000;

describe("perf: evaluate() throughput", () => {
test(`evaluate same expression 10,000 times <${EVAL_10K_THRESHOLD_MS}ms`, () => {
const tmpl = "{{ output.findings.length > 3 && output.confidence >= 0.6 }}";

// Warm-up: prime the parser path.
for (let i = 0; i < 100; i++) {
evaluate(tmpl, { output: { findings: [], confidence: 0 } });
}

const t0 = performance.now();
for (let i = 0; i < 10_000; i++) {
evaluate(tmpl, {
output: {
findings: new Array(i % 5),
confidence: (i % 100) / 100,
},
});
}
const elapsed = performance.now() - t0;

expect(elapsed).toBeLessThan(EVAL_10K_THRESHOLD_MS);
});
});
21 changes: 21 additions & 0 deletions packages/core/vitest.perf.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
// =============================================================================
// Vitest config for the bench suite (`npm run bench`)
// =============================================================================
// Picks up only `__perf__/**/*.perf.ts`, runs them serially in a single fork
// for stable timings, and bypasses the default `**/*.test.ts` glob so the
// bench suite never runs as part of `npm test`.
// =============================================================================

import { defineConfig } from "vitest/config";

export default defineConfig({
test: {
include: ["**/__perf__/**/*.perf.ts"],
// One fork, serialised, to minimise cross-test interference on timings.
// (vitest 4 moved pool sub-options to top level; `pool: "forks"` plus
// per-file warm-up is sufficient for stable timings here.)
pool: "forks",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Benchmark config claims single-worker/serialized execution, but pool: "forks" alone does not prevent parallel workers, so perf timings can still be noisy.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/vitest.perf.config.ts, line 17:

<comment>Benchmark config claims single-worker/serialized execution, but `pool: "forks"` alone does not prevent parallel workers, so perf timings can still be noisy.</comment>

<file context>
@@ -0,0 +1,21 @@
+		// One fork, serialised, to minimise cross-test interference on timings.
+		// (vitest 4 moved pool sub-options to top level; `pool: "forks"` plus
+		// per-file warm-up is sufficient for stable timings here.)
+		pool: "forks",
+		// 60s ceiling — well above any realistic threshold; only fires on hangs.
+		testTimeout: 60_000,
</file context>

// 60s ceiling — well above any realistic threshold; only fires on hangs.
testTimeout: 60_000,
},
});
Comment on lines +11 to +21
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's find the vitest version in the project
find . -name "package.json" -type f | head -5 | xargs grep -l "vitest" | head -1 | xargs cat | grep -A 2 -B 2 "vitest"

Repository: SingularityAI-Dev/logic-md

Length of output: 262


🏁 Script executed:

# Let's check the specific file mentioned in the review
cat -n packages/core/vitest.perf.config.ts

Repository: SingularityAI-Dev/logic-md

Length of output: 1179


🏁 Script executed:

# Search for any other vitest config files to understand the overall setup
find . -name "*vitest*" -type f | grep -v node_modules

Repository: SingularityAI-Dev/logic-md

Length of output: 262


🏁 Script executed:

# Let's check other vitest config files to see if maxWorkers is used elsewhere
cat packages/core/vitest.config.ts

Repository: SingularityAI-Dev/logic-md

Length of output: 462


🏁 Script executed:

# Check the main vitest config
cat vitest.config.ts

Repository: SingularityAI-Dev/logic-md

Length of output: 575


🏁 Script executed:

# Search for any references to singleFork, maxWorkers, or pool in the codebase
rg "(singleFork|maxWorkers|pool)" --type ts --type js

Repository: SingularityAI-Dev/logic-md

Length of output: 232


🌐 Web query:

vitest 4 pool forks maxWorkers configuration serialization

💡 Result:

In Vitest 4 (as of 2026), the default test execution pool is 'forks', which uses Node.js child_process.fork for running tests in isolated processes.[1][2] The maximum number of concurrent workers (including forks) is controlled by the top-level test.maxWorkers option, which accepts a number or percentage string (e.g., 4 or '50%') and defaults to all available parallelism (watch mode: half).[3] Finer control over the forks pool, such as the number of forks, is available via test.poolOptions.forks. Historical documentation shows options like minForks and maxForks,[4][5] alongside isolate and singleFork.[6][7] Official docs do not explicitly list all poolOptions.forks keys but confirm their use for pool-specific settings like isolate: false.[1][6] For IPC communication in the forks pool, Vitest uses child_process.fork with stdio: 'pipe' and serialization: 'advanced'. This was explicitly added in a recent refactor to simplify IPC channel serialization, replacing prior v8.serialize wrapping for config objects.[8] The fork options also support execArgv for custom Node flags.[9] Example configuration: import { defineConfig } from 'vitest/config' export default defineConfig({ test: { pool: 'forks', // default maxWorkers: '50%', poolOptions: { forks: { isolate: true, // default // maxForks?: number (from historical docs) execArgv: ['--max-old-space-size=4096'] } } } }) [4][5][8] Note: Official docs prioritize maxWorkers for concurrency; poolOptions.forks provides pool-specific tuning but lacks full current documentation. For custom pools, implement deserialize in PoolWorker.[10]

Citations:


🌐 Web query:

vitest singleFork deprecated vitest 4 maxWorkers

💡 Result:

In Vitest 4 (as of 2026-05-07), the singleFork option under poolOptions.forks is not deprecated. It remains a valid configuration for running all tests in a single forked child process when using the default pool: 'forks' [1][2][3][4]. Official migration guides for Vitest 4 do not list it as deprecated [1][5][6]. Recent discussions and code examples from 2025 continue to use it [2]. The migration guide shows singleFork: true alongside maxWorkers: 1 as an example configuration [1].

Citations:


🌐 Web query:

vitest 4.1.0 pool forks default workers concurrent

💡 Result:

In Vitest 4.1.0, the default pool is 'forks', which uses child_process.fork to run test files in separate processes.[1][2][3] This has been the default since around Vitest 2.x for better compatibility with native modules.[4][5][3] The number of concurrent workers (forks/processes) is controlled by maxWorkers, which defaults to all available parallelism (os.availableParallelism) if not in watch mode, or half in watch mode.[6][7] poolOptions.forks.maxForks and minForks can override this, but defaults are not explicitly documented beyond examples showing maxForks: 2 (not authoritative).[8] For concurrency within a worker (e.g., test.concurrent), maxConcurrency defaults to 5.[9][10] File parallelism is enabled by default (fileParallelism: true).[11]

Citations:


🌐 Web query:

vitest 4 poolOptions.forks singleFork configuration example

💡 Result:

In Vitest 4 (and current versions), configure poolOptions.forks.singleFork to true to run all tests in a single fork process instead of multiple parallel forks. This is useful for libraries with native dependencies (e.g., Prisma, bcrypt) or when process-related APIs like process.chdir() are needed, as forks support them unlike threads [1][2]. Configuration example in vitest.config.ts (or .js): import { defineConfig } from 'vitest/config' export default defineConfig({ test: { pool: 'forks', // Optional if default; explicitly set for clarity [1][2] poolOptions: { forks: { singleFork: true // Runs tests in single child_process.fork [2][5][9][10] } } } }) CLI equivalent: vitest --pool forks --poolOptions.forks.singleFork=true [3][4]. This replaces older --no-threads behavior, which now maps to --pool forks (multi-forks) or with singleFork: true for single process [3][4]. Official docs confirm forks pool uses child_process with configurable options via poolOptions.forks [1][2]. For workspaces, set per-project [5][6].

Citations:


Add poolOptions.forks.singleFork: true to enforce single-fork serialization.

The config currently relies on pool: "forks" alone, which does not enforce serialization. By default, vitest 4 uses maxWorkers to run multiple concurrent forks, allowing the three .perf.ts files to execute in parallel and introduce CPU contention that inflates timing measurements.

In vitest 4, singleFork: true under poolOptions.forks is the correct mechanism to run all tests in a single forked process:

🔧 Proposed fix
 export default defineConfig({
 	test: {
 		include: ["**/__perf__/**/*.perf.ts"],
-		// One fork, serialised, to minimise cross-test interference on timings.
-		// (vitest 4 moved pool sub-options to top level; `pool: "forks"` plus
-		// per-file warm-up is sufficient for stable timings here.)
 		pool: "forks",
+		poolOptions: {
+			forks: {
+				singleFork: true, // Run all tests in a single fork for stable timings
+			},
+		},
 		// 60s ceiling — well above any realistic threshold; only fires on hangs.
 		testTimeout: 60_000,
 	},
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export default defineConfig({
test: {
include: ["**/__perf__/**/*.perf.ts"],
// One fork, serialised, to minimise cross-test interference on timings.
// (vitest 4 moved pool sub-options to top level; `pool: "forks"` plus
// per-file warm-up is sufficient for stable timings here.)
pool: "forks",
// 60s ceiling — well above any realistic threshold; only fires on hangs.
testTimeout: 60_000,
},
});
export default defineConfig({
test: {
include: ["**/__perf__/**/*.perf.ts"],
pool: "forks",
poolOptions: {
forks: {
singleFork: true, // Run all tests in a single fork for stable timings
},
},
// 60s ceiling — well above any realistic threshold; only fires on hangs.
testTimeout: 60_000,
},
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/vitest.perf.config.ts` around lines 11 - 21, The test pool
currently sets pool: "forks" but doesn't enforce single-fork serialization;
update the Vitest config created by defineConfig so that the test section
includes poolOptions with forks.singleFork set to true (i.e., add poolOptions: {
forks: { singleFork: true } }) so all "**/__perf__/**/*.perf.ts" tests run in
one fork and avoid parallel CPU contention; ensure you modify the object under
test in the exported default config where pool is set.