Skip to content

Enhance Trainer API with HMR, early-stop, and cleanup improvements#101

Open
k-taro56 wants to merge 79 commits into
mainfrom
eng-615
Open

Enhance Trainer API with HMR, early-stop, and cleanup improvements#101
k-taro56 wants to merge 79 commits into
mainfrom
eng-615

Conversation

@k-taro56

@k-taro56 k-taro56 commented May 2, 2026

Copy link
Copy Markdown
Contributor

This pull request migrates the Arkor build and development workflow from esbuild to Rolldown, introduces a robust hot module replacement (HMR) system for the Studio dev server, and refines the cleanup and shutdown logic for long-lived resources. Documentation has been updated to explain the new HMR behavior in both English and Japanese, and dependency management has been adjusted accordingly. These changes streamline the developer experience, enable live code updates during training runs, and improve resource handling during shutdown.

Build system migration and HMR integration:

  • Migrated the build system from esbuild to Rolldown in arkor build, ensuring bundles target the current Node version and keeping external dependencies unbundled. (packages/arkor/package.json, packages/arkor/src/cli/commands/build.ts, packages/arkor/src/core/rolldownConfig) [1] [2] [3]
  • Added a persistent Rolldown watcher for src/arkor/ in dev mode, enabling HMR for trainer code. The watcher notifies the SPA via SSE (/api/dev/events), and the dev server can hot-swap callbacks or gracefully early-stop training runs based on config changes. (packages/arkor/src/cli/commands/dev.ts, packages/arkor/src/studio/hmr.ts, packages/arkor/src/studio/trainRegistry.ts) [1] [2] [3] [4] [5] [6]

Cleanup and shutdown improvements:

  • Introduced a reusable registerCleanupHook utility to handle resource cleanup on process exit and signals, ensuring proper teardown order (e.g., HMR watcher before token file removal). (packages/arkor/src/cli/cleanupHooks.ts, packages/arkor/src/cli/commands/dev.ts) [1] [2]

Documentation updates:

  • Updated English and Japanese docs to describe the new HMR workflow, explaining how live code changes are handled during training and how the system differentiates between callback-only and config changes. (docs/concepts/studio.mdx, docs/ja/concepts/studio.mdx, AGENTS.md) [1] [2] [3]
  • Clarified the use of Rolldown in the build process and updated package policy documentation accordingly. (AGENTS.md)

Dependency and test adjustments:

  • Removed esbuild from dependencies and added Rolldown. Updated test comments to reference Rolldown instead of esbuild. (packages/arkor/package.json, packages/arkor/src/cli/commands/start.test.ts) [1] [2]

Security and internal API notes:

  • Documented the security-sensitive allow-list for SSE token query parameters and emphasized that HMR/early-stop internals remain behind Symbol.for brands, not exposed on the public SDK surface. (AGENTS.md) [1] [2]

Summary by CodeRabbit

  • New Features
    • Studio dev-loop HMR with SSE rebuild events, callback hot-swap, and automatic in-progress training restart/hot-swap; build tooling migrated to Rolldown.
    • Manifest payloads now include a configuration hash, and Studio exposes an HMR-enabled flag to the app.
  • Bug Fixes / Reliability
    • More robust signal shutdown/cleanup coordination and safer, concurrent token/CSRF handling.
    • Deterministic config hashing to drive HMR decisions; improved cache-busting for dynamic imports.
  • Documentation
    • Updated Studio security and dev-loop workflow docs (including Japanese).
  • Tests
    • Expanded unit and E2E coverage for HMR, SSE routing/token defenses, runner shutdown, trainer inspection, and CLI/server flows.

k-taro56 added 7 commits May 2, 2026 21:27
- Integrated Rolldown for hot module replacement (HMR) in `arkor dev`, allowing real-time updates to the training interface without page refresh.
- Implemented `requestEarlyStop` and `replaceCallbacks` methods in the Trainer API to facilitate graceful stopping of training jobs and dynamic callback updates during execution.
- Updated documentation to reflect new features and usage patterns for improved developer guidance.
- Adjusted cleanup logic for better resource management during development sessions.
…l shutdown

- Introduced `replaceCallbacks` method in the Trainer API to allow dynamic updates of lifecycle callbacks during training runs.
- Enhanced signal handling for graceful early stopping, ensuring in-flight checkpoints are preserved during HMR rebuilds.
- Added `registerCleanupHook` for streamlined resource management on process exit, improving cleanup logic across development commands.
- Updated documentation to reflect new features and usage patterns for better developer guidance.
…internal callback swapping

- Removed the public `replaceCallbacks` method from the Trainer interface to prevent exposure of the hot-swapping functionality.
- Introduced an internal mechanism for callback swapping using a `Symbol.for`-keyed brand, allowing for dynamic updates during training runs without affecting the public API.
- Updated signal handling to ensure seamless integration with the new callback swapping logic, enhancing the hot module replacement (HMR) experience.
- Revised documentation to reflect changes in the Trainer API and clarify the internal callback management process.
…lement internal early-stop handling

- Removed the public `requestEarlyStop` method from the Trainer interface to prevent exposure of the early-stop functionality.
- Introduced an internal mechanism for early stopping using a `Symbol.for`-keyed brand, allowing for graceful stopping after the next checkpoint without affecting the public API.
- Updated signal handling to ensure seamless integration with the new early-stop logic, enhancing the hot module replacement (HMR) experience.
- Revised documentation to reflect changes in the Trainer API and clarify the internal early-stop management process.
…llback replacement

- Updated the Trainer API to remove public exposure of `requestEarlyStop` and `replaceCallbacks` methods, enhancing encapsulation.
- Implemented internal mechanisms for early stopping and callback swapping using `Symbol.for`-keyed brands, ensuring seamless integration during training runs.
- Revised signal handling to improve the hot module replacement (HMR) experience and maintain clean resource management.
- Updated documentation to reflect these changes and clarify the internal management processes for developers.
- Replaced esbuild with Rolldown for building the project, ensuring external dependencies are resolved correctly.
- Implemented a cleanup hook system to manage resource disposal on process termination signals.
- Enhanced the development experience with hot module replacement (HMR) capabilities, allowing for seamless updates during training runs.
- Updated documentation to reflect changes in the development loop and HMR behavior.
- Added tests for new signal handling and configuration hashing functionality.
Copilot AI review requested due to automatic review settings May 2, 2026 17:02
@mintlify

mintlify Bot commented May 2, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
arkor-92aeef0e 🟢 Ready View Preview May 2, 2026, 5:03 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Comment thread packages/arkor/src/studio/server.ts Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8abc594369

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/arkor/src/studio/server.ts Outdated
Comment on lines +421 to +423
const hotSwapTargets = activeTrains.notifyCallbackReload(nextHash);
const restartTargets =
activeTrains.requestEarlyStopOnMismatch(nextHash);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Dispatch train signals only once per rebuild

notifyCallbackReload/requestEarlyStopOnMismatch run inside each /api/dev/events subscriber callback, so one rebuild sends signals once per connected SSE client (and again on reconnect when createHmrCoordinator.subscribe replays lastEvent). In the common case of two open Studio tabs, the same child can receive two SIGTERMs for one config-changing rebuild; the second signal triggers the runner’s forced-exit path (exit(143)) instead of checkpoint-preserving early stop. Move signal dispatch out of per-subscriber delivery (or dedupe per rebuild hash) so each child is signaled at most once per rebuild event.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the local Studio dev experience by migrating the trainer bundling pipeline from esbuild to Rolldown, adding an HMR/SSE channel to propagate rebuild events to the Studio SPA, and introducing signal-based “hot-swap callbacks vs early-stop + restart” behavior for in-flight training subprocesses.

Changes:

  • Replace esbuild-based trainer bundling with Rolldown (one-shot build + watch/HMR).
  • Add /api/dev/events SSE stream + SPA client wiring to refresh manifest and coordinate restarts/hot-swaps.
  • Introduce internal trainer inspection/callback replacement/early-stop hooks (via Symbol.for brands), plus cleanup utilities and updated docs/tests.

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pnpm-lock.yaml Removes esbuild and adds rolldown to the lockfile graph.
packages/arkor/package.json Swaps dependency from esbuild to rolldown.
packages/arkor/src/core/rolldownConfig.ts Centralizes Rolldown input options + entry/outDir resolution and Node target derivation.
packages/arkor/src/cli/commands/build.ts Implements arkor build via Rolldown bundling instead of esbuild.
packages/arkor/src/studio/hmr.ts Adds lazy Rolldown watcher coordinator that emits ready/rebuild/error events.
packages/arkor/src/studio/hmr.test.ts Tests HMR coordinator behavior (ready/rebuild/error/replay/dispose).
packages/arkor/src/studio/manifest.ts Adds configHash to manifest summary and splits “summarise built manifest” vs “build + summarise”.
packages/arkor/src/studio/trainRegistry.ts Adds per-child registry + policy for SIGUSR2 hot-swap vs SIGTERM early-stop.
packages/arkor/src/studio/trainRegistry.test.ts Tests TrainRegistry signaling decisions and error tolerance.
packages/arkor/src/studio/server.ts Adds /api/dev/events SSE route and integrates TrainRegistry into rebuild handling.
packages/arkor/src/studio/server.test.ts Adds tests for /api/dev/events token rules, loopback guard, and SSE framing.
packages/arkor/src/core/configHash.ts Adds stable hashing of JobConfig for “callbacks-only vs restart” decisions.
packages/arkor/src/core/configHash.test.ts Tests determinism and sensitivity of hashJobConfig.
packages/arkor/src/core/trainerInspection.ts Introduces internal inspection + callback replacement + early-stop brands via Symbol.for.
packages/arkor/src/core/trainer.ts Implements callback hot-swap support + early-stop latch; attaches internal brands.
packages/arkor/src/core/trainer.test.ts Adds coverage for early-stop and callback hot-swap behavior via internal brands.
packages/arkor/src/core/runnerSignals.ts Adds signal handlers: SIGTERM graceful early-stop + SIGUSR2 callback reload.
packages/arkor/src/core/runnerSignals.test.ts Tests SIGTERM/SIGUSR2 handler behaviors with branded trainers.
packages/arkor/src/core/runner.ts Installs/removes new signal handlers around trainer.start()/trainer.wait().
packages/arkor/src/core/runner.test.ts Adds a test for SIGTERM early-stop behavior in the runner.
packages/arkor/src/cli/cleanupHooks.ts Adds reusable cleanup-hook registration for exit/signals.
packages/arkor/src/cli/commands/dev.ts Wires HMR coordinator into Studio server and refactors shutdown cleanup via cleanup hooks.
packages/arkor/src/cli/commands/start.test.ts Updates test comment to refer to Rolldown instead of esbuild.
packages/studio-app/src/lib/api.ts Adds typed DevEvent + openDevEvents() for SSE notifications.
packages/studio-app/src/components/RunTraining.tsx SPA listens to dev SSE events, refreshes manifest, and coordinates restart/hot-swap UI state.
docs/concepts/studio.mdx Documents HMR workflow, including hash-based hot-swap vs restart behavior (EN).
docs/ja/concepts/studio.mdx Documents HMR workflow (JA).
AGENTS.md Updates repository policy/docs for new SSE allow-list, HMR internals, and Rolldown build pipeline.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const [log, setLog] = useState("");
const [manifest, setManifest] = useState<ManifestResult | null>(null);
const [hmrStatus, setHmrStatus] = useState<
"idle" | "rebuilding" | "early-stopping" | "restarting" | "hot-swapped"
Comment thread packages/arkor/src/studio/server.ts Outdated
import { ensureProjectState } from "../core/projectState";
import { readState } from "../core/state";
import { readManifestSummary } from "./manifest";
import { readManifestSummary, summariseBuiltManifest } from "./manifest";
Comment on lines +190 to 196
function scheduleHmrCleanup(hmr: { dispose: () => Promise<void> }): void {
// Registered before the studio-token cleanup so it runs first on
// shutdown — Node fires signal handlers in registration order, and we
// want the watcher to release file handles before the outermost
// process.exit.
registerCleanupHook({ cleanup: () => hmr.dispose() });
}
Comment on lines +97 to +113
requestEarlyStopOnMismatch(
nextConfigHash: string | null,
): RestartTarget[] {
const targets: RestartTarget[] = [];
for (const [pid, entry] of this.entries) {
if (
nextConfigHash === null ||
entry.configHash === null ||
entry.configHash !== nextConfigHash
) {
try {
entry.child.kill("SIGTERM");
} catch {
// child already exited; close handler will clean up.
}
targets.push({ pid, trainFile: entry.trainFile });
}
Comment thread packages/arkor/src/core/trainer.ts Outdated
Comment on lines +275 to +284
await callbacks.onCheckpoint?.(ctx);
// Early-stop latch: a checkpoint just landed, so the in-flight work
// is durable. Cancel the cloud job and end `wait()` cleanly.
if (earlyStopRequested && earlyStopDeferred) {
await trainer.cancel();
if (earlyStopDeferred.timer) clearTimeout(earlyStopDeferred.timer);
earlyStopDeferred.resolve();
earlyStopDeferred = null;
return { terminal: true, artifacts: terminalResult?.artifacts ?? [] };
}
Comment on lines +129 to +137
function startWatcher(): void {
if (watcher || disposed) return;
if (!existsSync(resolved.entry)) {
broadcast({
type: "error",
message: `Build entry not found: ${resolved.entry}. Create ${BUILD_DEFAULTS.entry} or pass an explicit entry argument.`,
});
return;
}
@k-taro56 k-taro56 self-assigned this May 3, 2026
…ous BUNDLE_END on macOS

- Modified the test to assert that the late subscriber receives the same event as the prior subscriber, addressing issues with spurious BUNDLE_END events in rolldown@1.0.0-rc.17 on macOS.
- Added comments to clarify the reasoning behind the change and noted a future task to revisit the assertion after rolldown stabilizes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 31f510869c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +71 to +72
restartPendingRef.current = true;
setHmrStatus(runningRef.current ? "early-stopping" : "idle");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Ignore restart hints when no local run is active

restartPendingRef is set for every payload.restart event even when this tab is not running a training stream. Since /api/dev/events broadcasts restart events to all connected tabs, a passive tab can cache restartPendingRef=true from another tab’s run; the next time the user runs training in this tab, finally will auto-trigger an extra run, creating an unintended duplicate job and extra cloud/GPU spend. Only mark restart pending when runningRef.current is true (or otherwise correlate the restart target to the local run).

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +109 to +112
} catch {
// child already exited; close handler will clean up.
}
targets.push({ pid, trainFile: entry.trainFile });
Comment thread packages/arkor/src/studio/server.ts Outdated
import { ensureProjectState } from "../core/projectState";
import { readState } from "../core/state";
import { readManifestSummary } from "./manifest";
import { readManifestSummary, summariseBuiltManifest } from "./manifest";
const [log, setLog] = useState("");
const [manifest, setManifest] = useState<ManifestResult | null>(null);
const [hmrStatus, setHmrStatus] = useState<
"idle" | "rebuilding" | "early-stopping" | "restarting" | "hot-swapped"
Comment on lines +276 to 285
// Early-stop latch: a checkpoint just landed, so the in-flight work
// is durable. Cancel the cloud job and end `wait()` cleanly.
if (earlyStopRequested && earlyStopDeferred) {
await trainer.cancel();
if (earlyStopDeferred.timer) clearTimeout(earlyStopDeferred.timer);
earlyStopDeferred.resolve();
earlyStopDeferred = null;
return { terminal: true, artifacts: terminalResult?.artifacts ?? [] };
}
return { terminal: false, artifacts: terminalResult?.artifacts ?? [] };
…ce failures

- Added a test to verify that the SIGINT exit handler is armed even if the `persistStudioToken` function fails, preventing the dev server from idling indefinitely.
- Updated the `runDev` function to register the studio-token cleanup hook unconditionally, ensuring it executes on process termination signals regardless of token persistence success.
- Enhanced cleanup logic to maintain proper signal handling during development sessions, improving overall stability and user experience.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55b4a2e8f6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +66 to +73
if (payload.restart) {
// `/api/dev/events` is a broadcast — every open Studio tab gets
// this event. Only flip the auto-restart latch when *this* tab
// is actually running a stream right now; otherwise a passive
// tab would silently auto-spawn an extra job the next time the
// user clicks Run training, doubling cloud spend.
if (runningRef.current) {
// Training run is early-stopping; the active stream will

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope restart latch to runs actually signaled for restart

The handler sets restartPendingRef for any payload.restart while this tab is running, but rebuild events are broadcast across all tabs and can contain mixed outcomes (restartTargets and hotSwapTargets) for different subprocesses. In a multi-run setup, a tab whose run was only hot-swapped can still latch restartPendingRef=true, then auto-spawn an extra training job when its current run eventually finishes, increasing unintended cloud/GPU usage.

Useful? React with 👍 / 👎.

Comment on lines +135 to +136
entry.child.kill("SIGTERM");
entry.earlyStopRequested = true;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check signal delivery before marking restart targets

ChildProcess.kill() returns a boolean and does not throw when a process is already gone, but this path treats any non-throw as success. If kill("SIGTERM") returns false during an exit race, the entry is still marked earlyStopRequested and added to targets, so the UI may wait for/restart a run that was never actually signaled.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +46
useEffect(() => {
const es = openDevEvents();
const onMessage = (raw: MessageEvent) => {
Comment on lines +86 to +90
setHmrStatus("hot-swapped");
window.setTimeout(() => {
setHmrStatus((s) => (s === "hot-swapped" ? "idle" : s));
}, 1500);
} else {
Comment thread packages/arkor/src/studio/server.ts Outdated
},
cancel() {
activeTrains.unregister(child.pid);
child.kill();
Comment thread packages/arkor/src/cli/cleanupHooks.ts Outdated
Comment on lines +34 to +59
// Synchronous wrapper so signal handlers preserve "cleanup landed
// before this function returns" — important for sync cleanups (e.g.
// `unlinkSync`) and for tests that assert the side effect right after
// invoking the handler. Async cleanups are fire-and-forget with a
// catch so a hung dispose doesn't block exit.
const run = (): void => {
if (done) return;
done = true;
try {
const result = options.cleanup();
if (result && typeof (result as Promise<void>).catch === "function") {
(result as Promise<void>).catch(() => {
// best-effort: shutdown is racing other cleanup paths
});
}
} catch {
// best-effort
}
};

process.on("exit", run);

for (const sig of TERMINATING_SIGNALS) {
process.on(sig, () => {
run();
if (options.exitOnSignal) process.exit(0);

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa6a826f27

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/arkor/src/studio/server.ts Outdated
Comment on lines +496 to +498
const configHash: string | null = options.hmr
? options.hmr.getCurrentConfigHash()
: null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route explicit train files with their own HMR hash

When /api/train is called with body.file, the spawned arkor start <file> rebuilds that explicit entry into .arkor/build/index.mjs, but this snapshot still records the HMR coordinator's current hash for the default src/arkor/index.ts. In HMR-enabled Studio, a run launched from an alternate file can therefore be classified for SIGUSR2/restart based on a different artifact, and the default watcher may overwrite the same index.mjs that the child reloads from; e.g. editing only the default entry can hot-swap callbacks into an alternate-entry run from the wrong bundle. Treat explicit files as unhashable/restart-only or build/watch a separate artifact for that entry.

Useful? React with 👍 / 👎.

Comment on lines +189 to +193
try {
await trainer.cancel();
} catch {
// intentionally ignored; see comment above.
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate fallback cancel failures

For hand-rolled trainers that the runner accepts but that lack the internal early-stop brand, a SIGTERM/HMR stop falls back to trainer.cancel() here. If that cancel rejects (for example, a transient cloud/network failure), this catch resolves requestTrainerEarlyStop() successfully, so installShutdownHandlers exits with code 0 even though the remote job may still be running; this reintroduces the orphaned-job false-success path for the unbranded trainer shape.

Useful? React with 👍 / 👎.

Comment on lines +382 to +384
const manifest = await readManifestSummary(trainCwd, {
prebuiltOutFile: hmrOutFile,
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid serving stale artifacts before HMR is ready

On HMR-enabled startup, this always passes the prebuilt path to readManifestSummary, which imports it whenever .arkor/build/index.mjs already exists. If that file is left over from a previous build and the watcher has not emitted its first ready yet, the first /api/manifest poll can show an old trainer/config; because arkor start also skips rebuilding when the artifact exists, a quick Run click can launch the stale artifact until the watcher catches up. Gate this fast path on the coordinator having a successful current build, not just file existence.

Useful? React with 👍 / 👎.

Comment on lines +376 to +379
if (process.platform !== "win32") {
const r = safeKill(entry.child, "SIGUSR2");
if (r === "ok") {
hotSwapTargets.push(target);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Delay hot-swap signals until the child is ready

A same-hash rebuild can hit this branch immediately after /api/train registers a newly spawned child, before that child has imported runner.ts and installed its SIGUSR2 callback-reload handler. In that startup window on Unix, child.kill("SIGUSR2") can terminate the Node process with the default signal action while the server reports it as a hot-swap target, so the SPA does not schedule a restart and the user sees the run die instead of continue. Add a child-ready marker before allowing SIGUSR2, or route pre-ready children through restart.

Useful? React with 👍 / 👎.

Comment thread packages/arkor/package.json Outdated
Comment on lines +66 to +70
"@types/node": "^24",
"@vitest/coverage-v8": "^4.1.5",
"tsdown": "^0.22.0",
"typescript": "^5",
"vitest": "^4.1.5"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep published devDependencies catalog-bound

The repository's AGENTS.md dependency-catalog rule says the published packages' runtime dependencies are the only literal-semver carve-out and that their devDependencies stay catalog:-bound. Replacing these with literals here (and the same pattern in packages/create-arkor/package.json) breaks the single-source-of-truth catalog policy, so future bumps of TypeScript/Vitest/tsdown can drift between the published packages and the rest of the workspace instead of being one-line catalog updates.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- Introduced a NO_OP_DISPOSE function for consistent signal registration failure handling.
- Enhanced shutdown signal handling to ensure accurate exit codes based on received signals.
- Updated requestTrainerEarlyStop to handle errors more gracefully and ensure proper exit codes.
- Improved callback reload handler to avoid crashing on unsupported platforms.
- Refactored trainer test cases to utilize Response.json for consistency and clarity.
- Updated various promises to use void for better readability and to avoid unhandled promise rejections.
- Enhanced type safety and clarity in several areas, including the use of readonly arrays and improved error handling.
- Cleaned up unnecessary comments and improved code readability across multiple files.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@coderabbitai

coderabbitai Bot commented May 31, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

Adds Rolldown-based Studio HMR across build, runtime, server, CLI, and frontend paths. It introduces deterministic config hashing, signal-based early-stop and callback hot-swap, SSE-driven rebuild events, train restart routing, updated dev tooling, and expanded tests/docs.

Changes

Studio HMR End-to-End Flow

Layer / File(s) Summary
Build and shared runtime contracts
packages/arkor/package.json, pnpm-workspace.yaml, packages/arkor/src/core/rolldownConfig.ts, packages/arkor/src/core/moduleCacheBust.ts, packages/arkor/src/core/configHash.ts, packages/arkor/src/core/signalExit.ts, packages/arkor/src/cli/commands/build.ts
Replaces esbuild with Rolldown, adds shared build-path helpers, cache-busting, deterministic JobConfig hashing, and shared signal exit codes, with tests for hashing and cache-bust behavior.
Trainer lifecycle and signal handling
packages/arkor/src/cli/cleanupHooks.ts, packages/arkor/src/cli/cleanupHooks.test.ts, packages/arkor/src/core/runnerSignals.ts, packages/arkor/src/core/runnerSignals.test.ts, packages/arkor/src/core/trainerInspection.ts, packages/arkor/src/core/trainerInspection.test.ts, packages/arkor/src/core/trainer.ts, packages/arkor/src/core/trainer.test.ts, packages/arkor/src/core/runner.ts, packages/arkor/src/core/runner.test.ts, eslint.config.ts
Adds cleanup-hook coordination, two-stage shutdown and callback reload handlers, trainer inspection and callback replacement hooks, runner nonce output handling, and trainer early-stop internals with supporting tests and lint exceptions.
HMR coordinator and train registry
packages/arkor/src/studio/hmr.ts, packages/arkor/src/studio/hmr.test.ts, packages/arkor/src/studio/trainRegistry.ts, packages/arkor/src/studio/trainRegistry.test.ts, packages/arkor/src/studio/manifest.ts
Implements HMR event/state flow, bundle inspection and manifest summarization, and train registry rebuild dispatch policy with platform-specific signaling and supporting tests.
Studio server HMR routes and train orchestration
packages/arkor/src/studio/server.ts, packages/arkor/src/studio/server.test.ts
Extends Studio server options and meta injection, tightens stream token allow-listing, rewrites /api/train and /api/dev/events orchestration with train registry integration, and adds comprehensive server tests.
CLI dev wiring and token cleanup
packages/arkor/src/cli/commands/dev.ts, packages/arkor/src/cli/commands/dev.test.ts, packages/arkor/src/cli/commands/start.test.ts
Integrates the HMR coordinator into runDev, replaces manual token cleanup with cleanup-hook registration and timing-safe token identity checks, and updates CLI tests around async cleanup and token-file concurrency.
Frontend HMR support
packages/studio-app/src/lib/api.ts, packages/studio-app/src/lib/api.test.ts, packages/studio-app/src/components/RunTraining.tsx, packages/studio-app/vite.config.ts, packages/studio-app/src/components/RunTraining.hmr.test.tsx
Adds HMR detection and dev-event streaming APIs, propagates spawned PID from /api/train, and implements RunTraining restart and hot-swap state coordination with supporting tests and Vite dev meta injection.
Documentation and end-to-end validation
AGENTS.md, docs/concepts/studio.mdx, docs/ja/concepts/studio.mdx, e2e/cli/vitest.config.ts, e2e/studio/src/specs/hmr.spec.ts, e2e/cli/src/arkor-build.test.ts, package.json
Updates Studio security and HMR docs, refreshes dev-loop narratives, expands end-to-end HMR validation, and clarifies the related test harness timeout comment.

Estimated code review effort: 4 (Complex) | ~60 minutes

Suggested reviewers: soleil-colza

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.02% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title matches the main themes of the PR: HMR, early-stop handling, and cleanup improvements for the trainer workflow.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch eng-615
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch eng-615
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch eng-615

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4030b9c43

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// Clearing here makes the user's broken-state edit the
// source of truth: no auto-restart fires until the next
// successful rebuild re-arms the latch.
restartPendingRef.current = false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve restart after a fixed HMR error

When an HMR error arrives after a prior rebuild already put this child into the registry's earlyStopRequested state, clearing restartPendingRef here loses the only restart signal for that child. The server's rebuild dispatch skips entries with earlyStopRequested, so if the user fixes the compile error before the early-stop child exits, the successful rebuild will not include this pid in restartTargets and the SPA cannot re-arm the latch; the child then exits and stays idle instead of restarting with the fixed bundle.

Useful? React with 👍 / 👎.

Comment on lines +221 to +222
void fetchManifest()
.then(setManifest)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore stale manifest fetches after HMR errors

Fresh evidence beyond the server-side /api/manifest guard: a ready/rebuild event can start this async fetchManifest(), then a newer HMR error event can set the manifest to { error } before this request resolves. Because the promise is not versioned or cancelled, its later setManifest can overwrite the error with the last-good manifest, re-enabling the Run button for stale code until a later poll restores the error state.

Useful? React with 👍 / 👎.

Comment thread packages/arkor/src/studio/hmr.ts Outdated
Comment on lines +218 to +231
async function inspectBundle(outFile: string): Promise<InspectionResult> {
try {
const mod = (await import(moduleCacheBustUrl(outFile))) as Record<
string,
unknown
>;
const inspection = findInspectableTrainer(mod);
if (!inspection) return null;
return {
configHash: hashJobConfig(inspection.config),
trainerName: inspection.name,
};
} catch {
return null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate inspection failures to HMR error state

If a rebuilt module throws during evaluation (for example top-level user code throws), Rolldown still emits BUNDLE_END, but this catch converts the failed import into configHash: null and a successful ready/rebuild event. That means /api/manifest will not stay in the HMR error state and the registry may SIGTERM-restart active runs against an artifact that arkor start cannot import, leaving the SPA to churn into a failed restart instead of surfacing the build/import failure as an error.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/arkor/src/core/trainerInspection.test.ts (1)

91-121: ⚡ Quick win

Add the missing default: trainer inspection case.

This block calls out shape #4 in the regression note, but it never asserts findInspectableTrainer({ default: trainer }). A direct export default createTrainer(...) path can regress without failing this suite.

Proposed test addition
   const trainerC = brandedTrainer("default-arkor");
   const inspectionC = findInspectableTrainer({
     default: createArkor({ trainer: trainerC }),
   });
   expect(inspectionC?.name).toBe("default-arkor");

-  const trainerD = brandedTrainer("default-nested");
+  const trainerD = brandedTrainer("default-direct");
+  const inspectionD = findInspectableTrainer({ default: trainerD });
+  expect(inspectionD?.name).toBe("default-direct");
+
+  const trainerE = brandedTrainer("default-nested");
+  const inspectionE = findInspectableTrainer({
+    default: { trainer: trainerE },
+  });
+  expect(inspectionE?.name).toBe("default-nested");
-  const inspectionD = findInspectableTrainer({
-    default: { trainer: trainerD },
-  });
-  expect(inspectionD?.name).toBe("default-nested");
As per coding guidelines, "Add vitest test cases under `packages/*/src/**/*.test.ts` for SDK/CLI/scaffold logic changes."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/core/trainerInspection.test.ts` around lines 91 - 121, Add
a new assertion to cover the missing "default: trainer" shape by calling
findInspectableTrainer with an object whose default property is the bare
trainer; specifically create a trainer via brandedTrainer (e.g.,
brandedTrainer("default-bare")) and assert that findInspectableTrainer({
default: trainer }) returns the expected inspection (check .name), similar to
the other cases in the test that use findInspectableTrainer, brandedTrainer, and
createArkor.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/arkor/src/core/configHash.ts`:
- Around line 64-85: stableStringifyRec currently calls a value's toJSON before
adding the object to the seen set and uses Array.prototype.map which skips
sparse array holes; change it so that when maybeToJSON is a function you add the
current value to seen before invoking (so circular returns from toJSON are
detected like JSON.stringify), then call (maybeToJSON as (key: string) =>
unknown).call(value, key) and recurse on that result; for arrays, stop using
value.map and instead iterate numeric indices 0..value.length-1, treat missing
slots (i in value === false) and slots that serialize to undefined as "null"
(i.e. use stableStringifyRec per index and fall back to "null") to match
JSON.stringify behavior and ensure circulars are detected via the seen set.

In `@packages/arkor/src/core/trainer.test.ts`:
- Around line 1817-1849: The test's fetcher mock expects events for job id
"j-falsy" but the create-job stub returns minimalJobRow.id (currently "j-stop"),
causing wait() to hit the wrong stream URL; update the create-job response to
return a job with id "j-falsy" (i.e. ensure minimalJobRow.id or the object
returned in the POST branch matches "j-falsy") so the GET to
/v1/jobs/j-falsy/events/stream in the fetcher mock is exercised by the
Trainer.wait() flow.

---

Nitpick comments:
In `@packages/arkor/src/core/trainerInspection.test.ts`:
- Around line 91-121: Add a new assertion to cover the missing "default:
trainer" shape by calling findInspectableTrainer with an object whose default
property is the bare trainer; specifically create a trainer via brandedTrainer
(e.g., brandedTrainer("default-bare")) and assert that findInspectableTrainer({
default: trainer }) returns the expected inspection (check .name), similar to
the other cases in the test that use findInspectableTrainer, brandedTrainer, and
createArkor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 429d904f-75a0-497d-bfbb-9ea6225bd3e4

📥 Commits

Reviewing files that changed from the base of the PR and between 0b0eee1 and d4030b9.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (43)
  • AGENTS.md
  • docs/concepts/studio.mdx
  • docs/ja/concepts/studio.mdx
  • docs/ja/studio/jobs.mdx
  • e2e/cli/vitest.config.ts
  • e2e/studio/src/specs/hmr.spec.ts
  • eslint.config.ts
  • packages/arkor/package.json
  • packages/arkor/src/cli/cleanupHooks.test.ts
  • packages/arkor/src/cli/cleanupHooks.ts
  • packages/arkor/src/cli/commands/build.ts
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/cli/commands/start.test.ts
  • packages/arkor/src/core/configHash.test.ts
  • packages/arkor/src/core/configHash.ts
  • packages/arkor/src/core/moduleCacheBust.test.ts
  • packages/arkor/src/core/moduleCacheBust.ts
  • packages/arkor/src/core/projectState.test.ts
  • packages/arkor/src/core/rolldownConfig.ts
  • packages/arkor/src/core/runner.test.ts
  • packages/arkor/src/core/runner.ts
  • packages/arkor/src/core/runnerSignals.test.ts
  • packages/arkor/src/core/runnerSignals.ts
  • packages/arkor/src/core/schemas.test.ts
  • packages/arkor/src/core/signalExit.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/core/trainerInspection.test.ts
  • packages/arkor/src/core/trainerInspection.ts
  • packages/arkor/src/studio/hmr.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/studio/manifest.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/studio/server.ts
  • packages/arkor/src/studio/trainRegistry.test.ts
  • packages/arkor/src/studio/trainRegistry.ts
  • packages/cli-internal/src/templates.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/lib/api.ts
  • packages/studio-app/vite.config.ts
  • pnpm-workspace.yaml

Comment thread packages/arkor/src/core/configHash.ts Outdated
Comment thread packages/arkor/src/core/trainer.test.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…andling

fix: improve inspectBundle error handling and manifest fetching logic in HMR

test: add assertions for default-exported trainer shapes in trainerInspection tests

feat: implement monotonic counter for HMR SSE handler to prevent stale state

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e726548bca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +96 to +98
// restored the error. The polling effect's own fetches don't need
// this because they observe `cancelled` synchronously inside the
// `tick()` closure; this counter only protects the cross-handler

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Version manifest polling against HMR errors

When HMR is enabled, a regular 5s manifest poll that started before a later error SSE can still resolve after the error handler has set { error }, because this code explicitly leaves polling fetches outside the manifestFetchSeqRef guard. In projects whose manifest import/build is slow (for example top-level await or filesystem contention), that stale poll can overwrite the current build-error state and re-enable the Run button for the last-good artifact until the next poll restores the error.

Useful? React with 👍 / 👎.

Comment on lines +452 to +456
if (restartPendingRef.current) {
// Fast path: SSE event already landed before exit. Fire the
// restart synchronously without waiting for the grace
// window so the common case has no perceptible delay.
restartPendingRef.current = false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Suppress auto-restart after failed early-stop exits

When an HMR-triggered early-stop fails to cancel the cloud job, the runner now exits nonzero (exit=143), but streamTraining() still resolves normally for any /api/train stream that closed after a 200 response. If restartPendingRef is set, this branch auto-spawns a new run without checking that exit marker, so the fixed nonzero exit path is still treated as restartable and can overlap a new cloud job with the one whose cancel failed.

Useful? React with 👍 / 👎.

Comment thread packages/arkor/src/studio/server.ts Outdated
Comment on lines +911 to +912
const { baseUrl: rpcBaseUrl, token: rpcToken } =
await resolveCredentialsAndBaseUrl();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Capture cancel credentials at spawn time

This server-side Stop path re-reads credentials/base URL at cancel time, while the child created the cloud job using whatever credentials it loaded at trainer.start(). If the user logs in/out or otherwise changes ~/.arkor credentials while a run is in flight, Stop can send the cancel POST to the wrong account/control plane, swallow the failure, and then SIGKILL the child that still held the original credentials, leaving the original cloud job running.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e41acc7f28

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

// Hash mismatch (or one side is null): graceful restart.
const r = safeKill(entry.child, "SIGTERM");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid SIGTERM restarts on Windows

When Studio HMR runs on Windows and a rebuild requires a restart, this SIGTERM does not reach the runner's installShutdownHandlers; Node's subprocess.kill() docs state that on Windows these signals cause the process to be killed "forcefully and abruptly" (https://nodejs.org/api/child_process.html#subprocesskillsignal). In that environment the child cannot early-stop or send cancel(), so a config-changing rebuild can orphan the cloud job and lose the checkpoint-preservation behavior this path is meant to provide.

Useful? React with 👍 / 👎.


async function emitBuildSucceeded(): Promise<void> {
if (disposed) return;
const seq = ++buildSeq;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Invalidate stale inspections when rebuild starts

This sequence guard only advances after a later build reaches BUNDLE_END (or ERROR), so if the previous bundle's inspectBundle() is slow (for example user top-level await during dynamic import) and a new edit has already started compiling but has not ended yet, the older inspection still passes seq === buildSeq and broadcasts/signals active children for stale source. Rolldown's watch cycle includes BUNDLE_START before BUNDLE_END, so the in-flight inspection should be invalidated when the newer rebuild starts, not only when it finishes.

Useful? React with 👍 / 👎.

Comment thread packages/arkor/src/studio/hmr.ts Outdated
}
watcher = watch({
...rolldownInputOptions(resolved),
output: { file: resolved.outFile, format: "esm" },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid importing while the watcher writes

With HMR enabled, the watcher writes .arkor/build/index.mjs at this same path while /api/train can concurrently spawn arkor start, whose runStart skips rebuilding whenever that artifact already exists. If the user clicks Run during the watcher's non-atomic write window, the child can import a partially-written bundle and fail the run instead of starting from a coherent artifact. Gate training while a rebuild is in progress or publish the watched bundle via temp-file-and-rename before making it visible to arkor start.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
packages/arkor/src/studio/server.ts (1)

1620-1624: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constrain static asset reads to assetsDir.

readAsset() joins the request path directly onto assetsDir, and app.get("*") feeds it c.req.path verbatim. Any traversal sequence that survives routing normalization can therefore escape the bundled assets tree and read arbitrary local files. Because these GETs are only host-guarded, not Studio-token gated, this is a local file disclosure on the loopback server.

🔒 Suggested hardening
 async function readAsset(relPath: string): Promise<Response | null> {
   const cleaned = relPath.replace(/^\/+/, "");
   try {
-      const file = await readFile(join(assetsDir, cleaned));
+      const filePath = resolve(assetsDir, cleaned);
+      if (
+        filePath !== assetsDir &&
+        !filePath.startsWith(`${assetsDir}${sep}`)
+      ) {
+        return null;
+      }
+      const file = await readFile(filePath);
       const ext = cleaned.slice(cleaned.lastIndexOf(".") + 1);
       if (ext === "html") {

Also applies to: 1647-1655

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/studio/server.ts` around lines 1620 - 1624, readAsset
currently concatenates the request path onto assetsDir and can be abused for
path traversal; to fix, decode and normalize the incoming relPath, then compute
the absolute target with path.resolve(assetsDir, cleaned) (use the same resolved
base: const base = path.resolve(assetsDir)), and verify the resulting target is
inside base (e.g. use path.relative(base, target) and ensure it does not start
with ".." and does not contain path.sep at the front); if the check fails return
null (or an appropriate error) and do not call readFile. Also reject suspicious
inputs (null bytes, absolute paths) before resolving. Apply this same guard to
the other occurrence around lines 1647-1655.
packages/arkor/src/cli/commands/dev.test.ts (1)

663-682: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

These persist-failure tests do not actually fail on Windows.

Line 671 and Line 738 rely on chmodSync(..., 0o555) to block writes to ~/.arkor, but NTFS does not enforce directory writes that way. On Windows both tests can go green without ever hitting the persistStudioToken catch path, so this regression coverage disappears on part of the CI matrix. Use a platform-neutral failure trigger here, such as making studioTokenPath() a directory or mocking the write boundary.

Also applies to: 726-747

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/cli/commands/dev.test.ts` around lines 663 - 682, The
tests relying on chmodSync to make ~/.arkor read-only are flaky on Windows;
replace the platform-specific permission trick with a platform-neutral
write-failure trigger: locate the token path via studioTokenPath(...) (or the
exported persistStudioToken/writeCredentials function) and either (a) create a
directory at that exact path before calling runDev({ port: 4203 }) so the token
write will fail with ENOTDIR, or (b) mock/spyon the module method that writes
the token (persistStudioToken or writeCredentials) to throw an EACCES-like error
for this test; ensure you restore the original behavior in the finally block and
keep assertions (expect(runDev...).resolves.toBeUndefined and
expect(serve).toHaveBeenCalledTimes(1)) unchanged.
packages/studio-app/src/components/RunTraining.tsx (2)

141-167: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The poller can still overwrite an HMR error with stale manifest data.

Line 151 is only gated by cancelled, which flips on unmount, not on newer HMR events. If a /api/manifest poll started before an SSE error frame resolves afterward, this effect will restore the last-good manifest and re-enable Run against broken code until the next refresh. Reuse the same seq/generation gate for poll fetches, or invalidate in-flight polls when HMR events land.

Possible fix
   async function tick() {
+    const seq = ++manifestFetchSeqRef.current;
     try {
       const m = await fetchManifest();
-      if (!cancelled) setManifest(m);
+      if (!cancelled && seq === manifestFetchSeqRef.current) {
+        setManifest(m);
+      }
     } catch (err: unknown) {
-      if (!cancelled) {
+      if (!cancelled && seq === manifestFetchSeqRef.current) {
         setManifest({
           error: err instanceof Error ? err.message : String(err),
         });
       }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/studio-app/src/components/RunTraining.tsx` around lines 141 - 167,
The poller can overwrite a newer HMR error because it only checks cancelled;
modify the effect to use a generation/sequence gate: create a ref (e.g.,
manifestSeqRef) that you increment whenever an SSE/HMR error frame or any HMR
event invalidates the manifest, capture the current seq at the start of tick
(before calling fetchManifest), then before calling setManifest or scheduling
the next timer ensure the captured seq still equals manifestSeqRef.current; if
it differs, discard the poll result (and do not schedule another timeout). Keep
using the existing symbols (useEffect, cancelled, timer, tick, fetchManifest,
setManifest) but add and check the manifestSeqRef around the async fetch and
before state updates so in-flight polls are invalidated when HMR events arrive.

334-510: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Add a jsdom test suite for this restart state machine.

This component now owns the risky HMR logic - buffered pre-spawn events, late-SSE grace windows, and "Stop beats auto-restart" arbitration - but the provided coverage only hits API helpers and a caption-refresh E2E. A small Testing Library suite here would catch regressions before they become flaky cross-process Playwright failures. As per coding guidelines: Include jsdom-based Testing Library cases for Studio components. Based on learnings: Do not split implementation across multiple PRs; land docs, tests, and code in the same PR unless the user explicitly says otherwise.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/studio-app/src/components/RunTraining.tsx` around lines 334 - 510,
Add a jsdom Testing Library suite for the RunTraining component exercising the
run() state machine: write tests to simulate streamTraining callbacks and
SSE-like pre-spawn events to validate that pendingPreSpawnEventsRef is drained,
restartPendingRef wins over hot-swap, user Abort
(trainingAbortRef)/ac.signal.aborted prevents logging and cancels restarts, and
the 250ms restart grace window behavior (currentPidRef, restartGraceTimerRef)
toggles hmrStatus correctly; stub/mock streamTraining and isHmrEnabled, drive
onSpawn via the pid callback, and assert UI/state transitions (setHmrStatus
outcomes "idle"/"early-stopping"/"restarting"/"hot-swapped") and that run(file)
re-invokes when appropriate. Ensure tests run under jsdom (Testing Library +
jest) and are added alongside the component.

Sources: Coding guidelines, Learnings

packages/arkor/src/cli/commands/dev.ts (1)

199-230: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Read-then-unlink still lets a concurrent arkor dev instance lose its token.

Line 224 and Line 230 are a TOCTOU pair: another process can rewrite ~/.arkor/studio-token after the read succeeds but before this process unlinks it. In that interleaving, we compare our bytes and then delete the other process's fresh token anyway, so the shared-path 403 regression is still reachable. This needs an atomic ownership/lease scheme, not a best-effort content check.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/cli/commands/dev.ts` around lines 199 - 230, The current
read-then-unlink in registerCleanupHook (readFileSync → tokensEqual →
unlinkSync) is TOCTOU and can delete a freshly written token from another
process; replace the best-effort content check with an explicit ownership/lease:
when creating the token (persistStudioToken or wherever the token is written)
also atomically acquire a per-process lease by creating a companion
lock/ownership file using fs.openSync(lockPath, 'wx') (or writeFileSync with
flag 'wx') containing a unique owner id (pid/uuid); on cleanup, instead of just
reading the token file, read the lock file and compare the owner id to our
stored id and only then unlink the token and remove the lock; update
registerCleanupHook, persistStudioToken (or the token-writing site), and use
tokensEqual only for debug — use lockPath, persistStudioToken,
registerCleanupHook, and the stored owner id to locate the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/arkor/src/cli/commands/dev.test.ts`:
- Around line 663-682: The tests relying on chmodSync to make ~/.arkor read-only
are flaky on Windows; replace the platform-specific permission trick with a
platform-neutral write-failure trigger: locate the token path via
studioTokenPath(...) (or the exported persistStudioToken/writeCredentials
function) and either (a) create a directory at that exact path before calling
runDev({ port: 4203 }) so the token write will fail with ENOTDIR, or (b)
mock/spyon the module method that writes the token (persistStudioToken or
writeCredentials) to throw an EACCES-like error for this test; ensure you
restore the original behavior in the finally block and keep assertions
(expect(runDev...).resolves.toBeUndefined and
expect(serve).toHaveBeenCalledTimes(1)) unchanged.

In `@packages/arkor/src/cli/commands/dev.ts`:
- Around line 199-230: The current read-then-unlink in registerCleanupHook
(readFileSync → tokensEqual → unlinkSync) is TOCTOU and can delete a freshly
written token from another process; replace the best-effort content check with
an explicit ownership/lease: when creating the token (persistStudioToken or
wherever the token is written) also atomically acquire a per-process lease by
creating a companion lock/ownership file using fs.openSync(lockPath, 'wx') (or
writeFileSync with flag 'wx') containing a unique owner id (pid/uuid); on
cleanup, instead of just reading the token file, read the lock file and compare
the owner id to our stored id and only then unlink the token and remove the
lock; update registerCleanupHook, persistStudioToken (or the token-writing
site), and use tokensEqual only for debug — use lockPath, persistStudioToken,
registerCleanupHook, and the stored owner id to locate the changes.

In `@packages/arkor/src/studio/server.ts`:
- Around line 1620-1624: readAsset currently concatenates the request path onto
assetsDir and can be abused for path traversal; to fix, decode and normalize the
incoming relPath, then compute the absolute target with path.resolve(assetsDir,
cleaned) (use the same resolved base: const base = path.resolve(assetsDir)), and
verify the resulting target is inside base (e.g. use path.relative(base, target)
and ensure it does not start with ".." and does not contain path.sep at the
front); if the check fails return null (or an appropriate error) and do not call
readFile. Also reject suspicious inputs (null bytes, absolute paths) before
resolving. Apply this same guard to the other occurrence around lines 1647-1655.

In `@packages/studio-app/src/components/RunTraining.tsx`:
- Around line 141-167: The poller can overwrite a newer HMR error because it
only checks cancelled; modify the effect to use a generation/sequence gate:
create a ref (e.g., manifestSeqRef) that you increment whenever an SSE/HMR error
frame or any HMR event invalidates the manifest, capture the current seq at the
start of tick (before calling fetchManifest), then before calling setManifest or
scheduling the next timer ensure the captured seq still equals
manifestSeqRef.current; if it differs, discard the poll result (and do not
schedule another timeout). Keep using the existing symbols (useEffect,
cancelled, timer, tick, fetchManifest, setManifest) but add and check the
manifestSeqRef around the async fetch and before state updates so in-flight
polls are invalidated when HMR events arrive.
- Around line 334-510: Add a jsdom Testing Library suite for the RunTraining
component exercising the run() state machine: write tests to simulate
streamTraining callbacks and SSE-like pre-spawn events to validate that
pendingPreSpawnEventsRef is drained, restartPendingRef wins over hot-swap, user
Abort (trainingAbortRef)/ac.signal.aborted prevents logging and cancels
restarts, and the 250ms restart grace window behavior (currentPidRef,
restartGraceTimerRef) toggles hmrStatus correctly; stub/mock streamTraining and
isHmrEnabled, drive onSpawn via the pid callback, and assert UI/state
transitions (setHmrStatus outcomes
"idle"/"early-stopping"/"restarting"/"hot-swapped") and that run(file)
re-invokes when appropriate. Ensure tests run under jsdom (Testing Library +
jest) and are added alongside the component.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: be0f3b24-0b07-46b0-87c3-4a8c45682ee7

📥 Commits

Reviewing files that changed from the base of the PR and between e726548 and e41acc7.

📒 Files selected for processing (18)
  • AGENTS.md
  • docs/concepts/studio.mdx
  • e2e/cli/vitest.config.ts
  • e2e/studio/src/specs/hmr.spec.ts
  • eslint.config.ts
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/cli/commands/start.test.ts
  • packages/arkor/src/core/runner.test.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/studio/server.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/lib/api.ts
  • packages/studio-app/vite.config.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (19)
docs/**/*.mdx

📄 CodeRabbit inference engine (AGENTS.md)

Verify Mintlify heading slugs with curl before adding cross-page anchor links. Mintlify preserves /, =, and full-width JP parens in ids, strips ASCII parens ( ) and backticks `, and converts spaces to -. Confirm actual rendered ids before committing links.

Files:

  • docs/concepts/studio.mdx
**

⚙️ CodeRabbit configuration file

**: # Arkor Development Guide

Note: Claude Code automatically loads this file.

Repository shape

pnpm + Turbo monorepo. Workspaces are declared in pnpm-workspace.yaml (packages/*, e2e/*, docs).

Path Role
packages/arkor Published arkor SDK + CLI + bundled local Studio server (Hono). bin/arkordist/bin.mjs. Library entry → dist/index.mjs.
packages/create-arkor Published create-arkor scaffolder (pnpm create arkor).
packages/cli-internal Private workspace package. Source is bundled into arkor and create-arkor via tsdown's deps.alwaysBundle. Never appears as a runtime dependency on npm.
packages/studio-app Private Vite + React 19 SPA. pnpm --filter @arkor/studio-app bundle builds it; packages/arkor/scripts/copy-studio-assets.mjs copies dist/ into packages/arkor/dist/assets/.
e2e/cli Private vitest suite that spawns the built dist/bin.mjs of both CLIs in temp dirs.
e2e/studio Private Playwright suite that spawns arkor dev against an in-process fake cloud-api and drives the Studio SPA in Chromium.
docs Mintlify sources for docs.arkor.ai.

Common commands

Root scripts fan out via Turbo (which respects ^build deps in turbo.json):

pnpm install
pnpm build          # turbo run build across all packages
pnpm typecheck      # tsc --noEmit across all packages
pnpm lint           # oxlint --deny-warnings, then strict ESLint 10 (single root config each)
pnpm format         # oxfmt --write across the repo (config in oxfmt.config.ts)
pnpm format:check   # oxfmt --check; CI gates on this (no writes)
pnpm test           # vitest run across all packages (incl. e2e)
pnpm test:coverage  # writes lcov + cobertura + junit per package; CI uploads lcov to Codecov, cobertura to GitHub nativ...

Files:

  • docs/concepts/studio.mdx
  • packages/arkor/src/cli/commands/start.test.ts
  • packages/studio-app/vite.config.ts
  • e2e/cli/vitest.config.ts
  • packages/studio-app/src/lib/api.ts
  • packages/arkor/src/studio/server.ts
  • AGENTS.md
  • eslint.config.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • e2e/studio/src/specs/hmr.spec.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/cli/commands/dev.test.ts
**/*.{ts,tsx,js,jsx,mts,mjs,cts,cjs}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

**/*.{ts,tsx,js,jsx,mts,mjs,cts,cjs}: Avoid the em dash (U+2014) in code comments, string literals, and template literals. Use colons, periods, commas, parentheses, spaced hyphens (" - "), or reworded sentences instead
Run pnpm lint (oxlint --deny-warnings followed by strict ESLint 10 with type-aware rules) before submitting code

Files:

  • packages/arkor/src/cli/commands/start.test.ts
  • packages/studio-app/vite.config.ts
  • e2e/cli/vitest.config.ts
  • packages/studio-app/src/lib/api.ts
  • packages/arkor/src/studio/server.ts
  • eslint.config.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • e2e/studio/src/specs/hmr.spec.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/cli/commands/dev.test.ts
**/*.{ts,tsx,js,jsx,mts,mjs,cts,cjs,json,css,html}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run pnpm format using oxfmt to format all JS/TS/JSON/CSS/HTML files before submitting code

Files:

  • packages/arkor/src/cli/commands/start.test.ts
  • packages/studio-app/vite.config.ts
  • e2e/cli/vitest.config.ts
  • packages/studio-app/src/lib/api.ts
  • packages/arkor/src/studio/server.ts
  • eslint.config.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • e2e/studio/src/specs/hmr.spec.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/cli/commands/dev.test.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include vitest cases for SDK/CLI/scaffolder logic, jsdom-based Testing Library cases for Studio components, or visual UI documentation when submitting PRs

Files:

  • packages/arkor/src/cli/commands/start.test.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/studio-app/src/lib/api.test.ts
  • packages/arkor/src/cli/commands/dev.test.ts
**/*.{ts,tsx,js,jsx,mts,mjs}

📄 CodeRabbit inference engine (CONTRIBUTING.ja.md)

Do not use em-dashes (U+2014) anywhere in code comments, strings, and template literals. Instead use colons, periods, commas, parentheses, space-separated hyphens (-), or rephrase sentences.

Files:

  • packages/arkor/src/cli/commands/start.test.ts
  • packages/studio-app/vite.config.ts
  • e2e/cli/vitest.config.ts
  • packages/studio-app/src/lib/api.ts
  • packages/arkor/src/studio/server.ts
  • eslint.config.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • e2e/studio/src/specs/hmr.spec.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/cli/commands/dev.test.ts
packages/{arkor,create-arkor,cli-internal}/**/*.{ts,tsx,js,mts}

📄 CodeRabbit inference engine (CONTRIBUTING.ja.md)

CLI runtime messages, generated file templates, and test names must also follow the no em-dash rule

Files:

  • packages/arkor/src/cli/commands/start.test.ts
  • packages/arkor/src/studio/server.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/cli/commands/dev.test.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx}: Prefer interface for defining object shapes in TypeScript (TypeScript-specific type definition pattern)
Use camelCase for variable names in TypeScript/JavaScript code
Always use async/await for promises in TypeScript/JavaScript code
Use timingSafeEqual for security-critical token comparisons (e.g., Studio CSRF token validation)

Files:

  • packages/arkor/src/cli/commands/start.test.ts
  • packages/studio-app/vite.config.ts
  • e2e/cli/vitest.config.ts
  • packages/studio-app/src/lib/api.ts
  • packages/arkor/src/studio/server.ts
  • eslint.config.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • e2e/studio/src/specs/hmr.spec.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/cli/commands/dev.test.ts
packages/arkor/src/**/*.test.ts

📄 CodeRabbit inference engine (AGENTS.md)

Add vitest test cases for SDK/CLI/scaffold logic changes

Files:

  • packages/arkor/src/cli/commands/start.test.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/cli/commands/dev.test.ts
**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

When bumping oxlint or oxfmt in the catalog, run pnpm install --config.minimum-release-age=0 once so platform bindings land in pnpm-lock.yaml. Regular pnpm install thereafter will resolve them normally.

Files:

  • packages/arkor/src/cli/commands/start.test.ts
  • packages/studio-app/vite.config.ts
  • e2e/cli/vitest.config.ts
  • packages/studio-app/src/lib/api.ts
  • packages/arkor/src/studio/server.ts
  • eslint.config.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/core/runner.test.ts
  • e2e/studio/src/specs/hmr.spec.ts
  • packages/arkor/src/core/trainer.test.ts
  • packages/arkor/src/core/trainer.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/lib/api.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/cli/commands/dev.test.ts
packages/studio-app/vite.config.ts

📄 CodeRabbit inference engine (AGENTS.md)

Configure the arkor-studio-token Vite plugin with apply: "serve" (dev-only) to inject <meta name="arkor-studio-token"> into index.html on each request. Never apply this plugin during vite build to prevent baking a stale token into production.

Files:

  • packages/studio-app/vite.config.ts
e2e/cli/**/*.{ts,tsx,js,mts}

📄 CodeRabbit inference engine (CONTRIBUTING.ja.md)

E2E CLI tests must consume the built dist/bin.mjs from arkor and create-arkor packages, ensuring pnpm build is run first when executing tests standalone

Files:

  • e2e/cli/vitest.config.ts
packages/arkor/src/studio/server.ts

📄 CodeRabbit inference engine (AGENTS.md)

Validate every /api/* request via X-Arkor-Studio-Token header (or ?studioToken= query for EventSource) using timingSafeEqual comparison. Only GET stream-only routes can be added to the eventStreamPathPattern allow-list regex; never add mutation endpoints.

Files:

  • packages/arkor/src/studio/server.ts
AGENTS.md

📄 CodeRabbit inference engine (CLAUDE.md)

Maintain AGENTS.md as a living document with current agent status and architectural decisions

Files:

  • AGENTS.md
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Maintain English/Japanese documentation pairs: README.mdREADME.ja.md, CONTRIBUTING.mdCONTRIBUTING.ja.md, and docs/docs/ja/. Update both sides in the same PR.

Files:

  • AGENTS.md
eslint.config.ts

📄 CodeRabbit inference engine (AGENTS.md)

ESLint 10 reads eslint.config.ts flat config at repo root (type-aware via projectService). Configure strict rules: typescript-eslint strictTypeChecked + stylisticTypeChecked, unicorn, import-x, promise, n, react-hooks, jsx-a11y, vitest. Add per-file overrides for SPA (React), tests, Playwright fixtures, and scripts. Keep auto-fix unsafe rules (unicorn/no-typeof-undefined, no-unnecessary-type-assertion) set to off with comments explaining the failure modes.

Files:

  • eslint.config.ts
e2e/studio/**/*.{ts,tsx,js,mts}

📄 CodeRabbit inference engine (CONTRIBUTING.ja.md)

E2E Studio tests must consume the built dist/bin.mjs from arkor and the bundled Studio assets, requiring playwright chromium installation and pnpm build as prerequisites

Files:

  • e2e/studio/src/specs/hmr.spec.ts
packages/arkor/src/cli/commands/dev.ts

📄 CodeRabbit inference engine (AGENTS.md)

Generate a 32-byte base64url CSRF token per dev server launch and pass it to buildStudioApp({ studioToken, hmr })

Files:

  • packages/arkor/src/cli/commands/dev.ts
packages/arkor/src/studio/hmr.ts

📄 CodeRabbit inference engine (AGENTS.md)

Keep a Rolldown watcher over src/arkor/ to track rebuilds and push events over /api/dev/events (SSE). Dynamic-import artifacts on success, extract TrainerInspection snapshots via Symbol.for("arkor.trainer.inspect") brand, and compute stable configHash from cloud-side JobConfig.

Files:

  • packages/arkor/src/studio/hmr.ts
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Use the Repository pattern for data access
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Treat `createArkor` as a function that returns a frozen, opaque manifest tagged with `_kind: "arkor"`; pass it to tooling, not as a programmable client for user code
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Both e2e/cli and e2e/studio declare `arkor` (and `create-arkor`) as `workspace:*` devDependencies so Turbo's `^build` produces `dist/bin.mjs` exactly once before test runs (no pretest hooks, no concurrent races). Standalone runs need a prior `pnpm build`.
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: E2E coverage uses `c8` wrapping vitest (not vitest's native coverage) so child CLI V8 coverage is captured and remapped through tsdown sourcemaps back to `src/`. Set `CREATE_ARKOR_BUILD_SOURCEMAP=1` for E2E coverage builds; CI declares it in turbo.json so it propagates and busts the no-sourcemap cache variant. Release builds do NOT set this and ship without `.map` files.
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Release SBOMs are generated with `pnpm sbom --prod --sbom-spec-version 1.7` (CycloneDX 1.7 and SPDX 2.3 formats), attested under the trusted builder's OIDC identity, and uploaded to both Codecov and GitHub native code coverage. Bundled runtime inputs of `arkor/studio-app` (e.g. `react`, `fontsource-variable/geist*`) must be `dependencies`, not `devDependencies`, so they appear in the production SBOM.
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Do NOT split implementation (features, schema, behavior changes) across multiple PRs. Land docs (English + Japanese), tests, and code in the same PR unless the user explicitly says otherwise.
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Node version: published packages declare `engines.node >=22.22.0` (raised from 22.6 to dodge the Jan 2026 async-hooks CVE). Use Node 24 (latest preferred) for development.
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Preserve the SIGTERM-and-let-the-child-handle-it pattern in HMR subprocess lifecycle. Do not replace it with SIGKILL escalation (orphans Cloud jobs) or always-hot-swap (risks stale JobConfig). Do not surface `requestEarlyStop()` or `replaceCallbacks()` as public Trainer interface methods; keep them behind `Symbol.for` brands (dev-only HMR primitives).
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: HMR Symbols (`Symbol.for("arkor.trainer.inspect")`, `Symbol.for("arkor.trainer.replaceCallbacks")`, `Symbol.for("arkor.trainer.requestEarlyStop")`) are dev-only HMR primitives and must never be exposed in the published Trainer public interface. User code that wants similar semantics should compose `abortSignal` + `cancel()` per the cookbook.
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: When touching the Studio server or SPA fetch layer, preserve: token via header for `fetch`, query param for `EventSource`, host-header guard, no CORS, timing-safe compare. The Vite plugin is dev-only (`apply: "serve"`); running it during `vite build` would bake a stale per-launch token into production.
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: CORS is intentionally NOT configured on `/api/*`. The SPA is same-origin so reflecting `*` would let "simple" cross-origin POSTs reach handlers. The token check rejects those; cross-origin tabs cannot read the SPA's `<meta>`. This prevents another browser tab on the same machine from POSTing `/api/train` (RCE-grade exposure).
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Generated files copied into package directories are gitignored: `packages/*/CONTRIBUTING.md` (from root), `packages/arkor/docs/` (from root `docs/`). Always edit the source under repo root, not the copies.
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-11T14:01:54.065Z
Learning: Do not call a HuggingFace model name "non-existent" based on training data alone. Verify (e.g. WebFetch) before flagging in issues or PR comments. If unverifiable, hedge ("could not confirm") rather than asserting absence.
🪛 LanguageTool
AGENTS.md

[style] ~100-~100: Consider using the typographical ellipsis character here instead.
Context: ...emoved. The build / coverage lanes set ARKOR_INTERNAL_SCAFFOLD_ARKOR_SPEC=file:.../packages/arkor so scaffolded fixtures ...

(ELLIPSIS)


[uncategorized] ~132-~132: The official name of this software platform is spelled with a capital “H”.
Context: ...TOCOL`. - Release SBOMs (build.yaml): each release sh...

(GITHUB)


[uncategorized] ~132-~132: The official name of this software platform is spelled with a capital “H”.
Context: ...ifact, and postflight in release.yaml attaches them t...

(GITHUB)


[uncategorized] ~132-~132: The official name of this software platform is spelled with a capital “H”.
Context: ...npm emits 2.3), so release-dry-run.yaml regener...

(GITHUB)

🔇 Additional comments (11)
packages/arkor/src/core/trainer.ts (1)

32-32: LGTM!

Also applies to: 157-157, 248-248

eslint.config.ts (1)

4-4: LGTM!

Also applies to: 31-38, 40-98, 154-167, 176-176, 334-334, 357-357

packages/arkor/src/core/runner.test.ts (1)

171-171: LGTM!

packages/arkor/src/core/trainer.test.ts (1)

279-279: LGTM!

Also applies to: 674-675, 856-858, 1841-1841

packages/arkor/src/studio/hmr.ts (1)

108-108: LGTM!

Also applies to: 111-111, 347-347

packages/studio-app/src/components/RunTraining.tsx (1)

241-251: Em-dash check: no U+2014 characters in RunTraining.tsx
packages/studio-app/src/components/RunTraining.tsx contains no U+2014 em-dash characters, so the em-dash guideline issue is not present here.

			> Likely an incorrect or invalid review comment.
packages/studio-app/src/lib/api.ts (1)

174-183: LGTM!

Also applies to: 262-268, 414-424, 447-453, 476-485, 510-515

packages/studio-app/vite.config.ts (1)

61-63: LGTM!

AGENTS.md (1)

100-100: LGTM!

Also applies to: 132-132

docs/concepts/studio.mdx (1)

15-15: LGTM!

e2e/cli/vitest.config.ts (1)

22-25: LGTM!

Also applies to: 45-46

@socket-security

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedjsdom@​29.1.18110010093100
Addedglobals@​17.6.01001008691100
Addedhono@​4.12.23991009796100

View full report

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8c110d181

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const snap = win32CancelSnapshots.find(
(s) => s.pid === target.pid,
);
if (!snap?.jobId || !snap.scope) continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to late scope before Windows HMR cancel

On Windows, child.kill("SIGTERM") forcefully terminates the runner (Node documents this as a forceful/abrupt kill), so this server-side cancel is the only chance to release the cloud job for HMR restarts. In first-run anonymous projects the registry scope is commonly null at spawn because .arkor/state.json is created inside the child during trainer.start(), but the job id may be known by the time a rebuild forces restart; this !snap.scope guard then skips the cancel POST and the abrupt kill can leave the remote job running. Fresh evidence versus the earlier Windows SIGTERM finding: the new mitigation still only uses the spawn-time snapshot and does not do the late state-file fallback that the manual Stop path added.

Useful? React with 👍 / 👎.

Comment on lines +1108 to +1109
const { baseUrl: rpcBaseUrl, token: rpcToken } =
await resolveCredentialsAndBaseUrl();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the spawn credentials for Windows HMR cancels

When a Windows HMR restart hits this fallback, the child has just been killed abruptly, so this POST is the only process that can cancel the cloud job. Resolving credentials here uses whatever account/control plane is on disk at rebuild time; if the user logged in/out or switched hosts after the run was spawned, the cancel goes to the wrong place, the error is swallowed, and the original job keeps running. Fresh evidence versus the earlier credential-capture report: the manual Stop path now uses its spawn-time spawnRpc, but this new Windows HMR fallback still bypasses that snapshot.

Useful? React with 👍 / 👎.

Comment on lines +757 to +758
void trainer
.cancel()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disarm the timeout latch before fallback cancel

When the early-stop timeout fires, the latch stays armed until this fallback cancel() settles. If a checkpoint.saved event arrives during that in-flight cancel POST, the checkpoint branch still sees earlyStopRequested && earlyStopDeferred and issues a second trainer.cancel(); on APIs that reject duplicate/already-cancelled requests, that second POST can mark the run failed and make the SIGTERM handler exit nonzero even though the timeout fallback cancel may have succeeded. Clear or mark the latch before starting the fallback cancel so late checkpoints cannot re-enter the cancel path.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/arkor/src/studio/server.ts (1)

354-362: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The manifest fast path drifts from custom HMR entries.

options.hmr can be watching an explicit entry/output, but this code always re-derives hmrOutFile from cwd alone. In an arkor dev session started against a non-default entry, /api/manifest will keep reading the default .arkor/build/index.mjs while the watcher is publishing a different artifact, so the SPA polls stale or missing manifest data even though HMR is healthy. Please plumb the coordinator's resolved outFile into buildStudioApp instead of recomputing it here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/studio/server.ts` around lines 354 - 362, The current
hmrOutFile computation recomputes the build entry from trainCwd (using
resolveBuildEntry) and ignores a coordinator-resolved output; update the logic
so buildStudioApp receives and uses the coordinator's resolved outFile instead
of recomputing it here: stop deriving hmrOutFile from resolveBuildEntry({ cwd:
trainCwd }) when options.hmr is true and instead accept a
resolvedOutFile/outFile parameter (the coordinator's value) and assign
hmrOutFile = coordinatorOutFile (or the passed-in resolvedOutFile) so
readManifestSummary and the /api/manifest fast path use the coordinator-provided
artifact; keep fallback behavior for undefined/out-of-date files as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/arkor/src/cli/commands/dev.ts`:
- Around line 261-278: Replace the non-atomic existsSync+renameSync path restore
with an atomic "create only if absent" operation: in the try block where you
currently call renameSync(reapPath, path), use fs.linkSync(reapPath, path) (or
fs.copyFileSync(reapPath, path, fs.constants.COPYFILE_EXCL)) so the creation
will fail with EEXIST if path was recreated, and in the catch for that EEXIST
simply unlinkSync(reapPath); keep the unlinkSync(reapPath) branch when path
already existed before attempting the atomic create; reference existsSync,
unlinkSync, renameSync (replace), linkSync or copyFileSync with COPYFILE_EXCL,
reapPath and path to locate the change.

In `@packages/arkor/src/studio/hmr.ts`:
- Around line 431-445: The staging rename currently changes mtime/ctime and
breaks inspectBundle() caching because moduleCacheBustUrl(outFile) is used; fix
by comparing content hashes and skipping the atomic rename when the staging
content equals the existing artifact: compute a checksum (e.g., SHA256) of
stagingFile and resolved.outFile at the BUNDLE_END/rename path (the code around
stagingFile and watcher), and only call renameSync when hashes differ;
alternatively, update inspectBundle() and moduleCacheBustUrl() to use a
content-hash-based cache key (read file bytes and derive the cache key) so
identical-byte rebuilds map to the same inspection record.

In `@packages/arkor/src/studio/server.ts`:
- Around line 1074-1121: The win32 restart path snapshots only pid/jobId/scope
and re-resolves credentials later, reintroducing races (null jobId, null scope,
auth/host switches) — instead attach a spawn-time RPC cancel context (baseUrl,
token and any resolved RPC client info) to each registry entry when the child is
spawned (augment whatever structure activeTrains.list()/activeTrains registry
returns), then in the restart loop (the restartTargets handling after
activeTrains.dispatchRebuild) look up that stored RPC snapshot (instead of
calling resolveCredentialsAndBaseUrl/createRpc again) and call
rpc.v1.jobs[":id"].cancel.$post with the stored snapshot; still keep the
existing late fallback to jobId/state when the spawn-time snapshot is missing,
but prefer the stored spawn-time RPC context to avoid races.

In `@packages/studio-app/src/components/RunTraining.tsx`:
- Around line 463-489: The current guard only suppresses auto-restart for
explicit nonzero exit codes; change it so a missing trailer (exitCode === null)
also suppresses restart when a restart was pending. Update the condition around
the block that manipulates restartPendingRef.current, currentPidRef.current,
restartGraceTimerRef.current, setHmrStatus and setLog (the existing if that
checks exitCode) to: if ((exitCode !== null && exitCode !== 0) || (exitCode ===
null && restartPendingRef.current)) { ... } so the same suppression/cleanup and
log (via appendCapped) runs for null when a restart was pending.

---

Outside diff comments:
In `@packages/arkor/src/studio/server.ts`:
- Around line 354-362: The current hmrOutFile computation recomputes the build
entry from trainCwd (using resolveBuildEntry) and ignores a coordinator-resolved
output; update the logic so buildStudioApp receives and uses the coordinator's
resolved outFile instead of recomputing it here: stop deriving hmrOutFile from
resolveBuildEntry({ cwd: trainCwd }) when options.hmr is true and instead accept
a resolvedOutFile/outFile parameter (the coordinator's value) and assign
hmrOutFile = coordinatorOutFile (or the passed-in resolvedOutFile) so
readManifestSummary and the /api/manifest fast path use the coordinator-provided
artifact; keep fallback behavior for undefined/out-of-date files as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 3eac6c93-4022-4549-b988-df0e2aa0795a

📥 Commits

Reviewing files that changed from the base of the PR and between e41acc7 and c8c110d.

📒 Files selected for processing (8)
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/studio/server.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/lib/api.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (52)
  • GitHub Check: install · yarn-berry · macos-latest · node 26.0.0
  • GitHub Check: install · npm · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · bun · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · yarn-berry · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-9 · macos-latest · node 26.0.0
  • GitHub Check: install · yarn · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-11 · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-10 · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-10 · macos-latest · node 26.0.0
  • GitHub Check: install · pnpm-9 · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · npm · macos-latest · node >=24.12.0 <25
  • GitHub Check: install · yarn-berry · macos-latest · node >=24.12.0 <25
  • GitHub Check: install · bun · macos-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-10 · windows-latest · node 24.0.0
  • GitHub Check: install · pnpm-11 · macos-latest · node 24.12.0
  • GitHub Check: install · pnpm-11 · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · yarn-berry · windows-latest · node 24.12.0
  • GitHub Check: install · bun · windows-latest · node 24.12.0
  • GitHub Check: install · pnpm-11 · macos-latest · node >=22.22.0 <23
  • GitHub Check: install · yarn · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-10 · macos-latest · node 24.0.0
  • GitHub Check: install · bun · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-11 · macos-latest · node 22.22.0
  • GitHub Check: install · bun · windows-latest · node 26.0.0
  • GitHub Check: install · bun · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-10 · windows-latest · node 26.0.0
  • GitHub Check: install · npm · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-11 · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · yarn · windows-latest · node 26.0.0
  • GitHub Check: install · pnpm-10 · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · yarn-berry · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · yarn · windows-latest · node 24.12.0
  • GitHub Check: install · pnpm-9 · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · npm · windows-latest · node 24.12.0
  • GitHub Check: install · yarn-berry · windows-latest · node 26.0.0
  • GitHub Check: install · yarn · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-10 · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-11 · windows-latest · node 24.12.0
  • GitHub Check: install · yarn · windows-latest · node 24.0.0
  • GitHub Check: install · pnpm-9 · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-9 · windows-latest · node 24.12.0
  • GitHub Check: install · npm · windows-latest · node 26.0.0
  • GitHub Check: install · pnpm-11 · windows-latest · node 24.0.0
  • GitHub Check: install · npm · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-10 · windows-latest · node 24.12.0
  • GitHub Check: install · npm · windows-latest · node 24.0.0
  • GitHub Check: install · npm · windows-latest · node >=22.22.0 <23
  • GitHub Check: install · pnpm-9 · windows-latest · node 22.22.0
  • GitHub Check: install · npm · windows-latest · node 22.22.0
  • GitHub Check: install · yarn · windows-latest · node >=22.22.0 <23
  • GitHub Check: install · npm · windows-latest · node 26.0.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node 22.22.0
🧰 Additional context used
📓 Path-based instructions (11)
**/*.{ts,tsx,js,jsx,mts,mjs,cts,cjs}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

**/*.{ts,tsx,js,jsx,mts,mjs,cts,cjs}: Avoid the em dash (U+2014) in code comments, string literals, and template literals. Use colons, periods, commas, parentheses, spaced hyphens (" - "), or reworded sentences instead
Run pnpm lint (oxlint --deny-warnings followed by strict ESLint 10 with type-aware rules) before submitting code

Files:

  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/lib/api.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/studio/server.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/studio/hmr.ts
**/*.{ts,tsx,js,jsx,mts,mjs,cts,cjs,json,css,html}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run pnpm format using oxfmt to format all JS/TS/JSON/CSS/HTML files before submitting code

Files:

  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/lib/api.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/studio/server.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/studio/hmr.ts
**/*.{ts,tsx,js,jsx,mts,mjs}

📄 CodeRabbit inference engine (CONTRIBUTING.ja.md)

Do not use em-dashes (U+2014) anywhere in code comments, strings, and template literals. Instead use colons, periods, commas, parentheses, space-separated hyphens (-), or rephrase sentences.

Files:

  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/lib/api.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/studio/server.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/studio/hmr.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

TypeScript: prefer interface for defining object shapes; use camelCase for variable names; always use async/await for promises

Files:

  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/lib/api.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/studio/server.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/studio/hmr.ts
**

⚙️ CodeRabbit configuration file

**: # Arkor Development Guide

Note: Claude Code automatically loads this file.

Repository shape

pnpm + Turbo monorepo. Workspaces are declared in pnpm-workspace.yaml (packages/*, e2e/*, docs).

Path Role
packages/arkor Published arkor SDK + CLI + bundled local Studio server (Hono). bin/arkordist/bin.mjs. Library entry → dist/index.mjs.
packages/create-arkor Published create-arkor scaffolder (pnpm create arkor).
packages/cli-internal Private workspace package. Source is bundled into arkor and create-arkor via tsdown's deps.alwaysBundle. Never appears as a runtime dependency on npm.
packages/studio-app Private Vite + React 19 SPA. pnpm --filter @arkor/studio-app bundle builds it; packages/arkor/scripts/copy-studio-assets.mjs copies dist/ into packages/arkor/dist/assets/.
e2e/cli Private vitest suite that spawns the built dist/bin.mjs of both CLIs in temp dirs.
e2e/studio Private Playwright suite that spawns arkor dev against an in-process fake cloud-api and drives the Studio SPA in Chromium.
docs Mintlify sources for docs.arkor.ai.

Common commands

Root scripts fan out via Turbo (which respects ^build deps in turbo.json):

pnpm install
pnpm build          # turbo run build across all packages
pnpm typecheck      # tsc --noEmit across all packages
pnpm lint           # oxlint --deny-warnings, then strict ESLint 10 (single root config each)
pnpm format         # oxfmt --write across the repo (config in oxfmt.config.ts)
pnpm format:check   # oxfmt --check; CI gates on this (no writes)
pnpm test           # vitest run across all packages (incl. e2e)
pnpm test:coverage  # writes lcov + cobertura + junit per package; CI uploads lcov to Codecov, cobertura to GitHub nativ...

Files:

  • packages/studio-app/src/components/RunTraining.tsx
  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/lib/api.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/studio/server.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/studio/hmr.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include vitest cases for SDK/CLI/scaffolder logic, jsdom-based Testing Library cases for Studio components, or visual UI documentation when submitting PRs

Files:

  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/cli/commands/dev.test.ts
**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Test files must be placed under packages//src/**/.test.ts with vitest convention; e2e scenarios in e2e/cli/src and e2e/studio/**/*.test.ts

Files:

  • packages/studio-app/src/lib/api.test.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/cli/commands/dev.test.ts
packages/{arkor,create-arkor,cli-internal}/**/*.{ts,tsx,js,mts}

📄 CodeRabbit inference engine (CONTRIBUTING.ja.md)

CLI runtime messages, generated file templates, and test names must also follow the no em-dash rule

Files:

  • packages/arkor/src/studio/server.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/cli/commands/dev.test.ts
  • packages/arkor/src/studio/hmr.ts
**/studio/server.ts

📄 CodeRabbit inference engine (AGENTS.md)

**/studio/server.ts: Validate every /api/* request via X-Arkor-Studio-Token header (or ?studioToken= query for EventSource) using timingSafeEqual; enforce host-header allow-list (127.0.0.1/localhost) for DNS-rebinding defence; do NOT configure CORS
Restrict EventSource ?studioToken= query param allow-list to GET stream-only routes via eventStreamPathPattern regex in studio/server.ts; adding entries is CSRF-sensitive and must never include mutation endpoints

Files:

  • packages/arkor/src/studio/server.ts
**/cli/commands/dev.ts

📄 CodeRabbit inference engine (AGENTS.md)

Generate a 32-byte base64url CSRF token per arkor dev launch in dev.ts; persist to ~/.arkor/studio-token (mode 0600) and inject into Studio SPA via Vite plugin

Files:

  • packages/arkor/src/cli/commands/dev.ts
**/studio/hmr.ts

📄 CodeRabbit inference engine (AGENTS.md)

Keep Rolldown watcher over src/arkor/ in packages/arkor/src/studio/hmr.ts; on rebuild, pull TrainerInspection snapshot and compute configHash for hot-swap decision logic

Files:

  • packages/arkor/src/studio/hmr.ts
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-12T10:25:25.551Z
Learning: Use pnpm + Turbo monorepo structure with workspaces declared in pnpm-workspace.yaml (packages/*, e2e/*, docs)
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-12T10:25:25.551Z
Learning: For all new features, SDK/CLI/Studio behaviour changes, or schema additions, include docs in both English and Japanese (README.md ↔ README.ja.md, docs/ ↔ docs/ja/) AND vitest test cases in the same PR; do not defer docs or tests to follow-ups
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-12T10:25:25.551Z
Learning: Set ARKOR_INTERNAL_SCAFFOLD_ARKOR_SPEC=file:.../packages/arkor in build/coverage CI lanes so fixtures resolve to in-workspace arkor; install-matrix does NOT set this and packs fresh arkor-*.tgz instead
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-06-12T10:25:25.551Z
Learning: When bumping oxlint or oxfmt in catalog, run pnpm install --config.minimum-release-age=0 once so platform bindings land in pnpm-lock.yaml; fixes pnpm 10.x minimumReleaseAge interaction
🪛 OpenGrep (1.22.0)
packages/studio-app/src/lib/api.ts

[ERROR] 473-473: Dynamic command passed to child_process.exec/execSync. Use child_process.execFile or spawn with an argument array instead.

(coderabbit.command-injection.exec-js)

Comment on lines +261 to +278
// Foreign token: restore it, UNLESS the other process already
// re-wrote `path` after our rename claimed the old copy. Rename
// REPLACES an existing destination rather than failing, so a
// bare rename-back would clobber that fresher token with the
// older claimed one. The existence probe shrinks the clobber
// window from "read → unlink" (previous design, milliseconds
// spanning a token comparison) to the few instructions between
// existsSync and renameSync, and the losing outcome in that
// residual window is restore-the-older-token (the other dev
// server 403s until its own next rewrite), not delete-the-token
// outright.
try {
if (existsSync(path)) {
// Newest writer wins: discard the claimed older copy.
unlinkSync(reapPath);
} else {
renameSync(reapPath, path);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the foreign token without a replace-on-exist race.

This fallback still does existsSync(path) and then renameSync(reapPath, path). If another arkor dev recreates path after the existence check but before the rename-back, this branch overwrites that newer token with the older claimed one, so the sibling Studio can still fall back to 403s. Restore with an atomic "create only if absent" step instead, then discard reapPath on EEXIST.

Possible fix
-import { existsSync, readFileSync, renameSync, unlinkSync } from "node:fs";
+import {
+  constants as fsConstants,
+  copyFileSync,
+  readFileSync,
+  renameSync,
+  unlinkSync,
+} from "node:fs";
...
       try {
-        if (existsSync(path)) {
-          // Newest writer wins: discard the claimed older copy.
-          unlinkSync(reapPath);
-        } else {
-          renameSync(reapPath, path);
-        }
-      } catch {
+        copyFileSync(reapPath, path, fsConstants.COPYFILE_EXCL);
+        unlinkSync(reapPath);
+      } catch (err) {
+        const code =
+          typeof err === "object" && err !== null && "code" in err
+            ? String(err.code)
+            : "";
+        if (code === "EEXIST") {
+          try {
+            unlinkSync(reapPath);
+          } catch {
+            // best-effort
+          }
+          return;
+        }
         try {
           unlinkSync(reapPath);
         } catch {
           // best-effort
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/cli/commands/dev.ts` around lines 261 - 278, Replace the
non-atomic existsSync+renameSync path restore with an atomic "create only if
absent" operation: in the try block where you currently call
renameSync(reapPath, path), use fs.linkSync(reapPath, path) (or
fs.copyFileSync(reapPath, path, fs.constants.COPYFILE_EXCL)) so the creation
will fail with EEXIST if path was recreated, and in the catch for that EEXIST
simply unlinkSync(reapPath); keep the unlinkSync(reapPath) branch when path
already existed before attempting the atomic create; reference existsSync,
unlinkSync, renameSync (replace), linkSync or copyFileSync with COPYFILE_EXCL,
reapPath and path to locate the change.

Comment on lines +431 to +445
// The watcher writes to a STAGING file beside the real artefact,
// which BUNDLE_END atomically renames onto `outFile` (Codex P2,
// round 81). Writing `outFile` directly would let a concurrent
// `/api/train` spawn race the watcher's non-atomic write:
// `runStart` skips its own rebuild whenever the artefact exists,
// so a Run click landing inside the write window would
// dynamic-import a partially-written bundle and crash the run.
// `renameSync` within the same directory is atomic on POSIX and
// uses MoveFileEx(MOVEFILE_REPLACE_EXISTING) on Windows, so
// readers only ever observe the previous complete artefact or
// the new complete artefact, never a torn write. Bonus: a build
// that errors before BUNDLE_END leaves the last-good artefact
// in place instead of clobbering it with partial output.
const stagingFile = `${resolved.outFile}.hmr-staging`;
watcher = watch({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Identical-byte rebuilds no longer reuse the cached inspection module.

Replacing resolved.outFile with a freshly written staging file changes mtime/ctime on every BUNDLE_END, even when the bytes are unchanged. inspectBundle() still cache-busts through moduleCacheBustUrl(outFile), so duplicate or no-op rebuilds now import a brand-new URL, rerun top-level module code, and retain another permanent ESM-cache record. That breaks the "same bytes collapse to the same inspection record" invariant this file relies on, especially for the duplicate BUNDLE_END case already documented in hmr.test.ts. Please either skip the rename when the staged content hash matches the current artifact, or switch the inspection cache key to a content hash.

Also applies to: 467-472

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/studio/hmr.ts` around lines 431 - 445, The staging rename
currently changes mtime/ctime and breaks inspectBundle() caching because
moduleCacheBustUrl(outFile) is used; fix by comparing content hashes and
skipping the atomic rename when the staging content equals the existing
artifact: compute a checksum (e.g., SHA256) of stagingFile and resolved.outFile
at the BUNDLE_END/rename path (the code around stagingFile and watcher), and
only call renameSync when hashes differ; alternatively, update inspectBundle()
and moduleCacheBustUrl() to use a content-hash-based cache key (read file bytes
and derive the cache key) so identical-byte rebuilds map to the same inspection
record.

Comment on lines +1074 to +1121
// win32 (Codex P1, round 81): Node's `subprocess.kill("SIGTERM")`
// on Windows terminates the child "forcefully and abruptly"
// (per Node's child_process docs), so the runner's
// `installShutdownHandlers` never runs and the child can't
// issue its own graceful `cancel()` POST. Snapshot each active
// child's parsed jobId + spawn scope BEFORE dispatch (after
// dispatch the kill may already be reaping entries via close
// handlers) so the server can fire the cloud cancel on the
// child's behalf for every restart target.
const win32CancelSnapshots =
process.platform === "win32"
? activeTrains.list().map((entry) => ({
pid: entry.child.pid,
jobId: entry.jobId,
scope: entry.scope,
}))
: [];
const { hotSwapTargets, restartTargets } = activeTrains.dispatchRebuild(
nextHash,
nextArtifactContentHash,
);
if (process.platform === "win32" && restartTargets.length > 0) {
for (const target of restartTargets) {
const snap = win32CancelSnapshots.find(
(s) => s.pid === target.pid,
);
if (!snap?.jobId || !snap.scope) continue;
const { jobId: snapJobId, scope: snapScope } = snap;
// Fire-and-forget per child, mirroring the manual-Stop
// cancel path: SIGTERM-as-forceful-kill on win32 means
// nobody else will release the cloud job. Best-effort;
// cloud reaper / TTL is the safety net on failure.
void (async () => {
try {
const { baseUrl: rpcBaseUrl, token: rpcToken } =
await resolveCredentialsAndBaseUrl();
const rpc = createRpc(rpcBaseUrl, rpcToken);
await rpc.v1.jobs[":id"].cancel.$post({
param: { id: snapJobId },
query: {
orgSlug: snapScope.orgSlug,
projectSlug: snapScope.projectSlug,
},
});
} catch {
// best-effort
}
})();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Windows restart cancel dropped the robust spawn-time cancel context.

This branch snapshots only pid/jobId/scope and then resolves credentials again at rebuild time. That reopens the same races the manual Stop path above already accounted for: jobId can still be null when the rebuild lands, first-run anonymous runs still have scope === null, and a login or control-plane switch mid-run makes the cancel POST target the wrong account/host. On win32 the child is already being terminated, so any of those cases leaves the original cloud job orphaned. Please reuse the same spawn-time RPC snapshot plus late jobId/state fallback here, ideally by storing the cancel context on the registry entry instead of re-implementing a weaker cancel flow in this subscriber.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/studio/server.ts` around lines 1074 - 1121, The win32
restart path snapshots only pid/jobId/scope and re-resolves credentials later,
reintroducing races (null jobId, null scope, auth/host switches) — instead
attach a spawn-time RPC cancel context (baseUrl, token and any resolved RPC
client info) to each registry entry when the child is spawned (augment whatever
structure activeTrains.list()/activeTrains registry returns), then in the
restart loop (the restartTargets handling after activeTrains.dispatchRebuild)
look up that stored RPC snapshot (instead of calling
resolveCredentialsAndBaseUrl/createRpc again) and call
rpc.v1.jobs[":id"].cancel.$post with the stored snapshot; still keep the
existing late fallback to jobId/state when the spawn-time snapshot is missing,
but prefer the stored spawn-time RPC context to avoid races.

Comment on lines +463 to 489
// Codex P1 (PR #101 round 80): a NONZERO exit suppresses auto-
// restart. The runner exits 143 when an HMR-triggered early-stop
// failed to cancel the cloud job (cancel POST rejected); spawning
// a fresh run on top of that would overlap a new cloud job with
// the one whose cancel never landed (double GPU spend, no
// operator signal). Any other nonzero exit is a crash, and
// restarting a crashing run in a loop would churn cloud jobs
// with the same broken artefact. Exit 0 (graceful early-stop
// succeeded) and `null` (no marker parsed: spawn failure path
// already surfaced its own error) keep the existing restart
// behaviour.
if (exitCode !== null && exitCode !== 0) {
restartPendingRef.current = false;
currentPidRef.current = null;
setHmrStatus("idle");
if (restartGraceTimerRef.current !== null) {
clearTimeout(restartGraceTimerRef.current);
restartGraceTimerRef.current = null;
}
setLog((prev) =>
appendCapped(
prev,
`\n[error] ${err instanceof Error ? err.message : String(err)}\n`,
`\n[hmr] run exited with code ${exitCode}; auto-restart suppressed. Fix the issue and press Run to start a new job.\n`,
),
);
} finally {
if (trainingAbortRef.current === ac) trainingAbortRef.current = null;
// Always release the running flag, including the user-initiated
// abort path. setState on an already-unmounted component is a
// no-op in React 18+, so the unmount-cleanup case handles itself.
setRunning(false);
return;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat a missing exit= trailer as restart-unsafe.

streamTraining() returns null when the train stream ends without a parsed exit marker. This branch only suppresses restarts for explicit nonzero codes, so an HMR-triggered early-stop that loses its trailer still falls through to the restart path when restartPendingRef.current is set. That recreates the overlapping-job risk this guard is meant to stop. Only auto-restart after an explicit exitCode === 0; treat null as suppression too when a restart is pending.

Possible fix
-    if (exitCode !== null && exitCode !== 0) {
+    const restartUnsafe = restartPendingRef.current
+      ? exitCode !== 0
+      : exitCode !== null && exitCode !== 0;
+    if (restartUnsafe) {
       restartPendingRef.current = false;
       currentPidRef.current = null;
       setHmrStatus("idle");
       if (restartGraceTimerRef.current !== null) {
         clearTimeout(restartGraceTimerRef.current);
         restartGraceTimerRef.current = null;
       }
       setLog((prev) =>
         appendCapped(
           prev,
-          `\n[hmr] run exited with code ${exitCode}; auto-restart suppressed. Fix the issue and press Run to start a new job.\n`,
+          `\n[hmr] run exited with ${
+            exitCode === null ? "an unknown status" : `code ${exitCode}`
+          }; auto-restart suppressed. Fix the issue and press Run to start a new job.\n`,
         ),
       );
       return;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/studio-app/src/components/RunTraining.tsx` around lines 463 - 489,
The current guard only suppresses auto-restart for explicit nonzero exit codes;
change it so a missing trailer (exitCode === null) also suppresses restart when
a restart was pending. Update the condition around the block that manipulates
restartPendingRef.current, currentPidRef.current, restartGraceTimerRef.current,
setHmrStatus and setLog (the existing if that checks exitCode) to: if ((exitCode
!== null && exitCode !== 0) || (exitCode === null && restartPendingRef.current))
{ ... } so the same suppression/cleanup and log (via appendCapped) runs for null
when a restart was pending.

Comment on lines 447 to +474
while (true) {
const { value, done } = await reader.read();
if (done) break;
onChunk(decoder.decode(value, { stream: true }));
const text = decoder.decode(value, { stream: true });
trackTail(text);
onChunk(text);
}
// Flush any bytes the streaming decoder buffered for a multi-byte
// UTF-8 sequence that landed split across the final two chunks.
// Without this, the last character of the trainer's output gets
// silently dropped when it happens to be non-ASCII (Japanese log
// lines, emoji progress bars, etc.).
const tail = decoder.decode();
if (tail) onChunk(tail);
if (tail) {
trackTail(tail);
onChunk(tail);
}
} finally {
signal?.removeEventListener("abort", onAbort);
// Release the reader lock so a subsequent caller can re-acquire
// the body if needed. Mirrors `iterateSseFrames`'s finally clause.
reader.releaseLock();
}
// `null` when no marker was found: aborted streams, `error=` paths,
// truncated bodies. Callers treat null as "unknown" and apply their
// own conservative default.
const m = EXIT_MARKER_PATTERN.exec(tailBuf);
return m?.[1] === undefined ? null : Number.parseInt(m[1], 10);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't forward the synthetic exit= trailer into onChunk.

streamTraining parses \n---\nexit=<code>\n only after the read loop, but every decoded chunk has already been passed to onChunk. That means RunTraining will log the protocol footer verbatim on normal exits. Keep a small holdback tail, strip the trailer once the stream closes, and only emit the non-marker bytes.

🧰 Tools
🪛 OpenGrep (1.22.0)

[ERROR] 473-473: Dynamic command passed to child_process.exec/execSync. Use child_process.execFile or spawn with an argument array instead.

(coderabbit.command-injection.exec-js)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 41 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

Comment thread AGENTS.md
1. Passes it to `buildStudioApp({ studioToken })`. The Hono server validates every `/api/*` request via `X-Arkor-Studio-Token` header (or `?studioToken=` query for `EventSource`, which can't set headers). Comparison uses `timingSafeEqual`.
1. Passes it to `buildStudioApp({ studioToken, hmr })` (alongside the HMR coordinator from `createHmrCoordinator`, which is what wires up the SSE stream and per-rebuild signal dispatch). The Hono server validates every `/api/*` request via `X-Arkor-Studio-Token` header (or `?studioToken=` query for `EventSource`, which can't set headers). Comparison uses `timingSafeEqual`. The query-token allow-list lives in `eventStreamPathPattern` in [packages/arkor/src/studio/server.ts](packages/arkor/src/studio/server.ts), currently `/api/jobs/:id/events` and `/api/dev/events`. **Adding to that regex is CSRF-sensitive: each entry must be a GET stream-only route, never a mutation endpoint.**
2. Persists it to `~/.arkor/studio-token` (mode 0600) so the SPA dev workflow (`pnpm --filter @arkor/studio-app dev`) can read it via the `arkor-studio-token` Vite plugin in [packages/studio-app/vite.config.ts](packages/studio-app/vite.config.ts), which injects `<meta name="arkor-studio-token">` into `index.html` on each request. Persistence failure must NOT block server start (read-only `$HOME` on Docker, etc.); just warn.
3. Cleans up on `exit`/SIGINT/SIGTERM/SIGHUP via `unlinkSync`.
k-taro56 added 2 commits July 3, 2026 01:08
rolldown 1.0.2 is the suspected culprit behind the intermittent macOS
e2e signal-death exits in arkor-build.test.ts (native teardown crash,
see rolldown/rolldown#9722); 1.1.1 is the first release past the fix
window. pnpm moves to the current 11.x patch line.
… cleanup

- moduleCacheBust: key dynamic-import cache-busting on a truncated
  sha256 of the artefact bytes instead of mtime+ctime+size (kernel
  timestamp-tick collisions could serve a stale module to the HMR
  inspect / manifest / SIGUSR2 reload paths); read failures now get a
  unique miss-<n> token so a shared sentinel can't be cache-poisoned
- hmr: capture hash/contentHash synchronously before the inspection
  await so a faster next build can't tear an event (configHash from
  build N paired with fingerprints from build N+1)
- server: force null configHash/contentHash baselines for custom-entry
  /api/train spawns so a later default-entry rebuild can't hot-swap the
  wrong trainer's callbacks; document the nonce env-block caveat
- trainerInspection: accept `export const arkor = createTrainer(...)`
  (runner executes it with top precedence, so inspection missing it
  could hot-swap callbacks into the wrong trainer instance)
- manifest: drop the runBuild() fallback that raced the watcher's
  atomic staging+rename publish; import failures now surface as 400
- RunTraining: clear the restart latch at run() entry so a disowned
  grace timer can't auto-spawn an unrequested run hours later
- cleanupHooks: 5s watchdog hint on hung async cleanups; document the
  exit-owner contract
- configHash: correct the toJSON-returns-this comment/test (deliberate
  fail-safe divergence from JSON.stringify, which yields "{}")
- tests: select HMR rebuilds by contentHash (macOS spurious BUNDLE_END),
  poll for the SIGUSR2 skip warning instead of a fixed 50ms wait
  (Windows CI), and assert clean exits with code+signal+stderr context
  in the arkor-build e2e

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/arkor/src/studio/server.ts (1)

260-262: 🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

Update the CSRF comment to include /api/dev/events.

The allow-list now accepts query tokens for both job events and /api/dev/events, but this comment still says only the job-event stream route. Please keep the security comment aligned with the actual SSE-only allow-list.

As per coding guidelines, preserve "token via header for fetch, query param for EventSource, host-header guard, no CORS, timing-safe compare."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/studio/server.ts` around lines 260 - 262, The CSRF
security comment in the server allow-list is outdated because it only mentions
the job-event stream route, but the SSE query-token exception also applies to
/api/dev/events. Update the comment near the allow-list logic in server.ts so it
accurately reflects both SSE-only routes while preserving the existing guidance
about token via header for fetch, query param for EventSource, host-header
guard, no CORS, and timing-safe compare.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/arkor/src/studio/manifest.ts`:
- Around line 113-131: The fallback in manifest summarisation is incorrectly
rebuilding the watcher-owned artifact when prebuiltOutFile is missing, which can
race HMR/polling and break the atomic publish flow. Update
summariseBuiltManifest in manifest.ts so that, when opts.prebuiltOutFile is
provided, it only uses that artifact path and does not call runBuild() in the
missing-file case; instead, let the missing/invalid prebuilt bundle surface as
an error through the existing import/error handling path.

In `@packages/arkor/src/studio/server.test.ts`:
- Around line 663-678: The custom-entry guard test is only tracking
getCurrentConfigHash() and getCurrentArtifactContentHash(), so it can still pass
even if getCurrentArtifactHash() is consulted; update the fakeHmr mock and
assertions in server.test.ts to count getCurrentArtifactHash() calls as well,
and verify that this getter is skipped when the custom-entry path is expected to
avoid coordinator hash capture. Use the existing fakeHmr methods and the
custom-entry guard assertions to locate the affected test cases.

---

Outside diff comments:
In `@packages/arkor/src/studio/server.ts`:
- Around line 260-262: The CSRF security comment in the server allow-list is
outdated because it only mentions the job-event stream route, but the SSE
query-token exception also applies to /api/dev/events. Update the comment near
the allow-list logic in server.ts so it accurately reflects both SSE-only routes
while preserving the existing guidance about token via header for fetch, query
param for EventSource, host-header guard, no CORS, and timing-safe compare.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 645d7b83-d37d-4ba0-9a50-37712d0211f4

📥 Commits

Reviewing files that changed from the base of the PR and between c8c110d and 13941c7.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (22)
  • e2e/cli/src/arkor-build.test.ts
  • package.json
  • packages/arkor/package.json
  • packages/arkor/src/cli/cleanupHooks.test.ts
  • packages/arkor/src/cli/cleanupHooks.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/arkor/src/core/configHash.test.ts
  • packages/arkor/src/core/configHash.ts
  • packages/arkor/src/core/moduleCacheBust.test.ts
  • packages/arkor/src/core/moduleCacheBust.ts
  • packages/arkor/src/core/runner.ts
  • packages/arkor/src/core/runnerSignals.test.ts
  • packages/arkor/src/core/runnerSignals.ts
  • packages/arkor/src/core/trainerInspection.test.ts
  • packages/arkor/src/core/trainerInspection.ts
  • packages/arkor/src/studio/hmr.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/studio/manifest.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/studio/server.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/studio-app/src/components/RunTraining.tsx
📜 Review details
⏰ Context from checks skipped due to timeout. (73)
  • GitHub Check: Socket Security: Pull Request Alerts
  • GitHub Check: install · yarn · macos-latest · node >=24.12.0 <25
  • GitHub Check: install · yarn · macos-latest · node 26.0.0
  • GitHub Check: install · pnpm-10 · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-9 · macos-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-11 · macos-latest · node 26.0.0
  • GitHub Check: install · yarn-berry · macos-latest · node 26.0.0
  • GitHub Check: install · npm · macos-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-10 · macos-latest · node 26.0.0
  • GitHub Check: install · bun · macos-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-10 · windows-latest · node 26.0.0
  • GitHub Check: install · pnpm-11 · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · pnpm-10 · macos-latest · node 22.22.0
  • GitHub Check: install · bun · windows-latest · node 26.0.0
  • GitHub Check: install · pnpm-9 · windows-latest · node 26.0.0
  • GitHub Check: install · bun · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · npm · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · yarn · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · yarn-berry · windows-latest · node 24.0.0
  • GitHub Check: install · pnpm-9 · windows-latest · node >=26.0.0 <27
  • GitHub Check: install · pnpm-9 · windows-latest · node 24.12.0
  • GitHub Check: install · npm · windows-latest · node >=24.12.0 <25
  • GitHub Check: install · yarn · windows-latest · node 24.12.0
  • GitHub Check: install · npm · windows-latest · node 24.0.0
  • GitHub Check: install · npm · windows-latest · node 24.12.0
  • GitHub Check: install · pnpm-10 · windows-latest · node 24.0.0
  • GitHub Check: install · yarn · windows-latest · node 24.0.0
  • GitHub Check: install · pnpm-9 · windows-latest · node >=22.22.0 <23
  • GitHub Check: install · yarn · windows-latest · node >=22.22.0 <23
  • GitHub Check: install · pnpm-9 · windows-latest · node 24.0.0
  • GitHub Check: install · npm · windows-latest · node 22.22.0
  • GitHub Check: install · pnpm-11 · windows-latest · node 22.22.0
  • GitHub Check: install · pnpm-10 · windows-latest · node >=22.22.0 <23
  • GitHub Check: install · yarn · windows-latest · node 22.22.0
  • GitHub Check: install · npm · windows-latest · node >=22.22.0 <23
  • GitHub Check: install · pnpm-10 · windows-latest · node 22.22.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node 24.12.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=22.22.0 <23
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=26.0.0 <27
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node 24.12.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=24.8.0 <24.10.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node 26.0.0
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=24.10.0 <24.12.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=24.12.0 <25
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=24.1.0 <24.3.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=24.10.0 <24.12.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=24.8.0 <24.10.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node 26.0.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=24.0.0 <24.1.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=24.10.0 <24.12.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node 22.22.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=24.5.0 <24.8.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node 22.22.0
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node 26.0.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node 24.12.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=24.5.0 <24.8.0
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=26.0.0 <27
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=24.3.0 <24.5.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=22.22.0 <23
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=24.12.0 <25
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=26.0.0 <27
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=24.12.0 <25
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=24.0.0 <24.1.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=24.0.0 <24.1.0
  • GitHub Check: typecheck · lint · test · build · windows-latest · node >=24.1.0 <24.3.0
  • GitHub Check: typecheck · lint · test · build · macos-latest · node >=24.3.0 <24.5.0
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=24.8.0 <24.10.0
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=24.3.0 <24.5.0
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=24.1.0 <24.3.0
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=22.22.0 <23
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node 22.22.0
  • GitHub Check: typecheck · lint · test · build · ubuntu-latest · node >=24.5.0 <24.8.0
  • GitHub Check: coverage · upload to Codecov
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{yml,yaml,json,html,md}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Do not use the em dash (U+2014) in YAML, Markdown, JSON, or HTML files; the repo-wide scripts/check-no-em-dash.mts guard checks tracked files for this character and its HTML entity.

Files:

  • packages/arkor/package.json
  • package.json
**/*.{json,html,md,yaml,yml}

📄 CodeRabbit inference engine (CONTRIBUTING.ja.md)

Avoid using the em dash character (U+2014) in Markdown, JSON, HTML, and YAML files as well; the repository-wide check also blocks the HTML entity form.

Files:

  • packages/arkor/package.json
  • package.json
packages/*/package.json

📄 CodeRabbit inference engine (AGENTS.md)

Use catalog: for direct registry dependencies in workspace package manifests, keep workspace-internal packages on workspace:*, and preserve the literal semver carve-out for the runtime dependencies of the published arkor and create-arkor packages.

Files:

  • packages/arkor/package.json
**

⚙️ CodeRabbit configuration file

**: # Arkor Development Guide

Note: Claude Code automatically loads this file.

Repository shape

pnpm + Turbo monorepo. Workspaces are declared in pnpm-workspace.yaml (packages/*, e2e/*, docs).

Path Role
packages/arkor Published arkor SDK + CLI + bundled local Studio server (Hono). bin/arkordist/bin.mjs. Library entry → dist/index.mjs.
packages/create-arkor Published create-arkor scaffolder (pnpm create arkor).
packages/cli-internal Private workspace package. Source is bundled into arkor and create-arkor via tsdown's deps.alwaysBundle. Never appears as a runtime dependency on npm.
packages/studio-app Private Vite + React 19 SPA. pnpm --filter @arkor/studio-app bundle builds it; packages/arkor/scripts/copy-studio-assets.mjs copies dist/ into packages/arkor/dist/assets/.
e2e/cli Private vitest suite that spawns the built dist/bin.mjs of both CLIs in temp dirs.
e2e/studio Private Playwright suite that spawns arkor dev against an in-process fake cloud-api and drives the Studio SPA in Chromium.
docs Mintlify sources for docs.arkor.ai.

Common commands

Root scripts fan out via Turbo (which respects ^build deps in turbo.json):

pnpm install
pnpm build          # turbo run build across all packages
pnpm typecheck      # tsc --noEmit across all packages
pnpm lint           # oxlint --deny-warnings, then strict ESLint 10 (single root config each)
pnpm format         # oxfmt --write across the repo (config in oxfmt.config.ts)
pnpm format:check   # oxfmt --check; CI gates on this (no writes)
pnpm test           # vitest run across all packages (incl. e2e)
pnpm test:coverage  # writes lcov + cobertura + junit per package; CI uploads lcov to Codecov, cobertura to GitHub nativ...

Files:

  • packages/arkor/package.json
  • package.json
  • packages/arkor/src/core/trainerInspection.test.ts
  • packages/arkor/src/core/runner.ts
  • packages/arkor/src/core/moduleCacheBust.test.ts
  • e2e/cli/src/arkor-build.test.ts
  • packages/arkor/src/cli/cleanupHooks.test.ts
  • packages/arkor/src/core/configHash.ts
  • packages/arkor/src/core/runnerSignals.ts
  • packages/arkor/src/studio/manifest.ts
  • packages/arkor/src/core/trainerInspection.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/arkor/src/cli/cleanupHooks.ts
  • packages/arkor/src/core/moduleCacheBust.ts
  • packages/arkor/src/core/runnerSignals.test.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/studio/hmr.test.ts
  • packages/arkor/src/core/configHash.test.ts
  • packages/arkor/src/studio/server.ts
packages/**/*.{js,jsx,ts,tsx,mjs,cjs,mts,cts}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Avoid the em dash (U+2014) in code across all packages, including comments, string literals, template literals, CLI messages, generated templates, and test names; use other punctuation or rewording instead.

Files:

  • packages/arkor/src/core/trainerInspection.test.ts
  • packages/arkor/src/core/runner.ts
  • packages/arkor/src/core/moduleCacheBust.test.ts
  • packages/arkor/src/cli/cleanupHooks.test.ts
  • packages/arkor/src/core/configHash.ts
  • packages/arkor/src/core/runnerSignals.ts
  • packages/arkor/src/studio/manifest.ts
  • packages/arkor/src/core/trainerInspection.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/arkor/src/cli/cleanupHooks.ts
  • packages/arkor/src/core/moduleCacheBust.ts
  • packages/arkor/src/core/runnerSignals.test.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/studio/hmr.test.ts
  • packages/arkor/src/core/configHash.test.ts
  • packages/arkor/src/studio/server.ts
**/*.{js,jsx,ts,tsx,mjs,cjs}

📄 CodeRabbit inference engine (CONTRIBUTING.ja.md)

Avoid using the em dash character (U+2014) in comments, string literals, and template literals in all JavaScript/TypeScript packages; use punctuation, spaced hyphens, or rephrase instead.

Files:

  • packages/arkor/src/core/trainerInspection.test.ts
  • packages/arkor/src/core/runner.ts
  • packages/arkor/src/core/moduleCacheBust.test.ts
  • e2e/cli/src/arkor-build.test.ts
  • packages/arkor/src/cli/cleanupHooks.test.ts
  • packages/arkor/src/core/configHash.ts
  • packages/arkor/src/core/runnerSignals.ts
  • packages/arkor/src/studio/manifest.ts
  • packages/arkor/src/core/trainerInspection.ts
  • packages/studio-app/src/components/RunTraining.tsx
  • packages/arkor/src/cli/cleanupHooks.ts
  • packages/arkor/src/core/moduleCacheBust.ts
  • packages/arkor/src/core/runnerSignals.test.ts
  • packages/arkor/src/cli/commands/dev.ts
  • packages/studio-app/src/components/RunTraining.hmr.test.tsx
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/studio/hmr.ts
  • packages/arkor/src/studio/hmr.test.ts
  • packages/arkor/src/core/configHash.test.ts
  • packages/arkor/src/studio/server.ts
packages/*/src/**/*.test.ts

📄 CodeRabbit inference engine (AGENTS.md)

Add Vitest tests alongside SDK/CLI/scaffold logic changes, and cover new behavior in packages/*/src/**/*.test.ts when the change is code-path specific.

Files:

  • packages/arkor/src/core/trainerInspection.test.ts
  • packages/arkor/src/core/moduleCacheBust.test.ts
  • packages/arkor/src/cli/cleanupHooks.test.ts
  • packages/arkor/src/core/runnerSignals.test.ts
  • packages/arkor/src/studio/server.test.ts
  • packages/arkor/src/studio/hmr.test.ts
  • packages/arkor/src/core/configHash.test.ts
e2e/cli/**/*.test.ts

📄 CodeRabbit inference engine (AGENTS.md)

For CLI flow changes, add or update an e2e/cli scenario that exercises the built dist/bin.mjs in a realistic temp directory.

Files:

  • e2e/cli/src/arkor-build.test.ts
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: arkorlab/arkor

Timestamp: 2026-07-03T08:11:23.573Z
Learning: Use Node 24 (latest preferred) for development, while preserving the published `engines.node >=22.22.0` constraint.
🪛 ast-grep (0.44.0)
packages/arkor/src/studio/server.ts

[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { spawn, type ChildProcessByStdio } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { spawn, type ChildProcessByStdio } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { spawn, type ChildProcessByStdio } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)


[warning] Importing child_process exposes a command-execution surface; ensure any command/argument built from input is validated, and prefer execFile/spawn with an argument array over exec.
Context: import { spawn, type ChildProcessByStdio } from "node:child_process";
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(detect-child-process-typescript)

🔇 Additional comments (25)
packages/arkor/src/studio/hmr.ts (2)

18-122: LGTM!

Also applies to: 196-217, 558-653


336-365: 🩺 Stability & Availability

inspectBundle() is already content-addressed. It imports moduleCacheBustUrl(resolved.outFile), so the cache key follows the file bytes, not the mutable path. Reusing the same module record on a later revert is expected here.

			> Likely an incorrect or invalid review comment.
packages/arkor/src/studio/hmr.test.ts (1)

108-119: LGTM!

Also applies to: 182-186

packages/arkor/src/studio/manifest.ts (1)

42-73: LGTM!

Also applies to: 95-95

packages/arkor/src/studio/server.test.ts (1)

2288-2346: LGTM!

packages/arkor/src/cli/commands/dev.ts (1)

274-291: Previously flagged: restore the foreign token atomically.

This still uses existsSync(path) before renameSync(reapPath, path), leaving the same replace-on-exist race from the previous review.

packages/studio-app/src/components/RunTraining.tsx (2)

474-500: Previously flagged: treat a missing exit trailer as restart-unsafe.

The restart suppression still only handles explicit nonzero exits, so a pending HMR restart can proceed after streamTraining() returns null.


350-360: LGTM!

Also applies to: 502-510

e2e/cli/src/arkor-build.test.ts (1)

7-7: LGTM!

Also applies to: 22-42, 61-61, 76-76, 95-98

packages/studio-app/src/components/RunTraining.hmr.test.tsx (1)

339-372: LGTM!

package.json (1)

37-37: LGTM!

packages/arkor/src/studio/server.ts (2)

1108-1144: Still unresolved: Win32 restart cancel uses live credentials instead of the spawn-time cancel context.

This branch still snapshots only pid, jobId, and scope, then calls resolveCredentialsAndBaseUrl() during restart cancellation. That is the same race previously flagged for auth/control-plane switches and null first-run state.


56-133: LGTM!

Also applies to: 211-259, 263-545, 593-1035, 1037-1107, 1145-1198, 1714-1743

packages/arkor/src/core/configHash.ts (1)

32-135: LGTM!

packages/arkor/src/core/trainerInspection.ts (1)

226-280: LGTM!

Also applies to: 315-321

packages/arkor/src/core/trainerInspection.test.ts (1)

44-112: LGTM!

packages/arkor/src/core/runner.ts (1)

18-38: LGTM!

Also applies to: 91-107

packages/arkor/src/core/runnerSignals.test.ts (1)

394-406: LGTM!

packages/arkor/src/core/runnerSignals.ts (1)

149-157: 🩺 Stability & Availability

No change needed here. moduleCacheBustUrl(entryPath) already catches read failures and falls back to a unique miss-* token, so it won't escape the SIGUSR2 handler on missing or unreadable artifacts.

			> Likely an incorrect or invalid review comment.
packages/arkor/package.json (1)

58-61: Correctly placed as a literal-semver runtime dependency.

Rolldown backs the persistent Studio watcher used by arkor dev at runtime (not just arkor build), so it belongs in dependencies with literal semver rather than catalog:, per the carve-out for the published arkor/create-arkor packages.

Verified ^1.1.1 is the current published version: Latest version: 1.1.1, last published: 7 days ago.

Source: Path instructions

packages/arkor/src/core/configHash.test.ts (1)

228-243: LGTM!

packages/arkor/src/core/moduleCacheBust.test.ts (1)

2-2: LGTM!

Also applies to: 24-33, 35-43, 44-62, 64-75, 77-91, 95-103

packages/arkor/src/core/moduleCacheBust.ts (1)

1-65: LGTM!

Also applies to: 68-75

packages/arkor/src/cli/cleanupHooks.ts (1)

76-88: LGTM!

Also applies to: 103-216

packages/arkor/src/cli/cleanupHooks.test.ts (1)

259-303: LGTM!

Comment on lines +113 to +131
if (opts.prebuiltOutFile && existsSync(opts.prebuiltOutFile)) {
// No `runBuild()` fallback on import failure here, deliberately.
// An earlier revision fell through to a fresh `runBuild()` to
// recover from the watcher's then-non-atomic artefact writes
// (a poll could `import()` partial bytes). The watcher now
// publishes via staging-file + `renameSync` (atomic), so an
// import failure of an EXISTING artefact means the bundle is
// genuinely broken (throws at import time): rebuilding the same
// source can't fix that, and the fallback build would race the
// watcher by writing the same watcher-owned
// `.arkor/build/index.mjs` outside the atomic-publish protocol,
// exposing concurrent `/api/train` imports to torn reads.
// Rethrowing lets `/api/manifest` surface the import error as a
// 400 (the HMR coordinator broadcasts its own matching `error`
// frame for the same broken bundle, so both channels agree).
return summariseBuiltManifest(opts.prebuiltOutFile);
}
const { outFile } = await runBuild({ cwd, quiet: true });
const url = `${pathToFileURL(outFile).href}?t=${Date.now()}`;
const mod = (await import(url)) as Record<string, unknown>;
const candidate = mod.arkor ?? mod.default;
if (!isArkor(candidate)) return EMPTY;
const trainer = candidate.trainer ? { name: candidate.trainer.name } : null;
return { trainer };
return summariseBuiltManifest(outFile);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Do not fall back to runBuild() on the watcher-owned artifact in HMR mode.

When prebuiltOutFile is supplied but missing, this falls through to runBuild(), which writes .arkor/build/index.mjs outside the staging and rename protocol. During Studio polling, that can race a /api/train spawn that skips rebuild once the artifact appears, reintroducing torn imports.

Proposed fix
-  if (opts.prebuiltOutFile && existsSync(opts.prebuiltOutFile)) {
+  if (opts.prebuiltOutFile) {
+    if (!existsSync(opts.prebuiltOutFile)) return EMPTY;
     // No `runBuild()` fallback on import failure here, deliberately.
@@
     return summariseBuiltManifest(opts.prebuiltOutFile);
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (opts.prebuiltOutFile && existsSync(opts.prebuiltOutFile)) {
// No `runBuild()` fallback on import failure here, deliberately.
// An earlier revision fell through to a fresh `runBuild()` to
// recover from the watcher's then-non-atomic artefact writes
// (a poll could `import()` partial bytes). The watcher now
// publishes via staging-file + `renameSync` (atomic), so an
// import failure of an EXISTING artefact means the bundle is
// genuinely broken (throws at import time): rebuilding the same
// source can't fix that, and the fallback build would race the
// watcher by writing the same watcher-owned
// `.arkor/build/index.mjs` outside the atomic-publish protocol,
// exposing concurrent `/api/train` imports to torn reads.
// Rethrowing lets `/api/manifest` surface the import error as a
// 400 (the HMR coordinator broadcasts its own matching `error`
// frame for the same broken bundle, so both channels agree).
return summariseBuiltManifest(opts.prebuiltOutFile);
}
const { outFile } = await runBuild({ cwd, quiet: true });
const url = `${pathToFileURL(outFile).href}?t=${Date.now()}`;
const mod = (await import(url)) as Record<string, unknown>;
const candidate = mod.arkor ?? mod.default;
if (!isArkor(candidate)) return EMPTY;
const trainer = candidate.trainer ? { name: candidate.trainer.name } : null;
return { trainer };
return summariseBuiltManifest(outFile);
if (opts.prebuiltOutFile) {
if (!existsSync(opts.prebuiltOutFile)) return EMPTY;
// No `runBuild()` fallback on import failure here, deliberately.
// An earlier revision fell through to a fresh `runBuild()` to
// recover from the watcher's then-non-atomic artefact writes
// (a poll could `import()` partial bytes). The watcher now
// publishes via staging-file + `renameSync` (atomic), so an
// import failure of an EXISTING artefact means the bundle is
// genuinely broken (throws at import time): rebuilding the same
// source can't fix that, and the fallback build would race the
// watcher by writing the same watcher-owned
// `.arkor/build/index.mjs` outside the atomic-publish protocol,
// exposing concurrent `/api/train` imports to torn reads.
// Rethrowing lets `/api/manifest` surface the import error as a
// 400 (the HMR coordinator broadcasts its own matching `error`
// frame for the same broken bundle, so both channels agree).
return summariseBuiltManifest(opts.prebuiltOutFile);
}
const { outFile } = await runBuild({ cwd, quiet: true });
return summariseBuiltManifest(outFile);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/studio/manifest.ts` around lines 113 - 131, The fallback
in manifest summarisation is incorrectly rebuilding the watcher-owned artifact
when prebuiltOutFile is missing, which can race HMR/polling and break the atomic
publish flow. Update summariseBuiltManifest in manifest.ts so that, when
opts.prebuiltOutFile is provided, it only uses that artifact path and does not
call runBuild() in the missing-file case; instead, let the missing/invalid
prebuilt bundle surface as an error through the existing import/error handling
path.

Comment on lines +663 to +678
let configHashCalls = 0;
let contentHashCalls = 0;
const fakeHmr = {
subscribe: () => () => undefined,
getCurrentConfigHash: () => {
configHashCalls += 1;
return "default-entry-hash";
},
getCurrentArtifactHash: () => "default-artefact-hash",
getCurrentArtifactContentHash: () => {
contentHashCalls += 1;
return "default-artefact-content-hash";
},
getLastEventType: () => null,
async dispose() {},
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Count getCurrentArtifactHash() in the custom-entry guard.

The test says coordinator hash capture is skipped entirely, but this getter can still be consulted and the test would pass.

Proposed fix
       let configHashCalls = 0;
+      let artifactHashCalls = 0;
       let contentHashCalls = 0;
@@
-        getCurrentArtifactHash: () => "default-artefact-hash",
+        getCurrentArtifactHash: () => {
+          artifactHashCalls += 1;
+          return "default-artefact-hash";
+        },
@@
       expect(configHashCalls).toBe(0);
+      expect(artifactHashCalls).toBe(0);
       expect(contentHashCalls).toBe(0);

Also applies to: 704-705

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/arkor/src/studio/server.test.ts` around lines 663 - 678, The
custom-entry guard test is only tracking getCurrentConfigHash() and
getCurrentArtifactContentHash(), so it can still pass even if
getCurrentArtifactHash() is consulted; update the fakeHmr mock and assertions in
server.test.ts to count getCurrentArtifactHash() calls as well, and verify that
this getter is skipped when the custom-entry path is expected to avoid
coordinator hash capture. Use the existing fakeHmr methods and the custom-entry
guard assertions to locate the affected test cases.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13941c7b91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +88 to +89
const code = earlyStopFailed ? SIGNAL_EXIT_CODE[signal] : 0;
process.exit(code);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve SIGINT as a nonzero interrupted exit

When arkor start is run directly and the user presses Ctrl-C, a successful early-stop/cancel takes this : 0 branch for SIGINT, so the process exits successfully instead of with the usual interrupted status. Shell or CI wrappers such as arkor start || cleanup_on_failure will treat a user-aborted run as success; keep the zero-success behavior scoped to the Studio SIGTERM restart path or otherwise preserve 128 + signal for user interrupts.

Useful? React with 👍 / 👎.

Comment on lines +519 to +520
const configHash: string | null =
options.hmr && !trainFile ? options.hmr.getCurrentConfigHash() : null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pair the spawn hash with the artifact being loaded

When a Run click lands after the watcher has renamed a new .arkor/build/index.mjs into place but before inspectBundle() updates lastSuccessConfigHash, this records the previous build's config hash while the child imports the new artifact. The next HMR event then sees an artificial old-vs-new mismatch and SIGTERMs/restarts a run that was already spawned with the new config, causing avoidable cancel/restart churn in the save-then-immediately-run path.

Useful? React with 👍 / 👎.

cwd: resolved.cwd,
platform: "node",
logLevel: "warn",
transform: { target: resolveNodeTarget() },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Target the Node engine floor when bundling

When arkor build runs on a newer dev/CI Node but the resulting .arkor/build/index.mjs is later executed on the supported >=22.22.0 floor, deriving Rolldown's transform target from the build host can preserve syntax that Node 22 does not parse. The previous build path targeted node22.22, and the CLI docs describe build/start for CI, servers, and scripts, so the artifact should remain runnable on the package's declared engine floor rather than only on the build machine's current runtime.

Useful? React with 👍 / 👎.

// (via `moduleCacheBustUrl`) collapses no-op signals onto the
// same URL; the leak is bounded to "one per real edit", which
// is fundamentally what HMR has to retain.
const url = moduleCacheBustUrl(entryPath);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Load the artifact version classified for hot-swap

If a callback-only rebuild is classified and SIGUSR2 is sent, but the child is busy and another save renames a newer .arkor/build/index.mjs before this handler runs, moduleCacheBustUrl(entryPath) imports the newer artifact rather than the one whose configHash was matched. A config-changing follow-up build can therefore have its callbacks installed into the old cloud job until the later SIGTERM arrives, violating the hash gate's guarantee that hot-swaps only use bundles with the spawned config.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants