diff --git a/.agents/diary/cli-io-timeouts.md b/.agents/diary/cli-io-timeouts.md new file mode 100644 index 0000000..e301288 --- /dev/null +++ b/.agents/diary/cli-io-timeouts.md @@ -0,0 +1,5 @@ +[Out-of-scope decision] Excluded a global public `--timeout` flag from the first slice because request, inactivity, and total wait deadlines mean different things across commands. The spec keeps public configuration narrow until usage shows which budgets users need to tune. + +[Out-of-scope decision] Excluded automatic retry policy changes because bounded waiting and retry behavior should be designed separately. Mixing them in this spec would obscure the core product contract: stalled I/O must fail clearly and predictably. + +[Out-of-scope decision] Excluded broad local filesystem and credentials-store timeout policy from the first implementation plan. Those boundaries remain cancellation-aware, but the timeout work should focus on Prisma-controlled API, SDK, callback, polling, and stream boundaries to avoid false negatives and unnecessary complexity. diff --git a/.agents/projects/cli-io-timeouts.plan.md b/.agents/projects/cli-io-timeouts.plan.md new file mode 100644 index 0000000..bbec9b9 --- /dev/null +++ b/.agents/projects/cli-io-timeouts.plan.md @@ -0,0 +1,460 @@ +# CLI I/O Timeouts Plan + +## Assumptions + +**A1** The first implementation should use generous fixed defaults in code rather than public flags or user-facing environment variables. This satisfies FR9, FR13, and A7 from the spec without adding configuration surface. + +**A2** Timeout protection should target Prisma-controlled boundaries first: Management API calls, Compute SDK calls, OAuth callback/token exchange boundaries, GitHub installation polling, domain polling API calls, and remote log-stream inactivity. Local filesystem and credentials-store calls remain cancellation-aware but do not receive new timeout policy unless implementation reveals a clear CLI-controlled stall. + +**A3** Dedicated timeout behavior tests are not required for the first slice. Existing tests should be updated only when signatures or error contracts change, and the project should still build and pass the existing test suite. + +**A4** Node.js `>=22.12.0` is available, so the implementation can rely on modern AbortSignal primitives when they keep the code smaller. If a primitive produces ambiguous abort reasons, prefer a tiny local helper that preserves timeout-vs-cancel meaning. + +**A5** Existing SDK operation deadlines such as deploy/promote/update/remove `timeoutSeconds: 120` are remote operation deadlines, not the generic stalled-I/O timeout contract. Keep them unless they directly conflict with OPERATION_TIMEOUT conversion. + +## Open Questions + +None. + +## Phases + +### Phase 1: Shared Timeout Contract + +**Status**: ☐ Not started + +**Goal**: Add one small, reusable timeout/error boundary that preserves the distinction between user cancellation and timeout-caused aborts. + +**Requirements**: FR5, FR6, FR7, FR8, FR10, FR11, FR15, NFR2, NFR4, NFR5, NFR6, NFR7, NFR8 + +**Changes**: + +- Add an `OPERATION_TIMEOUT` error constructor to `packages/cli/src/shell/errors.ts` with stable metadata for operation label, duration, timeout kind, and optional command/domain context. +- Teach `packages/cli/src/shell/command-runner.ts` to translate timeout-specific errors into the new `CliError` while continuing to translate root-signal cancellation into `COMMAND_CANCELED`. +- Add a small timeout utility under `packages/cli/src/shell` or `packages/cli/src/lib` that can derive a child signal from the root command signal without aborting the root signal and without losing the timeout reason. +- Keep the helper API narrow: operation label, duration, timeout kind, parent signal, and optional domain are enough for this slice. +- Avoid adding global command timeouts, public flags, or test-only configuration hooks. + +**Acceptance Criteria**: + +- Timeout-caused aborts render as `OPERATION_TIMEOUT` with exit code `1` in both normal and streaming command runners. +- User cancellation still renders as `COMMAND_CANCELED` with exit code `130`. +- JSON error envelopes include non-sensitive timeout metadata. +- `pnpm --filter @prisma/cli build` passes. + +### Phase 2: Prisma API And SDK Request Boundaries + +**Status**: ☐ Not started + +**Goal**: Apply generous stalled-I/O deadlines to Prisma-controlled request and SDK call sites without changing command-level behavior. + +**Requirements**: FR1, FR2, FR5, FR6, FR8, FR9, FR10, FR11, FR15, FR16, NFR1, NFR2, NFR3, NFR5, NFR7, NFR8 + +**Changes**: + +- Route Management API calls in `packages/cli/src/lib/app/preview-provider.ts` through the shared timeout boundary, including domain, branch, compute-service, and project/service discovery calls. +- Route Compute SDK calls in `packages/cli/src/lib/app/preview-provider.ts` through the shared timeout boundary where the SDK accepts a signal, while preserving existing SDK operation deadlines for deploy/promote/update/remove. +- Route direct Management API calls in `packages/cli/src/controllers/project.ts` and `packages/cli/src/controllers/app-env.ts` through the same boundary for project, SCM installation, source repository, environment variable, and branch lookup/mutation calls. +- Route auth-related Management API calls in `packages/cli/src/lib/auth/auth-ops.ts` and `packages/cli/src/lib/auth/login.ts` through the same boundary for `/v1/me`, workspace lookup, login URL/token exchange-adjacent calls where the SDK accepts cancellation, and callback success-page workspace lookup. +- Preserve command-specific catches that convert platform states into domain-specific errors. Timeout errors should pass through unchanged or be converted only at the command boundary. +- Do not add timeouts to local package discovery, framework build execution, credentials-store calls, or arbitrary filesystem reads in this phase. + +**Acceptance Criteria**: + +- Every Prisma-controlled API/SDK call identified in the touched files either uses the shared timeout boundary or has a documented reason in code for not using it. +- Existing domain-specific errors such as `PROJECT_NOT_FOUND`, `DOMAIN_VERIFICATION_TIMEOUT`, and deploy failures remain reachable and are not masked by generic timeout conversion. +- Commands can still run longer than one request timeout while making progress across multiple bounded calls. +- `pnpm --filter @prisma/cli build` passes. +- `pnpm --filter @prisma/cli test` passes, unless existing unrelated tests are already failing; any unrelated failure must be documented before continuing. + +### Phase 3: Long-Lived Workflow Inactivity Boundaries + +**Status**: ☐ Not started + +**Goal**: Protect long-lived Prisma-controlled waits from silent stalls without imposing total command deadlines. + +**Requirements**: FR2, FR3, FR4, FR5, FR6, FR8, FR9, FR10, FR12, FR15, FR16, NFR1, NFR2, NFR3, NFR5, NFR6, NFR8 + +**Changes**: + +- Keep `app domain wait --timeout` as the explicit total wait budget in `packages/cli/src/controllers/app.ts`, and apply request-scoped deadlines only to each status refresh inside the polling loop. +- Keep GitHub installation/repository approval polling in `packages/cli/src/controllers/project.ts` under its existing total wait semantics, and apply request-scoped deadlines to each SCM/source repository refresh. +- Add remote log-stream inactivity protection around `streamDeploymentLogs` in `packages/cli/src/lib/app/preview-provider.ts` or the `app logs` controller path, without treating a long but active stream as timed out. +- Leave local `app run` and local build processes without total deadlines. If implementation shows a bounded startup wait controlled by the CLI, use an inactivity deadline only around that startup boundary. +- Ensure timeout metadata identifies whether the failure came from a request boundary or inactivity boundary. + +**Acceptance Criteria**: + +- `app domain wait --timeout 0` keeps poll-once behavior and does not get reinterpreted as a network timeout setting. +- Long-lived workflows remain allowed to run indefinitely when they keep making observable progress or are designed to stream. +- Silent remote log-stream stalls fail with `OPERATION_TIMEOUT`, while user interruption remains `COMMAND_CANCELED`. +- `pnpm --filter @prisma/cli build` passes. +- `pnpm --filter @prisma/cli test` passes, unless existing unrelated tests are already failing; any unrelated failure must be documented before continuing. + +### Phase 4: Product Documentation And Error Surface Alignment + +**Status**: ☐ Not started + +**Goal**: Make the timeout behavior part of the documented CLI contract without adding new public configuration surface. + +**Requirements**: FR6, FR7, FR8, FR10, FR11, FR12, FR13, FR14, NFR2, NFR4, NFR5, NFR6, NFR7 + +**Changes**: + +- Update `docs/product/error-conventions.md` to include `OPERATION_TIMEOUT`, its meaning, exit code behavior, and distinction from `COMMAND_CANCELED` and `DOMAIN_VERIFICATION_TIMEOUT`. +- Update `docs/product/command-spec.md` only where existing command-specific wait semantics need clarification, especially `app domain wait --timeout` and any long-lived streaming/wait command text. +- Keep output stream conventions unchanged in `docs/product/output-conventions.md`; update only if the implementation exposes a new structured timeout metadata convention that needs central documentation. +- Avoid documenting global timeout flags, public timeout environment variables, or retry behavior. + +**Acceptance Criteria**: + +- Product docs describe the timeout error contract clearly enough for users, CI, and agents to distinguish timeout from cancellation. +- Docs preserve existing command behavior and do not introduce out-of-scope configuration promises. +- `pnpm --filter @prisma/cli build` passes. + +## Supplement: Command Timeout Callstacks + +These callstacks describe the intended timeout placement, not exact implementation shape. Durations are proposed generous defaults for planning. They should remain fixed internal defaults in the first slice, not public command flags. + +Timeout constants: + +- `API_REQUEST_TIMEOUT = 60s`: one Prisma Management API HTTP request. +- `SDK_REQUEST_TIMEOUT = 60s`: one Compute SDK request-style operation such as list/show/create. +- `SDK_LONG_OPERATION_TIMEOUT = 120s`: existing Compute SDK remote operation deadline for deploy/promote/update/remove polling operations. +- `DOMAIN_WAIT_TOTAL_TIMEOUT = --timeout, default 15m`: existing user-facing `app domain wait` total wait budget. +- `GITHUB_INSTALL_TOTAL_TIMEOUT = 120s`: existing GitHub App installation/repository approval polling budget. +- `LOG_STREAM_INACTIVITY_TIMEOUT = 10m`: remote log stream may run forever while active, but fails after this much silence from the remote stream. +- `UPDATE_CHECK_REGISTRY_TIMEOUT = 3s`: existing advisory registry lookup timeout; this is not part of command execution and must not change the command result. +- `NO_TIMEOUT`: no new timeout because the command is local-only, user-driven, or a plausible long-running user-controlled path. + +Shared prelude for normal command execution: + +```text +runCli() + maybeWriteCachedUpdateNotification() + read cached state only + optionally spawn update worker; original command does not wait + +update worker, when spawned + fetch npm registry [UPDATE_CHECK_REGISTRY_TIMEOUT] +``` + +Version and help commands: + +```text +prisma-cli --version + read bundled package metadata [NO_TIMEOUT] + +prisma-cli version + buildVersionResult() [NO_TIMEOUT] + +prisma-cli --help / group help + commander help rendering [NO_TIMEOUT] +``` + +Auth commands: + +```text +prisma-cli auth login + performLogin() + create localhost callback server [NO_TIMEOUT] + sdk.getLoginUrl() [API_REQUEST_TIMEOUT] + open browser [NO_TIMEOUT] + wait for browser callback or pasted callback URL [NO_TIMEOUT] + sdk.handleCallback() / token exchange [API_REQUEST_TIMEOUT] + resolveWorkspaceName() + GET /v1/workspaces/{id} [API_REQUEST_TIMEOUT] + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + GET /v1/workspaces/{id} fallback [API_REQUEST_TIMEOUT] + +prisma-cli auth logout + performLogout() [NO_TIMEOUT for credentials-store boundary] + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + GET /v1/workspaces/{id} fallback [API_REQUEST_TIMEOUT] + +prisma-cli auth whoami + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + GET /v1/workspaces/{id} fallback [API_REQUEST_TIMEOUT] +``` + +Project commands: + +```text +prisma-cli project list + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + listRealWorkspaceProjects() + GET /v1/projects [API_REQUEST_TIMEOUT] + read local binding [NO_TIMEOUT] + +prisma-cli project show [--project] + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + inspectProjectBinding() + GET /v1/projects [API_REQUEST_TIMEOUT] + read local binding [NO_TIMEOUT] + +prisma-cli project create + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + provider.createProject() + Compute SDK createProject [SDK_REQUEST_TIMEOUT] + write local binding [NO_TIMEOUT] + +prisma-cli project link [id-or-name] + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + listRealWorkspaceProjects() + GET /v1/projects [API_REQUEST_TIMEOUT] + optional provider.createProject() + Compute SDK createProject [SDK_REQUEST_TIMEOUT] + write local binding [NO_TIMEOUT] +``` + +Project environment commands: + +```text +prisma-cli project env add --role/--branch [--project] + requireClientAndProject() + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + readAuthState() -> GET /v1/me [API_REQUEST_TIMEOUT] + resolveProjectTarget() -> GET /v1/projects [API_REQUEST_TIMEOUT] + resolveScopeToApi() + GET /v1/projects/{projectId}/branches or equivalent [API_REQUEST_TIMEOUT] + optional branch creation endpoint [API_REQUEST_TIMEOUT] + findVariableByNaturalKey() + GET /v1/environment-variables [API_REQUEST_TIMEOUT] + POST /v1/environment-variables [API_REQUEST_TIMEOUT] + +prisma-cli project env update --role/--branch [--project] + requireClientAndProject() [same as env add] + resolveScopeToApi() [same as env add] + findVariableByNaturalKey() + GET /v1/environment-variables [API_REQUEST_TIMEOUT] + PATCH /v1/environment-variables/{id} [API_REQUEST_TIMEOUT] + +prisma-cli project env list [--role/--branch] [--project] + requireClientAndProject() [same as env add] + resolveScopeToApi() [same as env add] + GET /v1/environment-variables [API_REQUEST_TIMEOUT] + +prisma-cli project env remove --role/--branch [--project] +prisma-cli project env rm --role/--branch [--project] + requireClientAndProject() [same as env add] + resolveScopeToApi() [same as env add] + findVariableByNaturalKey() + GET /v1/environment-variables [API_REQUEST_TIMEOUT] + DELETE /v1/environment-variables/{id} [API_REQUEST_TIMEOUT] +``` + +Git commands: + +```text +prisma-cli git connect [git-url] [--project] + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + resolveProjectTarget() + GET /v1/projects [API_REQUEST_TIMEOUT] + inspect existing source repository + GET source repository endpoint(s) [API_REQUEST_TIMEOUT] + resolve GitHub App installation/repository access + GET /v1/scm-installations [API_REQUEST_TIMEOUT] + GET /v1/scm-installations/{id}/repositories [API_REQUEST_TIMEOUT] + if installation or access missing: + POST /v1/scm-installations/install-intents [API_REQUEST_TIMEOUT] + open browser [NO_TIMEOUT] + poll for installation/access [GITHUB_INSTALL_TOTAL_TIMEOUT] + each GET /v1/scm-installations [API_REQUEST_TIMEOUT] + each GET /v1/scm-installations/{id}/repositories [API_REQUEST_TIMEOUT] + connect repository endpoint [API_REQUEST_TIMEOUT] + +prisma-cli git disconnect [--project] + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + readAuthState() + GET /v1/me [API_REQUEST_TIMEOUT] + resolveProjectTarget() + GET /v1/projects [API_REQUEST_TIMEOUT] + inspect existing source repository + GET source repository endpoint(s) [API_REQUEST_TIMEOUT] + disconnect repository endpoint [API_REQUEST_TIMEOUT] +``` + +Branch commands: + +```text +prisma-cli branch list + current preview real mode returns FEATURE_UNAVAILABLE [NO_TIMEOUT] + fixture mode reads local/mock state [NO_TIMEOUT] + +prisma-cli branch show + current preview real mode returns FEATURE_UNAVAILABLE [NO_TIMEOUT] + fixture mode reads local/mock state [NO_TIMEOUT] + +prisma-cli branch use [name] + current preview real mode returns FEATURE_UNAVAILABLE [NO_TIMEOUT] + fixture mode reads/writes local/mock state and may prompt [NO_TIMEOUT] +``` + +Local app commands: + +```text +prisma-cli app build + detect framework and build locally [NO_TIMEOUT] + run local build process [NO_TIMEOUT] + +prisma-cli app run + detect framework locally [NO_TIMEOUT] + run local dev/runtime process [NO_TIMEOUT] +``` + +App deployment and app resource commands: + +```text +prisma-cli app deploy [options] + read local project pin and infer local project shape [NO_TIMEOUT] + requireProviderAndDeployProjectContext() + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + readAuthState() -> GET /v1/me [API_REQUEST_TIMEOUT] + GET /v1/projects [API_REQUEST_TIMEOUT] + optional provider.createProject() -> SDK createProject [SDK_REQUEST_TIMEOUT] + resolve/create branch/app + GET /v1/projects/{projectId}/branches [API_REQUEST_TIMEOUT] + optional POST /v1/projects/{projectId}/branches [API_REQUEST_TIMEOUT] + GET /v1/compute-services [API_REQUEST_TIMEOUT] + optional POST /v1/compute-services [API_REQUEST_TIMEOUT] + local framework detection/customization [NO_TIMEOUT] + provider.deployApp() + local build strategy [NO_TIMEOUT] + Compute SDK deploy remote polling [SDK_LONG_OPERATION_TIMEOUT] + write selected app/local deployment state [NO_TIMEOUT] + +prisma-cli app show [--app] [--project] + requireProviderAndProjectContext() [API_REQUEST_TIMEOUT on auth/project requests] + provider.listApps() + GET /v1/compute-services [API_REQUEST_TIMEOUT] + provider.listDeployments() + SDK showService [SDK_REQUEST_TIMEOUT] + SDK listVersions [SDK_REQUEST_TIMEOUT] + +prisma-cli app open [--app] [--project] + requireProviderAndProjectContext() [API_REQUEST_TIMEOUT on auth/project requests] + provider.listApps() [API_REQUEST_TIMEOUT] + provider.listDeployments() + SDK showService [SDK_REQUEST_TIMEOUT] + SDK listVersions [SDK_REQUEST_TIMEOUT] + open browser [NO_TIMEOUT] + +prisma-cli app list-deploys [--app] [--project] + requireProviderAndProjectContext() [API_REQUEST_TIMEOUT on auth/project requests] + provider.listApps() [API_REQUEST_TIMEOUT] + provider.listDeployments() + SDK showService [SDK_REQUEST_TIMEOUT] + SDK listVersions [SDK_REQUEST_TIMEOUT] + +prisma-cli app show-deploy + requirePreviewAppProvider() + requireComputeAuth() [NO_TIMEOUT for credentials-store boundary] + provider.showDeployment() + SDK showVersion [SDK_REQUEST_TIMEOUT] + findAppForDeployment() + SDK listProjects [SDK_REQUEST_TIMEOUT] + SDK listServices [SDK_REQUEST_TIMEOUT] + SDK showService [SDK_REQUEST_TIMEOUT] + SDK listVersions [SDK_REQUEST_TIMEOUT] + readCurrentWorkspaceId() + stateStore.read() [NO_TIMEOUT] + fallback readAuthState() -> GET /v1/me [API_REQUEST_TIMEOUT] + +prisma-cli app promote [--app] [--project] + requireProviderAndProjectContext() [API_REQUEST_TIMEOUT on auth/project requests] + provider.listApps() [API_REQUEST_TIMEOUT] + provider.listDeployments() + SDK showService [SDK_REQUEST_TIMEOUT] + SDK listVersions [SDK_REQUEST_TIMEOUT] + provider.promoteDeployment() + Compute SDK promote remote polling [SDK_LONG_OPERATION_TIMEOUT] + +prisma-cli app rollback [--to ] [--app] [--project] + requireProviderAndProjectContext() [API_REQUEST_TIMEOUT on auth/project requests] + provider.listApps() [API_REQUEST_TIMEOUT] + provider.listDeployments() + SDK showService [SDK_REQUEST_TIMEOUT] + SDK listVersions [SDK_REQUEST_TIMEOUT] + provider.promoteDeployment() + Compute SDK promote remote polling [SDK_LONG_OPERATION_TIMEOUT] + +prisma-cli app remove [--app] [--project] + requireProviderAndProjectContext() [API_REQUEST_TIMEOUT on auth/project requests] + provider.listApps() [API_REQUEST_TIMEOUT] + provider.removeApp() + SDK showService [SDK_REQUEST_TIMEOUT] + Compute SDK destroyService remote polling [SDK_LONG_OPERATION_TIMEOUT] +``` + +App domain commands: + +```text +prisma-cli app domain add [--app] [--project] [--branch] + resolveAppDomainTarget() + requireProviderAndProjectContext() [API_REQUEST_TIMEOUT on auth/project requests] + provider.listApps() [API_REQUEST_TIMEOUT] + provider.addDomain() + POST /v1/compute-services/{id}/domains [API_REQUEST_TIMEOUT] + on 409: GET /v1/compute-services/{id}/domains [API_REQUEST_TIMEOUT] + +prisma-cli app domain show [--app] [--project] [--branch] + resolveAppDomainTarget() [same as domain add] + resolveDomainByHostname() + GET /v1/compute-services/{id}/domains [API_REQUEST_TIMEOUT] + provider.showDomain() + GET /v1/domains/{id} [API_REQUEST_TIMEOUT] + +prisma-cli app domain remove [--app] [--project] [--branch] + resolveAppDomainTarget() [same as domain add] + resolveDomainByHostname() + GET /v1/compute-services/{id}/domains [API_REQUEST_TIMEOUT] + provider.removeDomain() + DELETE /v1/domains/{id} [API_REQUEST_TIMEOUT] + +prisma-cli app domain retry [--app] [--project] [--branch] + resolveAppDomainTarget() [same as domain add] + resolveDomainByHostname() + GET /v1/compute-services/{id}/domains [API_REQUEST_TIMEOUT] + provider.retryDomain() + POST /v1/domains/{id}/retry [API_REQUEST_TIMEOUT] + +prisma-cli app domain wait [--timeout 15m] [--app] [--project] [--branch] + resolveAppDomainTarget() [same as domain add] + resolveDomainByHostname() + GET /v1/compute-services/{id}/domains [API_REQUEST_TIMEOUT] + wait loop [DOMAIN_WAIT_TOTAL_TIMEOUT] + sleep poll interval [bounded by remaining DOMAIN_WAIT_TOTAL_TIMEOUT] + provider.showDomain() + GET /v1/domains/{id} [API_REQUEST_TIMEOUT] +``` + +App logs command: + +```text +prisma-cli app logs [--app] [--deployment] [--project] + requireProviderAndProjectContext() [API_REQUEST_TIMEOUT on auth/project requests] + resolve deployment target + provider.listApps() [API_REQUEST_TIMEOUT] + provider.listDeployments() + SDK showService [SDK_REQUEST_TIMEOUT] + SDK listVersions [SDK_REQUEST_TIMEOUT] + optional provider.showDeployment() + SDK showVersion [SDK_REQUEST_TIMEOUT] + provider.streamDeploymentLogs() + get log auth token [NO_TIMEOUT for credentials-store boundary] + stream remote logs [LOG_STREAM_INACTIVITY_TIMEOUT] +``` + +## Revision Log diff --git a/.agents/projects/cli-io-timeouts.spec.md b/.agents/projects/cli-io-timeouts.spec.md new file mode 100644 index 0000000..689da8d --- /dev/null +++ b/.agents/projects/cli-io-timeouts.spec.md @@ -0,0 +1,122 @@ +# CLI I/O Timeouts Spec + +## Problem + +CLI commands that perform external I/O currently rely on cooperative cancellation, but they can still wait forever when an API request, SDK operation, filesystem boundary, browser/login callback, child process, polling loop, or stream stalls without resolving. That is bad for humans because the CLI appears broken, and bad for agents/CI because a hung process blocks automation until an external supervisor kills it. + +The goal is to make every I/O-based command terminate predictably when progress stops, while preserving legitimate long-running workflows such as deployment, domain verification, GitHub installation approval, OAuth login, local dev processes, and log streaming. + +Success means no command can hang indefinitely because of a stalled I/O boundary, timed-out work produces stable actionable CLI errors, user cancellation remains distinct from timeout, and automation can safely branch on structured error codes rather than wall-clock watchdogs. + +The case against this work is that overly aggressive timeouts can create false failures for users on slow networks, slow builds, slow domain propagation, or long-lived streams. A single command-wide timeout is especially risky because it punishes legitimate progress and makes commands like `app logs`, `app run`, and `app domain wait` unreliable by default. + +## Stakeholders + +Primary actors: + +- CLI users need commands to fail clearly instead of hanging, and they need long-running workflows to stay usable when work is still making progress. +- CI and agent operators need bounded command behavior, stable structured timeout errors, and no surprise prompts or decorative output in automation. +- CLI maintainers need one cross-command timeout model that prevents each command from inventing incompatible deadlines and error behavior. + +Secondary beneficiaries: + +- Platform API and SDK teams get clearer timeout reports that identify stalled boundaries rather than generic cancellations. +- Support teams get error metadata that distinguishes user cancellation, command-specific wait expiry, API unavailability, and stalled local runtime work. + +## Functional Requirements + +**FR1** Every CLI command that performs Prisma-controlled external I/O must have bounded waits for stalled I/O boundaries. Covered boundaries include Prisma API requests, SDK operations, CLI-owned callback waits, polling sleeps, and remote streams where inactivity can be classified without punishing plausible slow-but-healthy user work. + +**FR2** Timeouts must be scoped to the smallest user-meaningful stalled boundary rather than the whole command by default. A command may run longer than any single I/O timeout when it is making progress across multiple bounded steps. + +**FR3** Command-level deadlines must be reserved for commands whose purpose is explicitly to wait for an eventual condition. Existing command-specific wait semantics, such as `app domain wait --timeout`, remain the user-facing total wait budget for that condition. + +**FR4** Long-lived commands must not time out merely because they are long-lived. `app logs`, local app runtime commands, interactive OAuth login, GitHub installation approval waits, deploy/build operations, and domain verification waits must only fail on inactivity, an explicit command wait deadline, process failure, user cancellation, or a terminal remote state. + +**FR5** User cancellation must remain distinct from timeout. `SIGINT`, `SIGTERM`, prompt escape cancellation, and upstream runtime aborts must continue to produce `COMMAND_CANCELED` and exit `130`; timeout-caused aborts must not be reported as user cancellation. + +**FR6** Timeout failures must produce stable structured errors at the command boundary. JSON output must use `OPERATION_TIMEOUT`, the logical error domain, a summary, why, fix, and structured metadata identifying the timed-out operation and configured duration when known. + +**FR7** Timeout failures must use exit code `1`, except usage errors involving invalid timeout configuration continue to use exit code `2`. + +**FR8** Human timeout errors must identify what stopped making progress and what the user can do next. The message should prefer action-oriented recovery such as retrying, checking network/VPN/proxy state, inspecting platform status, increasing an explicit wait timeout where supported, or using `--trace` for details. + +**FR9** Defaults must be generous enough for normal slow networks and platform latency. Default deadlines must protect against indefinite hangs, not optimize for fast failure or timeout plausible slow-but-healthy cases. + +**FR10** Timeout behavior must be consistent across human, `--json`, TTY, non-TTY, CI, quiet, verbose, and trace modes. Output stream rules remain unchanged: structured data on stdout, human status and errors on stderr. + +**FR11** Timeout metadata exposed to agents must be stable and non-sensitive. It may include operation labels, command labels, duration milliseconds, whether the timeout was inactivity-based or total-deadline-based, and the last known resource status. It must not include tokens, raw request headers, secrets, or full local absolute paths. + +**FR12** Commands that already expose a user-configurable wait timeout must continue to honor that setting. A timeout value of `0` keeps the documented command-specific meaning where one exists, such as poll-once snapshot mode for domain wait. + +**FR13** Timeout configuration must not become a broad new public command surface in the first slice. The default behavior should work without adding global `--timeout` flags. Any new user-facing timeout knobs require source-of-truth product documentation before implementation. + +**FR14** Update checks and other advisory background work must never extend the command's runtime or change the original command result because of timeout handling. Advisory work should remain best-effort and non-blocking. + +**FR15** Timeout handling must compose with existing AbortSignal cancellation. A timeout that aborts an internal operation must not accidentally abort unrelated sibling work or change the root command signal's user-cancellation meaning. + +**FR16** Timeouts should be applied only where the CLI can reasonably control or classify the boundary, including Prisma-managed endpoints, SDK operations, and bounded local operations. The CLI must avoid adding timeouts to plausible user-controlled slow paths where a timeout would create more false negatives than protection. + +## Non-Functional Requirements + +**NFR1** Reliability: no externally backed command may wait indefinitely on a stalled operation when the underlying runtime supports cooperative abort or when the CLI can observe inactivity. + +**NFR2** Automation safety: CI and agent runs must receive deterministic process termination and stable error envelopes without relying on external shell watchdogs. + +**NFR3** Low false-positive rate: default timeout budgets must be conservative. A normal user on a slow connection or a normally slow deployment must not see timeout failures unless progress has actually stalled at a boundary the CLI controls, or an explicit wait deadline has expired. + +**NFR4** Observability: verbose or trace output must make timeout diagnosis possible without exposing secrets. Trace mode may include underlying abort/timeout causes and operation labels. + +**NFR5** Maintainability: timeout policy must be centralized enough that new commands inherit the same behavior and error taxonomy, while still allowing command-specific wait semantics where documented. + +**NFR6** Compatibility: existing documented behavior for `COMMAND_CANCELED`, `DOMAIN_VERIFICATION_TIMEOUT`, stream output, JSON envelopes, and stdout/stderr separation must not regress. + +**NFR7** Security: timeout errors and metadata must never print credentials, token-derived secrets, Authorization headers, secret environment variable values, or full absolute local paths. + +**NFR8** Portability: timeout behavior must work in the supported Node.js runtime across macOS, Linux, Windows, TTY, non-TTY, and CI environments. + +## Assumptions + +**A1** Step-scoped and inactivity-scoped timeouts are the right default. A root command timeout would be simpler, but it would incorrectly fail legitimate long-running commands and collapse useful diagnostics into one generic deadline. + +**A2** The first implementation should add `OPERATION_TIMEOUT` as the generic timeout error code for stalled operational I/O, while retaining existing command-specific timeout codes for domain verification and similar explicit wait commands. + +**A3** Timeout aborts should be represented separately from user cancellation even if both are implemented with AbortSignal internally. + +**A4** Defaults should be documented as product behavior before implementation. The exact durations should be decided during planning after command inventory, but the spec intentionally requires generous defaults rather than fast-fail defaults. + +**A5** Existing public command-specific timeout flags should keep their current behavior. This spec should not reinterpret `app domain wait --timeout` as a lower-level network request timeout. + +**A6** A global public `--timeout` flag is not part of the first slice. It would be too ambiguous because different commands need request deadlines, inactivity deadlines, and total wait deadlines. + +**A7** Timeout behavior does not require dedicated tests in the first slice. Planning should keep implementation simple and avoid building test-only timeout configurability unless it is already needed for another reason. + +**A8** Local build and run child processes should avoid total deadlines. Where timeout protection is needed, it should be limited to inactivity or startup boundaries the CLI can reasonably observe and classify. + +## Downstream Effects + +Timeouts become part of the CLI product contract, not just an implementation detail. That means docs, help text for commands with explicit wait deadlines, error codes, and support playbooks must stay aligned. + +In 6-12 months, a step-scoped model should make the CLI safer to extend because new remote commands can inherit the same timeout/error behavior. The maintenance cost is that every new kind of long-lived operation must declare whether it is request-bound, inactivity-bound, or total-deadline-bound. + +The main negative effect is that users on unusually slow networks may see timeout errors that did not exist before. The mitigation is generous defaults, operation-specific recovery guidance, and preserving explicit wait knobs only where they map to user intent. + +Agents and CI will have a better branching surface, but any downstream scripts that currently rely on a process hanging until an external supervisor kills it will observe earlier failures. That is an intentional behavior change. + +## Out of Scope + +**OS1** Adding a global `--timeout` flag across all commands. + +**OS2** Changing the command grammar or adding new command groups. + +**OS3** Changing platform API server-side timeout behavior. + +**OS4** Retrying operations automatically beyond already documented retry behavior. Timeouts and retries are related, but this spec only requires bounded waiting and clear failure. + +**OS5** Changing existing domain verification semantics beyond ensuring they compose with lower-level I/O timeouts. + +**OS6** Replacing external CI/job-level timeouts. Shell supervisors remain useful as a last-resort guardrail, but the CLI should not depend on them for normal stalled I/O. + +## Open Questions + +None.