Persist workspace volume cleanup state to stop repeated GC removal attempts

## Summary
Repeated GC logs show attempts to remove workspace volumes for closed threads on every sweep and after server restarts. Example:

```
[Nest] ... LOG [ContainerService] Removing volume name=ha_ws_<id> force=true
[Nest] ... DEBUG [ContainerService] Volume already removed name=ha_ws_<id>
```

This indicates the system doesn’t persist that a thread’s workspace volume was already removed (or is absent). The Volume GC reprocesses the same closed threads indefinitely because candidate selection includes all `status='closed'` and cooldown is only in-memory.

## User report
- Increasing logs related to removal of volumes.
- After server restart the same logs appear again, implying DB contains outdated state.

## Research specification (Emerson Gray)

### Root cause
- VolumeGcService selects candidates with `where: { status: 'closed' }` and never persists a per-thread cleanup state.
- Cooldown is in-memory; restarts reset it and cause retries on the same closed threads.
- Docker removal is idempotent (404 swallowed) but the DB doesn’t reflect “already cleaned”.

### Fix plan
1) Schema: add a nullable timestamp field on `Thread`:
   - `workspaceVolumeRemovedAt DateTime?`
   - Migration: add column; backward compatible; optional index `(status, workspaceVolumeRemovedAt)`.

2) VolumeGcService:
   - Change candidate selection to `where: { status: 'closed', workspaceVolumeRemovedAt: null }`.
   - When no containers reference the volume:
     - On successful removal OR 404/not_found, set `workspaceVolumeRemovedAt = now`.
     - On other errors, do not mark; allow retry.
   - Keep in-memory cooldown as secondary throttle.
   - Use idempotent guarded update: `updateMany({ where: { id, workspaceVolumeRemovedAt: null }, data: { workspaceVolumeRemovedAt: now } })`.

3) ThreadCleanupCoordinator + Provider:
   - `deleteWorkspaceVolume(threadId)`: after removal or confirmed absence, mark `workspaceVolumeRemovedAt`.
   - Adjust provider contract to return a small outcome enum (recommended):
     - `{ outcome: 'removed' | 'not_found' | 'referenced' }`.
   - `container.service.ts removeVolume`: return `'removed' | 'not_found'`; throw on other errors; keep swallowing 404.

4) Concurrency & idempotency:
   - Removal remains idempotent at Docker.
   - DB marking is race-safe via conditional update; GC and coordinator won’t conflict.

### Tests
- VolumeGcService:
  - Filters candidates to closed threads with `workspaceVolumeRemovedAt=NULL`.
  - Marks timestamp on removed and not_found; does not mark when referenced; does not mark on non-404 error; respects runner down.
- ThreadCleanupCoordinator:
  - Marks on success and not_found; skips marking on referenced; does not mark on error.
- Optional: provider/container.service mapping of 404 to `not_found`.

## Acceptance criteria
- After first sweep post-deploy, threads whose volumes are removed/absent are marked and no longer retried.
- Server restart does not re-trigger volume removal attempts for already cleaned threads.
- Existing behavior for open threads and referenced volumes is preserved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist workspace volume cleanup state to stop repeated GC removal attempts #1351

Summary

User report

Research specification (Emerson Gray)

Root cause

Fix plan

Tests

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Persist workspace volume cleanup state to stop repeated GC removal attempts #1351

Description

Summary

User report

Research specification (Emerson Gray)

Root cause

Fix plan

Tests

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions