You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DRIFT.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,19 @@
1
1
# Live API Drift Detection
2
2
3
-
llmock produces responses shaped like real LLM APIs. Providers change their APIs over time. **Drift** means the mock no longer matches reality — your tests pass against llmock but break against the real API.
3
+
aimock produces responses shaped like real LLM APIs. Providers change their APIs over time. **Drift** means the mock no longer matches reality — your tests pass against aimock but break against the real API.
4
4
5
5
## Three-Layer Approach
6
6
7
7
Drift detection compares three independent sources to triangulate the cause of any mismatch:
8
8
9
-
| SDK types = Real API? | Real API = llmock? | Diagnosis |
9
+
| SDK types = Real API? | Real API = aimock? | Diagnosis |
| Yes | No |**llmock drift** — response builders need updating |
11
+
| Yes | No |**aimock drift** — response builders need updating |
12
12
| No | No |**Provider changed before SDK update** — flag, wait for SDK catch-up |
13
13
| Yes | Yes |**No drift** — all clear |
14
14
| No | Yes |**SDK drift** — provider deprecated something SDK still references |
15
15
16
-
Two-way comparison (mock vs real) can't distinguish between "we need to fix llmock" and "the SDK hasn't caught up yet." Three-way comparison can.
16
+
Two-way comparison (mock vs real) can't distinguish between "we need to fix aimock" and "the SDK hasn't caught up yet." Three-way comparison can.
17
17
18
18
## Running Drift Tests
19
19
@@ -40,9 +40,9 @@ Each provider's tests skip independently if its key is not set. You can run drif
40
40
41
41
### Severity levels
42
42
43
-
-**critical** — Test fails. llmock produces a different shape than the real API for a field that both the SDK and real API agree on. This means llmock needs an update.
44
-
-**warning** — Test passes (unless `STRICT_DRIFT=1`). The real API has a field that neither the SDK nor llmock knows about, or the SDK and real API disagree. Usually means a provider added something new.
45
-
-**info** — Always passes. Known intentional differences (usage fields are always zero, optional fields llmock omits, etc.).
43
+
-**critical** — Test fails. aimock produces a different shape than the real API for a field that both the SDK and real API agree on. This means aimock needs an update.
44
+
-**warning** — Test passes (unless `STRICT_DRIFT=1`). The real API has a field that neither the SDK nor aimock knows about, or the SDK and real API disagree. Usually means a provider added something new.
45
+
-**info** — Always passes. Known intentional differences (usage fields are always zero, optional fields aimock omits, etc.).
46
46
47
47
### Example report output
48
48
@@ -86,7 +86,7 @@ When a `critical` drift is detected:
86
86
87
87
## Model Deprecation
88
88
89
-
The `models.drift.ts` test scrapes model names referenced in llmock's test files, README, and fixtures, then checks each provider's model listing API to verify they still exist.
89
+
The `models.drift.ts` test scrapes model names referenced in aimock's test files, README, and fixtures, then checks each provider's model listing API to verify they still exist.
90
90
91
91
When a model is deprecated:
92
92
@@ -106,7 +106,7 @@ When a model is deprecated:
106
106
107
107
## WebSocket Drift Coverage
108
108
109
-
In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover llmock's WS protocols (4 verified + 2 canary = 6 WS tests):
109
+
In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (4 verified + 2 canary = 6 WS tests):
110
110
111
111
| Protocol | Text | Tool Call | Real Endpoint | Status |
@@ -118,13 +118,13 @@ In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model dep
118
118
119
119
**Auth**: Uses the same `OPENAI_API_KEY` and `GOOGLE_API_KEY` environment variables as HTTP tests. No new secrets needed.
120
120
121
-
**How it works**: A TLS WebSocket client (`ws-providers.ts`) connects to real provider endpoints using `node:tls` with RFC 6455 framing. Each protocol function handles the setup sequence (e.g., Realtime session negotiation, Gemini Live setup/setupComplete) and collects messages until a terminal event. The mock side uses the existing `ws-test-client.ts` plaintext client against the local llmock server.
121
+
**How it works**: A TLS WebSocket client (`ws-providers.ts`) connects to real provider endpoints using `node:tls` with RFC 6455 framing. Each protocol function handles the setup sequence (e.g., Realtime session negotiation, Gemini Live setup/setupComplete) and collects messages until a terminal event. The mock side uses the existing `ws-test-client.ts` plaintext client against the local aimock server.
122
122
123
123
### Gemini Live: unverified
124
124
125
-
llmock's Gemini Live handler implements the text-based `BidiGenerateContent` protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live) — `setup`/`setupComplete` handshake, `clientContent` with turns, `serverContent` with `modelTurn.parts[].text`, and `toolCall` responses. The protocol format is correct per the docs.
125
+
aimock's Gemini Live handler implements the text-based `BidiGenerateContent` protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live) — `setup`/`setupComplete` handshake, `clientContent` with turns, `serverContent` with `modelTurn.parts[].text`, and `toolCall` responses. The protocol format is correct per the docs.
126
126
127
-
However, as of March 2026, the only models that support `bidiGenerateContent` are native-audio models (`gemini-2.5-flash-native-audio-*`), which reject text-only requests. No text-capable model exists for this endpoint yet, so we cannot triangulate llmock's output against a real API response.
127
+
However, as of March 2026, the only models that support `bidiGenerateContent` are native-audio models (`gemini-2.5-flash-native-audio-*`), which reject text-only requests. No text-capable model exists for this endpoint yet, so we cannot triangulate aimock's output against a real API response.
128
128
129
129
A canary test (`ws-gemini-live.drift.ts`) queries the Gemini model listing API on each drift run and checks for a non-audio model that supports `bidiGenerateContent`. When Google ships one, the canary will flag it and the full drift tests can be enabled.
0 commit comments