From cfd4aa954e79cb2f812b244436e9dd436f78540c Mon Sep 17 00:00:00 2001 From: Stackbilt Date: Thu, 28 May 2026 20:38:56 -0500 Subject: [PATCH] =?UTF-8?q?docs(groq):=20S6=20polish=20=E2=80=94=20streami?= =?UTF-8?q?ng=20caveat,=20surcharge=20note,=20API=20reference=20(#69)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Final docs sweep for the Groq built-in-tools feature. All issue #69 acceptance criteria are satisfied across S0–S5; S6 is documentation completeness only (no code change): - Streaming caveat — builtInTools is accepted while streaming and the search runs server-side, but the streaming path emits content deltas only; structured metadata.builtInToolResults / reasoning are non-streaming only. Documented (not gated) since rejecting would block legitimate streamed answers that used search internally. README + CHANGELOG. - CreditLedger surcharge note — Cost Tracking section now states built-in tool surcharges (web search, code interpreter, browser automation) are billed per-use and not token-tracked in v1 (the issue's explicit S6 ask). - API Reference — added BuiltInTool / BuiltInToolType / BuiltInToolResult to Key Types; updated LLMRequest (builtInTools), LLMResponse (metadata) and the GroqProvider class row to reflect the now-shipped surface. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 3 ++- README.md | 11 ++++++++--- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f376c2a..a98ba75 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,7 +5,7 @@ Format follows [Keep a Changelog](https://keepachangelog.com/). Versions use [Se ## [Unreleased] -Groq built-in tools (issue #69), landing across stacked PRs. Additive only. +Groq built-in tools (issue #69). Additive only. ### Added - **`LLMRequest.builtInTools`** — optional `Array<{ type: BuiltInToolType }>` requesting server-side tools (web search, code interpreter, etc.). Normalized identifiers: `web_search`, `visit_website`, `browser_automation`, `code_interpreter`, `wolfram_alpha`. Ignored by providers/models that don't advertise `supportsBuiltInTools`. @@ -20,6 +20,7 @@ Groq built-in tools (issue #69), landing across stacked PRs. Additive only. ### Notes - Built-in tool surcharges are billed by the provider and are **not** attributed per-call in `TokenUsage`; use `CreditLedger` for accounting. +- Streaming with `builtInTools` runs the search server-side but emits content deltas only — structured `metadata.builtInToolResults` / `metadata.reasoning` are available on non-streaming `generateResponse` only. ## [1.9.0] — 2026-05-22 diff --git a/README.md b/README.md index 3343f12..e1772e7 100644 --- a/README.md +++ b/README.md @@ -178,6 +178,8 @@ const llm = new LLMProviders({ }); ``` +> **Server-side built-in tool surcharges are not token-tracked.** Costs in `TokenUsage` and the ledger cover token spend only. Groq's built-in tools bill separately per use (web search ~$5/1k requests, code interpreter ~$0.18/hr, browser automation ~$0.08/hr) and are **not** attributed per call in v1. If you run `builtInTools` workloads, account for these surcharges out of band (e.g. a manual ledger adjustment or a separate budget line). + ## Model Catalog & Runtime Selection Model selection is driven by a declarative catalog rather than a hardcoded fallback array. The selector intersects: @@ -428,6 +430,7 @@ Notes: - **Cost.** Built-in tool surcharges (e.g. web search ~$5/1k requests) are billed by the provider and are **not** attributed per-call in `TokenUsage`; track them via `CreditLedger` if needed. - **Citations.** Structured search results surface on `LLMResponse.metadata.builtInToolResults` — `Array<{ type, name?, arguments?, results: [{ title, url, content, score }] }>`. Only executions that ran a web search appear (e.g. `code_interpreter` runs, which carry no citations, are omitted); the field is absent when no search ran. Citation sub-fields are passed through as the provider returns them — treat them as best-effort and validate URLs before use. - **Reasoning.** When the model exposes its internal reasoning (the queries it searched), it surfaces on `LLMResponse.metadata.reasoning` as a string. Absent when the model doesn't emit it. +- **Streaming.** `builtInTools` is accepted on streaming requests and the search still runs server-side, but the streaming path emits content deltas only — structured `metadata.builtInToolResults` and `metadata.reasoning` are **not** surfaced while streaming. Use non-streaming `generateResponse` when you need the structured citations. ```typescript const citations = res.metadata?.builtInToolResults?.[0]?.results ?? []; @@ -509,7 +512,7 @@ fs.writeFileSync('fixtures/openai.json', JSON.stringify(shape, null, 2)); | `AnthropicProvider` | Anthropic Claude models (streaming, tools) | | `CloudflareProvider` | Cloudflare Workers AI (streaming, tools on GPT-OSS/Gemma 4/Llama 4, batch) | | `CerebrasProvider` | Cerebras fast inference (streaming, tools on GLM/Qwen) | -| `GroqProvider` | Groq fast inference (streaming, tools on GPT-OSS/LLaMA 3.3 70B) | +| `GroqProvider` | Groq fast inference (streaming, tools on GPT-OSS/LLaMA 3.3 70B; server-side built-in tools on Compound systems and GPT-OSS) | | `NvidiaProvider` | NVIDIA NIM inference (streaming, tools on Llama/Nemotron/Mistral) | | `BaseProvider` | Abstract base with shared resiliency, metrics, and cost calculation | @@ -542,8 +545,10 @@ fs.writeFileSync('fixtures/openai.json', JSON.stringify(shape, null, 2)); | Type | Description | |------|-------------| -| `LLMRequest` | Unified request: messages, model, temperature, tools, response_format, cache, lora | -| `LLMResponse` | Unified response: message, usage (with cost), provider, tool calls | +| `LLMRequest` | Unified request: messages, model, temperature, tools, builtInTools, response_format, cache, lora | +| `LLMResponse` | Unified response: message, usage (with cost), provider, tool calls, metadata (builtInToolResults, reasoning) | +| `BuiltInTool` / `BuiltInToolType` | Server-side tool request: `{ type }` where type is `web_search` \| `visit_website` \| `browser_automation` \| `code_interpreter` \| `wolfram_alpha` | +| `BuiltInToolResult` | A surfaced built-in execution: `{ type, name?, arguments?, results: [{ title, url, content, score }] }` on `metadata.builtInToolResults` | | `TokenUsage` | Token counts, cost, and cached token fields (cachedInputTokens, cacheReadInputTokens, cacheCreationInputTokens) | | `CacheHints` | Cache strategy, key, ttl, sessionId, cacheablePrefix for provider-agnostic prompt caching | | `ToolExecutor` | Interface for `generateResponseWithTools`: `execute(name, args) => Promise` |