Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Format follows [Keep a Changelog](https://keepachangelog.com/). Versions use [Se

## [Unreleased]

Groq built-in tools (issue #69), landing across stacked PRs. Additive only.
Groq built-in tools (issue #69). Additive only.

### Added
- **`LLMRequest.builtInTools`** — optional `Array<{ type: BuiltInToolType }>` requesting server-side tools (web search, code interpreter, etc.). Normalized identifiers: `web_search`, `visit_website`, `browser_automation`, `code_interpreter`, `wolfram_alpha`. Ignored by providers/models that don't advertise `supportsBuiltInTools`.
Expand All @@ -20,6 +20,7 @@ Groq built-in tools (issue #69), landing across stacked PRs. Additive only.

### Notes
- Built-in tool surcharges are billed by the provider and are **not** attributed per-call in `TokenUsage`; use `CreditLedger` for accounting.
- Streaming with `builtInTools` runs the search server-side but emits content deltas only — structured `metadata.builtInToolResults` / `metadata.reasoning` are available on non-streaming `generateResponse` only.

## [1.9.0] — 2026-05-22

Expand Down
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,8 @@ const llm = new LLMProviders({
});
```

> **Server-side built-in tool surcharges are not token-tracked.** Costs in `TokenUsage` and the ledger cover token spend only. Groq's built-in tools bill separately per use (web search ~$5/1k requests, code interpreter ~$0.18/hr, browser automation ~$0.08/hr) and are **not** attributed per call in v1. If you run `builtInTools` workloads, account for these surcharges out of band (e.g. a manual ledger adjustment or a separate budget line).

## Model Catalog & Runtime Selection

Model selection is driven by a declarative catalog rather than a hardcoded fallback array. The selector intersects:
Expand Down Expand Up @@ -428,6 +430,7 @@ Notes:
- **Cost.** Built-in tool surcharges (e.g. web search ~$5/1k requests) are billed by the provider and are **not** attributed per-call in `TokenUsage`; track them via `CreditLedger` if needed.
- **Citations.** Structured search results surface on `LLMResponse.metadata.builtInToolResults` — `Array<{ type, name?, arguments?, results: [{ title, url, content, score }] }>`. Only executions that ran a web search appear (e.g. `code_interpreter` runs, which carry no citations, are omitted); the field is absent when no search ran. Citation sub-fields are passed through as the provider returns them — treat them as best-effort and validate URLs before use.
- **Reasoning.** When the model exposes its internal reasoning (the queries it searched), it surfaces on `LLMResponse.metadata.reasoning` as a string. Absent when the model doesn't emit it.
- **Streaming.** `builtInTools` is accepted on streaming requests and the search still runs server-side, but the streaming path emits content deltas only — structured `metadata.builtInToolResults` and `metadata.reasoning` are **not** surfaced while streaming. Use non-streaming `generateResponse` when you need the structured citations.

```typescript
const citations = res.metadata?.builtInToolResults?.[0]?.results ?? [];
Expand Down Expand Up @@ -509,7 +512,7 @@ fs.writeFileSync('fixtures/openai.json', JSON.stringify(shape, null, 2));
| `AnthropicProvider` | Anthropic Claude models (streaming, tools) |
| `CloudflareProvider` | Cloudflare Workers AI (streaming, tools on GPT-OSS/Gemma 4/Llama 4, batch) |
| `CerebrasProvider` | Cerebras fast inference (streaming, tools on GLM/Qwen) |
| `GroqProvider` | Groq fast inference (streaming, tools on GPT-OSS/LLaMA 3.3 70B) |
| `GroqProvider` | Groq fast inference (streaming, tools on GPT-OSS/LLaMA 3.3 70B; server-side built-in tools on Compound systems and GPT-OSS) |
| `NvidiaProvider` | NVIDIA NIM inference (streaming, tools on Llama/Nemotron/Mistral) |
| `BaseProvider` | Abstract base with shared resiliency, metrics, and cost calculation |

Expand Down Expand Up @@ -542,8 +545,10 @@ fs.writeFileSync('fixtures/openai.json', JSON.stringify(shape, null, 2));

| Type | Description |
|------|-------------|
| `LLMRequest` | Unified request: messages, model, temperature, tools, response_format, cache, lora |
| `LLMResponse` | Unified response: message, usage (with cost), provider, tool calls |
| `LLMRequest` | Unified request: messages, model, temperature, tools, builtInTools, response_format, cache, lora |
| `LLMResponse` | Unified response: message, usage (with cost), provider, tool calls, metadata (builtInToolResults, reasoning) |
| `BuiltInTool` / `BuiltInToolType` | Server-side tool request: `{ type }` where type is `web_search` \| `visit_website` \| `browser_automation` \| `code_interpreter` \| `wolfram_alpha` |
| `BuiltInToolResult` | A surfaced built-in execution: `{ type, name?, arguments?, results: [{ title, url, content, score }] }` on `metadata.builtInToolResults` |
| `TokenUsage` | Token counts, cost, and cached token fields (cachedInputTokens, cacheReadInputTokens, cacheCreationInputTokens) |
| `CacheHints` | Cache strategy, key, ttl, sessionId, cacheablePrefix for provider-agnostic prompt caching |
| `ToolExecutor` | Interface for `generateResponseWithTools`: `execute(name, args) => Promise<unknown>` |
Expand Down