Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions docs/external-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
| 11 | BigQuery Metrics | `api.anthropic.com/api/claude_code/metrics` | HTTPS | 默认启用 |
| 12 | MCP Proxy | `mcp-proxy.anthropic.com` | HTTPS+WS | 使用 MCP 工具时 |
| 13 | MCP Registry | `api.anthropic.com/mcp-registry` | HTTPS | 查询 MCP 服务器时 |
| 14 | Bing Search | `www.bing.com` | HTTPS | WebSearch 工具 |
| 14 | Web Search Pages | `www.bing.com`, `search.brave.com` | HTTPS | WebSearch 工具,可通过 `WEB_SEARCH_ADAPTER=bing|brave` 切换 |
| 15 | Google Cloud Storage (更新) | `storage.googleapis.com` | HTTPS | 版本检查 |
| 16 | GitHub Raw (Changelog/Stats) | `raw.githubusercontent.com` | HTTPS | 更新提示 |
| 17 | Claude in Chrome Bridge | `bridge.claudeusercontent.com` | WSS | Chrome 集成 |
Expand Down Expand Up @@ -121,12 +121,16 @@ Anthropic 托管的 MCP 服务器代理。
- **端点**: `https://api.anthropic.com/mcp-registry/v0/servers?version=latest&visibility=commercial`
- **文件**: `src/services/mcp/officialRegistry.ts`

### 14. Bing Search
### 14. Web Search Pages

WebSearch 工具的默认适配器,抓取 Bing 搜索结果。
WebSearch 工具支持直接抓取 Bing 搜索结果页面,也支持通过 Brave 的 LLM Context API
获取搜索上下文;可通过 `WEB_SEARCH_ADAPTER=bing|brave` 显式切换后端。

- **端点**: `https://www.bing.com/search?q={query}&setmkt=en-US`
- **文件**: `src/tools/WebSearchTool/adapters/bingAdapter.ts`
- **Bing 端点**: `https://www.bing.com/search?q={query}&setmkt=en-US`
- **Brave 端点**: `https://api.search.brave.com/res/v1/llm/context?q={query}`
- **文件**:
- `src/tools/WebSearchTool/adapters/bingAdapter.ts`
- `src/tools/WebSearchTool/adapters/braveAdapter.ts`

另外还有 Domain Blocklist 查询:
- **端点**: `https://api.anthropic.com/api/web/domain_info?domain={domain}`
Expand Down Expand Up @@ -201,6 +205,7 @@ WebSearch 工具的默认适配器,抓取 Bing 搜索结果。
| `{region}-aiplatform.googleapis.com` | Google Vertex AI | HTTPS |
| `{resource}.services.ai.azure.com` | Azure Foundry | HTTPS |
| `www.bing.com` | Bing 搜索 | HTTPS |
| `search.brave.com` | Brave 搜索 | HTTPS |
| `storage.googleapis.com` | 自动更新 | HTTPS |
| `raw.githubusercontent.com` | Changelog / 插件统计 | HTTPS |
| `bridge.claudeusercontent.com` | Chrome Bridge | WSS |
Expand Down
44 changes: 23 additions & 21 deletions docs/features/web-search-tool.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# WEB_SEARCH_TOOL — 网页搜索工具

> 实现状态:适配器架构完成,Bing 适配器为当前默认后端
> 实现状态:适配器架构完成,支持 API / Bing / Brave 三种后端
> 引用数:核心工具,无 feature flag 门控(始终启用)

## 一、功能概述

WebSearchTool 让模型可以搜索互联网获取最新信息。原始实现仅支持 Anthropic API 服务端搜索(`web_search_20250305` server tool),在第三方代理端点下不可用。现已重构为适配器架构,新增 Bing 搜索页面解析作为 fallback,确保任何 API 端点都能使用搜索功能。
WebSearchTool 让模型可以搜索互联网获取最新信息。原始实现仅支持 Anthropic API 服务端搜索(`web_search_20250305` server tool),在第三方代理端点下不可用。现已重构为适配器架构,支持 API 服务端搜索,以及 Bing / Brave 两个 HTML 解析后端,确保任何 API 端点都能使用搜索功能。
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct backend description for Brave.

Line 8 describes Brave as an “HTML parsing backend,” but BraveSearchAdapter calls Brave’s LLM Context API and maps JSON grounding data. Please adjust this wording to avoid conflating it with Bing’s HTML parsing path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/features/web-search-tool.md` at line 8, Update the docs text for
WebSearchTool to correctly describe Brave: change the phrase calling Brave an
“HTML parsing backend” to state that Brave (handled by BraveSearchAdapter) uses
Brave’s LLM Context API and returns JSON grounding data that is mapped into the
tool, whereas Bing uses the HTML-parsing backend; reference WebSearchTool,
BraveSearchAdapter and the Bing HTML parsing path to make the distinction clear.


## 二、实现架构

Expand All @@ -21,9 +21,13 @@ WebSearchTool.call()
│ └── 使用 web_search_20250305 server tool
│ 通过 queryModelWithStreaming 二次调用 API
└── BingSearchAdapter — Bing HTML 抓取 + 正则提取(当前默认)
└── 直接抓取 Bing 搜索页 HTML
正则提取 b_algo 块中的标题/URL/摘要
├── BingSearchAdapter — Bing HTML 抓取 + 正则提取
│ └── 直接抓取 Bing 搜索页 HTML
│ 正则提取 b_algo 块中的标题/URL/摘要
└── BraveSearchAdapter — Brave LLM Context API
└── 调用 Brave HTTPS GET 接口
将 grounding payload 映射为标题/URL/摘要
```

### 2.2 模块结构
Expand All @@ -37,8 +41,9 @@ WebSearchTool.call()
| 适配器工厂 | `src/tools/WebSearchTool/adapters/index.ts` | `createAdapter()` 工厂函数,选择后端 |
| API 适配器 | `src/tools/WebSearchTool/adapters/apiAdapter.ts` | 封装原有 `queryModelWithStreaming` 逻辑,使用 server tool |
| Bing 适配器 | `src/tools/WebSearchTool/adapters/bingAdapter.ts` | Bing HTML 抓取 + 正则解析 |
| 单元测试 | `src/tools/WebSearchTool/__tests__/bingAdapter.test.ts` | 32 个测试用例 |
| 集成测试 | `src/tools/WebSearchTool/__tests__/bingAdapter.integration.ts` | 真实网络请求验证 |
| Brave 适配器 | `src/tools/WebSearchTool/adapters/braveAdapter.ts` | Brave LLM Context API 适配与结果映射 |
| 单元测试 | `src/tools/WebSearchTool/__tests__/bingAdapter.test.ts`, `src/tools/WebSearchTool/__tests__/braveAdapter*.test.ts`, `src/tools/WebSearchTool/__tests__/adapterFactory.test.ts` | Bing / Brave 解析与工厂逻辑测试 |
| 集成测试 | `src/tools/WebSearchTool/__tests__/bingAdapter.integration.ts`, `src/tools/WebSearchTool/__tests__/braveAdapter.integration.ts` | 真实网络请求验证 |

### 2.3 数据流

Expand All @@ -49,20 +54,18 @@ WebSearchTool.call()
validateInput() — 校验 query 非空、allowed/block 不共存
createAdapter() → BingSearchAdapter(当前硬编码)
createAdapter() → ApiSearchAdapter | BingSearchAdapter | BraveSearchAdapter
adapter.search(query, { allowedDomains, blockedDomains, signal, onProgress })
├── onProgress({ type: 'query_update', query })
├── axios.get(bing.com/search?q=...&setmkt=en-US)
│ └── 13 个 Edge 浏览器请求头
├── axios.get(search-engine-url)
│ └── API 鉴权请求头
├── extractBingResults(html) — 正则提取 <li class="b_algo"> 块
│ ├── resolveBingUrl() — 解码 base64 重定向 URL
│ ├── extractSnippet() — 三级降级摘要提取
│ └── decodeHtmlEntities() — he.decode
├── extractResults(payload) — 按后端提取结果
│ └── grounding → SearchResult[] 映射
├── 客户端域名过滤 (allowedDomains / blockedDomains)
Expand Down Expand Up @@ -117,19 +120,18 @@ Bing 返回的重定向 URL 格式:`bing.com/ck/a?...&u=a1aHR0cHM6Ly9...`

## 四、适配器选择逻辑

当前 `createAdapter()` 硬编码返回 `BingSearchAdapter`,原逻辑已注释保留
`createAdapter()` 按以下优先级选择后端,并按选中的后端 key 缓存适配器实例

```typescript
export function createAdapter(): WebSearchAdapter {
return new BingSearchAdapter()
// 注释保留的选择逻辑:
// 1. WEB_SEARCH_ADAPTER 环境变量强制指定 api|bing
// 2. isFirstPartyAnthropicBaseUrl() → API 适配器
// 3. 第三方端点 → Bing 适配器
// 1. WEB_SEARCH_ADAPTER=api|bing|brave 显式指定
// 2. Anthropic 官方 API Base URL → ApiSearchAdapter
// 3. 第三方代理 / 非官方端点 → BingSearchAdapter
}
```

恢复自动选择:取消 `index.ts` 中的注释即可。
显式指定 `WEB_SEARCH_ADAPTER=brave` 时,会改用 Brave LLM Context API 后端,并要求
`BRAVE_SEARCH_API_KEY` 或 `BRAVE_API_KEY`。

## 五、接口定义

Expand Down
10 changes: 6 additions & 4 deletions docs/tools/search-and-navigation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -146,14 +146,15 @@ AI 的信息获取不局限于本地代码:

### WebSearch 实现机制

WebSearch 通过适配器模式支持两种搜索后端,由 `src/tools/WebSearchTool/adapters/` 中的工厂函数 `createAdapter()` 选择:
WebSearch 通过适配器模式支持三种搜索后端,由 `src/tools/WebSearchTool/adapters/` 中的工厂函数 `createAdapter()` 选择:

```
适配器架构:
WebSearchTool.call()
→ createAdapter() 选择后端
├─ ApiSearchAdapter — Anthropic API 服务端搜索(需官方 API 密钥)
└─ BingSearchAdapter — 直接抓取 Bing 搜索页面解析(无需 API 密钥)
├─ BingSearchAdapter — 直接抓取 Bing 搜索页面解析(无需 API 密钥)
└─ BraveSearchAdapter — 调用 Brave LLM Context API 解析(需 Brave API 密钥)
→ adapter.search(query, options)
→ 转换为统一 SearchResult[] 格式返回
```
Expand All @@ -166,8 +167,9 @@ WebSearch 通过适配器模式支持两种搜索后端,由 `src/tools/WebSear
|--------|------|--------|
| 1 | 环境变量 `WEB_SEARCH_ADAPTER=api` | `ApiSearchAdapter` |
| 2 | 环境变量 `WEB_SEARCH_ADAPTER=bing` | `BingSearchAdapter` |
| 3 | API Base URL 指向 Anthropic 官方 | `ApiSearchAdapter` |
| 4 | 第三方代理 / 非官方端点 | `BingSearchAdapter` |
| 3 | 环境变量 `WEB_SEARCH_ADAPTER=brave` | `BraveSearchAdapter` |
| 4 | API Base URL 指向 Anthropic 官方 | `ApiSearchAdapter` |
| 5 | 第三方代理 / 非官方端点 | `BingSearchAdapter` |

适配器是无状态的,同一会话内缓存复用。

Expand Down
70 changes: 70 additions & 0 deletions src/tools/WebSearchTool/__tests__/adapterFactory.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import { afterEach, describe, expect, mock, test } from 'bun:test'

let isFirstPartyBaseUrl = true

mock.module('../adapters/apiAdapter.js', () => ({
ApiSearchAdapter: class ApiSearchAdapter {},
}))

mock.module('../adapters/bingAdapter.js', () => ({
BingSearchAdapter: class BingSearchAdapter {},
}))

mock.module('../adapters/braveAdapter.js', () => ({
BraveSearchAdapter: class BraveSearchAdapter {},
}))

mock.module('../../../utils/model/providers.js', () => ({
isFirstPartyAnthropicBaseUrl: () => isFirstPartyBaseUrl,
}))

const { createAdapter } = await import('../adapters/index')

const originalWebSearchAdapter = process.env.WEB_SEARCH_ADAPTER

afterEach(() => {
isFirstPartyBaseUrl = true

if (originalWebSearchAdapter === undefined) {
delete process.env.WEB_SEARCH_ADAPTER
} else {
process.env.WEB_SEARCH_ADAPTER = originalWebSearchAdapter
}
})
Comment on lines +21 to +33
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the test file structure
cat -n src/tools/WebSearchTool/__tests__/adapterFactory.test.ts

Repository: claude-code-best/claude-code

Length of output: 2631


🏁 Script executed:

# Now examine the adapters/index.ts to see the caching mechanism
cat -n src/tools/WebSearchTool/adapters/index.ts

Repository: claude-code-best/claude-code

Length of output: 1794


🏁 Script executed:

# Check what isFirstPartyBaseUrl is and where it's defined
rg -n "isFirstPartyBaseUrl" src/tools/WebSearchTool/ -A 2 -B 2

Repository: claude-code-best/claude-code

Length of output: 2390


Reset the factory cache between test cases.

The module-level cachedAdapter and cachedAdapterKey state in src/tools/WebSearchTool/adapters/index.ts is never cleared between tests. While cache invalidation works via the adapterKey comparison, proper test isolation requires explicitly resetting this module state in the afterEach() block alongside the environment variable cleanup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/WebSearchTool/__tests__/adapterFactory.test.ts` around lines 21 -
33, Reset the module-level cache after each test: in the afterEach() teardown
add logic to clear the adapter cache by setting cachedAdapter and
cachedAdapterKey back to their initial/undefined state so createAdapter will
re-evaluate on the next import; reference the module-level symbols cachedAdapter
and cachedAdapterKey (used by createAdapter in the adapters index) and ensure
they are reset alongside restoring process.env.WEB_SEARCH_ADAPTER.


describe('createAdapter', () => {
test('reuses the same instance when the selected backend does not change', () => {
process.env.WEB_SEARCH_ADAPTER = 'brave'

const firstAdapter = createAdapter()
const secondAdapter = createAdapter()

expect(firstAdapter).toBe(secondAdapter)
expect(firstAdapter.constructor.name).toBe('BraveSearchAdapter')
})

test('rebuilds the adapter when WEB_SEARCH_ADAPTER changes', () => {
process.env.WEB_SEARCH_ADAPTER = 'brave'
const braveAdapter = createAdapter()

process.env.WEB_SEARCH_ADAPTER = 'bing'
const bingAdapter = createAdapter()

expect(bingAdapter).not.toBe(braveAdapter)
expect(bingAdapter.constructor.name).toBe('BingSearchAdapter')
})

test('selects the API adapter for first-party Anthropic URLs', () => {
delete process.env.WEB_SEARCH_ADAPTER
isFirstPartyBaseUrl = true

expect(createAdapter().constructor.name).toBe('ApiSearchAdapter')
})

test('selects the Bing adapter for third-party Anthropic base URLs', () => {
delete process.env.WEB_SEARCH_ADAPTER
isFirstPartyBaseUrl = false

expect(createAdapter().constructor.name).toBe('BingSearchAdapter')
})
})
106 changes: 106 additions & 0 deletions src/tools/WebSearchTool/__tests__/braveAdapter.extract.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
import { describe, expect, test } from 'bun:test'
import { extractBraveResults } from '../adapters/braveAdapter'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use src/ alias import in tests as well.

Switch this relative import to the configured path alias for consistency with project standards.

As per coding guidelines, "Use src/path alias for imports; valid paths likeimport { ... } from 'src/utils/...'``."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/WebSearchTool/__tests__/braveAdapter.extract.test.ts` at line 2,
Replace the relative import of extractBraveResults from
'../adapters/braveAdapter' with the project's src/ path alias import (i.e.,
import extractBraveResults via the src/ alias) so tests use the same module
resolution as the app; update the import statement referencing braveAdapter and
the extractBraveResults symbol accordingly.


describe('extractBraveResults', () => {
test('extracts generic grounding results', () => {
const results = extractBraveResults({
grounding: {
generic: [
{
title: 'Example Title 1',
url: 'https://example.com/page1',
snippets: ['First result description'],
},
{
title: 'Example Title 2',
url: 'https://example.com/page2',
snippets: ['Second result description'],
},
],
},
})

expect(results).toEqual([
{
title: 'Example Title 1',
url: 'https://example.com/page1',
snippet: 'First result description',
},
{
title: 'Example Title 2',
url: 'https://example.com/page2',
snippet: 'Second result description',
},
])
})

test('combines generic, poi, and map grounding results', () => {
const results = extractBraveResults({
grounding: {
generic: [{ title: 'Generic', url: 'https://example.com/generic' }],
poi: { title: 'POI', url: 'https://maps.example.com/poi' },
map: [{ title: 'Map', url: 'https://maps.example.com/map' }],
},
})

expect(results).toEqual([
{ title: 'Generic', url: 'https://example.com/generic', snippet: undefined },
{ title: 'POI', url: 'https://maps.example.com/poi', snippet: undefined },
{ title: 'Map', url: 'https://maps.example.com/map', snippet: undefined },
])
})

test('joins multiple snippets into one summary string', () => {
const results = extractBraveResults({
grounding: {
generic: [
{
title: 'Joined Snippets',
url: 'https://example.com/joined',
snippets: ['First snippet.', 'Second snippet.'],
},
],
},
})

expect(results[0].snippet).toBe('First snippet. Second snippet.')
})

test('skips entries without a title or URL', () => {
const results = extractBraveResults({
grounding: {
generic: [
{ title: 'Missing URL' },
{ url: 'https://example.com/missing-title' },
{ title: 'Valid', url: 'https://example.com/valid' },
],
},
})

expect(results).toEqual([
{ title: 'Valid', url: 'https://example.com/valid', snippet: undefined },
])
})

test('deduplicates repeated URLs across grounding buckets', () => {
const results = extractBraveResults({
grounding: {
generic: [{ title: 'First', url: 'https://example.com/dup' }],
poi: { title: 'Second', url: 'https://example.com/dup' },
map: [{ title: 'Third', url: 'https://example.com/dup' }],
},
})

expect(results).toEqual([
{ title: 'First', url: 'https://example.com/dup', snippet: undefined },
])
})

test('returns empty array when grounding is missing', () => {
expect(extractBraveResults({})).toEqual([])
})

test('returns empty array when grounding arrays are absent', () => {
expect(extractBraveResults({ grounding: {} })).toEqual([])
})
})
Loading