-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Add brave as alternative WebSearchTool #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,11 @@ | ||
| # WEB_SEARCH_TOOL — 网页搜索工具 | ||
|
|
||
| > 实现状态:适配器架构完成,Bing 适配器为当前默认后端 | ||
| > 实现状态:适配器架构完成,支持 API / Bing / Brave 三种后端 | ||
| > 引用数:核心工具,无 feature flag 门控(始终启用) | ||
|
|
||
| ## 一、功能概述 | ||
|
|
||
| WebSearchTool 让模型可以搜索互联网获取最新信息。原始实现仅支持 Anthropic API 服务端搜索(`web_search_20250305` server tool),在第三方代理端点下不可用。现已重构为适配器架构,新增 Bing 搜索页面解析作为 fallback,确保任何 API 端点都能使用搜索功能。 | ||
| WebSearchTool 让模型可以搜索互联网获取最新信息。原始实现仅支持 Anthropic API 服务端搜索(`web_search_20250305` server tool),在第三方代理端点下不可用。现已重构为适配器架构,支持 API 服务端搜索,以及 Bing / Brave 两个 HTML 解析后端,确保任何 API 端点都能使用搜索功能。 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct backend description for Brave. Line 8 describes Brave as an “HTML parsing backend,” but 🤖 Prompt for AI Agents |
||
|
|
||
| ## 二、实现架构 | ||
|
|
||
|
|
@@ -21,9 +21,13 @@ WebSearchTool.call() | |
| │ └── 使用 web_search_20250305 server tool | ||
| │ 通过 queryModelWithStreaming 二次调用 API | ||
| │ | ||
| └── BingSearchAdapter — Bing HTML 抓取 + 正则提取(当前默认) | ||
| └── 直接抓取 Bing 搜索页 HTML | ||
| 正则提取 b_algo 块中的标题/URL/摘要 | ||
| ├── BingSearchAdapter — Bing HTML 抓取 + 正则提取 | ||
| │ └── 直接抓取 Bing 搜索页 HTML | ||
| │ 正则提取 b_algo 块中的标题/URL/摘要 | ||
| │ | ||
| └── BraveSearchAdapter — Brave LLM Context API | ||
| └── 调用 Brave HTTPS GET 接口 | ||
| 将 grounding payload 映射为标题/URL/摘要 | ||
| ``` | ||
|
|
||
| ### 2.2 模块结构 | ||
|
|
@@ -37,8 +41,9 @@ WebSearchTool.call() | |
| | 适配器工厂 | `src/tools/WebSearchTool/adapters/index.ts` | `createAdapter()` 工厂函数,选择后端 | | ||
| | API 适配器 | `src/tools/WebSearchTool/adapters/apiAdapter.ts` | 封装原有 `queryModelWithStreaming` 逻辑,使用 server tool | | ||
| | Bing 适配器 | `src/tools/WebSearchTool/adapters/bingAdapter.ts` | Bing HTML 抓取 + 正则解析 | | ||
| | 单元测试 | `src/tools/WebSearchTool/__tests__/bingAdapter.test.ts` | 32 个测试用例 | | ||
| | 集成测试 | `src/tools/WebSearchTool/__tests__/bingAdapter.integration.ts` | 真实网络请求验证 | | ||
| | Brave 适配器 | `src/tools/WebSearchTool/adapters/braveAdapter.ts` | Brave LLM Context API 适配与结果映射 | | ||
| | 单元测试 | `src/tools/WebSearchTool/__tests__/bingAdapter.test.ts`, `src/tools/WebSearchTool/__tests__/braveAdapter*.test.ts`, `src/tools/WebSearchTool/__tests__/adapterFactory.test.ts` | Bing / Brave 解析与工厂逻辑测试 | | ||
| | 集成测试 | `src/tools/WebSearchTool/__tests__/bingAdapter.integration.ts`, `src/tools/WebSearchTool/__tests__/braveAdapter.integration.ts` | 真实网络请求验证 | | ||
|
|
||
| ### 2.3 数据流 | ||
|
|
||
|
|
@@ -49,20 +54,18 @@ WebSearchTool.call() | |
| validateInput() — 校验 query 非空、allowed/block 不共存 | ||
| │ | ||
| ▼ | ||
| createAdapter() → BingSearchAdapter(当前硬编码) | ||
| createAdapter() → ApiSearchAdapter | BingSearchAdapter | BraveSearchAdapter | ||
| │ | ||
| ▼ | ||
| adapter.search(query, { allowedDomains, blockedDomains, signal, onProgress }) | ||
| │ | ||
| ├── onProgress({ type: 'query_update', query }) | ||
| │ | ||
| ├── axios.get(bing.com/search?q=...&setmkt=en-US) | ||
| │ └── 13 个 Edge 浏览器请求头 | ||
| ├── axios.get(search-engine-url) | ||
| │ └── API 鉴权请求头 | ||
| │ | ||
| ├── extractBingResults(html) — 正则提取 <li class="b_algo"> 块 | ||
| │ ├── resolveBingUrl() — 解码 base64 重定向 URL | ||
| │ ├── extractSnippet() — 三级降级摘要提取 | ||
| │ └── decodeHtmlEntities() — he.decode | ||
| ├── extractResults(payload) — 按后端提取结果 | ||
| │ └── grounding → SearchResult[] 映射 | ||
| │ | ||
| ├── 客户端域名过滤 (allowedDomains / blockedDomains) | ||
| │ | ||
|
|
@@ -117,19 +120,18 @@ Bing 返回的重定向 URL 格式:`bing.com/ck/a?...&u=a1aHR0cHM6Ly9...` | |
|
|
||
| ## 四、适配器选择逻辑 | ||
|
|
||
| 当前 `createAdapter()` 硬编码返回 `BingSearchAdapter`,原逻辑已注释保留: | ||
| `createAdapter()` 按以下优先级选择后端,并按选中的后端 key 缓存适配器实例: | ||
|
|
||
| ```typescript | ||
| export function createAdapter(): WebSearchAdapter { | ||
| return new BingSearchAdapter() | ||
| // 注释保留的选择逻辑: | ||
| // 1. WEB_SEARCH_ADAPTER 环境变量强制指定 api|bing | ||
| // 2. isFirstPartyAnthropicBaseUrl() → API 适配器 | ||
| // 3. 第三方端点 → Bing 适配器 | ||
| // 1. WEB_SEARCH_ADAPTER=api|bing|brave 显式指定 | ||
| // 2. Anthropic 官方 API Base URL → ApiSearchAdapter | ||
| // 3. 第三方代理 / 非官方端点 → BingSearchAdapter | ||
| } | ||
| ``` | ||
|
|
||
| 恢复自动选择:取消 `index.ts` 中的注释即可。 | ||
| 显式指定 `WEB_SEARCH_ADAPTER=brave` 时,会改用 Brave LLM Context API 后端,并要求 | ||
| `BRAVE_SEARCH_API_KEY` 或 `BRAVE_API_KEY`。 | ||
|
|
||
| ## 五、接口定义 | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| import { afterEach, describe, expect, mock, test } from 'bun:test' | ||
|
|
||
| let isFirstPartyBaseUrl = true | ||
|
|
||
| mock.module('../adapters/apiAdapter.js', () => ({ | ||
| ApiSearchAdapter: class ApiSearchAdapter {}, | ||
| })) | ||
|
|
||
| mock.module('../adapters/bingAdapter.js', () => ({ | ||
| BingSearchAdapter: class BingSearchAdapter {}, | ||
| })) | ||
|
|
||
| mock.module('../adapters/braveAdapter.js', () => ({ | ||
| BraveSearchAdapter: class BraveSearchAdapter {}, | ||
| })) | ||
|
|
||
| mock.module('../../../utils/model/providers.js', () => ({ | ||
| isFirstPartyAnthropicBaseUrl: () => isFirstPartyBaseUrl, | ||
| })) | ||
|
|
||
| const { createAdapter } = await import('../adapters/index') | ||
|
|
||
| const originalWebSearchAdapter = process.env.WEB_SEARCH_ADAPTER | ||
|
|
||
| afterEach(() => { | ||
| isFirstPartyBaseUrl = true | ||
|
|
||
| if (originalWebSearchAdapter === undefined) { | ||
| delete process.env.WEB_SEARCH_ADAPTER | ||
| } else { | ||
| process.env.WEB_SEARCH_ADAPTER = originalWebSearchAdapter | ||
| } | ||
| }) | ||
|
Comment on lines
+21
to
+33
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # First, let's look at the test file structure
cat -n src/tools/WebSearchTool/__tests__/adapterFactory.test.tsRepository: claude-code-best/claude-code Length of output: 2631 🏁 Script executed: # Now examine the adapters/index.ts to see the caching mechanism
cat -n src/tools/WebSearchTool/adapters/index.tsRepository: claude-code-best/claude-code Length of output: 1794 🏁 Script executed: # Check what isFirstPartyBaseUrl is and where it's defined
rg -n "isFirstPartyBaseUrl" src/tools/WebSearchTool/ -A 2 -B 2Repository: claude-code-best/claude-code Length of output: 2390 Reset the factory cache between test cases. The module-level 🤖 Prompt for AI Agents |
||
|
|
||
| describe('createAdapter', () => { | ||
| test('reuses the same instance when the selected backend does not change', () => { | ||
| process.env.WEB_SEARCH_ADAPTER = 'brave' | ||
|
|
||
| const firstAdapter = createAdapter() | ||
| const secondAdapter = createAdapter() | ||
|
|
||
| expect(firstAdapter).toBe(secondAdapter) | ||
| expect(firstAdapter.constructor.name).toBe('BraveSearchAdapter') | ||
| }) | ||
|
|
||
| test('rebuilds the adapter when WEB_SEARCH_ADAPTER changes', () => { | ||
| process.env.WEB_SEARCH_ADAPTER = 'brave' | ||
| const braveAdapter = createAdapter() | ||
|
|
||
| process.env.WEB_SEARCH_ADAPTER = 'bing' | ||
| const bingAdapter = createAdapter() | ||
|
|
||
| expect(bingAdapter).not.toBe(braveAdapter) | ||
| expect(bingAdapter.constructor.name).toBe('BingSearchAdapter') | ||
| }) | ||
|
|
||
| test('selects the API adapter for first-party Anthropic URLs', () => { | ||
| delete process.env.WEB_SEARCH_ADAPTER | ||
| isFirstPartyBaseUrl = true | ||
|
|
||
| expect(createAdapter().constructor.name).toBe('ApiSearchAdapter') | ||
| }) | ||
|
|
||
| test('selects the Bing adapter for third-party Anthropic base URLs', () => { | ||
| delete process.env.WEB_SEARCH_ADAPTER | ||
| isFirstPartyBaseUrl = false | ||
|
|
||
| expect(createAdapter().constructor.name).toBe('BingSearchAdapter') | ||
| }) | ||
| }) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| import { describe, expect, test } from 'bun:test' | ||
| import { extractBraveResults } from '../adapters/braveAdapter' | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion | 🟠 Major Use Switch this relative import to the configured path alias for consistency with project standards. As per coding guidelines, " 🤖 Prompt for AI Agents |
||
|
|
||
| describe('extractBraveResults', () => { | ||
| test('extracts generic grounding results', () => { | ||
| const results = extractBraveResults({ | ||
| grounding: { | ||
| generic: [ | ||
| { | ||
| title: 'Example Title 1', | ||
| url: 'https://example.com/page1', | ||
| snippets: ['First result description'], | ||
| }, | ||
| { | ||
| title: 'Example Title 2', | ||
| url: 'https://example.com/page2', | ||
| snippets: ['Second result description'], | ||
| }, | ||
| ], | ||
| }, | ||
| }) | ||
|
|
||
| expect(results).toEqual([ | ||
| { | ||
| title: 'Example Title 1', | ||
| url: 'https://example.com/page1', | ||
| snippet: 'First result description', | ||
| }, | ||
| { | ||
| title: 'Example Title 2', | ||
| url: 'https://example.com/page2', | ||
| snippet: 'Second result description', | ||
| }, | ||
| ]) | ||
| }) | ||
|
|
||
| test('combines generic, poi, and map grounding results', () => { | ||
| const results = extractBraveResults({ | ||
| grounding: { | ||
| generic: [{ title: 'Generic', url: 'https://example.com/generic' }], | ||
| poi: { title: 'POI', url: 'https://maps.example.com/poi' }, | ||
| map: [{ title: 'Map', url: 'https://maps.example.com/map' }], | ||
| }, | ||
| }) | ||
|
|
||
| expect(results).toEqual([ | ||
| { title: 'Generic', url: 'https://example.com/generic', snippet: undefined }, | ||
| { title: 'POI', url: 'https://maps.example.com/poi', snippet: undefined }, | ||
| { title: 'Map', url: 'https://maps.example.com/map', snippet: undefined }, | ||
| ]) | ||
| }) | ||
|
|
||
| test('joins multiple snippets into one summary string', () => { | ||
| const results = extractBraveResults({ | ||
| grounding: { | ||
| generic: [ | ||
| { | ||
| title: 'Joined Snippets', | ||
| url: 'https://example.com/joined', | ||
| snippets: ['First snippet.', 'Second snippet.'], | ||
| }, | ||
| ], | ||
| }, | ||
| }) | ||
|
|
||
| expect(results[0].snippet).toBe('First snippet. Second snippet.') | ||
| }) | ||
|
|
||
| test('skips entries without a title or URL', () => { | ||
| const results = extractBraveResults({ | ||
| grounding: { | ||
| generic: [ | ||
| { title: 'Missing URL' }, | ||
| { url: 'https://example.com/missing-title' }, | ||
| { title: 'Valid', url: 'https://example.com/valid' }, | ||
| ], | ||
| }, | ||
| }) | ||
|
|
||
| expect(results).toEqual([ | ||
| { title: 'Valid', url: 'https://example.com/valid', snippet: undefined }, | ||
| ]) | ||
| }) | ||
|
|
||
| test('deduplicates repeated URLs across grounding buckets', () => { | ||
| const results = extractBraveResults({ | ||
| grounding: { | ||
| generic: [{ title: 'First', url: 'https://example.com/dup' }], | ||
| poi: { title: 'Second', url: 'https://example.com/dup' }, | ||
| map: [{ title: 'Third', url: 'https://example.com/dup' }], | ||
| }, | ||
| }) | ||
|
|
||
| expect(results).toEqual([ | ||
| { title: 'First', url: 'https://example.com/dup', snippet: undefined }, | ||
| ]) | ||
| }) | ||
|
|
||
| test('returns empty array when grounding is missing', () => { | ||
| expect(extractBraveResults({})).toEqual([]) | ||
| }) | ||
|
|
||
| test('returns empty array when grounding arrays are absent', () => { | ||
| expect(extractBraveResults({ grounding: {} })).toEqual([]) | ||
| }) | ||
| }) | ||
Uh oh!
There was an error while loading. Please reload this page.