fix(bilibili): 修正分P视频字幕优先链路未传p参数导致取错集#403
Open
wmsdsb wants to merge 3 commits into
Open
Conversation
问题:B站分P视频(如62集课程),提交?p=36链接时, 字幕优先链路通过x/web-interface/view API拿cid时未传p参数, 默认取第1集cid,导致生成的是第1集的笔记。 同时yt-dlp正确下载了p36音频,但被跳过。 修复: - url_parser新增extract_bilibili_p_number()提取URL中的p参数 - bilibili_subtitle的_get_cid()接收p参数,从data.pages[p-1]取对应分P的cid - fetch_subtitles()调用extract_bilibili_p_number()透传p
There was a problem hiding this comment.
Pull request overview
该 PR 修复了 B 站分 P 视频在“字幕优先”链路中未透传 ?p=N,导致通过 view API 取错 cid(默认第 1 集)从而生成错误笔记的问题,使字幕获取与 yt-dlp 下载的音频集数一致。
Changes:
- 新增 URL 工具函数
extract_bilibili_p_number(),用于从 B 站链接中解析分 P 序号p。 - 字幕拉取链路在取
cid时透传p并从data.pages[p-1]选择对应分集的cid。 - 日志与返回的
raw元信息中补充p字段,便于排查与追踪。
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backend/app/utils/url_parser.py | 增加从 B 站 URL 提取分 P 参数 p 的解析能力(含短链处理与尾缀形式)。 |
| backend/app/downloaders/bilibili_subtitle.py | 在字幕优先路径中透传 p 并按分 P 选择正确 cid,避免取到第 1 集字幕。 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1,5 +1,5 @@ | |||
| import re | |||
| from typing import Optional | |||
| from typing import Optional, Tuple | |||
Comment on lines
+76
to
+79
| # 匹配 /pN 尾缀形式(较少见) | ||
| match = re.search(r'/p(\d+)(?:/?$|\?|&)', url) | ||
| if match: | ||
| return int(match.group(1)) |
| from app.services.cookie_manager import CookieConfigManager | ||
| from app.utils.logger import get_logger | ||
| from app.utils.url_parser import extract_video_id | ||
| from app.utils.url_parser import extract_video_id, extract_bilibili_p_number |
Comment on lines
128
to
+136
| def fetch_subtitles(self, video_url: str) -> Optional[TranscriptResult]: | ||
| bvid = extract_video_id(video_url, "bilibili") | ||
| if not bvid: | ||
| logger.info("无法从 URL 提取 BV id") | ||
| return None | ||
|
|
||
| cid = self._get_cid(bvid) | ||
| # 提取分 P 序号 | ||
| p = extract_bilibili_p_number(video_url) | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
问题:B站分P视频(如62集课程),提交?p=36链接时,
字幕优先链路通过x/web-interface/view API拿cid时未传p参数,
默认取第1集cid,导致生成的是第1集的笔记。
同时yt-dlp正确下载了p36音频,但被跳过。
修复:
改动概述
修复 B 站分P视频提交 ?p=N 链接时,字幕优先链路未透传 p 参数,导致始终取第 1 集 cid 生成笔记。
为什么
B 站 x/web-interface/view API 默认返回第 1 集的 cid。分P视频(如 62 集课程)用户提交 ?p=36 时,extract_video_id() 只取了 BV 号丢掉了 p 参数,字幕链路用第 1 集的 cid 拉到第 1 集字幕,而 yt-dlp 下载的是正确的 p36 音频——两者对不上,GPT 基于错误的字幕生成笔记。
做了什么
测试方式
回归风险
Checklist
feature/*/fix/*/release/*/hotfix/*)develop;线上紧急 →master;发版 → 见 §4.3)type(scope): subject格式(CONTRIBUTING.md §5.1)README.md/CHANGELOG.md/CLAUDE.md/ 模块 README,如适用).env/ 大型二进制