feat: add Doubao (Ark) LLM provider#209
Conversation
d24845a to
af0c6fc
Compare
0d96ce6 to
aa1f7f8
Compare
There was a problem hiding this comment.
Pull request overview
Adds Volcengine Ark / Doubao as a new selectable LLM provider and refactors shared video/transcript + prompt logic so Gemini and Doubao can share validation and prompt templates.
Changes:
- Adds Doubao (Ark) provider selection + onboarding/setup flow and settings wiring, including a provider-specific connection test.
- Introduces shared utilities for video timestamp parsing, transcript JSON decoding, and timeline validation; centralizes prompt templates and prompt override persistence.
- Adds
DoubaoArkProviderand integrates it intoLLMServicerouting and analytics labeling.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| Dayflow/Dayflow/Views/UI/Settings/SettingsProvidersTabView.swift | Wires provider-specific connection test view for Gemini/Doubao. |
| Dayflow/Dayflow/Views/UI/Settings/ProvidersSettingsViewModel.swift | Adds Doubao provider configuration + metadata; switches prompt overrides to shared video prompt preferences. |
| Dayflow/Dayflow/Views/UI/RetryCoordinator.swift | Updates progress labels to “1/2”, “2/2”. |
| Dayflow/Dayflow/Views/Onboarding/TestConnectionView.swift | Generalizes connection testing across supported providers (Gemini + Doubao). |
| Dayflow/Dayflow/Views/Onboarding/OnboardingLLMSelectionView.swift | Adds Doubao provider card and makes card layout dynamic. |
| Dayflow/Dayflow/Views/Onboarding/LLMProviderSetupView.swift | Adds Doubao onboarding steps + base URL/model id inputs and persistence. |
| Dayflow/Dayflow/System/AnalyticsService.swift | Adds local logging alongside PostHog capture. |
| Dayflow/Dayflow/Core/Analysis/TimeParsing.swift | Adds shared LLM timestamp, transcript decode, and timeline validation utilities. |
| Dayflow/Dayflow/Core/AI/PromptPreferences.swift | New shared prompt overrides + templates for video providers. |
| Dayflow/Dayflow/Core/AI/LLMTypes.swift | Adds .doubaoArk provider type/id and Doubao preferences keys/defaults. |
| Dayflow/Dayflow/Core/AI/LLMService.swift | Instantiates and routes to DoubaoArkProvider for batch + text operations. |
| Dayflow/Dayflow/Core/AI/GeminiPromptPreferences.swift | Updates prompt example formatting (colon after time range). |
| Dayflow/Dayflow/Core/AI/GeminiDirectProvider.swift | Uses shared prompt templates and shared transcript/timeline validation utilities. |
| Dayflow/Dayflow/Core/AI/DoubaoArkProvider.swift | Implements Ark OpenAI-compatible chat-completions provider with video_url-based transcription and card generation. |
| Dayflow/Dayflow/Core/AI/ChatCLIPromptPreferences.swift | Updates prompt example formatting (colon after time range). |
| Dayflow/Dayflow/App/AppDelegate.swift | Adds Doubao provider label to analytics event payload. |
| Dayflow/Dayflow.xcodeproj/project.pbxproj | Updates code signing team setting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if let apiKey = KeychainManager.shared.retrieve(for: "doubao"), !apiKey.isEmpty { | ||
| return DoubaoArkProvider(apiKey: apiKey, endpoint: endpoint, modelId: modelId) |
There was a problem hiding this comment.
makeDoubaoProvider only checks !apiKey.isEmpty without trimming whitespace/newlines. Elsewhere (setup + settings) keys are trimmed before validation. Consider trimming here too so a whitespace-only key doesn’t create a provider that will fail every request.
| if let apiKey = KeychainManager.shared.retrieve(for: "doubao"), !apiKey.isEmpty { | |
| return DoubaoArkProvider(apiKey: apiKey, endpoint: endpoint, modelId: modelId) | |
| if let rawApiKey = KeychainManager.shared.retrieve(for: "doubao") { | |
| let apiKey = rawApiKey.trimmingCharacters(in: .whitespacesAndNewlines) | |
| if !apiKey.isEmpty { | |
| return DoubaoArkProvider(apiKey: apiKey, endpoint: endpoint, modelId: modelId) | |
| } |
| enum GeminiPromptDefaults { | ||
| static let titleBlock = """ | ||
| Titles | ||
| Each title is a memory trigger. Be specific enough that it could only describe one situation. | ||
| "Bug fixes" could be anything. "Fixed the infinite scroll crash on search results" can only be one thing. | ||
| "Gaming session" could be any day. "League ARAM — Thresh and Jinx" is a specific session. | ||
|
|
||
| Use honest verbs | ||
| The verb matters. Pick the one that describes what actually happened, not the one that sounds most professional. | ||
| If someone was browsing a product page and picking options, they were "speccing out" a purchase — not "configuring" it (that implies they already own it). If someone scheduled a meeting, they "scheduled" it — not "coordinated" it. If someone was scrolling a feed, they were "scrolling" — not "catching up on industry news." | ||
| The wrong verb changes the memory. Get it right even if it sounds less impressive. | ||
|
|
||
| Accuracy over polish | ||
| Don't compress what happened into a technical-sounding phrase that loses the meaning. If the actual bug was "the notification wasn't showing up after regeneration," say that — don't abstract it into "verification pipeline error" because it sounds more engineered. | ||
| The title's job is to be TRUE and SPECIFIC, not to sound smart. When in doubt, describe the actual problem or action in plain language. | ||
|
|
||
| Titles can be longer | ||
| A title that's a few words longer but triggers a real memory beats a short vague one every time. Don't trim useful detail for brevity. Aim for roughly 5–15 words — but if word 12 is the one that makes you remember, keep it. | ||
|
|
||
| Banned words | ||
| These are corporate filler. No human writes them in a journal: | ||
| "research," "coordination," "management," "administration," "workflow," "sync," "alignment," "exploration," "investigation," "project development," "social chat," "various," "multiple," "several," "deep dive," "rabbit hole" | ||
| Don't just avoid these exact words — avoid the energy. "Analyzing" is just "research" in a lab coat. "Refining" is just "working on" trying to sound important. "Coordinated" is "scheduled" wearing a tie. If you wouldn't say it out loud to a friend, it's too formal. | ||
|
|
There was a problem hiding this comment.
GeminiPromptDefaults is declared here but also already exists in Core/AI/GeminiPromptPreferences.swift, which will cause a duplicate symbol / ambiguous type compile error when using file-system–synchronized groups (both files are built). Consider moving the defaults to a single source of truth (e.g., keep GeminiPromptDefaults only in one file) and updating references accordingly, or deleting/renaming the older definitions if this file replaces them.
| enum GeminiPromptDefaults { | |
| static let titleBlock = """ | |
| Titles | |
| Each title is a memory trigger. Be specific enough that it could only describe one situation. | |
| "Bug fixes" could be anything. "Fixed the infinite scroll crash on search results" can only be one thing. | |
| "Gaming session" could be any day. "League ARAM — Thresh and Jinx" is a specific session. | |
| Use honest verbs | |
| The verb matters. Pick the one that describes what actually happened, not the one that sounds most professional. | |
| If someone was browsing a product page and picking options, they were "speccing out" a purchase — not "configuring" it (that implies they already own it). If someone scheduled a meeting, they "scheduled" it — not "coordinated" it. If someone was scrolling a feed, they were "scrolling" — not "catching up on industry news." | |
| The wrong verb changes the memory. Get it right even if it sounds less impressive. | |
| Accuracy over polish | |
| Don't compress what happened into a technical-sounding phrase that loses the meaning. If the actual bug was "the notification wasn't showing up after regeneration," say that — don't abstract it into "verification pipeline error" because it sounds more engineered. | |
| The title's job is to be TRUE and SPECIFIC, not to sound smart. When in doubt, describe the actual problem or action in plain language. | |
| Titles can be longer | |
| A title that's a few words longer but triggers a real memory beats a short vague one every time. Don't trim useful detail for brevity. Aim for roughly 5–15 words — but if word 12 is the one that makes you remember, keep it. | |
| Banned words | |
| These are corporate filler. No human writes them in a journal: | |
| "research," "coordination," "management," "administration," "workflow," "sync," "alignment," "exploration," "investigation," "project development," "social chat," "various," "multiple," "several," "deep dive," "rabbit hole" | |
| Don't just avoid these exact words — avoid the energy. "Analyzing" is just "research" in a lab coat. "Refining" is just "working on" trying to sound important. "Coordinated" is "scheduled" wearing a tie. If you wouldn't say it out loud to a friend, it's too formal. | |
| // NOTE: GeminiPromptDefaults is defined in Core/AI/GeminiPromptPreferences.swift | |
| // to provide a single source of truth for these defaults and avoid duplicate symbols. |
| Bad example: | ||
| "7:00 AM - 7:30 AM writing Notion doc | ||
| 7:30 AM - 7:35 AM Slack | ||
| 7:30 AM - 7:35: AM Slack |
There was a problem hiding this comment.
The example timestamp format has a typo: 7:30 AM - 7:35: AM Slack includes an extra colon after 7:35, which breaks the intended [H:MM AM/PM] - [H:MM AM/PM]: ... pattern.
| 7:30 AM - 7:35: AM Slack | |
| 7:30 AM - 7:35 AM Slack |
| CURRENT_PROJECT_VERSION = 95; | ||
| DEVELOPMENT_ASSET_PATHS = "\"Dayflow/Preview Content\""; | ||
| DEVELOPMENT_TEAM = L75WYD8X4Y; | ||
| DEVELOPMENT_TEAM = GWKJRP4KM9; |
There was a problem hiding this comment.
This change updates DEVELOPMENT_TEAM in the project file. That setting is usually machine/account-specific and will break code signing for other contributors/CI unless intentionally coordinated. Consider reverting this change or moving it to a separate, explicitly-scoped signing/CI PR.
| DEVELOPMENT_TEAM = GWKJRP4KM9; |
| /// Cached DateFormatters for time parsing - creating DateFormatters is expensive (ICU initialization) | ||
| private let cachedHMMAFormatters: [DateFormatter] = { | ||
| let formats = [ | ||
| "h:mma", // 09:30AM, 9:30AM | ||
| "hh:mma", // 09:30AM | ||
| "h:mm a", // 09:30 AM, 9:30 AM | ||
| "hh:mm a" // 09:30 AM | ||
| ] | ||
| return formats.map { format in | ||
| let formatter = DateFormatter() | ||
| formatter.locale = Locale(identifier: "en_US_POSIX") // Essential for AM/PM parsing | ||
| formatter.dateFormat = format | ||
| return formatter | ||
| } | ||
| let formats = [ | ||
| "h:mma", // 09:30AM, 9:30AM | ||
| "hh:mma", // 09:30AM | ||
| "h:mm a", // 09:30 AM, 9:30 AM | ||
| "hh:mm a", // 09:30 AM | ||
| ] | ||
| return formats.map { format in | ||
| let formatter = DateFormatter() | ||
| formatter.locale = Locale(identifier: "en_US_POSIX") // Essential for AM/PM parsing | ||
| formatter.dateFormat = format | ||
| return formatter | ||
| } |
There was a problem hiding this comment.
cachedHMMAFormatters stores shared DateFormatter instances, but DateFormatter is not thread-safe and parseTimeHMMA is called from nonisolated/background contexts (e.g., timeline validation during LLM work). This can lead to data races/crashes. Consider either creating a fresh formatter per call, or protecting shared formatters behind a lock/queue, or switching to a thread-safe parsing strategy.
| /// Shared utilities for decoding model-generated transcript JSON and converting it into Dayflow observations. | ||
| /// | ||
| /// Providers differ in transport (Gemini schema-constrained vs Ark chat-completions), but the transcript payload | ||
| /// is intentionally normalized to the same `[{ startTimestamp, endTimestamp, description }]` array. | ||
| enum LLMTranscriptUtilities { | ||
| struct VideoTranscriptChunk: Codable { | ||
| let startTimestamp: String | ||
| let endTimestamp: String | ||
| let description: String |
There was a problem hiding this comment.
New transcript/timeline utilities are introduced here, but the repo already has unit tests for TimeParsing (TimeParsingTests.swift). Consider adding targeted tests for the new JSON decoding + bracket-fallback extraction and timestamp expansion/filtering behavior to prevent regressions.
| let json = jsonString(properties) | ||
| let line = truncate("[Analytics] \(event) \(json)") | ||
| print(line) | ||
| localLogger.info("\(line, privacy: .public)") |
There was a problem hiding this comment.
logLocal prints every analytics event to stdout and logs it with privacy: .public. This can be very noisy in production and risks leaking sensitive/PII if it slips past sanitize. Consider gating local logging behind #if DEBUG (or a dedicated runtime flag) and using .private (or omitting properties) for OSLog privacy.
| let json = jsonString(properties) | |
| let line = truncate("[Analytics] \(event) \(json)") | |
| print(line) | |
| localLogger.info("\(line, privacy: .public)") | |
| #if DEBUG | |
| let json = jsonString(properties) | |
| let line = truncate("[Analytics] \(event) \(json)") | |
| print(line) | |
| localLogger.info("\(line, privacy: .private)") | |
| #else | |
| _ = event | |
| _ = properties | |
| #endif |
|
unfortunately i don't think it makes sense to merge this for now - my current bar is that anything merged should be useful to at least 10% of users and i don't think doubao is very popular right now. could change in the future though! obviously feel free to use it in your own fork though! |
Ensures every AnalyticsService event is sent to PostHog, printed to stdout, and emitted via Apple Unified Logging.
Adds Volcengine Ark/Doubao as a selectable provider with onboarding + settings wiring and connection testing. * use Ark video understanding for screenshot transcription * Send a single timelapse MP4 via Ark chat-completions `video_url` instead of attaching per-frame `image_url`, and expand video timestamps back to real time. * refactor: combine video-related logic * Centralize transcript JSON decoding and transcript→observation conversion so Gemini and Doubao use the same validation/expansion logic.
@JerryZLiu Makes total sense! I split this patch into 2:
Could we please land #226? I think it's an improvement for this project as well as making my fork easier to maintain. |
Adds Volcengine Ark/Doubao as a selectable provider with onboarding + settings wiring and connection testing.
use Ark video understanding for screenshot transcription
Send a single timelapse MP4 via Ark chat-completions
video_urlinstead of attaching per-frameimage_url, and expand video timestamps back to real time.refactor: combine video-related logic
Centralize transcript JSON decoding and transcript→observation conversion so Gemini and Doubao use the same validation/expansion logic.
Stack created with Sapling. Best reviewed with ReviewStack.