Skip to content

feat: add Doubao (Ark) LLM provider#209

Closed
vegerot wants to merge 2 commits intoJerryZLiu:mainfrom
vegerot:pr209
Closed

feat: add Doubao (Ark) LLM provider#209
vegerot wants to merge 2 commits intoJerryZLiu:mainfrom
vegerot:pr209

Conversation

@vegerot
Copy link
Copy Markdown
Contributor

@vegerot vegerot commented Feb 17, 2026

Adds Volcengine Ark/Doubao as a selectable provider with onboarding + settings wiring and connection testing.

  • use Ark video understanding for screenshot transcription

  • Send a single timelapse MP4 via Ark chat-completions video_url instead of attaching per-frame image_url, and expand video timestamps back to real time.

  • refactor: combine video-related logic

  • Centralize transcript JSON decoding and transcript→observation conversion so Gemini and Doubao use the same validation/expansion logic.


Stack created with Sapling. Best reviewed with ReviewStack.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Volcengine Ark / Doubao as a new selectable LLM provider and refactors shared video/transcript + prompt logic so Gemini and Doubao can share validation and prompt templates.

Changes:

  • Adds Doubao (Ark) provider selection + onboarding/setup flow and settings wiring, including a provider-specific connection test.
  • Introduces shared utilities for video timestamp parsing, transcript JSON decoding, and timeline validation; centralizes prompt templates and prompt override persistence.
  • Adds DoubaoArkProvider and integrates it into LLMService routing and analytics labeling.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
Dayflow/Dayflow/Views/UI/Settings/SettingsProvidersTabView.swift Wires provider-specific connection test view for Gemini/Doubao.
Dayflow/Dayflow/Views/UI/Settings/ProvidersSettingsViewModel.swift Adds Doubao provider configuration + metadata; switches prompt overrides to shared video prompt preferences.
Dayflow/Dayflow/Views/UI/RetryCoordinator.swift Updates progress labels to “1/2”, “2/2”.
Dayflow/Dayflow/Views/Onboarding/TestConnectionView.swift Generalizes connection testing across supported providers (Gemini + Doubao).
Dayflow/Dayflow/Views/Onboarding/OnboardingLLMSelectionView.swift Adds Doubao provider card and makes card layout dynamic.
Dayflow/Dayflow/Views/Onboarding/LLMProviderSetupView.swift Adds Doubao onboarding steps + base URL/model id inputs and persistence.
Dayflow/Dayflow/System/AnalyticsService.swift Adds local logging alongside PostHog capture.
Dayflow/Dayflow/Core/Analysis/TimeParsing.swift Adds shared LLM timestamp, transcript decode, and timeline validation utilities.
Dayflow/Dayflow/Core/AI/PromptPreferences.swift New shared prompt overrides + templates for video providers.
Dayflow/Dayflow/Core/AI/LLMTypes.swift Adds .doubaoArk provider type/id and Doubao preferences keys/defaults.
Dayflow/Dayflow/Core/AI/LLMService.swift Instantiates and routes to DoubaoArkProvider for batch + text operations.
Dayflow/Dayflow/Core/AI/GeminiPromptPreferences.swift Updates prompt example formatting (colon after time range).
Dayflow/Dayflow/Core/AI/GeminiDirectProvider.swift Uses shared prompt templates and shared transcript/timeline validation utilities.
Dayflow/Dayflow/Core/AI/DoubaoArkProvider.swift Implements Ark OpenAI-compatible chat-completions provider with video_url-based transcription and card generation.
Dayflow/Dayflow/Core/AI/ChatCLIPromptPreferences.swift Updates prompt example formatting (colon after time range).
Dayflow/Dayflow/App/AppDelegate.swift Adds Doubao provider label to analytics event payload.
Dayflow/Dayflow.xcodeproj/project.pbxproj Updates code signing team setting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +129 to +130
if let apiKey = KeychainManager.shared.retrieve(for: "doubao"), !apiKey.isEmpty {
return DoubaoArkProvider(apiKey: apiKey, endpoint: endpoint, modelId: modelId)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeDoubaoProvider only checks !apiKey.isEmpty without trimming whitespace/newlines. Elsewhere (setup + settings) keys are trimmed before validation. Consider trimming here too so a whitespace-only key doesn’t create a provider that will fail every request.

Suggested change
if let apiKey = KeychainManager.shared.retrieve(for: "doubao"), !apiKey.isEmpty {
return DoubaoArkProvider(apiKey: apiKey, endpoint: endpoint, modelId: modelId)
if let rawApiKey = KeychainManager.shared.retrieve(for: "doubao") {
let apiKey = rawApiKey.trimmingCharacters(in: .whitespacesAndNewlines)
if !apiKey.isEmpty {
return DoubaoArkProvider(apiKey: apiKey, endpoint: endpoint, modelId: modelId)
}

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +64
enum GeminiPromptDefaults {
static let titleBlock = """
Titles
Each title is a memory trigger. Be specific enough that it could only describe one situation.
"Bug fixes" could be anything. "Fixed the infinite scroll crash on search results" can only be one thing.
"Gaming session" could be any day. "League ARAM — Thresh and Jinx" is a specific session.

Use honest verbs
The verb matters. Pick the one that describes what actually happened, not the one that sounds most professional.
If someone was browsing a product page and picking options, they were "speccing out" a purchase — not "configuring" it (that implies they already own it). If someone scheduled a meeting, they "scheduled" it — not "coordinated" it. If someone was scrolling a feed, they were "scrolling" — not "catching up on industry news."
The wrong verb changes the memory. Get it right even if it sounds less impressive.

Accuracy over polish
Don't compress what happened into a technical-sounding phrase that loses the meaning. If the actual bug was "the notification wasn't showing up after regeneration," say that — don't abstract it into "verification pipeline error" because it sounds more engineered.
The title's job is to be TRUE and SPECIFIC, not to sound smart. When in doubt, describe the actual problem or action in plain language.

Titles can be longer
A title that's a few words longer but triggers a real memory beats a short vague one every time. Don't trim useful detail for brevity. Aim for roughly 5–15 words — but if word 12 is the one that makes you remember, keep it.

Banned words
These are corporate filler. No human writes them in a journal:
"research," "coordination," "management," "administration," "workflow," "sync," "alignment," "exploration," "investigation," "project development," "social chat," "various," "multiple," "several," "deep dive," "rabbit hole"
Don't just avoid these exact words — avoid the energy. "Analyzing" is just "research" in a lab coat. "Refining" is just "working on" trying to sound important. "Coordinated" is "scheduled" wearing a tie. If you wouldn't say it out loud to a friend, it's too formal.

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GeminiPromptDefaults is declared here but also already exists in Core/AI/GeminiPromptPreferences.swift, which will cause a duplicate symbol / ambiguous type compile error when using file-system–synchronized groups (both files are built). Consider moving the defaults to a single source of truth (e.g., keep GeminiPromptDefaults only in one file) and updating references accordingly, or deleting/renaming the older definitions if this file replaces them.

Suggested change
enum GeminiPromptDefaults {
static let titleBlock = """
Titles
Each title is a memory trigger. Be specific enough that it could only describe one situation.
"Bug fixes" could be anything. "Fixed the infinite scroll crash on search results" can only be one thing.
"Gaming session" could be any day. "League ARAM — Thresh and Jinx" is a specific session.
Use honest verbs
The verb matters. Pick the one that describes what actually happened, not the one that sounds most professional.
If someone was browsing a product page and picking options, they were "speccing out" a purchase — not "configuring" it (that implies they already own it). If someone scheduled a meeting, they "scheduled" it — not "coordinated" it. If someone was scrolling a feed, they were "scrolling" — not "catching up on industry news."
The wrong verb changes the memory. Get it right even if it sounds less impressive.
Accuracy over polish
Don't compress what happened into a technical-sounding phrase that loses the meaning. If the actual bug was "the notification wasn't showing up after regeneration," say that — don't abstract it into "verification pipeline error" because it sounds more engineered.
The title's job is to be TRUE and SPECIFIC, not to sound smart. When in doubt, describe the actual problem or action in plain language.
Titles can be longer
A title that's a few words longer but triggers a real memory beats a short vague one every time. Don't trim useful detail for brevity. Aim for roughly 515 words — but if word 12 is the one that makes you remember, keep it.
Banned words
These are corporate filler. No human writes them in a journal:
"research," "coordination," "management," "administration," "workflow," "sync," "alignment," "exploration," "investigation," "project development," "social chat," "various," "multiple," "several," "deep dive," "rabbit hole"
Don't just avoid these exact words — avoid the energy. "Analyzing" is just "research" in a lab coat. "Refining" is just "working on" trying to sound important. "Coordinated" is "scheduled" wearing a tie. If you wouldn't say it out loud to a friend, it's too formal.
// NOTE: GeminiPromptDefaults is defined in Core/AI/GeminiPromptPreferences.swift
// to provide a single source of truth for these defaults and avoid duplicate symbols.

Copilot uses AI. Check for mistakes.
Bad example:
"7:00 AM - 7:30 AM writing Notion doc
7:30 AM - 7:35 AM Slack
7:30 AM - 7:35: AM Slack
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example timestamp format has a typo: 7:30 AM - 7:35: AM Slack includes an extra colon after 7:35, which breaks the intended [H:MM AM/PM] - [H:MM AM/PM]: ... pattern.

Suggested change
7:30 AM - 7:35: AM Slack
7:30 AM - 7:35 AM Slack

Copilot uses AI. Check for mistakes.
CURRENT_PROJECT_VERSION = 95;
DEVELOPMENT_ASSET_PATHS = "\"Dayflow/Preview Content\"";
DEVELOPMENT_TEAM = L75WYD8X4Y;
DEVELOPMENT_TEAM = GWKJRP4KM9;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change updates DEVELOPMENT_TEAM in the project file. That setting is usually machine/account-specific and will break code signing for other contributors/CI unless intentionally coordinated. Consider reverting this change or moving it to a separate, explicitly-scoped signing/CI PR.

Suggested change
DEVELOPMENT_TEAM = GWKJRP4KM9;

Copilot uses AI. Check for mistakes.
Comment on lines 3 to +16
/// Cached DateFormatters for time parsing - creating DateFormatters is expensive (ICU initialization)
private let cachedHMMAFormatters: [DateFormatter] = {
let formats = [
"h:mma", // 09:30AM, 9:30AM
"hh:mma", // 09:30AM
"h:mm a", // 09:30 AM, 9:30 AM
"hh:mm a" // 09:30 AM
]
return formats.map { format in
let formatter = DateFormatter()
formatter.locale = Locale(identifier: "en_US_POSIX") // Essential for AM/PM parsing
formatter.dateFormat = format
return formatter
}
let formats = [
"h:mma", // 09:30AM, 9:30AM
"hh:mma", // 09:30AM
"h:mm a", // 09:30 AM, 9:30 AM
"hh:mm a", // 09:30 AM
]
return formats.map { format in
let formatter = DateFormatter()
formatter.locale = Locale(identifier: "en_US_POSIX") // Essential for AM/PM parsing
formatter.dateFormat = format
return formatter
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cachedHMMAFormatters stores shared DateFormatter instances, but DateFormatter is not thread-safe and parseTimeHMMA is called from nonisolated/background contexts (e.g., timeline validation during LLM work). This can lead to data races/crashes. Consider either creating a fresh formatter per call, or protecting shared formatters behind a lock/queue, or switching to a thread-safe parsing strategy.

Copilot uses AI. Check for mistakes.
Comment on lines +258 to +266
/// Shared utilities for decoding model-generated transcript JSON and converting it into Dayflow observations.
///
/// Providers differ in transport (Gemini schema-constrained vs Ark chat-completions), but the transcript payload
/// is intentionally normalized to the same `[{ startTimestamp, endTimestamp, description }]` array.
enum LLMTranscriptUtilities {
struct VideoTranscriptChunk: Codable {
let startTimestamp: String
let endTimestamp: String
let description: String
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New transcript/timeline utilities are introduced here, but the repo already has unit tests for TimeParsing (TimeParsingTests.swift). Consider adding targeted tests for the new JSON decoding + bracket-fallback extraction and timestamp expansion/filtering behavior to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +235
let json = jsonString(properties)
let line = truncate("[Analytics] \(event) \(json)")
print(line)
localLogger.info("\(line, privacy: .public)")
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logLocal prints every analytics event to stdout and logs it with privacy: .public. This can be very noisy in production and risks leaking sensitive/PII if it slips past sanitize. Consider gating local logging behind #if DEBUG (or a dedicated runtime flag) and using .private (or omitting properties) for OSLog privacy.

Suggested change
let json = jsonString(properties)
let line = truncate("[Analytics] \(event) \(json)")
print(line)
localLogger.info("\(line, privacy: .public)")
#if DEBUG
let json = jsonString(properties)
let line = truncate("[Analytics] \(event) \(json)")
print(line)
localLogger.info("\(line, privacy: .private)")
#else
_ = event
_ = properties
#endif

Copilot uses AI. Check for mistakes.
@JerryZLiu
Copy link
Copy Markdown
Owner

unfortunately i don't think it makes sense to merge this for now - my current bar is that anything merged should be useful to at least 10% of users and i don't think doubao is very popular right now. could change in the future though! obviously feel free to use it in your own fork though!

Ensures every AnalyticsService event is sent to PostHog, printed to stdout, and emitted via Apple Unified Logging.
Adds Volcengine Ark/Doubao as a selectable provider with onboarding + settings wiring and connection testing.

* use Ark video understanding for screenshot transcription

* Send a single timelapse MP4 via Ark chat-completions `video_url` instead of attaching per-frame `image_url`, and expand video timestamps back to real time.

* refactor: combine video-related logic

* Centralize transcript JSON decoding and transcript→observation conversion so Gemini and Doubao use the same validation/expansion logic.
@vegerot
Copy link
Copy Markdown
Contributor Author

vegerot commented Mar 5, 2026

unfortunately i don't think it makes sense to merge this for now - my current bar is that anything merged should be useful to at least 10% of users and i don't think doubao is very popular right now. could change in the future though! obviously feel free to use it in your own fork though!

@JerryZLiu Makes total sense! I split this patch into 2:

  1. refactor: centralize LLM prompt templates and transcript utilities #226 - making it easier to add video-understanding LLMs in the future
  2. Add Doubao support, on my fork

Could we please land #226? I think it's an improvement for this project as well as making my fork easier to maintain.

@vegerot vegerot closed this Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants