Skip to content

feat: add Doubao (Ark) LLM provider#212

Open
vegerot wants to merge 4 commits intoJerryZLiu:mainfrom
vegerot:pr212
Open

feat: add Doubao (Ark) LLM provider#212
vegerot wants to merge 4 commits intoJerryZLiu:mainfrom
vegerot:pr212

Conversation

@vegerot
Copy link
Copy Markdown
Contributor

@vegerot vegerot commented Feb 20, 2026

Adds Volcengine Ark/Doubao as a selectable provider with onboarding + settings wiring and connection testing.

  • use Ark video understanding for screenshot transcription

  • Send a single timelapse MP4 via Ark chat-completions video_url instead of attaching per-frame image_url, and expand video timestamps back to real time.

  • use structured output for Doubao

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com


Stack created with Sapling. Best reviewed with ReviewStack.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Dayflow’s LLM infrastructure to support structured JSON output (schemas + shared prompt/templates + shared transcript/timeline validation) and adds Volcengine Ark/Doubao as an additional provider option across onboarding and settings.

Changes:

  • Add Ark/Doubao provider end-to-end (selection, setup, test-connection, routing, and service/provider implementation).
  • Introduce shared prompt/schema utilities and shared transcript/timeline parsing + validation helpers to standardize provider output.
  • Add local analytics event logging alongside PostHog capture.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
Dayflow/Dayflow/Views/UI/Settings/SettingsProvidersTabView.swift Wires provider-specific connection tests (Gemini vs Doubao).
Dayflow/Dayflow/Views/UI/Settings/ProvidersSettingsViewModel.swift Adds Doubao provider configuration/visibility + migrates prompt overrides storage API to VideoPromptPreferences.
Dayflow/Dayflow/Views/UI/RetryCoordinator.swift Updates step labels to show progress (1/2, 2/2).
Dayflow/Dayflow/Views/Onboarding/TestConnectionView.swift Generalizes connection testing to multiple providers and adds Doubao test logic.
Dayflow/Dayflow/Views/Onboarding/OnboardingLLMSelectionView.swift Adds Doubao onboarding card and makes card sizing dynamic based on provider count.
Dayflow/Dayflow/Views/Onboarding/LLMProviderSetupView.swift Adds Doubao onboarding setup flow (key + base URL + model ID + test).
Dayflow/Dayflow/System/AnalyticsService.swift Routes analytics through a new “PostHog + local log” capture path.
Dayflow/Dayflow/Core/Analysis/TimeParsing.swift Adds shared timestamp parsing, transcript decoding, and timeline validation utilities.
Dayflow/Dayflow/Core/AI/PromptPreferences.swift Introduces shared prompt override storage and shared prompt templates.
Dayflow/Dayflow/Core/AI/LLMTypes.swift Adds Doubao provider types + preferences keys/defaults.
Dayflow/Dayflow/Core/AI/LLMService.swift Adds provider creation/routing for Doubao for both batch and text actions.
Dayflow/Dayflow/Core/AI/LLMSchema.swift Adds reusable JSON schema strings for structured output.
Dayflow/Dayflow/Core/AI/GeminiPromptPreferences.swift Removes Gemini-specific prompt override storage (replaced by shared Video prompt prefs).
Dayflow/Dayflow/Core/AI/GeminiDirectProvider.swift Switches to shared prompt/templates/utilities; adds schema-driven structured output configuration.
Dayflow/Dayflow/Core/AI/DoubaoArkProvider.swift Adds Ark/Doubao OpenAI-compatible provider implementation (chat-completions + schema + video transcription + card generation).
Dayflow/Dayflow/Core/AI/ChatCLIPromptPreferences.swift Tweaks detailed-summary format example lines.
Dayflow/Dayflow/App/AppDelegate.swift Includes Doubao in analysis_job_started provider analytics.
Dayflow/Dayflow.xcodeproj/project.pbxproj Updates signing team ID.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 910 to 912
private func geminiTranscribeRequest(fileURI: String, mimeType: String, prompt: String, batchId: Int64?, groupId: String, model: GeminiModel, attempt: Int) async throws -> (String, String) {
let transcriptionSchema: [String:Any] = [
"type":"ARRAY",
"items": [
"type":"OBJECT",
"properties":[
"startTimestamp":["type":"STRING"],
"endTimestamp": ["type":"STRING"],
"description": ["type":"STRING"]
],
"required":["startTimestamp","endTimestamp","description"],
"propertyOrdering":["startTimestamp","endTimestamp","description"]
]
]

let transcriptionSchemaObject = try! JSONSerialization.jsonObject(with: Data(LLMSchema.screenRecordingTranscriptionSchema.utf8))
let generationConfig: [String: Any] = [
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using try! to parse LLMSchema.screenRecordingTranscriptionSchema will crash the app if the schema string ever becomes invalid JSON (including from future edits/merge conflicts). Prefer a non-crashing path (e.g., guard let ... else { ... }) and fail the request gracefully or omit the schema in that case.

Copilot uses AI. Check for mistakes.
Comment on lines +1499 to +1500
"responseMimeType": "application/json",
"responseJsonSchema": activityCardsSchemaObject
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activityCardsSchemaObject is optional, but is passed into generationConfig as Any here. If it remains an Optional(...) at runtime, JSONSerialization.data(withJSONObject:) will throw because Optionals are not valid JSON objects. Unwrap it before inserting into the dictionary (or omit the key when nil).

Copilot uses AI. Check for mistakes.
Comment on lines 224 to 228
private func captureToPostHogAndLocal(_ name: String, properties: [String: Any]) {
Task.detached(priority: .utility) {
PostHogSDK.shared.capture("person_props_updated", properties: payload)
self.logLocal(name, properties: properties)
PostHogSDK.shared.capture(name, properties: properties)
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

captureToPostHogAndLocal logs the provided properties without re-sanitizing them. Today most call sites pass already-sanitized payloads, but this makes it easy for a future call site to accidentally log sensitive keys locally (via print/OSLog). Consider sanitizing inside captureToPostHogAndLocal as a defense-in-depth measure.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +234
let line = truncate("[Analytics] \(event) \(json)")
print(line)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logLocal unconditionally prints every analytics event to stdout. This can produce a lot of noise in production builds and may have performance/operational impact; consider gating print behind #if DEBUG (and rely on Logger for release) or making local logging configurable.

Suggested change
let line = truncate("[Analytics] \(event) \(json)")
print(line)
let line = truncate("[Analytics] \(event) \(json)")
#if DEBUG
print(line)
#endif

Copilot uses AI. Check for mistakes.
Bad example:
"7:00 AM - 7:30 AM writing Notion doc
7:30 AM - 7:35 AM Slack
7:30 AM - 7:35: AM Slack
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example line has a formatting typo: 7:30 AM - 7:35: AM Slack (extra colon after 7:35). This will be copied by models/users and should match the intended [start] - [end]: ... format.

Suggested change
7:30 AM - 7:35: AM Slack
7:30 AM - 7:35 AM: Slack

Copilot uses AI. Check for mistakes.
CURRENT_PROJECT_VERSION = 95;
DEVELOPMENT_ASSET_PATHS = "\"Dayflow/Preview Content\"";
DEVELOPMENT_TEAM = L75WYD8X4Y;
DEVELOPMENT_TEAM = GWKJRP4KM9;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the Xcode project DEVELOPMENT_TEAM value. Committing a team ID update typically breaks code signing for other developers/CI and is usually environment-specific; suggest reverting this change unless the repo explicitly intends to rotate the shared team ID.

Suggested change
DEVELOPMENT_TEAM = GWKJRP4KM9;
DEVELOPMENT_TEAM = "";

Copilot uses AI. Check for mistakes.
Comment on lines 4 to +8
private let cachedHMMAFormatters: [DateFormatter] = {
let formats = [
"h:mma", // 09:30AM, 9:30AM
"hh:mma", // 09:30AM
"h:mm a", // 09:30 AM, 9:30 AM
"hh:mm a" // 09:30 AM
]
return formats.map { format in
let formatter = DateFormatter()
formatter.locale = Locale(identifier: "en_US_POSIX") // Essential for AM/PM parsing
formatter.dateFormat = format
return formatter
}
let formats = [
"h:mma", // 09:30AM, 9:30AM
"hh:mma", // 09:30AM
"h:mm a", // 09:30 AM, 9:30 AM
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeParsing.swift is using 2-space indentation in this section, while most Swift files in this repo use 4-space indentation (e.g., AppDelegate.swift). Consider reformatting to match the prevailing style to avoid churn in future diffs.

Copilot uses AI. Check for mistakes.
@vegerot vegerot force-pushed the pr212 branch 3 times, most recently from 9bc81d6 to db963e3 Compare March 5, 2026 02:02
@vegerot vegerot changed the title feat: use structured output to improve responses feat: add Doubao (Ark) LLM provider Mar 5, 2026
vegerot and others added 4 commits March 6, 2026 16:12
- Extract shared prompt templates into LLMPromptTemplates (GeminiPromptPreferences.swift)
- Add VideoPromptPreferences/VideoPromptOverrides/VideoPromptSections types,
  replacing GeminiPromptPreferences/GeminiPromptOverrides/GeminiPromptSections
- Centralize transcript JSON decoding and observation conversion in
  LLMTranscriptUtilities (TimeParsing.swift) for reuse across providers
- Refactor GeminiDirectProvider to use LLMPromptTemplates and LLMTranscriptUtilities
- Refactor TestConnectionView to accept a provider parameter with
  finishFailure/finishSuccess helpers for clean multi-provider support
- Fix OnboardingLLMSelectionView card-width calculation to be dynamic
  based on card count rather than hard-coded divisor of 3
- Update SettingsProvidersTabView and ProvidersSettingsViewModel to use
  new VideoPrompt* types

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ensures every AnalyticsService event is sent to PostHog, printed to stdout, and emitted via Apple Unified Logging.
Adds Volcengine Ark/Doubao as a selectable provider with onboarding + settings wiring and connection testing.

* use Ark video understanding for screenshot transcription

* Send a single timelapse MP4 via Ark chat-completions `video_url` instead of attaching per-frame `image_url`, and expand video timestamps back to real time.

* use structured output for Doubao

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants