Skip to content

Add iMessage sync support#224

Open
sternryan wants to merge 3 commits intowesm:mainfrom
sternryan:imessage-upstream
Open

Add iMessage sync support#224
sternryan wants to merge 3 commits intowesm:mainfrom
sternryan:imessage-upstream

Conversation

@sternryan
Copy link
Copy Markdown

Summary

  • New sync-imessage command that reads macOS ~/Library/Messages/chat.db and imports messages into msgvault
  • Implements gmail.API interface over SQLite chat.db for plug-and-play integration with existing sync infrastructure
  • Uses existing SourceType field in sync.Options (added by IMAP support)

New files

  • internal/imessage/client.go - gmail.API implementation over chat.db
  • internal/imessage/parser.go - MIME builder, timestamp conversion
  • internal/imessage/models.go - chat.db row types
  • internal/imessage/parser_test.go - tests for parsing utilities
  • cmd/msgvault/cmd/sync_imessage.go - CLI command

Usage

msgvault sync-imessage

Tested against macOS Messages database with ~10k conversations.

Implement a sync-imessage command that reads from macOS's
~/Library/Messages/chat.db and imports messages into the msgvault
archive using the existing sync infrastructure.

New files:
- internal/imessage/client.go: gmail.API implementation over chat.db
- internal/imessage/parser.go: MIME builder, timestamp conversion
- internal/imessage/models.go: chat.db row types
- internal/imessage/parser_test.go: tests for parsing utilities
- cmd/msgvault/cmd/sync_imessage.go: CLI command

Uses existing SourceType field in sync.Options (added by IMAP support).
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (0508613)

Summary Verdict: Adds a new sync-imessage client but contains high-severity data loss issues regarding missing attachments and attributedBody parsing, alongside bugs
with date filtering and nil pointer dereferences.

High

  • internal/imessage/client.go:266 and 295

    • Problem: The importer only reads message.text. Many Messages DB rows keep the visible text only in
      attributedBody when text is NULL, so those messages will be archived as blank messages with empty snippets.
    • Fix: Select attributedBody and decode it as a fallback whenever message.text is null; add coverage for rows where only attributedBody is populated.
  • internal/imessage/client.go:258 and 301

    • Problem: cache_has_attachments is fetched but never used, and the code never reads the attachment tables or emits MIME parts for them. Messages with photos/
      files/stickers will therefore be imported without their attachments, silently losing data.
    • Fix: Read attachment metadata via the join tables and feed attachment content through the existing archive pipeline, or fail/warn explicitly until attachment support is implemented.

Medium

  • cmd/msgvault/ cmd/sync_imessage.go:80 and 88

    • Problem: --after and --before are parsed with time.Parse, which treats YYYY-MM-DD as midnight UTC. On a local macOS CLI this shifts the intended day boundary
      for non-UTC users, causing messages around midnight local time to be included or excluded incorrectly.
    • Fix: Parse date-only filters with time.ParseInLocation(..., time.Local) so filtering matches the user’s local calendar day.
  • internal/imessage/client .go:340

    • Problem: GetMessagesRawBatch pre-allocates the results slice and continues on errors, leaving nil entries in the slice. This may cause nil pointer dereferences when the caller iterates over the results.
  • Fix: Initialize var results []*gmail.RawMessage and append(results, msg) for successfully fetched messages.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

- Parse --after/--before dates with time.ParseInLocation for local
  timezone instead of UTC (fixes midnight boundary off-by-one)
- Switch GetMessagesRawBatch from pre-allocated indexed slice to
  append-based, eliminating nil entries on fetch errors
- Log Warn when messages have attachments that won't be archived
  (attachment extraction not yet implemented)
@sternryan
Copy link
Copy Markdown
Author

Addressed roborev feedback in 3357a07:

High — attributedBody (line 266, 295): Already handled — GetMessageRaw falls back to extractAttributedBodyText(msg.AttributedBody) when msg.Text is nil (lines 307-312). The attributedBody column is selected in the query and the decoder handles NSKeyedArchiver binary plists from macOS Ventura+.

High — attachments (line 258, 301): Added Warn-level log when cache_has_attachments != 0, making the data gap explicit at runtime. Full attachment extraction (reading from attachment/message_attachment_join tables) is planned as a follow-up.

Medium — date filter timezone (line 80, 88): Switched time.Parsetime.ParseInLocation("2006-01-02", ..., time.Local) so --after 2024-01-01 resolves to midnight local time, not midnight UTC.

Medium — nil entries in batch (line 340): Changed GetMessagesRawBatch from pre-allocated indexed slice (make(len) + results[i] = msg) to append-based (make(0, cap) + append), producing compact nil-free results.

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 29, 2026

roborev: Combined Review (3357a07)

Summary verdict: PR is not ready to merge due to 3 medium-severity correctness issues in the iMessage import path.

Medium

  • Blank-body imports for messages stored outside message.text
    Location: internal/imessage/client.go, internal/imessage/client.go
    Problem: The importer only reads message.text and converts NULL to an empty body. In chat.db, some visible messages store content only in attributedBody or other non-text fields, so these messages will be archived as blank even though they contain user-visible text.
    Fix: Decode attributedBody as a fallback when text is NULL, or explicitly detect and skip unsupported message shapes instead of silently importing empty messages.

  • Invalid Message-ID generation from raw iMessage GUIDs
    Location: internal/imessage/parser.go
    Problem: buildMIME inserts the raw iMessage GUID directly into Message-ID. iMessage GUIDs commonly contain characters like : and /, which are not valid in an unquoted RFC 5322 msg-id local part, so the generated header can be syntactically invalid and rejected or rewritten by MIME tooling.
    Fix: Sanitize the GUID before using it in Message-ID (for example, hash or base64url encode it), or omit the header if no valid stable value is available.

  • NULL service values cause message rows to be skipped
    Location: internal/imessage/models.go, internal/imessage/client.go
    Problem: The message.service column can be NULL for system messages or placeholders. Scanning that into a non-pointer string causes Scan to fail, which results in those messages being silently skipped during sync.
    Fix: Make messageRow.Service nullable (for example *string or sql.NullString) and handle the missing value safely when assigning labelIDs.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

- Decode attributedBody as fallback when message.text is NULL
  (macOS Ventura+ stores content exclusively in NSKeyedArchiver blob)
- Hash iMessage GUIDs for Message-ID header (raw GUIDs contain ':'
  and '/' which are invalid in RFC 5322 msg-id local-part)
- Make messageRow.Service nullable (*string) to handle NULL service
  column on system messages instead of failing Scan
- Add extractAttributedBodyText with NSKeyedArchiver plist decoder
- Add tests for attributedBody extraction and Message-ID sanitization
@sternryan sternryan requested a review from wesm as a code owner March 29, 2026 18:59
@sternryan
Copy link
Copy Markdown
Author

Addressed round-2 roborev feedback in 26d2a2d:

Blank-body imports (attributedBody): Added attributedBody column to SELECT/Scan, extractAttributedBodyText() decoder for NSKeyedArchiver binary plists, and fallback logic when text is NULL. Includes test coverage with synthetic NSKeyedArchiver blobs.

Invalid Message-ID from raw GUIDs: iMessage GUIDs like p:0/ABC123 contain : and / which are invalid in RFC 5322 msg-id. Now SHA-256 hashes the GUID and uses the first 24 hex chars as the local-part. Test updated to verify raw GUIDs are not emitted.

NULL service column: Changed messageRow.Service from string to *string so Scan handles NULL values from system messages instead of failing. Label assignment checks for nil before use.

@wesm
Copy link
Copy Markdown
Owner

wesm commented Mar 29, 2026

Since we now have WhatsApp, iMessage, and Google Voice PRs I’m going to review all there and make sure the storage layer is coherent and I will work on getting them merged one by one, but just bear with me!

@sternryan
Copy link
Copy Markdown
Author

Stumbling on this repo really helped clear up a lot of my backlog and unlocked a bunch of stuff for me, thank you so much for creating this! I'm so stoked for my first contribution!

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 29, 2026

roborev: Combined Review (26d2a2d)

Verdict: Changes are not ready to merge due to 1 high-severity issue and 3 medium-severity issues.

High

  • internal/imessage/parser.go:161
    extractAttributedBodyText assumes message.attributedBody is an NSKeyedArchiver/plist blob, but modern Messages databases often store it in Apple's typedstream/NSArchiver format. For messages where message.text is NULL, this can decode to "", causing many recent iMessages to be archived with empty bodies.
    Fix: Decode the actual typedstream format used by attributedBody, or add a robust fallback that can extract the NSString payload from real blobs. Add tests using real Ventura/Sonoma samples.

Medium

  • internal/imessage/parser.go:47, internal/imessage/parser.go:98, internal/imessage/parser.go:149, cmd/msgvault/cmd/sync_imessage.go:76
    Participant identifiers are written directly into RFC 2822 headers without strict validation. Because normalizeIdentifier accepts arbitrary strings containing @, and --me is passed through unchanged, crafted values containing CR/LF can inject additional headers into archived messages.
    Fix: Reject or sanitize control characters, strictly validate addresses before header generation, and validate --me at CLI parsing time. For invalid identifiers, omit them or replace them with a safe placeholder.

  • internal/imessage/client.go:160, internal/imessage/client.go:252
    Sync queries pull every row from message without filtering iMessage reaction/system rows. Tapbacks, stickers, and similar records are stored as separate rows in chat.db, so they may be imported as standalone messages with empty or misleading content.
    Fix: Filter to actual user-visible messages, or map reaction/system rows to explicit reaction metadata instead of normal messages.

  • internal/imessage/client.go:252
    Edited and unsent messages are not handled. On newer macOS versions, those rows may have empty text and store edit state in message_summary_info; the current query does not read that column, so archived content can be missing or stale.
    Fix: Select and decode message_summary_info, use the latest edited body when present, and mark unsent messages appropriately. Add coverage for edited/unsent rows.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants