Skip to content

Seed canonical morph types and regenerate search index#2219

Draft
myieye wants to merge 17 commits intofeat/sync-morph-typesfrom
claude/fix-morph-type-syncing-bQo8X
Draft

Seed canonical morph types and regenerate search index#2219
myieye wants to merge 17 commits intofeat/sync-morph-typesfrom
claude/fix-morph-type-syncing-bQo8X

Conversation

@myieye
Copy link
Copy Markdown
Collaborator

@myieye myieye commented Mar 24, 2026

Summary

This PR ensures all canonical morph type definitions are seeded into new and existing CRDT projects, and regenerates the full-text search index to include morph-type prefix/postfix tokens in searchable headwords.

Key Changes

  • Added canonical morph type definitions (CanonicalMorphTypes.cs): Defines 20 standard morph types matching FieldWorks/LibLCM specifications with verified GUIDs, names, abbreviations, and affixes. Uses frozen dictionary for immutability and performance.

  • Automatic seeding on project creation and migration: Modified CrdtProjectsService and CurrentProjectService to seed canonical morph types when:

    • Creating a new project with SeedNewProjectData: true
    • Opening an existing project that lacks morph types (via MigrateDb)
  • Search index regeneration: Added migration 20260318120000_RegenerateSearchTableForMorphTypes that clears the EntrySearchRecord table to force FTS rebuild with updated morph-type affixes.

  • Legacy snapshot patching: Updated CrdtFwdataProjectSyncService to patch legacy snapshots missing morph types during sync operations, preventing duplicates when syncing with FwData.

  • Comprehensive test coverage: Added MorphTypeSeedingTests verifying:

    • New projects receive all canonical morph types
    • Existing projects without morph types get seeded on open
    • Seeding is idempotent (no duplicates on repeated opens)
    • All MorphTypeKind enum values are covered

    Added integration test in Sena3SyncTests validating canonical definitions match FwData morph types.

  • Regression test versioning: Updated RegressionTestHelper to version 3 for snapshot compatibility.

Implementation Details

  • Morph types are seeded with fixed GUIDs matching SIL.LCModel constants, ensuring consistency across projects and with FieldWorks data.
  • Seeding uses PreDefinedData.PredefinedMorphTypes() to integrate with existing predefined data infrastructure.
  • Search index regeneration is safe: FTS table is lazily rebuilt on next query if cleared.
  • Sync operations detect and skip seeding if morph types already exist, preventing duplicates in legacy workflows.

https://claude.ai/code/session_01WDKE2vXP4gjMWjfn4cmL4p

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b2054619-8d97-41c4-98aa-030021f54e12

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-morph-type-syncing-bQo8X

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 24, 2026
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from dc45a12 to 823531a Compare March 24, 2026 14:27
claude and others added 7 commits March 24, 2026 15:28
- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM)
- Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes
- Seed morph-types for existing projects in MigrateDb (before FTS refresh)
- Add EF migration to clear FTS table so headwords are rebuilt with morph tokens
- Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates
- Add tests: seeding, Sena3 verification, sync with legacy snapshots
- Add v3 to RegressionVersion enum (v3.sql dump to be generated)

https://claude.ai/code/session_01WDKE2vXP4gjMWjfn4cmL4p
… check

- Add SecondaryOrder = 0 to all morph types that were relying on the default
- Add assertion that all canonical morph types exist in FwData (not just the reverse)

https://claude.ai/code/session_01WDKE2vXP4gjMWjfn4cmL4p
@myieye myieye force-pushed the claude/fix-morph-type-syncing-bQo8X branch from 46c8f87 to f8f4cfe Compare March 24, 2026 15:39
@myieye myieye changed the base branch from claude/add-lexeme-headwords-TowRX to feat/sync-morph-types March 24, 2026 15:40
@myieye myieye marked this pull request as ready for review March 24, 2026 15:40
@myieye myieye closed this Mar 24, 2026
@myieye myieye reopened this Mar 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

UI unit Tests

  1 files  ±0   54 suites  ±0   28s ⏱️ +3s
140 tests ±0  140 ✅ ±0  0 💤 ±0  0 ❌ ±0 
207 runs  ±0  207 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit da8250a. ± Comparison against base commit e9f7255.

♻️ This comment has been updated with latest results.

@argos-ci
Copy link
Copy Markdown

argos-ci bot commented Mar 24, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ⚠️ Changes detected (Review) 6 changed Mar 31, 2026, 8:25 AM

@myieye myieye marked this pull request as draft March 27, 2026 08:17
@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Mar 31, 2026

Now working on getting tests to pass. Writing down my findings as I go, so this is going to be a little bit stream-of-consciousness.

Sorting order tests are acting non-deterministic, with different sort orders returning on different runs of the same test. I.e., one run of the test might produce the sort order [suffix, root, prefix] and the next run of the same test might produce [suffix, prefix, root].

I'm starting to suspect homograph numbers may be to blame. The MorphTokens_DoNotAffectSortOrder test creates three entries, "aaaa" (a root), "-aaaa" (a suffix), and "aaaa-" (a prefix). Those should not be treated as homographs... but if they are being given homograph numbers, then the order of the homograph numbers may entirely depend on the order in which the homographs are found.

... And I just noticed that the sorting tests are ending up sorting those three words in an order that matches the alphabetical ordering of their GUID. Which would be consistent with the theory that the homograph number is what's being sorted on here.

... But a good argument against that is that the tests that are failing are the LcmCrdt sorting tests, where the "sort by HomographNumber" lines are still commented out! Might have to search for a new theory.

@rmunn rmunn self-assigned this Mar 31, 2026
@rmunn
Copy link
Copy Markdown
Contributor

rmunn commented Mar 31, 2026

It looks like the predefined morph types aren't being seeded in the sorting tests, and that may be the cause of the failures.

Morph types are now required for correct sorting order, so we don't have
an option like _seedWs for not seeding them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants