Seed canonical morph types and regenerate search index#2219
Seed canonical morph types and regenerate search index#2219myieye wants to merge 17 commits intofeat/sync-morph-typesfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
dc45a12 to
823531a
Compare
- Add CanonicalMorphTypes with all 19 morph-type definitions (GUIDs from LibLCM) - Seed morph-types for new projects via PreDefinedData.PredefinedMorphTypes - Seed morph-types for existing projects in MigrateDb (before FTS refresh) - Add EF migration to clear FTS table so headwords are rebuilt with morph tokens - Patch legacy snapshots (empty MorphTypes) in sync layer to prevent duplicates - Add tests: seeding, Sena3 verification, sync with legacy snapshots - Add v3 to RegressionVersion enum (v3.sql dump to be generated) https://claude.ai/code/session_01WDKE2vXP4gjMWjfn4cmL4p
… check - Add SecondaryOrder = 0 to all morph types that were relying on the default - Add assertion that all canonical morph types exist in FwData (not just the reverse) https://claude.ai/code/session_01WDKE2vXP4gjMWjfn4cmL4p
46c8f87 to
f8f4cfe
Compare
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
|
Now working on getting tests to pass. Writing down my findings as I go, so this is going to be a little bit stream-of-consciousness. Sorting order tests are acting non-deterministic, with different sort orders returning on different runs of the same test. I.e., one run of the test might produce the sort order [suffix, root, prefix] and the next run of the same test might produce [suffix, prefix, root]. I'm starting to suspect homograph numbers may be to blame. The MorphTokens_DoNotAffectSortOrder test creates three entries, "aaaa" (a root), "-aaaa" (a suffix), and "aaaa-" (a prefix). Those should not be treated as homographs... but if they are being given homograph numbers, then the order of the homograph numbers may entirely depend on the order in which the homographs are found. ... And I just noticed that the sorting tests are ending up sorting those three words in an order that matches the alphabetical ordering of their GUID. Which would be consistent with the theory that the homograph number is what's being sorted on here. ... But a good argument against that is that the tests that are failing are the LcmCrdt sorting tests, where the "sort by HomographNumber" lines are still commented out! Might have to search for a new theory. |
|
It looks like the predefined morph types aren't being seeded in the sorting tests, and that may be the cause of the failures. |
Morph types are now required for correct sorting order, so we don't have an option like _seedWs for not seeding them.
Summary
This PR ensures all canonical morph type definitions are seeded into new and existing CRDT projects, and regenerates the full-text search index to include morph-type prefix/postfix tokens in searchable headwords.
Key Changes
Added canonical morph type definitions (
CanonicalMorphTypes.cs): Defines 20 standard morph types matching FieldWorks/LibLCM specifications with verified GUIDs, names, abbreviations, and affixes. Uses frozen dictionary for immutability and performance.Automatic seeding on project creation and migration: Modified
CrdtProjectsServiceandCurrentProjectServiceto seed canonical morph types when:SeedNewProjectData: trueMigrateDb)Search index regeneration: Added migration
20260318120000_RegenerateSearchTableForMorphTypesthat clears theEntrySearchRecordtable to force FTS rebuild with updated morph-type affixes.Legacy snapshot patching: Updated
CrdtFwdataProjectSyncServiceto patch legacy snapshots missing morph types during sync operations, preventing duplicates when syncing with FwData.Comprehensive test coverage: Added
MorphTypeSeedingTestsverifying:MorphTypeKindenum values are coveredAdded integration test in
Sena3SyncTestsvalidating canonical definitions match FwData morph types.Regression test versioning: Updated
RegressionTestHelperto version 3 for snapshot compatibility.Implementation Details
PreDefinedData.PredefinedMorphTypes()to integrate with existing predefined data infrastructure.https://claude.ai/code/session_01WDKE2vXP4gjMWjfn4cmL4p