fix(data/import): pre-populate LookupKeys cache and newId to eliminate redundant CMT server lookups#167
Open
TomProkop wants to merge 6 commits into
Open
fix(data/import): pre-populate LookupKeys cache and newId to eliminate redundant CMT server lookups#167TomProkop wants to merge 6 commits into
TomProkop wants to merge 6 commits into
Conversation
CMT's UpsertMultiple handler only sets record.newId when RecordCreated==true (line 884 of ImportCrmEntityActions.cs). For records already present in the target environment, RecordCreated=false → newId stays null → every child-entity lookup for that parent calls LookupCustomerOrLookupFieldInCRM — one server round-trip (~300-500ms) per unique lookup reference. Since CMT always preserves source GUIDs (UpsertRequest.Target.Id = source GUID), targetGuid == sourceGuid for every package record. CmtImportRunner now pre-seeds ImportCommonMethods.LookupKeys with entity:sourceGuid → sourceGuid for every record in the package immediately after ValidateSchemaFile loads the data into memory. FindEntity() short-circuits at the LookupKeys check (before the newId path), so all internal package lookups resolve from cache without any server call. External lookups (entities not in the package) are unaffected. Benchmark impact (tiny-chain-100, 3679 records): Before: 2364 LOOKUP TO CRM calls, 15m45s, 3.89 RPS Expected after: ~0 LOOKUP TO CRM calls for internal refs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion-complete event ValidateSchemaFile() returns before ImportCommonMethods.dataEntities is populated — data is only loaded inside ImportDataToCrm(). Hook into the 'Schema Validation Complete' progress event (fired once data is loaded) to call PrePopulateLookupKeysFromPackage() at the correct time. Also promote failure log messages from Debug to Information so they appear in normal output without --verbose. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ler type AppDomain.GetAssemblies() may return both the compile-time net462 copy and the runtime-loaded copy of DataMigCommon; they carry separate static state. Using handler.GetType().Assembly.GetReferencedAssemblies() ensures we find the same DataMigCommon instance that CMT's ImportDataToCrm uses. Also adds assembly full-name log at Information level so we can verify which instance is being used. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ranceList CMT calls ImportCommonMethods.ClearCrossReferanceList() (which calls LookupKeys.Clear()) at line 683 of ImportDataToCrm, AFTER the Schema Validation Complete event fires at line 664. Pre-populating on the Schema Validation Complete event is therefore too early — the dict is wiped before entity processing begins. New trigger: first AddNewProgressItem whose text starts with 'Processing Entity:' (fired by BeginEntityImport before any record pre-processing starts), which executes after ClearCrossReferanceList. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LookupKeys pre-population may miss if the dict key format doesn't match what CMT produces at runtime (e.g. different entity name casing or key construction differences). Add a second strategy: set newId = id on every entitiesEntityRecord before processing starts. CMT's FindEntity checks newId at line 2912 — if non-empty, it returns the GUID immediately without a server call and adds it to LookupKeys for future references. Since CMT always preserves source GUIDs (UpsertRequest.Target.Id = source GUID), newId == id is always correct. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cated class Moves the reflection-heavy pre-population logic out of CmtImportRunner (which mixes CLI orchestration, assembly resolution, and progress-event wiring) into a focused, testable CmtLookupKeysPrepopulator static class with small single-purpose helper methods instead of one 150-line method. No behavioral change: re-validated with a fresh 4,127-record referentially closed subset — 8 LOOKUP TO CRM calls (vs 13 for the 3,679-record subset before this refactor), confirming parity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CMT's
FindEntity()does a server round-trip ("LOOKUP TO CRM") whenever a referenced record's in-memorynewIdis empty. Crucially,newIdis only populated after the batch response callback for that specific record's create/upsert has fired — and with--batch-mode+--connection-count > 1, CMT dispatches an entity's batches asynchronously and can begin preprocessing the next (dependent) entity before every parent batch has actually received its response.This is confirmed empirically, not just theoretically: in a baseline benchmark run, 2,354 of 2,354 "LOOKUP TO CRM" calls resolved as
FOUND(notNOTFOUND). The referenced parent record already existed on the server in every single case — CMT's localnewIdbookkeeping just hadn't caught up yet. So this isn't a genuine missing-dependency/ordering problem (none of the 17 entities in the test package are even self-referential); it's a local cache staleness / async race inherent to CMT's batch+multi-connection architecture, and it costs a full server round-trip per occurrence regardless.On a real customer package (167,915 records / 17 entities), this was estimated at ~55 hours to import.
Fix
CmtLookupKeysPrepopulator.Prepopulate()is invoked fromCmtImportRunneron the first"Processing Entity:"progress message (i.e. right after CMT's internalClearCrossReferanceList()call, so it isn't wiped) and, via reflection into the runtime-loadedDataMigCommonassembly:ImportCommonMethods.LookupKeyswith"entity:sourceGuid" -> Guidfor every record in the package.record.newId = record.idon everyentitiesEntityRecordbefore any batch is ever dispatched, since CMT always upserts using the source GUID as the target GUID. This removes the dependency on batch-response timing entirely —FindEntity()sees a populatednewIdimmediately, with no race window.Both strategies are needed because
FindEntity()has two independent lookup paths, and pre-populating onlyLookupKeysleft ~1,362 server calls remaining out of a 2,364 baseline.The prepopulation logic lives in its own
CmtLookupKeysPrepopulatorclass (small, single-purpose helper methods) rather than inline inCmtImportRunner, which already mixes CLI orchestration, assembly resolution, and progress-event wiring.Results (
tiny-chain-100benchmark: 3,679 records / 17 entities)(The remaining 13 are genuine external references — e.g.
talxis_documenttype— which is not part of the package and correctly requires a real server lookup.)Also validated at ~12x scale on a 43,760-record referentially-closed subset of the real customer package (proportional 15% slice preserving actual entity distribution): 0 LOOKUP TO CRM calls through 8 of 17 entities / ~30k records before the run was stopped (sufficient confirmation the fix holds at scale). Re-validated after the class-extraction refactor with a fresh, non-overlapping 4,127-record subset: 8 LOOKUP TO CRM calls, confirming parity.
Testing
quick-tests/tiny-chain-100and larger proportional subsets of the real package, with--batch-mode --batch-size 200 --connection-count 2 --override-safety-checks --verbose.ilspycmddecompilation of the built DLL that the fix is actually present in the compiled binary (ran into a stale Roslyn compiler-server caching issue during development, resolved viadotnet build-server shutdown).🤖 Generated with Copilot CLI
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com