Skip to content

fix(data/import): pre-populate LookupKeys cache and newId to eliminate redundant CMT server lookups#167

Open
TomProkop wants to merge 6 commits into
masterfrom
fix/cmt-lookupkeys-newid-prepopulation
Open

fix(data/import): pre-populate LookupKeys cache and newId to eliminate redundant CMT server lookups#167
TomProkop wants to merge 6 commits into
masterfrom
fix/cmt-lookupkeys-newid-prepopulation

Conversation

@TomProkop

@TomProkop TomProkop commented Jul 2, 2026

Copy link
Copy Markdown
Member

Problem

CMT's FindEntity() does a server round-trip ("LOOKUP TO CRM") whenever a referenced record's in-memory newId is empty. Crucially, newId is only populated after the batch response callback for that specific record's create/upsert has fired — and with --batch-mode + --connection-count > 1, CMT dispatches an entity's batches asynchronously and can begin preprocessing the next (dependent) entity before every parent batch has actually received its response.

This is confirmed empirically, not just theoretically: in a baseline benchmark run, 2,354 of 2,354 "LOOKUP TO CRM" calls resolved as FOUND (not NOTFOUND). The referenced parent record already existed on the server in every single case — CMT's local newId bookkeeping just hadn't caught up yet. So this isn't a genuine missing-dependency/ordering problem (none of the 17 entities in the test package are even self-referential); it's a local cache staleness / async race inherent to CMT's batch+multi-connection architecture, and it costs a full server round-trip per occurrence regardless.

On a real customer package (167,915 records / 17 entities), this was estimated at ~55 hours to import.

Fix

CmtLookupKeysPrepopulator.Prepopulate() is invoked from CmtImportRunner on the first "Processing Entity:" progress message (i.e. right after CMT's internal ClearCrossReferanceList() call, so it isn't wiped) and, via reflection into the runtime-loaded DataMigCommon assembly:

  1. Pre-seeds ImportCommonMethods.LookupKeys with "entity:sourceGuid" -> Guid for every record in the package.
  2. Sets record.newId = record.id on every entitiesEntityRecord before any batch is ever dispatched, since CMT always upserts using the source GUID as the target GUID. This removes the dependency on batch-response timing entirely — FindEntity() sees a populated newId immediately, with no race window.

Both strategies are needed because FindEntity() has two independent lookup paths, and pre-populating only LookupKeys left ~1,362 server calls remaining out of a 2,364 baseline.

The prepopulation logic lives in its own CmtLookupKeysPrepopulator class (small, single-purpose helper methods) rather than inline in CmtImportRunner, which already mixes CLI orchestration, assembly resolution, and progress-event wiring.

Results (tiny-chain-100 benchmark: 3,679 records / 17 entities)

LOOKUP TO CRM calls Total time
Baseline 2,364 15m45s
LookupKeys-only 1,362 ~10m
This fix (LookupKeys + newId) 13 ~4m49s

(The remaining 13 are genuine external references — e.g. talxis_documenttype — which is not part of the package and correctly requires a real server lookup.)

Also validated at ~12x scale on a 43,760-record referentially-closed subset of the real customer package (proportional 15% slice preserving actual entity distribution): 0 LOOKUP TO CRM calls through 8 of 17 entities / ~30k records before the run was stopped (sufficient confirmation the fix holds at scale). Re-validated after the class-extraction refactor with a fresh, non-overlapping 4,127-record subset: 8 LOOKUP TO CRM calls, confirming parity.

Testing

  • Manual benchmarks against a live Dataverse SIT environment using quick-tests/tiny-chain-100 and larger proportional subsets of the real package, with --batch-mode --batch-size 200 --connection-count 2 --override-safety-checks --verbose.
  • Confirmed via ilspycmd decompilation of the built DLL that the fix is actually present in the compiled binary (ran into a stale Roslyn compiler-server caching issue during development, resolved via dotnet build-server shutdown).

🤖 Generated with Copilot CLI

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

TomProkop and others added 6 commits July 2, 2026 17:22
CMT's UpsertMultiple handler only sets record.newId when RecordCreated==true
(line 884 of ImportCrmEntityActions.cs). For records already present in the
target environment, RecordCreated=false → newId stays null → every child-entity
lookup for that parent calls LookupCustomerOrLookupFieldInCRM — one server
round-trip (~300-500ms) per unique lookup reference.

Since CMT always preserves source GUIDs (UpsertRequest.Target.Id = source GUID),
targetGuid == sourceGuid for every package record. CmtImportRunner now
pre-seeds ImportCommonMethods.LookupKeys with entity:sourceGuid → sourceGuid
for every record in the package immediately after ValidateSchemaFile loads the
data into memory.

FindEntity() short-circuits at the LookupKeys check (before the newId path),
so all internal package lookups resolve from cache without any server call.
External lookups (entities not in the package) are unaffected.

Benchmark impact (tiny-chain-100, 3679 records):
  Before: 2364 LOOKUP TO CRM calls, 15m45s, 3.89 RPS
  Expected after: ~0 LOOKUP TO CRM calls for internal refs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion-complete event

ValidateSchemaFile() returns before ImportCommonMethods.dataEntities is
populated — data is only loaded inside ImportDataToCrm(). Hook into the
'Schema Validation Complete' progress event (fired once data is loaded)
to call PrePopulateLookupKeysFromPackage() at the correct time.

Also promote failure log messages from Debug to Information so they appear
in normal output without --verbose.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ler type

AppDomain.GetAssemblies() may return both the compile-time net462 copy and
the runtime-loaded copy of DataMigCommon; they carry separate static state.
Using handler.GetType().Assembly.GetReferencedAssemblies() ensures we find
the same DataMigCommon instance that CMT's ImportDataToCrm uses.

Also adds assembly full-name log at Information level so we can verify
which instance is being used.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ranceList

CMT calls ImportCommonMethods.ClearCrossReferanceList() (which calls
LookupKeys.Clear()) at line 683 of ImportDataToCrm, AFTER the Schema
Validation Complete event fires at line 664. Pre-populating on the
Schema Validation Complete event is therefore too early — the dict is
wiped before entity processing begins.

New trigger: first AddNewProgressItem whose text starts with
'Processing Entity:' (fired by BeginEntityImport before any record
pre-processing starts), which executes after ClearCrossReferanceList.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LookupKeys pre-population may miss if the dict key format doesn't match
what CMT produces at runtime (e.g. different entity name casing or
key construction differences). Add a second strategy: set newId = id on
every entitiesEntityRecord before processing starts.

CMT's FindEntity checks newId at line 2912 — if non-empty, it returns
the GUID immediately without a server call and adds it to LookupKeys
for future references. Since CMT always preserves source GUIDs
(UpsertRequest.Target.Id = source GUID), newId == id is always correct.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cated class

Moves the reflection-heavy pre-population logic out of CmtImportRunner
(which mixes CLI orchestration, assembly resolution, and progress-event
wiring) into a focused, testable CmtLookupKeysPrepopulator static class
with small single-purpose helper methods instead of one 150-line method.

No behavioral change: re-validated with a fresh 4,127-record referentially
closed subset — 8 LOOKUP TO CRM calls (vs 13 for the 3,679-record subset
before this refactor), confirming parity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant