fix(locorda_core): per-property vclk stamps for LWW concurrent-merge (closes #50)#61
fix(locorda_core): per-property vclk stamps for LWW concurrent-merge (closes #50)#61kkalass wants to merge 7 commits into
Conversation
… truth) The vocabulary spec/vocabularies/crdt-mechanics.ttl defines crdt:installationIri (see proposed-changes/007), but spec/mappings/core-v1.ttl still referenced the old name crdt:installationId. Bring the mapping in line with the vocabulary and regenerate mapping_bootstrap.g.dart accordingly. No semantic change.
…lary Adds the vocabulary and core merge-contract scaffolding needed to scope LWW (and future per-property/per-value) CRDT bookkeeping at the right granularity, fixing issue #50 conceptually. Vocabulary changes (spec/vocabularies): * sync.ttl - New class sync:PropertyStatement: framework metadata about a (resource, property) pair, distinct from rdf:Statement (which identifies a specific (s,p,o) triple). - New property sync:property (domain PropertyStatement, range rdf:Property). - sync:hasStatement.range widened to a union of rdf:Statement, sync:PropertyStatement and sync:ResourceStatement (one mechanism, three statement granularities). - sync:resource.domain widened to union of ResourceStatement and PropertyStatement. * crdt-mechanics.ttl - New class crdt:VersionedClock: immutable, content-addressed HLC snapshot. Hash is over (installationIri, logicalTime) pairs sorted by installationIri; physicalTime is a tie-break annotation, NOT part of the hash (consistent with crdt:clockHash). - New property crdt:vclk: attaches a VersionedClock snapshot to a framework metadata node (PropertyStatement, rdf:Statement, ...). Merge semantics: vector-clock domination on the referenced vclks; ties broken by max(physicalTime) within the vclk on 'concurrent'. Mapping changes (spec/mappings/core-v1.ttl): * New <#property-statement> ClassMapping: (sync:resource, sync:property) identifying + Immutable; crdt:vclk LWW (placeholder; merger applies vclk-domination semantics). * New <#versioned-clock> ClassMapping: crdt:hasClockEntry Immutable with mc:disableBlankNodePathIdentification true (vclk is content-addressed; no path identification for embedded clock entries). * Traversal boundaries extended with sync:resource, sync:property, crdt:vclk, crdt:hasClockEntry to keep framework-side metadata cleanly separated. Regenerates packages/locorda_core/lib/src/generated/mapping_bootstrap.g.dart. No implementation change yet; this commit is vocabulary + contract only. Concept reference will follow as proposed-changes/028.
Concept note for the issue #50 fix, replacing the PR #60 per-property clock attempt with a sync:PropertyStatement + crdt:VersionedClock design. Covers problem reproduction, three statement granularities, vclk content-addressing, per-property merge semantics, GC strategy and a worked example showing both properties surviving the fix.
Change the VersionedClock entry identity from crdt:installationIri to crdt:forClockEntry — a reference to the document's existing crdt:ClockEntry IRI. Rationale: - The framework currently does not write crdt:installationIri onto its ClockEntry nodes; installation identity is conveyed purely through the deterministic lcrd-clk-md5-<hash(installationLocalId)> fragment. - The ClockEntry IRI is content-addressed, stable across local and remote storage, and identical on every installation observing the entry — so it is semantically equivalent to an installation-keyed vector clock as the per-entry key, without requiring additional triples on existing ClockEntry nodes. - crdt:installationIri remains in the vocabulary as OPTIONAL (annotated) for future cross-document discovery use cases. Each VersionedClock entry carries its own crdt:logicalTime / crdt:physicalTime as frozen snapshot values; the referenced ClockEntry may have advanced since the snapshot. The vclk content hash is over (forClockEntry, logicalTime) pairs sorted by forClockEntry; physicalTime is annotation only. Mapping additions: * crdt:forClockEntry → Immutable in <#clock-mappings>. * Traversal boundary extended for crdt:forClockEntry. Updates proposed-changes/028 and regenerates vocab + mapping bootstrap.
Without per-property causality information, LwwRegister.remoteMerge() fell back to comparing document-level HLC physical timestamps when two installations had the same logical time (ClockComparison.concurrent). If installation B had a higher maxPhysicalTime than A, it would win ALL LWW properties — including ones it never touched — silently discarding A's uncontested writes. Fix (proposal 028): on every LWW property write, emit a sync:PropertyStatement keyed on (subject, predicate) carrying a crdt:VersionedClock (vclk) IRI. During concurrent merge, each property is now resolved against its own per-(s,p) vclk rather than the whole document clock, so uncontested writes win unconditionally. Details: - LwwRegister.generateMetadata() emits PropertyStatement + vclk node; suppresses the statement when the property's vclk equals the document- wide implicit default (no overhead for the common uncontested case). GC removes stale vclk subgraphs on re-save of the same property. - RemoteCrdtMergeContext now receives MetadataGenerator so the merge path can resolve vclks from the remote graph's PropertyStatements. - RemoteDocumentMerger._extractStatements() recognises the new sync:PropertyStatement shape (sync:resource / sync:property) alongside the existing rdf:Statement reification shape. - OrganizedGraph carries documentIri to distinguish framework-data subjects (which must NOT get PropertyStatements) from app-data. - CrdtDocumentManager opts out of per-property stamping for the framework-data pass to avoid recursive self-stamping of the PropertyStatement / VersionedClock nodes themselves. - New test case 43_concurrent_lww_different_props exercises the exact scenario from the bug report: A and B diverge at logicalTime=3, A changes both name+description, B changes only name; after cross-sync B wins name (higher physical time) but A retains description. - All existing expected_stored_graph.ttl fixtures updated to include the new PropertyStatement metadata. Closes #50
On Windows/Linux expect DesktopGDriveAuth; on all other platforms (macOS, iOS, Android) expect GoogleSignInGDriveAuth, matching the runtime dispatch in gdrive_auth_io.dart.
| // Remove all old triples that the new emission will re-introduce, | ||
| // plus the inbound hasStatement edge. The new emission re-adds them | ||
| // via `addNodes(documentIri, sync:hasStatement, ...)`. | ||
| triplesToRemove.addAll(oldFrameworkGraph.findTriples(subject: stmtIri)); |
There was a problem hiding this comment.
I think this is dangerous - If I understand correctly, this would also remove triples of the same statement not managed by us. We must not touch anything that was possibly added by a future version (or a different program).
| // plus the inbound hasStatement edge. The new emission re-adds them | ||
| // via `addNodes(documentIri, sync:hasStatement, ...)`. | ||
| triplesToRemove.addAll(oldFrameworkGraph.findTriples(subject: stmtIri)); | ||
| triplesToRemove.addAll(oldFrameworkGraph.findTriples( |
There was a problem hiding this comment.
actually, find without subject needs to iterate all triples - can hasStatement be on any subject iri? or on the documentIri?
|
|
||
| // Vclk subgraph GC: if the OLD vclk this stmt pointed to is now | ||
| // unreferenced by any OTHER PropertyStatement, drop its subgraph too. | ||
| final staleVclks = oldFrameworkGraph |
There was a problem hiding this comment.
.findTriples is quite optimized for subject/predicate, but I wonder if garbage collection really belongs here in localValueChange - isn't this a separate thing we need to do after all individual property value changes where handled? Also because the idea of property statements is not so very special for LWW - I think that assigning metadata to a subject/property pair is a pretty general idea. And regarding vclk: this is a bit more specific, yes - but not necessarily bound to LWW either, right? So cleaning up unused vclk references IMHO really should be separate - what do you think? And about the semantics above that a vclk needs to be referenced only if it is not equivalent to the base vclk - maybe there should be a general helper? But beware: please discuss this issue only, do not change anything for it without my explicit consent.
| // the writer's HLC at save time). When PropertyStatements exist on | ||
| // BOTH sides, compare their vclks as proper vector clocks and resolve | ||
| // independently from other properties. | ||
| // 2. Fall back to the document-wide clock comparison when per-property |
There was a problem hiding this comment.
should this really be an all-or-nothing kind of thing? My intention in the concept was, that the vclk is taken from the property if possible, else the base clock, else the document clock. This is done per side in the merge, so a remote could have the property vclk and local the document wide one. It must not fall back to comparing document wide on both sides. Or did I misunderstand something?
Summary
Fixes #50 —
LwwRegisterconcurrent merge used document-level clock, causing silent data loss on uncontested LWW properties.Root Cause
Without per-property causality information,
LwwRegister.remoteMerge()fell back to comparing document-level HLC physical timestamps when two installations had the same logical time (ClockComparison.concurrent). If installation B had a highermaxPhysicalTimethan A, it would win all LWW properties — including ones it never touched — silently discarding A's uncontested writes.Fix (proposal 028)
On every LWW property write, emit a
sync:PropertyStatementkeyed on(subject, predicate)carrying acrdt:VersionedClock(vclk) IRI. During concurrent merge, each property is now resolved against its own per-(s,p) vclk rather than the whole document clock, so uncontested writes win unconditionally.Changes
LwwRegister.generateMetadata()emitsPropertyStatement+ vclk node; suppresses the statement when the property's vclk equals the document-wide implicit default (no overhead for the common uncontested case). GC removes stale vclk subgraphs on re-save.RemoteCrdtMergeContextnow receivesMetadataGeneratorso the merge path can resolve vclks from the remote graph'sPropertyStatements.RemoteDocumentMerger._extractStatements()recognises the newsync:PropertyStatementshape (sync:resource/sync:property) alongside the existingrdf:Statementreification shape.OrganizedGraphcarriesdocumentIrito distinguish framework-data subjects (which must NOT get PropertyStatements) from app-data.CrdtDocumentManageropts out of per-property stamping for the framework-data pass to avoid recursive self-stamping of thePropertyStatement/VersionedClocknodes.43_concurrent_lww_different_propsexercises the exact scenario from the bug report: A and B diverge atlogicalTime=3, A changes bothname+description, B changes onlyname; after cross-sync B winsname(higher physical time) but A retainsdescription.expected_stored_graph.ttlfixtures updated to include the newPropertyStatementmetadata.Preparatory commits
spec: introduce sync:PropertyStatement and crdt:VersionedClock vocabularydocs(proposed-changes): 028 — property statements and versioned clocksspec: use crdt:forClockEntry as VersionedClock identity anchor