Currently the implementation of src/db-document-immer.ts is not optimal. Our goal is to make it much more efficient in terms of speed and memory.
- Build a synthetic benchmark that applies N patches (e.g. 1k) against a document with deletes/sets + string columns; run for
db-document-immer,db-document-immutable, andstringcodecs so we can compare. - Extend the patchflow core to apply patches in batches instead of one-by-one when computing a value; expose a batch API from session/graph.
- Add
applyPatchBatchto all codecs (string/immutable/immer). Start with a trivial implementation that loops and callsapplyPatch. - Implement a fast
applyPatchBatchfor the immer codec:- apply all patch operations inside a single
producecall - track dirty keys and update indexes once at the end (no per-op rebuilds)
- avoid allocating a new DbDocumentImmer per op
- apply all patch operations inside a single
- Improve immutable + string implementations:
- keep incremental index updates for immutable
- fast-path no-ops for string (or precompute diff application where possible)
- Re-run the benchmark and record before/after numbers in this section.
Patchflow currently assumes that Patch.time is globally unique within a document’s patch graph. In practice this is false when the same userId can commit concurrently from multiple clients/processes (e.g. browser + backend service in CoCalc-lite, or multiple browser tabs). When two patches share the same logical time, one can overwrite the other in stores that key patches by time, causing silent data loss/corruption.
The fix is to make patch identity be a pair (time, clientId) rather than the scalar time. The clientId is an opaque, per-client random identifier (configurable generator; default crypto-random). Ordering becomes lexicographic by (time, clientId), and collisions become vanishingly unlikely (only possible if two clients pick the same random clientId).
To keep patch ids opaque but still debuggable, represent the (time, clientId) pair as a single string:
PatchId = "<time36>_<client>"time36: logical time in milliseconds since epoch, encoded in base36 and left-padded to a fixed width so lexicographic ordering matches numeric ordering.client: base64url (or base32/base36) encoding of a crypto-random byte string (96–128 bits typical).
Example:
0kqv9gq3a_2VbCq9qWZ7yqfQ
Notes:
- Use monotone logical time per client:
t = max(lastT + 1, Date.now())to avoid backwards clock jumps. - Provide small
encodePatchId/decodePatchIdhelpers for debugging and tests. Most of patchflow should treatPatchIdas opaque and only compare by string order.
-
Introduce a first-class patch id
- Add
PatchId(either{ time: number; clientId: string }or a canonical encoded string key).- simplest would be
${time in ms since epoch}-${base64 clientId}since it already would sort properly and we can just treat it as a singe atomic value for parents, etc. So type PatchId = string;
- simplest would be
- Update
Patch/PatchEnvelopeto always carryclientId. - Treat missing
clientIdon ingest as legacy:clientId = "legacy"(or similar) for backward compatibility.
- Add
-
Update core types and APIs
- Change
parents?: number[]toparents?: PatchId[](orPatchKey[]). - Update
PatchGraphpublic methods that currently accept/return times:getHeads(): PatchId[]getPatch(id: PatchId): PatchEnvelopeversions()should return patch ids (unique), not just times.
- If needed for convenience/debugging, provide explicit helpers like
headTimes()that are clearly not unique ids.
- Change
-
Update PatchGraph internals
- Replace internal maps keyed by
numbertime with maps keyed byPatchId(or encoded key). - Rework head computation, ancestor traversal, and merge caching to use patch ids.
- Update deterministic ordering to sort by
(time asc, clientId lex asc)only.
- Replace internal maps keyed by
-
Update Session
- Add
clientId?: string/clientIdFactory?: () => stringtoSessionOptions. - Generate a stable
clientIdonce perSessioninstance by default (crypto-random). - Include
clientIdin every local patch envelope and in parent references. - Remove the “mod 1024 time-slot guarantees uniqueness” framing; uniqueness is now provided by
(time, clientId)(userId range limits can remain for other reasons, but not for uniqueness).
- Add
-
Update PatchStore adapter contract
- Ensure
append()andloadInitial()transport theclientIdand parent PatchIds. - Consider extending pagination options beyond
sinceTime(which is now ambiguous) tosince?: PatchIdorsinceKey?: stringfor robust incremental loads.
- Ensure
-
Update adapters + tests
- Update the in-memory patch store adapter and all tests to use patch ids.
- Add a regression test reproducing the real failure mode:
- Two sessions with the same
userIdand identical clock time but differentclientIdmust produce two distinct patches in the graph/store.
- Two sessions with the same
- Add a compatibility test: patches without
clientIdingest as legacy and remain addressable.
-
CoCalc-side migration notes (outside patchflow core)
- Any concrete PatchStore backed by a DB table keyed by time must be updated to key by
(time, clientId)(e.g. addclient_idcolumn or encode a composite key). - Existing historical patches can be treated as
clientId="legacy"without rewriting history; parents referencing old patches use the legacy id.
- Any concrete PatchStore backed by a DB table keyed by time must be updated to key by
Goals: extract the generic patch-DAG realtime sync core from CoCalc into a standalone, MIT-licensed package with clear adapters and comprehensive tests. Keep the core transport-agnostic and file-system-agnostic.
- core/types:
Document,Patch,PatchId/logical time,Snapshot,Version, error types. - core/diff: pluggable diff/patch codec; include a default string codec (wraps DiffMatchPatch) plus hooks for custom codecs.
- core/patch-graph: deterministic DAG manager (similar to
SortedPatchList), handles add/merge, known-heads tracking, snapshots, caching, dedup (file-load dedup), and value computationvalue({time?, without?}). - core/session: orchestrates local doc state + patch graph + adapters; provides commit, undo/redo, snapshot scheduling, load-more-history, and exposes events (change, user-change, has-unsaved-changes).
- adapters/patch-store (transport): interface for loading initial history, appending local patches/snapshots, and subscribing to remote patches.
- adapters/file (optional): read/write doc strings, optional delta-write support, optional FS watch feed → patches.
- adapters/presence (optional): cursor/presence feed with throttling; kept out of the core session unless injected.
- adapters/clock/log (optional): injectable clock and logger to avoid global dependencies.
- testing harness: in-memory implementations of patch-store/file/presence to exercise the core without any I/O.
Types used below:
PatchEnvelope = { patch: Patch; source?: string; seq?: number; isSnapshot?: boolean; snapshotValue?: string }SnapshotEnvelope = { time: number; value: string; version: number; seq?: number }DocCodec = { fromString(s: string): Document; toString(doc: Document): string; makePatch(a: Document, b: Document): CompressedPatch; applyPatch(doc: Document, patch: CompressedPatch): Document }
Patch store / transport (required):
interface PatchStore {
// load initial state (optionally paged); returns patches + known snapshot if any
loadInitial(opts?: {
sinceTime?: number;
}): Promise<{ patches: PatchEnvelope[]; hasMore?: boolean }>;
// append a local patch/snapshot to the shared log
append(envelope: PatchEnvelope): Promise<void>;
// subscribe to remote envelopes; returns unsubscribe
subscribe(onEnvelope: (env: PatchEnvelope) => void): () => void;
}File adapter (optional):
interface FileAdapter {
read(): Promise<string>; // throws ENOENT-style error if missing
write(content: string, opts?: { base?: string }): Promise<void>; // base enables delta writes
watch?(onChange: (delta?: { patch?: CompressedPatch; seq?: number }) => void): () => void;
}Presence adapter (optional):
interface PresenceAdapter {
publish(state: any): void;
subscribe(onState: (state: any, clientId: string) => void): () => void;
}Session construction sketch:
type SessionDeps = {
codec: DocCodec;
patchStore: PatchStore;
clock?: () => number;
file?: FileAdapter;
presence?: PresenceAdapter;
log?: (msg: string, data?: any) => void;
};- (done) Lift
Document/Patch/SortedPatchList-equivalent into core modules with zero CoCalc deps. - (done) Write in-memory
PatchStore+DocCodec(string) and port existing SortedPatchList tests; add coverage for merges, snapshots, undo/redo, value(without). - (done) Rebuild a slim
Sessionaround adapters; keep file/presence optional. ReplacereuseInFlight-style saves with a small dirty/queue state machine. - (done) Add file adapter + tests: overlapping saves, init/close races, remote patches arriving during disk write.
- (done) Add presence adapter (optional) and keep it out of the core unless supplied.
- Integrate back into CoCalc via adapters (conat-backed PatchStore, FS/watch wrapper), without reintroducing CoCalc-specific code into the core.