From a2407e3de1147ed5c534e5bc23e4b8ae2a5a5b0f Mon Sep 17 00:00:00 2001 From: Aaron D Goldman Date: Thu, 8 Jun 2023 14:21:08 -0700 Subject: [PATCH 1/7] fix: Sha256Sum rename and short eventId format --- CIPs/cip-124.md | 179 ++++++++++++++++++++--------------------- tables/streamtypes.csv | 2 +- 2 files changed, 90 insertions(+), 91 deletions(-) diff --git a/CIPs/cip-124.md b/CIPs/cip-124.md index a68c7bc..19f3bcb 100644 --- a/CIPs/cip-124.md +++ b/CIPs/cip-124.md @@ -6,7 +6,7 @@ discussions-to: https://forum.ceramic.network/t/cip-124-recon-tip-synchronizatio status: Draft category: Networking created: 2023-01-18 -edited: 2023-06-05 +edited: 2023-06-08 --- ## Simple Summary @@ -24,7 +24,7 @@ Stream sets are bundles of streams that can be gossiped about as a group or in s Currently nodes broadcast updates to streams to every node in the network using a single libp2p pubsub topic. This incurs a lot of work on all nodes to process messages that they don’t necessarily care about. It also means that the throughput of the network is limited by the bandwidth, leading either to prioritizing high bandwidth nodes or greatly limiting the network throughput to support low bandwidth nodes. Furthermore, if a node missed the broadcast, it would not detect the missing stream events unless it hears a later update or uses some out of band synchronization protocol like "historical data sync" in ComposeDB that scans the Ethereum blockchain for anchor transactions. -Recon aims to provide low to no overhead for nodes with no overlap in interest, while retaining a high probability of getting the **latest** events from a stream shortly after any node has the events, without any need for remote connections at query time. By ceasing to publish updates in the pubsub channel and instead organizing them into a stream set, nodes interested in those streams can synchronize with each other without putting load on uninterested nodes in the network. A secondary goal of stream sets is to give a structure for sharding a stream set across mulitple nodes. By supporting the ability to synchronize only a sub-range of the stream set, the burden of storing, indexing, and retrieving streams can be sharded among nodes. +Recon aims to provide low to no overhead for nodes with no overlap in interest, while retaining a high probability of getting the **latest** events from a stream shortly after any node has the events, without any need for remote connections at query time. By ceasing to publish updates in the pubsub channel and instead organizing them into a stream set, nodes interested in those streams can synchronize with each other without putting load on uninterested nodes in the network. A secondary goal of stream sets is to give a structure for sharding a stream set across multiple nodes. By supporting the ability to synchronize only a sub-range of the stream set, the burden of storing, indexing, and retrieving streams can be sharded among nodes. Finally, nodes also need a way to find other nodes interested in the stream set or sub-range, so that they can synchronize with them. Recon relies on nodes gossiping their interest to peers, as well as keeping a list of their peers' interest. This way nodes that are in sync, or nearly in sync, stay in sync with very little bandwidth. Nodes can also avoid sending stream event announcements to nodes that have no interest in the stream ranges. @@ -46,51 +46,39 @@ sort-value = kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr An eventId is a special type of [StreamId](https://cips.ceramic.network/CIPs/cip-59) that encodes information about an event and its order in a stream set. It is constructed as follows, -```xml -eventId = concatBytes( - varint(0xce), - varint(0x71), - eventIdBytes -) -``` - -Where `0xce` is the streamid multicode, `0x71` is the streamtype code for an eventId (note that this is also the multicode for dag-cbor). The `eventIdBytes` are created as follows, - ```js -eventIdBytes = EncodeDagCbor([ // ordered list not dict - concatBytes( - varint(networkId), - last16Bytes(sort-value), - last16Bytes(hash-sha256(stream-controller-did)), - last8Bytes(init-event-cid) - ), - prevTimestamp, - eventHeigh, - eventCID -]) +concatBytes( + varint(0xce), // streamid varint + varint(0x30), // multicodec varint + varint(network_id), // network_id varint + last8bytes(sha256(sort_value)), // separator [u8; 8] + last8bytes(sha256(controller)), // controller [u8; 8] + last4bytes(init_event_cid_bytes), // StreamID [u8; 4] + cbor(event_height), // event_height cbor unsigned int + event_cid_bytes, // [u8] +) ``` Where: - -* `networkId` is a number as defined in the [networkIds table](../tables/networkIds.csv) - -* `sort-value` is based on a user provided value for *sort-key* and *sort-value* -* `stream-controller-did` is the controller DID of the stream this event belongs to -* `init-event-cid` is the CID of the first, e.g. InitEvent, of the stream this event belongs to -* `prevTimestamp` is the unix timestamp in most recent TimeEvent that came before this event. Note that for InitEvents this value is 0, and for TimeEvents this value is the timestamp of the TimeEvent that came before it. -* `eventHeight` is the "height" of the event since the most recent TimeEvent. For InitEvents and TimeEvents this value is 0. -* `eventCID` the CID of the event itself -* `last16Bytes` and `last8Bytes` takes the last N bytes of the input and prepends with zeros if the input is shorter +* `0xce` is the streamid multicode +* `0x30` is the streamtype code for an eventId (note that this is also the multicode for multicodec). +* `network_id` is a number as defined in the [networkIds table](../tables/networkIds.csv) +* `sort_value` is based on a user provided value for *sort-key* and *sort-value* +* `controller` is the controller DID of the stream this event belongs to +* `init_event_cid_bytes` is the CID of the first, e.g. InitEvent, of the stream this event belongs to +* `event_height` is the "height" of the event InitEvent. For InitEvents this value is 0 +* `event_cid_bytes` the CID of the event itself +* `last8bytes` and `last4bytes` takes the last N bytes of the input and prepends with zeros if the input is shorter ### Recon Message The Recon protocol uses a binary string as a message for communication. This message is constructed in the following way, ``` -(EventId (Ahash EventId)+ )+ +(EventId (Ahash EventId)+ ) ``` -Every recon message starts and ends with an eventId and in between every eventId there is an ahash (see [Appendix A](#appendix-a-associative-fash-function)) of all of the eventIds in-between. For efficiency the ahash can be represented using a b-tree under the hood (see [Appendix B](#appendix-b-btree-b-hash-trees))), but this is not a strict requirement. The message can be a binary string because both eventIds and ahash use multicodes so the parser can know when the end of any particular eventId or ahash has been reached. +Every recon message starts and ends with an eventId and in between every eventId there is an ahash (see [Appendix A](#appendix-a-associative-fash-function)) of all of the eventIds in-between. For efficiency the ahash can be represented using a b-tree under the hood (see [Appendix B](#appendix-b-btree-b-hash-trees)), but this is not a strict requirement. The message can be a binary string because both eventIds and ahash use multicodes so the parser can know when the end of any particular eventId or ahash has been reached. ### Stream Set Ranges @@ -99,22 +87,19 @@ With the definition of eventIds above we get an absolute ordering of events. We For example, to construct the range of all streams defined by the *Model* `kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr`, we would construct the start and stop eventIds as follows: ```js -startEventIdBytes = EncodeDagCbor([ - concatBytes( - 0x00, // mainnet - last16Bytes(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr), - last16Bytes(repeat16(0x00)), // stream controller DID - last8Bytes(repeat8(0x00)) // streamid - ) -]) -endEventIdBytes = EncodeDagCbor([ - concatBytes( - 0x00, // mainnet - last16Bytes(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr), - last16Bytes(repeat16(0xff)), // stream controller DID - last8Bytes(repeat8(0xff)) // streamid - ) -]) +start = eventId( + network_id = 0x00, // mainnet + sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + controller = last8Bytes(repeat8(0x00)), // stream controller DID + init_event = last4Bytes(repeat4(0x00)) // streamid +) + +stop = eventId( + network_id = 0x00, // mainnet + sort_value = last8Bytes(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr), + controller = last8Bytes(repeat8(0xff)), // stream controller DID + init_event = last4Bytes(repeat4(0xff)) // streamid +) ``` Given this it should be simple to see how we could split the range into subsets as well. @@ -122,24 +107,19 @@ Given this it should be simple to see how we could split the range into subsets If you want to subscribe only to a specific stream within a *Model* you can use the following structure for your start and stop eventId: ```js -startEventIdBytes = EncodeDagCbor([ - concatBytes( - 0x00, // mainnet - last16Bytes(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr), - last16Bytes(hash-sha256(stream-controller-did)), - last8Bytes(init-event-cid) - ), - 0 // the begining of time -]) -endEventIdBytes = EncodeDagCbor([ - concatBytes( - 0x00, // mainnet - last16Bytes(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr), - last16Bytes(hash-sha256(stream-controller-did)), - last8Bytes(init-event-cid) - ), - 2^32-1 // Sun Feb 07 2106 06:28:15 GMT+0000 (hopefully not actually the end of time, you can specify a larger number if desired) -]) +start = eventId( + network_id = 0x00, // mainnet + sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + controller = last8Bytes(sha256(stream-controller-did)), // stream controller DID + init_event = last4Bytes(repeat4(init-event-cid)) // streamid +) + +end = eventId( + network_id = 0x00, // mainnet + sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + controller = last8Bytes(sha256(stream-controller-did)), // stream controller DID + init_event = last4Bytes(repeat4(init-event-cid)) + 1 // streamid +) ``` ### Interactive Sync Algorithm @@ -209,7 +189,7 @@ they also merge to `"ape"`, `"hog"` nothing to send we are in sync ``` -> (ape, h(bee,cat,doe,eel,fox,gnu), hog) - <- (ape, h(bee,cat,doe,eel,fox,gnu), hog)![ring](../assets/cip-124/ring.png) + <- (ape, h(bee,cat,doe,eel,fox,gnu), hog) ``` #### Random Peer Sync Order @@ -237,8 +217,8 @@ We can announce and discover peers in the same network by only including the net ``` eventId = concatBytes( varint(0xce), - varint(0x71), - EncodeDagCbor([ varint(networkId) ]) + varint(0x30), + varint(networkId) ) ``` @@ -250,12 +230,8 @@ We can announce and discover peers in the same network and that are using the sa eventId = concatBytes( varint(0xce), varint(0x71), - EncodeDagCbor([ - concatBytes( - varint(networkId), - last16Bytes(sort-value) - ) - ]) + varint(networkId), + last16Bytes(sha256(sort-value)) ) ``` @@ -356,11 +332,11 @@ WIP implementation in [rust-ceramic](https://github.com/3box/rust-ceramic/) Recon works well when you want to synchronize event that are contiguous in the key ordering. This opens up for the possibility of an attacker creating junk events where each of them have a separate controller DID to uniformly spread the events across a given stream set. This could significantly slow down the synchronization process, since if we want to exclude spam there would be many holes in the ranges. A possible mitigation for this could be to add more sophisticated forms of access control where only certain DIDs are allowed to write to certain ranges. Further options should be explored as well. -The associative hash functions are only secure if the node is asked to produce the eventIds that hash to the SHAs that make up the SumOfShas associative hash. +The associative hash functions are only secure if the node is asked to produce the eventIds that hash to the SHAs that make up the Sha256Sum associative hash. If a node is allowed to claim a hash is in the set but not show the string it is a Sha256 of it could trivially turn any Sha256Sum into any other. It's important that a node that receives a new eventId over recon synchronizes the data of this event and validates it before it relays this eventId to other peers. Otherwise invalid eventIds might be relayed -## Appendix A: Associative Hash Function (SumOfSha256s) +## Appendix A: Associative Hash Function (Sha256Sum) An associative hash function can simply be defined as a hash function that is associative: @@ -371,24 +347,26 @@ h(h(a, b), c) = h(a, h(b, c)) In Recon this is useful because we often want to compute the hash of all events between two given events. If we were to use a normal cryptographic hash function, it would be rather expensive to recompute this hash every time we split a range. It would also be a big overhead to keep a list of pre-computed hashes. Instead we can use an associative hash and store our events in a tree based on this "ahash". If we need to further split a range we simply recurse into the given sub-tree and join the ahashes we need from that level. We can compute the associative hash by traversing the depth of our tree only twice, e.g. `2*log_b(n)`, where b is the fanout and n is the number of eventIds. -### SumOfSha256s +### Sha256Sum -In particular we define an associative hash function "SumOfSha256s" as simply the sum of sha2-256 hashes over leaf elements: +In particular we define an associative hash function "Sha256Sum" as simply the sum of sha2-256 hashes over leaf elements: -``` -sum(sha256(eventId) for eventIds in stream set) +```py +sum(sha256(eventId) for eventIds in stream_set) ``` -We also register the multihash code `0xXX` to be able to represent SumOfSha256s as a multihash. +We also register the multihash code `0xce12` to be able to represent Sha256Sum as a multihash. #### ahash -To get the ahash of a set we start by using SHA256 two convert the set elements to 32 byte hashs. +To get the ahash of a set we start by using SHA256 two convert the set elements to 32 byte hashes. Next each of the 32 byte hashes are treated as an array of 8 unsigned little endian 32 bit integers. -To add to ahashs we use piecewise addition with all the additions here mod `2^32`. `C = A + B` +To add to Sha256Sum hashes we use piecewise addition with all the additions here mod `2^32`. -``` +`C = A + B` + +```js c[0] = a[0] + b[0]; c[1] = a[1] + b[1]; c[2] = a[2] + b[2]; c[3] = a[3] + b[3]; c[4] = a[4] + b[4]; c[5] = a[5] + b[5]; c[6] = a[6] + b[6]; c[7] = a[7] + b[7]; ``` @@ -399,8 +377,29 @@ If you have a large set distributed across many nodes the hasher can hash and ad then send the hashes to one node for final combination. Little endian 32 bit integers are chosen since x86 and arm CPUs use little endian unsigned integers. -u32 was chosen since it will fit in a JS number and can be calculated in js without on reliance. -big number libraries. +u32 was chosen since it will fit in a JS number and can be calculated in js without reliance on big number libraries. + +By storing the Sha256Sum in a SQL table we can add Sha256Sum with the built in SQL sum. + +```sql +CREATE TABLE data ( + key TEXT, + h0 INTEGER, h1 INTEGER, h2 INTEGER, h3 INTEGER, + h4 INTEGER, h5 INTEGER, h6 INTEGER, h7 INTEGER +); +``` + +```sql +INSERT INTO data (key, h0, h1, h2, h3, h4, h5, h6, h7) +VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?); +``` + +```sql +SELECT sum(h0) & 0xFFFFFFFF, sum(h1) & 0xFFFFFFFF, sum(h2) & 0xFFFFFFFF, sum(h3) & 0xFFFFFFFF, + sum(h4) & 0xFFFFFFFF, sum(h5) & 0xFFFFFFFF, sum(h6) & 0xFFFFFFFF, sum(h7) & 0xFFFFFFFF +FROM data +WHERE key > 'k' AND key < 'l'; +``` Treating the hash as u8 was rejected since it is less performant and using the xor as the associative add was rejected since having a value twice will look the same as not havening that element at all. diff --git a/tables/streamtypes.csv b/tables/streamtypes.csv index d13bf61..5f7ca1f 100644 --- a/tables/streamtypes.csv +++ b/tables/streamtypes.csv @@ -4,4 +4,4 @@ CAIP-10 Link, 0x01, Link blockchain accounts to DIDs, Model, 0x02, Defines a schema shared by group of documents in ComposeDB Model Instance Document, 0x03, Represents a json document in ComposeDB, UNLOADABLE, 0x04, A stream that is not meant to be loaded, -EventId, 0x71, An event id encoded as a dag-cbor list of attributes, https://cips.ceramic.network/CIPs/cip-124 +EventId, 0x30, An event id encoded as a cip-124 EventID, https://cips.ceramic.network/CIPs/cip-124 From f1c9584b32aa74ed529dce11a81c08c023725937 Mon Sep 17 00:00:00 2001 From: Aaron D Goldman Date: Fri, 9 Jun 2023 10:50:16 -0700 Subject: [PATCH 2/7] Fix: EventId updated to 0x05 --- CIPs/cip-124.md | 12 ++++++------ tables/streamtypes.csv | 8 ++++---- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/CIPs/cip-124.md b/CIPs/cip-124.md index 19f3bcb..fc69fb7 100644 --- a/CIPs/cip-124.md +++ b/CIPs/cip-124.md @@ -49,7 +49,7 @@ An eventId is a special type of [StreamId](https://cips.ceramic.network/CIPs/cip ```js concatBytes( varint(0xce), // streamid varint - varint(0x30), // multicodec varint + varint(0x05), // cip-124 EventID varint varint(network_id), // network_id varint last8bytes(sha256(sort_value)), // separator [u8; 8] last8bytes(sha256(controller)), // controller [u8; 8] @@ -60,13 +60,13 @@ concatBytes( ``` Where: -* `0xce` is the streamid multicode -* `0x30` is the streamtype code for an eventId (note that this is also the multicode for multicodec). +* `0xce` is the streamid multicode as defined in the [multicodec table](https://github.com/multiformats/multicodec/blob/master/table.csv). +* `0x05` is the streamtype code for an cip-124 EventID as defined in the [streamTypes table](../tables/streamtypes.csv). * `network_id` is a number as defined in the [networkIds table](../tables/networkIds.csv) * `sort_value` is based on a user provided value for *sort-key* and *sort-value* * `controller` is the controller DID of the stream this event belongs to -* `init_event_cid_bytes` is the CID of the first, e.g. InitEvent, of the stream this event belongs to -* `event_height` is the "height" of the event InitEvent. For InitEvents this value is 0 +* `init_event_cid_bytes` is the CID of the first Event of the this stream. +* `event_height` is the "height" of the event InitEvent. For InitEvents this value is `0` else `prev.event_height + 1`. * `event_cid_bytes` the CID of the event itself * `last8bytes` and `last4bytes` takes the last N bytes of the input and prepends with zeros if the input is shorter @@ -217,7 +217,7 @@ We can announce and discover peers in the same network by only including the net ``` eventId = concatBytes( varint(0xce), - varint(0x30), + varint(0x05), varint(networkId) ) ``` diff --git a/tables/streamtypes.csv b/tables/streamtypes.csv index 5f7ca1f..a2ca23b 100644 --- a/tables/streamtypes.csv +++ b/tables/streamtypes.csv @@ -1,7 +1,7 @@ name, code, description, specification Tile, 0x00, A stream type representing a json document, https://cips.ceramic.network/CIPs/cip-8 CAIP-10 Link, 0x01, Link blockchain accounts to DIDs, https://cips.ceramic.network/CIPs/cip-7 -Model, 0x02, Defines a schema shared by group of documents in ComposeDB -Model Instance Document, 0x03, Represents a json document in ComposeDB, -UNLOADABLE, 0x04, A stream that is not meant to be loaded, -EventId, 0x30, An event id encoded as a cip-124 EventID, https://cips.ceramic.network/CIPs/cip-124 +Model, 0x02, Defines a schema shared by group of documents in ComposeDB, https://github.com/ceramicnetwork/js-ceramic/tree/main/packages/stream-model +Model Instance Document, 0x03, Represents a json document in ComposeDB, https://github.com/ceramicnetwork/js-ceramic/tree/main/packages/stream-model-instance +UNLOADABLE, 0x04, A stream that is not meant to be loaded, https://github.com/ceramicnetwork/js-ceramic/blob/main/packages/stream-model/src/model.ts#L163-L165 +EventId, 0x05, An event id encoded as a cip-124 EventID, https://cips.ceramic.network/CIPs/cip-124 From 1c50385980d418c75cf2283ce703a90128a237f3 Mon Sep 17 00:00:00 2001 From: Aaron D Goldman Date: Tue, 13 Jun 2023 09:18:19 -0700 Subject: [PATCH 3/7] fix: rename Sha256sum to Sha256a --- CIPs/cip-124.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/CIPs/cip-124.md b/CIPs/cip-124.md index fc69fb7..5aad263 100644 --- a/CIPs/cip-124.md +++ b/CIPs/cip-124.md @@ -332,11 +332,11 @@ WIP implementation in [rust-ceramic](https://github.com/3box/rust-ceramic/) Recon works well when you want to synchronize event that are contiguous in the key ordering. This opens up for the possibility of an attacker creating junk events where each of them have a separate controller DID to uniformly spread the events across a given stream set. This could significantly slow down the synchronization process, since if we want to exclude spam there would be many holes in the ranges. A possible mitigation for this could be to add more sophisticated forms of access control where only certain DIDs are allowed to write to certain ranges. Further options should be explored as well. -The associative hash functions are only secure if the node is asked to produce the eventIds that hash to the SHAs that make up the Sha256Sum associative hash. If a node is allowed to claim a hash is in the set but not show the string it is a Sha256 of it could trivially turn any Sha256Sum into any other. +The associative hash functions are only secure if the node is asked to produce the eventIds that hash to the SHAs that make up the Sha256a associative hash. If a node is allowed to claim a hash is in the set but not show the string it is a Sha256 of it could trivially turn any Sha256a into any other. It's important that a node that receives a new eventId over recon synchronizes the data of this event and validates it before it relays this eventId to other peers. Otherwise invalid eventIds might be relayed -## Appendix A: Associative Hash Function (Sha256Sum) +## Appendix A: Associative Hash Function (Sha256a) An associative hash function can simply be defined as a hash function that is associative: @@ -347,22 +347,22 @@ h(h(a, b), c) = h(a, h(b, c)) In Recon this is useful because we often want to compute the hash of all events between two given events. If we were to use a normal cryptographic hash function, it would be rather expensive to recompute this hash every time we split a range. It would also be a big overhead to keep a list of pre-computed hashes. Instead we can use an associative hash and store our events in a tree based on this "ahash". If we need to further split a range we simply recurse into the given sub-tree and join the ahashes we need from that level. We can compute the associative hash by traversing the depth of our tree only twice, e.g. `2*log_b(n)`, where b is the fanout and n is the number of eventIds. -### Sha256Sum +### Sha256a -In particular we define an associative hash function "Sha256Sum" as simply the sum of sha2-256 hashes over leaf elements: +In particular we define an associative hash function "Sha256a" as simply the sum of sha2-256 hashes over leaf elements: ```py sum(sha256(eventId) for eventIds in stream_set) ``` -We also register the multihash code `0xce12` to be able to represent Sha256Sum as a multihash. +We also register the multihash code `0xce12` to be able to represent Sha256a as a multihash. #### ahash To get the ahash of a set we start by using SHA256 two convert the set elements to 32 byte hashes. Next each of the 32 byte hashes are treated as an array of 8 unsigned little endian 32 bit integers. -To add to Sha256Sum hashes we use piecewise addition with all the additions here mod `2^32`. +To add to Sha256a hashes we use piecewise addition with all the additions here mod `2^32`. `C = A + B` @@ -379,7 +379,7 @@ then send the hashes to one node for final combination. Little endian 32 bit integers are chosen since x86 and arm CPUs use little endian unsigned integers. u32 was chosen since it will fit in a JS number and can be calculated in js without reliance on big number libraries. -By storing the Sha256Sum in a SQL table we can add Sha256Sum with the built in SQL sum. +By storing the Sha256a in a SQL table we can add Sha256a with the built in SQL sum. ```sql CREATE TABLE data ( From 706f89baa7cc24a8145275f2045dd8cef9215c25 Mon Sep 17 00:00:00 2001 From: Aaron D Goldman Date: Wed, 19 Jul 2023 15:26:20 -0700 Subject: [PATCH 4/7] feat: Event height detail for parsing out the CID --- CIPs/cip-124.md | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/CIPs/cip-124.md b/CIPs/cip-124.md index 3e87f57..5529130 100644 --- a/CIPs/cip-124.md +++ b/CIPs/cip-124.md @@ -55,7 +55,7 @@ concatBytes( last8bytes(sha256(controller)), // controller [u8; 8] last4bytes(init_event_cid_bytes), // StreamID [u8; 4] cbor(event_height), // event_height cbor unsigned int - event_cid_bytes, // [u8] + event_cid_bytes, // [u8] a CID or the 0 byte to indicate a fencepost ) ``` @@ -66,10 +66,28 @@ Where: * `sort_value` is based on a user provided value for *sort-key* and *sort-value* * `controller` is the controller DID of the stream this event belongs to * `init_event_cid_bytes` is the CID of the first Event of the this stream. -* `event_height` is the "height" of the event InitEvent. For InitEvents this value is `0` else `prev.event_height + 1`. -* `event_cid_bytes` the CID of the event itself +* `event_height` is the "height" of the event InitEvent. For InitEvents this value is `1` else `prev.event_height + 1`. +* `event_cid_bytes` the CID of the event itself or the 0 byte for a fencepost as it doesn't reference an event. * `last8bytes` and `last4bytes` takes the last N bytes of the input and prepends with zeros if the input is shorter +Event height [CBOR unsigned integer](https://www.rfc-editor.org/rfc/rfc8949.html#section-3.1-2.1) + * 0 - 23 + * 0xXX + * the literal byte + * 24 - 255 + * 0x18XX + * the 24 byte then the u8 + * 256 - 65,535 + * 0x19XXXX + * the 25 byte then the u16 + * 65,536 - 4,294,967,295 + * 0x1aXXXXXXXX + * the 26 byte then the u32 + * 4,294,967,296 - 18,446,744,073,709,551,615 + * 0x1bXXXXXXXXXXXXXXXX + * the 27 byte then the u64 + + ### Recon Message The Recon protocol uses a binary string as a message for communication. This message is constructed in the following way, From cfc293a8c67e6b65b1ed1a2d134ec5da9c458627 Mon Sep 17 00:00:00 2001 From: Aaron D Goldman Date: Wed, 19 Jul 2023 15:42:19 -0700 Subject: [PATCH 5/7] fix: fmt --- CIPs/cip-124.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/CIPs/cip-124.md b/CIPs/cip-124.md index 944b4f8..35ddd27 100644 --- a/CIPs/cip-124.md +++ b/CIPs/cip-124.md @@ -8,6 +8,8 @@ category: Networking created: 2023-01-18 edited: 2023-07-19 --- + + ## Simple Summary @@ -24,9 +26,10 @@ Stream sets are bundles of streams that can be gossiped about as a group or in s Currently nodes broadcast updates to streams to every node in the network using a single libp2p pubsub topic. This incurs a lot of work on all nodes to process messages that they don’t necessarily care about. It also means that the throughput of the network is limited by the bandwidth, leading either to prioritizing high bandwidth nodes or greatly limiting the network throughput to support low bandwidth nodes. Furthermore, if a node missed the broadcast, it would not detect the missing stream events unless it hears a later update or uses some out of band synchronization protocol like "historical data sync" in ComposeDB that scans the Ethereum blockchain for anchor transactions. -Recon aims to provide low to no overhead for nodes with no overlap in interest, while retaining a high probability of getting the **latest** events from a stream shortly after any node has the events, without any need for remote connections at query time. By ceasing to publish updates in the pubsub channel and instead organizing them into a stream set, nodes interested in those streams can synchronize with each other without putting load on uninterested nodes in the network. A secondary goal of stream sets is to give a structure for sharding a stream set across multiple nodes. By supporting the ability to synchronize only a sub-range of the stream set, the burden of storing, indexing, and retrieving streams can be sharded among nodes. +Recon aims to provide low to no overhead for nodes with no overlap in interest, while retaining a high probability of getting the **latest** events from a stream shortly after any node has the events, without any need for remote connections at query time. By ceasing to publish updates in the pubsub channel and instead organizing them into a stream set, nodes interested in those streams can synchronize with each other without putting load on uninterested nodes in the network. A secondary goal of stream sets is to give a structure for sharding a stream set across multiple nodes. By supporting the ability to synchronize only a sub-range of the stream set, the burden of storing, indexing, and retrieving streams can be sharded among nodes. + +Finally, nodes also need a way to find other nodes interested in the stream set or sub-range, so that they can synchronize with them. Recon relies on nodes gossiping their interest to peers, as well as keeping a list of their peers' interest. This way nodes that are in sync, or nearly in sync, stay in sync with very little bandwidth. Nodes can also avoid sending stream event announcements to nodes that have no interest in the stream ranges. -Finally, nodes also need a way to find other nodes interested in the stream set or sub-range, so that they can synchronize with them. Recon relies on nodes gossiping their interest to peers, as well as keeping a list of their peers' interest. This way nodes that are in sync, or nearly in sync, stay in sync with very little bandwidth. Nodes can also avoid sending stream event announcements to nodes that have no interest in the stream ranges. ## Specification @@ -67,7 +70,7 @@ Where: * `controller` is the controller DID of the stream this event belongs to * `init_event_cid_bytes` is the CID of the first Event of the this stream. * `event_height` is the "height" of the event InitEvent. For InitEvents this value is `1` else `prev.event_height + 1`. -* `event_cid_bytes` the CID of the event itself or the 0 byte for a fencepost as it doesn't reference an event. +* `event_cid_bytes` the CID of the event itself or the 0 byte for a fencepost as it doesn't reference an event. * `last8bytes` and `last4bytes` takes the last N bytes of the input and prepends with zeros if the input is shorter Event height [CBOR unsigned integer](https://www.rfc-editor.org/rfc/rfc8949.html#section-3.1-2.1) @@ -99,7 +102,7 @@ Every recon message starts and ends with an eventId and in between every eventId ### Stream Set Ranges -With the definition of eventIds above we get an absolute ordering of events. We can now define subsets of the total range of all eventIds by defining a start and a stop eventId. +With the definition of eventIds above we get an absolute ordering of events. We can now define subsets of the total range of all eventIds by defining a start and a stop eventId. For example, to construct the range of all streams defined by the *Model* `kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr`, we would construct the start and stop eventIds as follows: @@ -252,6 +255,7 @@ eventId = concatBytes( ) ``` + ## Rationale @@ -330,6 +334,7 @@ We could change LibP2P PubSub to only send the events that a node cares about to This approach was rejected because it does not solve the missed messages problem. + ## Backwards Compatibility @@ -353,6 +358,7 @@ The associative hash functions are only secure if the node is asked to produce t It's important that a node that receives a new eventId over recon synchronizes the data of this event and validates it before it relays this eventId to other peers. Otherwise invalid eventIds might be relayed + ## Appendix A: Associative Hash Function (Sha256a) An associative hash function can simply be defined as a hash function that is associative: @@ -428,7 +434,6 @@ A b-tree with fanout 2: ![fanout2](../assets/cip-124/b_hash_tree_2.png) - ## Appendix B: B#tree (B hash trees) e.g. [MST](https://hal.inria.fr/hal-02303490/document) / [Prolly Trees](https://docs.dolthub.com/architecture/storage-engine/prolly-tree) From 06d05b43d1b2ae50a46f38e3a34a297e055f2cd0 Mon Sep 17 00:00:00 2001 From: Aaron D Goldman Date: Mon, 31 Jul 2023 09:36:31 -0700 Subject: [PATCH 6/7] fix: add fencepost decode should return None --- CIPs/cip-124.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/CIPs/cip-124.md b/CIPs/cip-124.md index 35ddd27..5046fd0 100644 --- a/CIPs/cip-124.md +++ b/CIPs/cip-124.md @@ -6,7 +6,7 @@ discussions-to: https://forum.ceramic.network/t/cip-124-recon-tip-synchronizatio status: Draft category: Networking created: 2023-01-18 -edited: 2023-07-19 +edited: 2023-07-31 --- @@ -58,7 +58,7 @@ concatBytes( last8bytes(sha256(controller)), // controller [u8; 8] last4bytes(init_event_cid_bytes), // StreamID [u8; 4] cbor(event_height), // event_height cbor unsigned int - event_cid_bytes, // [u8] a CID or the 0 byte to indicate a fencepost + event_cid_bytes, // [u8] a CID or the (0x00 or 0xFF) byte to indicate a fencepost ) ``` @@ -69,8 +69,8 @@ Where: * `sort_value` is based on a user provided value for *sort-key* and *sort-value* * `controller` is the controller DID of the stream this event belongs to * `init_event_cid_bytes` is the CID of the first Event of the this stream. -* `event_height` is the "height" of the event InitEvent. For InitEvents this value is `1` else `prev.event_height + 1`. -* `event_cid_bytes` the CID of the event itself or the 0 byte for a fencepost as it doesn't reference an event. +* `event_height` is the "height" of the event InitEvent. For InitEvents this value is `0` else `prev.event_height + 1`. +* `event_cid_bytes` the CID of the event itself or the (0x00 or 0xFF) byte for a fencepost as it doesn't reference an event. * `last8bytes` and `last4bytes` takes the last N bytes of the input and prepends with zeros if the input is shorter Event height [CBOR unsigned integer](https://www.rfc-editor.org/rfc/rfc8949.html#section-3.1-2.1) @@ -90,6 +90,8 @@ Event height [CBOR unsigned integer](https://www.rfc-editor.org/rfc/rfc8949.html * 0x1bXXXXXXXXXXXXXXXX * the 27 byte then the u64 +When decoding if you reach an invalid value stop decoding and return a None value. This is not an EventID it is a fencepost. + ### Recon Message The Recon protocol uses a binary string as a message for communication. This message is constructed in the following way, From d3d47af2854f7abd306bf7145b4d243dd16be3bf Mon Sep 17 00:00:00 2001 From: Aaron D Goldman Date: Wed, 2 Aug 2023 15:06:27 -0700 Subject: [PATCH 7/7] feat: add sort key to the separator --- CIPs/cip-124.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/CIPs/cip-124.md b/CIPs/cip-124.md index 5046fd0..6daf002 100644 --- a/CIPs/cip-124.md +++ b/CIPs/cip-124.md @@ -6,7 +6,7 @@ discussions-to: https://forum.ceramic.network/t/cip-124-recon-tip-synchronizatio status: Draft category: Networking created: 2023-01-18 -edited: 2023-07-31 +edited: 2023-08-02 --- @@ -54,7 +54,7 @@ concatBytes( varint(0xce), // streamid varint varint(0x05), // cip-124 EventID varint varint(network_id), // network_id varint - last8bytes(sha256(sort_value)), // separator [u8; 8] + last8bytes(sha256(sort_key + "|" + sort_value)), // separator [u8; 8] last8bytes(sha256(controller)), // controller [u8; 8] last4bytes(init_event_cid_bytes), // StreamID [u8; 4] cbor(event_height), // event_height cbor unsigned int @@ -111,14 +111,14 @@ For example, to construct the range of all streams defined by the *Model* `kjzl6 ```js start = eventId( network_id = 0x00, // mainnet - sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), controller = last8Bytes(repeat8(0x00)), // stream controller DID init_event = last4Bytes(repeat4(0x00)) // streamid ) stop = eventId( network_id = 0x00, // mainnet - sort_value = last8Bytes(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr), + sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), controller = last8Bytes(repeat8(0xff)), // stream controller DID init_event = last4Bytes(repeat4(0xff)) // streamid ) @@ -131,14 +131,14 @@ If you want to subscribe only to a specific stream within a *Model* you can use ```js start = eventId( network_id = 0x00, // mainnet - sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), controller = last8Bytes(sha256(stream-controller-did)), // stream controller DID init_event = last4Bytes(repeat4(init-event-cid)) // streamid ) end = eventId( network_id = 0x00, // mainnet - sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), controller = last8Bytes(sha256(stream-controller-did)), // stream controller DID init_event = last4Bytes(repeat4(init-event-cid)) + 1 // streamid )