diff --git a/CIPs/cip-124.md b/CIPs/cip-124.md index 3e87f57..6daf002 100644 --- a/CIPs/cip-124.md +++ b/CIPs/cip-124.md @@ -6,8 +6,10 @@ discussions-to: https://forum.ceramic.network/t/cip-124-recon-tip-synchronizatio status: Draft category: Networking created: 2023-01-18 -edited: 2023-06-23 +edited: 2023-08-02 --- + + ## Simple Summary @@ -24,9 +26,10 @@ Stream sets are bundles of streams that can be gossiped about as a group or in s Currently nodes broadcast updates to streams to every node in the network using a single libp2p pubsub topic. This incurs a lot of work on all nodes to process messages that they don’t necessarily care about. It also means that the throughput of the network is limited by the bandwidth, leading either to prioritizing high bandwidth nodes or greatly limiting the network throughput to support low bandwidth nodes. Furthermore, if a node missed the broadcast, it would not detect the missing stream events unless it hears a later update or uses some out of band synchronization protocol like "historical data sync" in ComposeDB that scans the Ethereum blockchain for anchor transactions. -Recon aims to provide low to no overhead for nodes with no overlap in interest, while retaining a high probability of getting the **latest** events from a stream shortly after any node has the events, without any need for remote connections at query time. By ceasing to publish updates in the pubsub channel and instead organizing them into a stream set, nodes interested in those streams can synchronize with each other without putting load on uninterested nodes in the network. A secondary goal of stream sets is to give a structure for sharding a stream set across multiple nodes. By supporting the ability to synchronize only a sub-range of the stream set, the burden of storing, indexing, and retrieving streams can be sharded among nodes. +Recon aims to provide low to no overhead for nodes with no overlap in interest, while retaining a high probability of getting the **latest** events from a stream shortly after any node has the events, without any need for remote connections at query time. By ceasing to publish updates in the pubsub channel and instead organizing them into a stream set, nodes interested in those streams can synchronize with each other without putting load on uninterested nodes in the network. A secondary goal of stream sets is to give a structure for sharding a stream set across multiple nodes. By supporting the ability to synchronize only a sub-range of the stream set, the burden of storing, indexing, and retrieving streams can be sharded among nodes. + +Finally, nodes also need a way to find other nodes interested in the stream set or sub-range, so that they can synchronize with them. Recon relies on nodes gossiping their interest to peers, as well as keeping a list of their peers' interest. This way nodes that are in sync, or nearly in sync, stay in sync with very little bandwidth. Nodes can also avoid sending stream event announcements to nodes that have no interest in the stream ranges. -Finally, nodes also need a way to find other nodes interested in the stream set or sub-range, so that they can synchronize with them. Recon relies on nodes gossiping their interest to peers, as well as keeping a list of their peers' interest. This way nodes that are in sync, or nearly in sync, stay in sync with very little bandwidth. Nodes can also avoid sending stream event announcements to nodes that have no interest in the stream ranges. ## Specification @@ -51,11 +54,11 @@ concatBytes( varint(0xce), // streamid varint varint(0x05), // cip-124 EventID varint varint(network_id), // network_id varint - last8bytes(sha256(sort_value)), // separator [u8; 8] + last8bytes(sha256(sort_key + "|" + sort_value)), // separator [u8; 8] last8bytes(sha256(controller)), // controller [u8; 8] last4bytes(init_event_cid_bytes), // StreamID [u8; 4] cbor(event_height), // event_height cbor unsigned int - event_cid_bytes, // [u8] + event_cid_bytes, // [u8] a CID or the (0x00 or 0xFF) byte to indicate a fencepost ) ``` @@ -67,9 +70,28 @@ Where: * `controller` is the controller DID of the stream this event belongs to * `init_event_cid_bytes` is the CID of the first Event of the this stream. * `event_height` is the "height" of the event InitEvent. For InitEvents this value is `0` else `prev.event_height + 1`. -* `event_cid_bytes` the CID of the event itself +* `event_cid_bytes` the CID of the event itself or the (0x00 or 0xFF) byte for a fencepost as it doesn't reference an event. * `last8bytes` and `last4bytes` takes the last N bytes of the input and prepends with zeros if the input is shorter +Event height [CBOR unsigned integer](https://www.rfc-editor.org/rfc/rfc8949.html#section-3.1-2.1) + * 0 - 23 + * 0xXX + * the literal byte + * 24 - 255 + * 0x18XX + * the 24 byte then the u8 + * 256 - 65,535 + * 0x19XXXX + * the 25 byte then the u16 + * 65,536 - 4,294,967,295 + * 0x1aXXXXXXXX + * the 26 byte then the u32 + * 4,294,967,296 - 18,446,744,073,709,551,615 + * 0x1bXXXXXXXXXXXXXXXX + * the 27 byte then the u64 + +When decoding if you reach an invalid value stop decoding and return a None value. This is not an EventID it is a fencepost. + ### Recon Message The Recon protocol uses a binary string as a message for communication. This message is constructed in the following way, @@ -82,21 +104,21 @@ Every recon message starts and ends with an eventId and in between every eventId ### Stream Set Ranges -With the definition of eventIds above we get an absolute ordering of events. We can now define subsets of the total range of all eventIds by defining a start and a stop eventId. +With the definition of eventIds above we get an absolute ordering of events. We can now define subsets of the total range of all eventIds by defining a start and a stop eventId. For example, to construct the range of all streams defined by the *Model* `kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr`, we would construct the start and stop eventIds as follows: ```js start = eventId( network_id = 0x00, // mainnet - sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), controller = last8Bytes(repeat8(0x00)), // stream controller DID init_event = last4Bytes(repeat4(0x00)) // streamid ) stop = eventId( network_id = 0x00, // mainnet - sort_value = last8Bytes(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr), + sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), controller = last8Bytes(repeat8(0xff)), // stream controller DID init_event = last4Bytes(repeat4(0xff)) // streamid ) @@ -109,14 +131,14 @@ If you want to subscribe only to a specific stream within a *Model* you can use ```js start = eventId( network_id = 0x00, // mainnet - sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), controller = last8Bytes(sha256(stream-controller-did)), // stream controller DID init_event = last4Bytes(repeat4(init-event-cid)) // streamid ) end = eventId( network_id = 0x00, // mainnet - sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), + sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)), controller = last8Bytes(sha256(stream-controller-did)), // stream controller DID init_event = last4Bytes(repeat4(init-event-cid)) + 1 // streamid ) @@ -235,6 +257,7 @@ eventId = concatBytes( ) ``` + ## Rationale @@ -313,6 +336,7 @@ We could change LibP2P PubSub to only send the events that a node cares about to This approach was rejected because it does not solve the missed messages problem. + ## Backwards Compatibility @@ -336,6 +360,7 @@ The associative hash functions are only secure if the node is asked to produce t It's important that a node that receives a new eventId over recon synchronizes the data of this event and validates it before it relays this eventId to other peers. Otherwise invalid eventIds might be relayed + ## Appendix A: Associative Hash Function (Sha256a) An associative hash function can simply be defined as a hash function that is associative: @@ -411,7 +436,6 @@ A b-tree with fanout 2: ![fanout2](../assets/cip-124/b_hash_tree_2.png) - ## Appendix B: B#tree (B hash trees) e.g. [MST](https://hal.inria.fr/hal-02303490/document) / [Prolly Trees](https://docs.dolthub.com/architecture/storage-engine/prolly-tree)