-
Notifications
You must be signed in to change notification settings - Fork 0
hashing_method
Current algorithm of computing hash of value is to serialize that value using CBOR and then compute hash of serialized data. It's deficient for following reasons:
-
It doesn't work for Merkle trees. Root hash of Merkle tree is not hash of serialized tree!
-
CBOR allows several valid encodings of same value. For example lists have two encodings: definite and indefinite arrays. Furthemore is has mechanism of extensions so it's possible that encoding of some value could change in the future and will break our code.
In order to resolve these problems folloing scheme is proposed.
First change is expose fold structure of hash function. For efficiency reasons
it's desirable to have mutable accumulator which requires ST monad. (It's
straightfowrward to generalize to PrimMonad but ST will keep exposition
clearer)
class CryptoHash alg where
type HashState alg :: * -> *
newHashState :: ST s (HashState alg s)
updateHashState :: HashState alg s -> ByteString -> ST s ()
freezeDigest :: HashState alg s -> ST s (Hash alg)Minor efficiency concern is whether we should provide ByteString alternative
for hashing small objects (think Int etc) where overhead of creation and
collections of full ByteString becomes noticeable.
Another type class describes how values should be hashed:
class CryptoHashable a where
hashValue :: CryptoHash alg => HashState alg s -> a -> ST s ()Definition of hash is changed to:
hash :: (CryptoHashable a, CryptoHash alg) => a -> Hash alg
hash a = runST $ do
st <- newHashState
hashValue st a
freezeDigest stHashing is supposed to be done by hashing optional type/constructor tag and every field. For example:
data Foo = Foo Int Char Double
instance CryptoHashable Foo where
hashValue st (Foo a b c) = do
updateHashState st (1 :: Int)
updateHashState st a
updateHashState st b
updateHashState st cSo for records hash is computed by depth first traversal. It's not completely clear how to define hashes for other structures.
- Fixed width integers. Low-endian encoding of value? Also do not provide
instance for
Int&Wordsince thoose are variable width or cast them toInt64&Word64first? - Tuples are just sequence of fields?
- Do we need to add number of fields to hash?
- How should we treat lists/vectors? Structurually? Special sequence tag, length and then fold over fields?
Handling of Merkle trees presents special interest. It's one of motivations of this proposal and it presents its own difficulties. Here is implementation of nonempty binary Merkle tree which will be used for illustration purpose:
data MerkleTree alg a
= Branch (Hash alg) (MerkleTree alg a) (MerkleTree alg a)
| Leaf (Hash alg) aFirst thing to notice is that MerkleTree is parametrized by hash type and
CryptoHashable will work with any hashing algorithm. This means:
-
hash (tree :: MerkleTree alg a)returns hash of hash of root node! but when used in
Currently we sign CBOR serializarion of value. It naturally suffers from similar problems as hashing do. Possible solution is to sign hash which is computed using method above.