Skip to content

hashing_method

Alexey Khudyakov edited this page Oct 14, 2019 · 2 revisions

Current algorithm of computing hash of value is to serialize that value using CBOR and then compute hash of serialized data. It's deficient for following reasons:

  1. It doesn't work for Merkle trees. Root hash of Merkle tree is not hash of serialized tree!

  2. CBOR allows several valid encodings of same value. For example lists have two encodings: definite and indefinite arrays. Furthemore is has mechanism of extensions so it's possible that encoding of some value could change in the future and will break our code.

In order to resolve these problems folloing scheme is proposed.

New hashing

First change is expose fold structure of hash function. For efficiency reasons it's desirable to have mutable accumulator which requires ST monad. (It's straightfowrward to generalize to PrimMonad but ST will keep exposition clearer)

class CryptoHash alg where
  type HashState alg :: * -> *
  newHashState    :: ST s (HashState alg s)
  updateHashState :: HashState alg s -> ByteString -> ST s ()
  freezeDigest    :: HashState alg s -> ST s (Hash alg)

Minor efficiency concern is whether we should provide ByteString alternative for hashing small objects (think Int etc) where overhead of creation and collections of full ByteString becomes noticeable.

Another type class describes how values should be hashed:

class CryptoHashable a where
  hashValue :: CryptoHash alg => HashState alg s -> a -> ST s ()

Definition of hash is changed to:

hash :: (CryptoHashable a, CryptoHash alg) => a -> Hash alg
hash a = runST $ do
  st <- newHashState
  hashValue st a
  freezeDigest st

Instance definition

Hashing is supposed to be done by hashing optional type/constructor tag and every field. For example:

data Foo = Foo Int Char Double

instance CryptoHashable Foo where
  hashValue st (Foo a b c) = do
    updateHashState st (1 :: Int)
	updateHashState st a
	updateHashState st b
	updateHashState st c

So for records hash is computed by depth first traversal. It's not completely clear how to define hashes for other structures.

  • Fixed width integers. Low-endian encoding of value? Also do not provide instance for Int & Word since thoose are variable width or cast them to Int64 & Word64 first?
  • Tuples are just sequence of fields?
  • Do we need to add number of fields to hash?
  • How should we treat lists/vectors? Structurually? Special sequence tag, length and then fold over fields?

Merkle trees

Handling of Merkle trees presents special interest. It's one of motivations of this proposal and it presents its own difficulties. Here is implementation of nonempty binary Merkle tree which will be used for illustration purpose:

data MerkleTree alg a
  = Branch (Hash alg) (MerkleTree alg a) (MerkleTree alg a)
  | Leaf   (Hash alg) a

First thing to notice is that MerkleTree is parametrized by hash type and CryptoHashable will work with any hashing algorithm. This means:

  • hash (tree :: MerkleTree alg a) returns hash of hash of root node! but when used in

Signatures

Currently we sign CBOR serializarion of value. It naturally suffers from similar problems as hashing do. Possible solution is to sign hash which is computed using method above.

Clone this wiki locally