add table-ID consistency check.

# FNV-1a 64-bit on 32-bit MCUs: Principles, Implementation, and Trade-offs

<img width="1024" height="559" alt="Image" src="https://github.com/user-attachments/assets/89c238dc-e8ff-4ebe-b96f-b2f0c2b0307c" />

## Core Conclusion First

The key point of FNV-1a 64-bit is not whether a 32-bit MCU can handle 64-bit integers. The real questions are these:

1. Why this hash works.
2. What each step in the implementation is actually doing.
3. Which scenarios it fits, and which it does not.

Once those three points are clear, the engineering trade-off between 32-bit and 64-bit becomes much easier to discuss, and the conclusion is much closer to how real-world selection decisions are actually made.

The standard FNV-1a update process can be summarized as one fixed loop:

- Start from `offset_basis`
- For each input byte, XOR it into the state first
- Then multiply by a fixed `FNV_Prime`
- Keep the result within a fixed bit width

For the 64-bit version, the standard constants are:

```c
#define FNV1A64_OFFSET_BASIS 0xCBF29CE484222325ULL
#define FNV1A64_PRIME        0x00000100000001B3ULL
```

These two constants were not chosen arbitrarily. The `offset_basis` determines the initial state, and `FNV_Prime` determines how the state is diffused after each newly absorbed byte.

---

## Start with the Principle

### FNV-1a does only two things: absorb input and diffuse state

Each round of FNV-1a performs only two steps:

```text
state = state XOR byte
state = state * FNV_Prime   (mod 2^N)
```

The first step injects the current byte into the state.  
The second step re-mixes the state so that a low-bit change can quickly propagate across a wider bit range.

Together, these two steps define the essential behavior of FNV-1a:

- Input order affects the result
- Every new byte changes all subsequent states
- The full input does not need to be buffered, so streaming updates are natural

That is one reason FNV-1a is well suited to embedded systems. UART streams, network packets, flash block reads, and log parsing are all naturally processed incrementally as data arrives.

### Why it is “XOR first, then multiply by the prime”

The most common confusion in the FNV family is the difference between FNV-1 and FNV-1a.

- FNV-1: multiply first, then XOR
- FNV-1a: XOR first, then multiply

In engineering practice, FNV-1a is used more often. The reason is straightforward: for short data, small messages, and short strings, FNV-1a usually provides better distribution. So for most ordinary use cases, FNV-1a is the more reasonable default.

This matters even more on MCUs, because many real inputs are short by nature, for example:

- command words
- configuration keys
- protocol field names
- file path fragments
- resource tags
- small binary headers

If most inputs are only a few bytes to a few dozen bytes long, then “distribution quality on small inputs” usually matters more than theoretical throughput on large blocks.

### Why `FNV_Prime` has this specific value

The 32-bit version uses this prime:

```text
0x01000193 = 2^24 + 2^8 + 0x93
```

The 64-bit version uses this prime:

```text
0x00000100000001B3 = 2^40 + 2^8 + 0xB3
```

This class of prime has two engineering advantages:

1. Its bit structure was selected specifically for hash quality, not simply because it is “easy to compute.”
2. Because it has clear high-bit and low-bit structure, the multiplication can, on some targets, be decomposed into shifts, a small-constant multiply, and carry propagation.

So an implementation of FNV-1a does not necessarily have to rely on a single native multiply instruction. On resource-constrained targets, the structure of the prime itself provides room for decomposition.

### `offset_basis` is not decorative; it is the initial perturbation

If the state started at `0`, the algorithm would be more prone to predictable degeneracy. FNV-1a uses a fixed nonzero `offset_basis` as the starting state so that the first byte is processed from an already perturbed initial state.

For the 32-bit version, the constant is:

```c
0x811C9DC5
```

For the 64-bit version, the constant is:

```c
0xCBF29CE484222325
```

This guarantees that the same input always yields the same standard result across implementations and platforms.

---

## Understanding the Code through `hash_32a.c`

### Why explain a 64-bit algorithm using 32-bit source code

`hash_32a.c` is not a 64-bit file, but it is an excellent teaching entry point for a simple reason:

- the update path is short
- the key operations are not over-abstracted
- both “direct multiplication” and “shift-add expansion” appear in the same codebase
- it is well suited for explaining the per-byte update mechanism of FNV-1a

In other words, the value of this file is not that it is 32-bit, but that it exposes the algorithm steps very clearly.

### First look at an equivalent simplified core loop

The following example is not a verbatim copy of the original file. It is an equivalent form organized around the structure of `hash_32a.c`, with emphasis on the essential logic:

```c
#include <stdint.h>
#include <stddef.h>

#define FNV1A32_OFFSET_BASIS 0x811C9DC5u
#define FNV1A32_PRIME        0x01000193u

static uint32_t fnv1a32_step(uint32_t state, uint8_t byte)
{
    state ^= (uint32_t)byte;

    /* equivalent to state *= 0x01000193u */
    state += (state << 1)
          +  (state << 4)
          +  (state << 7)
          +  (state << 8)
          +  (state << 24);

    return state;
}

uint32_t fnv1a32_buf(const void *data, size_t len)
{
    const uint8_t *p = (const uint8_t *)data;
    uint32_t state = FNV1A32_OFFSET_BASIS;

    while (len--) {
        state = fnv1a32_step(state, *p++);
    }

    return state;
}
```

This code already exposes the full essence of FNV-1a.

### What the first line is really doing: absorbing the current byte into the state

```c
state ^= (uint32_t)byte;
```

This line only injects the byte; it does not diffuse it.

After this operation, the new byte influences the state, but only locally. If the function returned immediately here, the change would mostly remain concentrated in the lower bits.

So the real point of FNV-1a is not the XOR alone, but the fact that the XOR is immediately followed by a fixed multiplication.

### What the second step is really doing: diffusing the state

The 32-bit prime is `0x01000193`. Expanding it gives:

```text
0x01000193 = 2^24 + 2^8 + 2^7 + 2^4 + 2^1 + 1
```

Therefore:

```text
state * 0x01000193
= state
+ (state << 1)
+ (state << 4)
+ (state << 7)
+ (state << 8)
+ (state << 24)
```

That is the mathematical origin of the shift-add branch in the source code.  
In other words, what may look like a “bit trick optimization” is in fact simply an implementation of multiplication by a fixed constant.

This detail matters, because it shows that FNV-1a never has to be written only as a single multiply instruction. As long as the multiplier is fixed, it can be expanded according to the structure of the constant.

### Why the source keeps both the direct multiply path and the expanded shift-add path

In `hash_32a.c`, there is both a direct multiply-by-`FNV_32_PRIME` path and a shift-add expanded path. That is not duplication. It is an explicit acknowledgement of one fact:

- the algorithm is fixed
- the best implementation path depends on the compiler and hardware

On some platforms, direct multiplication is better.  
On others, the shift-add form may optimize better or make cost easier to reason about.

This is exactly the kind of design that fits MCU engineering, because embedded implementations care not only about whether something works, but also about:

- whether the instruction sequence is stable
- whether the compiler inserts extra helper library calls
- whether the implementation introduces uncontrollable long latency
- whether it is easy to statically analyze and estimate performance

---

## Moving Smoothly from 32-bit Code to a 64-bit Implementation

### The 64-bit version does not change the algorithm

Switching from 32-bit to 64-bit does not change the update steps at all:

```c
static uint64_t fnv1a64_step(uint64_t state, uint8_t byte)
{
    state ^= (uint64_t)byte;
    state *= 0x00000100000001B3ULL;
    return state;
}
```

Only two things change:

1. The state width expands from 32 bits to 64 bits.
2. The constants change from the 32-bit `offset_basis / prime` pair to the 64-bit pair.

So the main benefit of the 64-bit version is not that the algorithm becomes more complex, but that the state space becomes much larger, which lowers the probability of accidental collisions in ordinary engineering use.

### The most direct 64-bit reference implementation

```c
#include <stdint.h>
#include <stddef.h>

#define FNV1A64_OFFSET_BASIS 0xCBF29CE484222325ULL
#define FNV1A64_PRIME        0x00000100000001B3ULL

uint64_t fnv1a64_buf(const void *data, size_t len)
{
    const uint8_t *p = (const uint8_t *)data;
    uint64_t state = FNV1A64_OFFSET_BASIS;

    while (len--) {
        state ^= (uint64_t)(*p++);
        state *= FNV1A64_PRIME;
    }

    return state;
}
```

If the target platform can afford the cost of `uint64_t` addition and multiplication, this is already the clearest, easiest-to-verify, and most portable form.

### A more practical engineering form: an incremental context interface

FNV-1a is naturally suited to an “init, feed chunks, get result” interface rather than only a one-shot function that hashes a complete buffer.

```c
typedef struct {
    uint64_t state;
} fnv1a64_ctx_t;

static inline void fnv1a64_init(fnv1a64_ctx_t *ctx)
{
    ctx->state = FNV1A64_OFFSET_BASIS;
}

static inline void fnv1a64_update(fnv1a64_ctx_t *ctx, const void *data, size_t len)
{
    const uint8_t *p = (const uint8_t *)data;
    while (len--) {
        ctx->state ^= (uint64_t)(*p++);
        ctx->state *= FNV1A64_PRIME;
    }
}

static inline uint64_t fnv1a64_final(const fnv1a64_ctx_t *ctx)
{
    return ctx->state;
}
```

This form maps directly into:

- UART receive state machines
- fragmented network packet assembly
- file system chunked reads
- flash page scanning
- protocol parsers
- log aggregators

RFC 9923 also provides context-style incremental interfaces, which confirms that this usage is itself a standard way to apply FNV.

---

## How to Judge Suitability by Use Case

### Suitable scenarios: fast lookup, indexing, and mapping

FNV has long been used in many non-cryptographic hash scenarios, including database indexing, DNS servers, file fingerprints, caches, symbol tables, hash tables, and search/index systems.

Mapped to MCUs and embedded software, the most common use cases are:

- command-word hashing
- configuration-key hashing
- resource-path and resource-name mapping
- protocol field-name matching
- hash bucket indexing
- internal object identifiers
- lightweight non-security file or block identifiers

These use cases share the same characteristics:

- speed and implementation simplicity matter
- input lengths are usually not large
- the goal is quick distinction and lookup
- resistance to active adversaries is not required

### Also well suited to streaming data

FNV-1a is naturally friendly to streaming input. As long as the current state is preserved, additional input can be absorbed continuously. RFC 9923 also explicitly states that the FNV result of a prefix can be used directly as the `offset_basis` for the next segment, and the final result is equivalent to hashing the concatenated whole in one pass.

That means:

- prefixes can be reused
- shared headers can be precomputed
- layered protocols can accumulate segment by segment
- constant parts can be hashed in advance

On MCUs, this is valuable because it can reduce repeated computation and save CPU time.

### Unsuitable scenarios: any security-adversarial use

RFC 9923 is explicit: FNV is a non-cryptographic hash and is not suitable for scenarios that require collisions, preimages, or second preimages to be computationally difficult to construct.

So the following uses should be excluded immediately:

- security authentication
- tamper-resistant digests
- public hash tables exposed to adversarial input
- protocol designs requiring collision resistance
- using the hash value as a security token

If the input comes from an untrusted external entity and the attacker can intentionally craft data, then FNV-1a should not be used as a security mechanism.

### It should also not be treated as a CRC replacement

CRC is for transmission error detection.  
FNV-1a is for hash distribution and fast mapping.

They solve different problems and are not interchangeable.  
If the goal is error detection, start with CRC.  
If the goal is indexing or hash bucket distribution, then consider FNV.

---

## Pros and Cons Should Be Evaluated Separately

### Advantages

#### 1. Very small implementation footprint

The core loop of FNV-1a is short, has minimal dependencies, and is easy to port.  
In MCU engineering, that means it is easy to place into boot stages, bootloaders, tiny libraries, protocol-stack internals, or other code-size-sensitive modules.

#### 2. Easy to audit and verify

The algorithm steps are fixed and the state update is very direct.  
Compared with more structurally complex hash functions, FNV-1a is easier to inspect manually and easier to validate with test vectors.

#### 3. Naturally supports streaming input

The entire input does not have to be collected up front.  
That makes it very friendly to streams, chunked reads, and protocol parsing.

#### 4. Well suited to short strings and short keys

For configuration keys, command words, identifiers, and path fragments, FNV-1a is often already practical enough.

#### 5. The 64-bit version significantly enlarges the state space

When the number of objects grows, hash tables become larger, or identifiers live longer, the 64-bit version is much less prone to accidental collisions than the 32-bit version in ordinary engineering use.

### Disadvantages

#### 1. It is not a cryptographic hash

This is the most important limitation.  
Once the requirement involves security, FNV-1a should no longer be used as a core mechanism.

#### 2. Collision resistance is limited

It is sufficient for ordinary engineering mapping, but not for adversarial input.  
Maliciously crafted inputs on an exposed interface can degrade bucket distribution, worsen lookup performance, and even create denial-of-service risk.

#### 3. The 64-bit version is not always cost-effective on a 32-bit MCU

A larger state space does not automatically mean a better overall implementation cost.  
Whether 64-bit is worth it depends on object scale, collision risk, toolchain code generation quality, and requirements for throughput and code size.

---

## Finally, Look at 32-bit vs 64-bit Use in Practice

### Decide based on requirements first, then bit width

Bit width should not be chosen by first asking whether the platform is 32-bit or 64-bit. The better questions are:

- How large is the input population?
- How long must the hash value be kept?
- How high is the cost of a collision?
- Does the result need to be reused across modules, files, or versions?
- Is occasional collision-induced degradation acceptable?

If the number of objects is small, the hash is only used for temporary internal indexing, and the lifetime is short, then 32-bit is often sufficient.  
If there are many objects, the hash table is larger, the result is preserved long term, or the collision risk needs to be reduced further, then 64-bit is the safer choice.

### A 32-bit MCU does not mean 64-bit FNV-1a is impossible

RFC 9923 explicitly provides two implementation paths for 64-bit FNV:

- one for targets with 64-bit arithmetic support
- one for targets that have only 32-bit arithmetic

So “can a 32-bit MCU use 64-bit FNV-1a” is not an algorithm question at all. It is purely an implementation-cost question.

### When the toolchain support is good, use `uint64_t` directly

If the compiler can reliably generate high-quality 64-bit arithmetic code, then the direct `uint64_t` version is usually the best first choice:

- shortest code
- best readability
- easiest verification
- easiest alignment with the standard constants

### When 64-bit multiplication is expensive, decompose using the prime structure

In the “64-bit FNV for 32-bit arithmetic only” section of RFC 9923, the 64-bit prime is written in a decomposable form:

```c
/* 64-bit FNV_prime = 2^40 + 2^8 + 0xb3 */
#define FNV64primeX 0x01B3
#define FNV64shift  8
```

The idea behind this is very clear:

- keep a 64-bit state
- do not force the implementation into a single full-width 64-bit multiply
- instead, exploit the structure `2^40 + 2^8 + 0xB3` and turn the update into local multiplication, shifts, and carry propagation

This is the same category of idea as the shift-add expansion in `hash_32a.c`, except that the state is now split into multiple smaller segments instead of a single 32-bit word.

### A more practical engineering recommendation

#### Prefer the 32-bit version when:

- the hash is only used for local lookup
- the input set is limited
- speed and code size matter more
- the collision risk of 32-bit is acceptable

#### Prefer the 64-bit version when:

- the hash value must be kept or reused long term
- the number of objects is relatively large
- the cost of a collision is higher
- you want a larger state space without introducing a more complex algorithm

#### Benchmark before deciding when:

- the platform lacks efficient native 64-bit multiplication
- the compiler version is unstable or inconsistent
- both code size and throughput are sensitive
- you are comparing 32-bit FNV, 64-bit FNV, CRC, or other non-cryptographic hashes

In those cases, the most effective solution is not arguing, but measuring three things directly:

1. cycles per byte
2. code size
3. collision behavior on the real target input set

---

## Summary

When discussing FNV-1a 64-bit on a 32-bit MCU, the most reasonable order of understanding is:

1. Start from the principle: for each byte, XOR first, then multiply by a fixed prime.
2. Then look at the code: `hash_32a.c` already exposes this core update path very clearly.
3. Then look at usage: FNV-1a is suitable for fast indexing, mapping, and streaming hashing, but not for security use.
4. Only then compare bit widths: the real difference between 32-bit and 64-bit is state space and implementation cost, not the algorithmic steps.

Once organized this way, the discussion no longer gets stuck on the narrow question of “can a 32-bit MCU do 64-bit multiplication,” and instead returns to the three dimensions that actually determine selection: **hash principle, implementation method, and applicability boundary**.

---

## References

- `hash_32a.c`  
  `https://github.com/lcn2/fnv/blob/master/hash_32a.c`

- `RFC 9923 - The FNV Non-Cryptographic Hash Algorithm`  
  `https://www.rfc-editor.org/rfc/rfc9923.html`


add table-ID consistency check. #47

Description

FNV-1a 64-bit on 32-bit MCUs: Principles, Implementation, and Trade-offs

Core Conclusion First

Start with the Principle

FNV-1a does only two things: absorb input and diffuse state

Why it is “XOR first, then multiply by the prime”

Why FNV_Prime has this specific value

offset_basis is not decorative; it is the initial perturbation

Understanding the Code through hash_32a.c

Why explain a 64-bit algorithm using 32-bit source code

First look at an equivalent simplified core loop

What the first line is really doing: absorbing the current byte into the state

What the second step is really doing: diffusing the state

Why the source keeps both the direct multiply path and the expanded shift-add path

Moving Smoothly from 32-bit Code to a 64-bit Implementation

The 64-bit version does not change the algorithm

The most direct 64-bit reference implementation

A more practical engineering form: an incremental context interface

How to Judge Suitability by Use Case

Suitable scenarios: fast lookup, indexing, and mapping

Also well suited to streaming data

Unsuitable scenarios: any security-adversarial use

It should also not be treated as a CRC replacement

Pros and Cons Should Be Evaluated Separately

Advantages

1. Very small implementation footprint

2. Easy to audit and verify

3. Naturally supports streaming input

4. Well suited to short strings and short keys

5. The 64-bit version significantly enlarges the state space

Disadvantages

1. It is not a cryptographic hash

2. Collision resistance is limited

3. The 64-bit version is not always cost-effective on a 32-bit MCU

Finally, Look at 32-bit vs 64-bit Use in Practice

Decide based on requirements first, then bit width

A 32-bit MCU does not mean 64-bit FNV-1a is impossible

When the toolchain support is good, use uint64_t directly

When 64-bit multiplication is expensive, decompose using the prime structure

A more practical engineering recommendation

Prefer the 32-bit version when:

Prefer the 64-bit version when:

Benchmark before deciding when:

Summary

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why `FNV_Prime` has this specific value

`offset_basis` is not decorative; it is the initial perturbation

Understanding the Code through `hash_32a.c`

When the toolchain support is good, use `uint64_t` directly