symbolic: support Set[Seq[T]] (needed for message-passing protocols)

## Summary

The symbolic checker rejects `Set[Seq[T]]` types. This blocks symbolic checking of specs that model messages as sets of tuples — the natural encoding for message-passing protocols (Raft, Paxos, PBFT, etc.).

```
check error: unsupported construct for symbolic checking: Set[Seq[T]]
requires exponential encoding (tracks membership of every possible sequence).
hint: use --bfs to check with explicit-state BFS instead
```

## Why this is fundamentally hard

### Current set encoding: bitvector membership

Sets in the symbolic checker use an **exploded bitvector encoding** — one Z3 `Bool` variable per possible element in the domain:

```
Set[0..5]  →  6 Z3 Bools: [member_0, member_1, ..., member_5]
```

Set operations map cleanly:
- `x in S` → `member_x`
- `S union T` → element-wise OR
- `S intersect T` → element-wise AND
- `S \ T` → element-wise AND-NOT
- `{} (empty)` → all false

This is efficient because the domain is small and discrete. For `Set[0..N]` with N=100, you get 100 Z3 Bools — perfectly manageable.

### Why sequences break this

A `Seq[0..M]` with max length L has **Σᵢ₌₀ᴸ (M+1)ⁱ** possible values (including the empty sequence). Each distinct sequence needs its own membership Bool.

| M (element range) | L (max length) | Possible sequences | Z3 Bools needed |
|---|---|---|---|
| 2 | 3 | 40 | 40 |
| 3 | 3 | 85 | 85 |
| 5 | 5 | 9,331 | 9,331 |
| 10 | 5 | 161,051 | 161,051 |
| 10 | 9 | ~10B | infeasible |

For Raft messages (`[term, src, dest, lastLogTerm, lastLogIndex]` — 5 fields, each bounded by N/MaxVal/MaxLogLen), even small constants (N=2, MaxVal=2, MaxLogLen=3) produce thousands of possible tuples. At realistic sizes the bitvector is infeasible.

### Current sequence encoding

Sequences use an **exploded positional encoding**: 1 Z3 Int for length + L Z3 variables for elements:

```
Seq[Int] with max_len=3  →  4 Z3 vars: [len, elem_0, elem_1, elem_2]
```

This is efficient for individual sequence variables but doesn't compose with the bitvector set encoding — you'd need one such group per possible sequence value in the set, which is the exponential blowup.

## Possible approaches

### 1. Bounded enumeration (feasible for small domains)

When element type is bounded (0..M) and max_len is known, enumerate all possible sequences and assign each a unique integer ID. Then `Set[Seq[0..M]]` becomes `Set[0..K]` using the existing bitvector encoding.

**Encoding:** Sequence → integer via positional encoding. For `[a₀, a₁]` with elements in 0..M: `id = len_offset + a₀ * (M+1) + a₁`.

**Pros:** Reuses existing bitvector infrastructure. All set operations work unchanged.
**Cons:** Only works when total sequence count < threshold (e.g. 10,000). Needs encode/decode functions in the encoder. Doesn't help for realistic protocol message spaces.

### 2. Z3 native Array(Seq, Bool) encoding

Z3 has native `Array` sort: `Array(KeySort, ValueSort)`. A set can be encoded as `Array(SeqSort, Bool)` where `Select(arr, seq) == true` means membership.

```
Set[Seq[Int]]  →  Z3 Array(Seq(Int), Bool)

x in S         →  Select(S, x)
S union T      →  MapOr(S, T)  (element-wise OR via Z3 map)
S = {}         →  ConstArray(Seq(Int), false)
S ∪ {x}        →  Store(S, x, true)
```

**Pros:** Handles unbounded domains. No exponential blowup. Elegant encoding.
**Cons:** Z3's reasoning about `Array(Seq, Bool)` is incomplete — the solver may return `Unknown` or diverge. Inductive checking (the primary symbolic use case) may hit undecidability. Performance unpredictable.

### 3. Quantifier-based encoding

Encode set membership as a Z3 predicate: `member(S, x) ↔ (x = a ∨ x = b ∨ ...)` where `a, b, ...` are the elements added to S. Union becomes disjunction of membership predicates.

**Pros:** Works for any element type Z3 can compare.
**Cons:** Formula size grows with number of set operations in the spec. Quantifier reasoning in Z3 is a known bottleneck. May not terminate for complex properties.

### 4. Workaround: Dict[Int, Seq[T]] with counter (current recommendation)

Replace `Set[Seq[T]]` with `Dict[Int, Seq[T]]` keyed by a monotonic counter. Membership test becomes `any i in 0..counter: msgs[i] == target`.

**Pros:** Works today with existing encoder. No symbolic changes needed.
**Cons:** Changes the model semantics (ordered/indexed vs set-based). Counter variable adds state dimensions. Membership test is O(counter) in the formula.

### 5. Keep --bfs fallback

The `--bfs` explicit-state backend handles `Set[Seq[T]]` natively with no limitations. For bounded model checking (which is the primary use case for protocol verification), BFS is already the right tool.

**Pros:** Already works. No changes needed.
**Cons:** Loses symbolic mode benefits (unbounded reasoning, inductive proofs).

## Recommendation

Approach 1 (bounded enumeration) is the most practical incremental improvement — it extends the existing architecture and handles the cases where symbolic checking is most useful (small parameter spaces). For large protocol specs, BFS remains the right tool regardless.

Approach 2 (Z3 native arrays) is the theoretically correct solution but carries significant risk of solver incompleteness. Worth investigating as a research direction but not as a reliability-critical feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

symbolic: support Set[Seq[T]] (needed for message-passing protocols) #35

Summary

Why this is fundamentally hard

Current set encoding: bitvector membership

Why sequences break this

Current sequence encoding

Possible approaches

1. Bounded enumeration (feasible for small domains)

2. Z3 native Array(Seq, Bool) encoding

3. Quantifier-based encoding

4. Workaround: Dict[Int, Seq[T]] with counter (current recommendation)

5. Keep --bfs fallback

Recommendation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

symbolic: support Set[Seq[T]] (needed for message-passing protocols) #35

Description

Summary

Why this is fundamentally hard

Current set encoding: bitvector membership

Why sequences break this

Current sequence encoding

Possible approaches

1. Bounded enumeration (feasible for small domains)

2. Z3 native Array(Seq, Bool) encoding

3. Quantifier-based encoding

4. Workaround: Dict[Int, Seq[T]] with counter (current recommendation)

5. Keep --bfs fallback

Recommendation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions