Skip to content

symbolic: support Set[Seq[T]] (needed for message-passing protocols) #35

@danwt

Description

@danwt

Summary

The symbolic checker rejects Set[Seq[T]] types. This blocks symbolic checking of specs that model messages as sets of tuples — the natural encoding for message-passing protocols (Raft, Paxos, PBFT, etc.).

check error: unsupported construct for symbolic checking: Set[Seq[T]]
requires exponential encoding (tracks membership of every possible sequence).
hint: use --bfs to check with explicit-state BFS instead

Why this is fundamentally hard

Current set encoding: bitvector membership

Sets in the symbolic checker use an exploded bitvector encoding — one Z3 Bool variable per possible element in the domain:

Set[0..5]  →  6 Z3 Bools: [member_0, member_1, ..., member_5]

Set operations map cleanly:

  • x in Smember_x
  • S union T → element-wise OR
  • S intersect T → element-wise AND
  • S \ T → element-wise AND-NOT
  • {} (empty) → all false

This is efficient because the domain is small and discrete. For Set[0..N] with N=100, you get 100 Z3 Bools — perfectly manageable.

Why sequences break this

A Seq[0..M] with max length L has Σᵢ₌₀ᴸ (M+1)ⁱ possible values (including the empty sequence). Each distinct sequence needs its own membership Bool.

M (element range) L (max length) Possible sequences Z3 Bools needed
2 3 40 40
3 3 85 85
5 5 9,331 9,331
10 5 161,051 161,051
10 9 ~10B infeasible

For Raft messages ([term, src, dest, lastLogTerm, lastLogIndex] — 5 fields, each bounded by N/MaxVal/MaxLogLen), even small constants (N=2, MaxVal=2, MaxLogLen=3) produce thousands of possible tuples. At realistic sizes the bitvector is infeasible.

Current sequence encoding

Sequences use an exploded positional encoding: 1 Z3 Int for length + L Z3 variables for elements:

Seq[Int] with max_len=3  →  4 Z3 vars: [len, elem_0, elem_1, elem_2]

This is efficient for individual sequence variables but doesn't compose with the bitvector set encoding — you'd need one such group per possible sequence value in the set, which is the exponential blowup.

Possible approaches

1. Bounded enumeration (feasible for small domains)

When element type is bounded (0..M) and max_len is known, enumerate all possible sequences and assign each a unique integer ID. Then Set[Seq[0..M]] becomes Set[0..K] using the existing bitvector encoding.

Encoding: Sequence → integer via positional encoding. For [a₀, a₁] with elements in 0..M: id = len_offset + a₀ * (M+1) + a₁.

Pros: Reuses existing bitvector infrastructure. All set operations work unchanged.
Cons: Only works when total sequence count < threshold (e.g. 10,000). Needs encode/decode functions in the encoder. Doesn't help for realistic protocol message spaces.

2. Z3 native Array(Seq, Bool) encoding

Z3 has native Array sort: Array(KeySort, ValueSort). A set can be encoded as Array(SeqSort, Bool) where Select(arr, seq) == true means membership.

Set[Seq[Int]]  →  Z3 Array(Seq(Int), Bool)

x in S         →  Select(S, x)
S union T      →  MapOr(S, T)  (element-wise OR via Z3 map)
S = {}         →  ConstArray(Seq(Int), false)
S ∪ {x}        →  Store(S, x, true)

Pros: Handles unbounded domains. No exponential blowup. Elegant encoding.
Cons: Z3's reasoning about Array(Seq, Bool) is incomplete — the solver may return Unknown or diverge. Inductive checking (the primary symbolic use case) may hit undecidability. Performance unpredictable.

3. Quantifier-based encoding

Encode set membership as a Z3 predicate: member(S, x) ↔ (x = a ∨ x = b ∨ ...) where a, b, ... are the elements added to S. Union becomes disjunction of membership predicates.

Pros: Works for any element type Z3 can compare.
Cons: Formula size grows with number of set operations in the spec. Quantifier reasoning in Z3 is a known bottleneck. May not terminate for complex properties.

4. Workaround: Dict[Int, Seq[T]] with counter (current recommendation)

Replace Set[Seq[T]] with Dict[Int, Seq[T]] keyed by a monotonic counter. Membership test becomes any i in 0..counter: msgs[i] == target.

Pros: Works today with existing encoder. No symbolic changes needed.
Cons: Changes the model semantics (ordered/indexed vs set-based). Counter variable adds state dimensions. Membership test is O(counter) in the formula.

5. Keep --bfs fallback

The --bfs explicit-state backend handles Set[Seq[T]] natively with no limitations. For bounded model checking (which is the primary use case for protocol verification), BFS is already the right tool.

Pros: Already works. No changes needed.
Cons: Loses symbolic mode benefits (unbounded reasoning, inductive proofs).

Recommendation

Approach 1 (bounded enumeration) is the most practical incremental improvement — it extends the existing architecture and handles the cases where symbolic checking is most useful (small parameter spaces). For large protocol specs, BFS remains the right tool regardless.

Approach 2 (Z3 native arrays) is the theoretically correct solution but carries significant risk of solver incompleteness. Worth investigating as a research direction but not as a reliability-critical feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions