Seshat — Egyptian goddess of writing, knowledge, and measurement.
An honest computational study of Linear A — a measurement, not a decipherment. Linear A is undeciphered and, from the surviving corpus alone, probably undecipherable: ~1.4k mostly 1–2-sign inscriptions, an unknown language, and no bilingual key. No method — AI, quantum, or human — extracts information that isn't there. So SESHAT asks the answerable question instead:
What can be computationally inferred about Linear A, and where is the information-theoretic limit — validated on Linear B, where we know the answer?
Every method is calibrated on Linear B (deciphered Mycenaean Greek). If a technique can't recover what we already know about Linear B, we don't trust it on Linear A. A rigorous negative result is itself the headline.
| Phase | Question | Result |
|---|---|---|
| 1 — Information limit | How much structure does the corpus hold? | Linear A H(next|prev) ≈ 3.7 bits, redundancy 31% vs Linear B 63% — far sparser, quantifying why it resists decipherment |
| 2 — GNN sign embeddings | Does pure co-occurrence encode phonology? | On Linear B, recovers vowels (≈36% vs 24.5% baseline, +8–11 pts across seeds, null-controlled z≈4) but not consonants (null) |
| 3 — Linear B → Linear A transfer | Do shared AB signs carry their Linear B vowels in Linear A? | No. The transfer lift sits inside the null (|z|<2) → Linear A does not distributionally mirror Linear B — consistent with a different (non-Greek) language |
One method, two questions, one honest picture (shared axis). Left: it detects Linear B's own vowel structure — the real lift sits far outside the degree-preserving null. Right: it finds no A→B transfer — the real lift is buried in the null. We measure where the signal is and where it isn't.
Full methods + results: docs/ADR-0002.
The long-term vision (a SUBSTRATE-style lab for ancient scripts):
docs/ADR-0003.
| component | language | role |
|---|---|---|
seshat-analysis/ |
Python (PyTorch) | info-limit, GNN sign embeddings, Linear B validation, null-model controls, transfer probe, figures |
seshat-core/ |
Rust | corpus parser, sign inventory, bigram matrices — the data foundation |
seshat-anneal/ |
C++/CUDA | QUBO / simulated + quantum-inspired annealing engine — a future refinement layer (Phase 4), validated on synthetic data only, not a Linear A decipherment claim |
seshat-viz/ |
Rust/egui | interactive sign tables and heatmaps |
Linear A corpus (Younger DB) ──► Phase 1: information limit (entropy, Zipf, hapax)
sign co-occurrence graph ──► Phase 2: GNN embeddings (skip-gram, message-passing)
──► validate on Linear B (vowel/consonant probe + null control)
──► Phase 3: transfer probe to shared signs (honest negative)
[Phase 4 — future] ──► annealing refinement, seeded by the above
cd seshat-analysis && pip install -e .
python -m seshat_analysis.gnn_validate --data ../data # Phase 2 (Linear B recovery)
python -m seshat_analysis.gnn_nullmodel --data ../data # null-model control
python -m seshat_analysis.linear_a_transfer --data ../data # Phase 3 (transfer — the negative)
python -m seshat_analysis.phase23_figure --data ../data --recompute # the figures
pytest # 11 testsCPU is sufficient (the graphs are ~50 signs); everything is seeded and deterministic.
- Linear A: John Younger's Linear A Database (Univ. of Kansas) —
data/corpus/linear_a/ - Linear B: attested words (Ventris & Chadwick 1953; Duhoux & Morpurgo Davies) —
data/corpus/linear_b/ - Linear B sign values: the standard Ventris grid, read from the authoritative Unicode Linear B Syllabary character names — not hand-typed; the exact readings used are saved in
data/linear_b_grid_used.json - Comparanda: Luwian, Hurrian —
data/corpus/
Linear A is not deciphered here; no phonetic value is asserted for any undeciphered sign. Deciphered scripts (Linear B) are ground truth; undeciphered Linear A is treated as measurement, never announcement. Methods are trusted only after they recover known structure on Linear B and survive a null-model control.
SESHAT is the Aegean module of a planned computational-epigraphy lab — a
SUBSTRATE-style set of per-script tools: deciphered scripts (Akkadian, Sumerian,
Egyptian…) get real tooling (OCR, transliteration, search); undeciphered ones get
honest limit-analysis, like here. That platform is a documented north star
(docs/ADR-0003), deliberately not yet
started — one honest module at a time.
Antonio Zambudio Rodriguez
