Skip to content

QuantumDrizzy/SESHAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SESHAT — Measuring the Information Limit of Linear A

CI Rust C++/CUDA Python

Seshat — Egyptian goddess of writing, knowledge, and measurement.

An honest computational study of Linear A — a measurement, not a decipherment. Linear A is undeciphered and, from the surviving corpus alone, probably undecipherable: ~1.4k mostly 1–2-sign inscriptions, an unknown language, and no bilingual key. No method — AI, quantum, or human — extracts information that isn't there. So SESHAT asks the answerable question instead:

What can be computationally inferred about Linear A, and where is the information-theoretic limit — validated on Linear B, where we know the answer?

Every method is calibrated on Linear B (deciphered Mycenaean Greek). If a technique can't recover what we already know about Linear B, we don't trust it on Linear A. A rigorous negative result is itself the headline.


Results (the honest headline)

Phase Question Result
1 — Information limit How much structure does the corpus hold? Linear A H(next|prev) ≈ 3.7 bits, redundancy 31% vs Linear B 63% — far sparser, quantifying why it resists decipherment
2 — GNN sign embeddings Does pure co-occurrence encode phonology? On Linear B, recovers vowels (≈36% vs 24.5% baseline, +8–11 pts across seeds, null-controlled z≈4) but not consonants (null)
3 — Linear B → Linear A transfer Do shared AB signs carry their Linear B vowels in Linear A? No. The transfer lift sits inside the null (|z|<2) → Linear A does not distributionally mirror Linear B — consistent with a different (non-Greek) language

Signal vs no-signal

One method, two questions, one honest picture (shared axis). Left: it detects Linear B's own vowel structure — the real lift sits far outside the degree-preserving null. Right: it finds no A→B transfer — the real lift is buried in the null. We measure where the signal is and where it isn't.

Full methods + results: docs/ADR-0002. The long-term vision (a SUBSTRATE-style lab for ancient scripts): docs/ADR-0003.


How it works (each language where it fits)

component language role
seshat-analysis/ Python (PyTorch) info-limit, GNN sign embeddings, Linear B validation, null-model controls, transfer probe, figures
seshat-core/ Rust corpus parser, sign inventory, bigram matrices — the data foundation
seshat-anneal/ C++/CUDA QUBO / simulated + quantum-inspired annealing engine — a future refinement layer (Phase 4), validated on synthetic data only, not a Linear A decipherment claim
seshat-viz/ Rust/egui interactive sign tables and heatmaps
Linear A corpus (Younger DB)  ──►  Phase 1: information limit (entropy, Zipf, hapax)
sign co-occurrence graph      ──►  Phase 2: GNN embeddings (skip-gram, message-passing)
                              ──►           validate on Linear B (vowel/consonant probe + null control)
                              ──►  Phase 3: transfer probe to shared signs (honest negative)
[Phase 4 — future]            ──►  annealing refinement, seeded by the above

Reproduce

cd seshat-analysis && pip install -e .
python -m seshat_analysis.gnn_validate     --data ../data   # Phase 2 (Linear B recovery)
python -m seshat_analysis.gnn_nullmodel    --data ../data   # null-model control
python -m seshat_analysis.linear_a_transfer --data ../data  # Phase 3 (transfer — the negative)
python -m seshat_analysis.phase23_figure   --data ../data --recompute   # the figures
pytest                                                      # 11 tests

CPU is sufficient (the graphs are ~50 signs); everything is seeded and deterministic.

Data & provenance

  • Linear A: John Younger's Linear A Database (Univ. of Kansas) — data/corpus/linear_a/
  • Linear B: attested words (Ventris & Chadwick 1953; Duhoux & Morpurgo Davies) — data/corpus/linear_b/
  • Linear B sign values: the standard Ventris grid, read from the authoritative Unicode Linear B Syllabary character names — not hand-typed; the exact readings used are saved in data/linear_b_grid_used.json
  • Comparanda: Luwian, Hurrian — data/corpus/

Honesty contract

Linear A is not deciphered here; no phonetic value is asserted for any undeciphered sign. Deciphered scripts (Linear B) are ground truth; undeciphered Linear A is treated as measurement, never announcement. Methods are trusted only after they recover known structure on Linear B and survive a null-model control.

Bigger picture

SESHAT is the Aegean module of a planned computational-epigraphy lab — a SUBSTRATE-style set of per-script tools: deciphered scripts (Akkadian, Sumerian, Egyptian…) get real tooling (OCR, transliteration, search); undeciphered ones get honest limit-analysis, like here. That platform is a documented north star (docs/ADR-0003), deliberately not yet started — one honest module at a time.

Author

Antonio Zambudio Rodriguez

About

Combinatorial-optimization engine (QUBO + simulated annealing) applied to the undeciphered Linear A script — anchor-constraint solving + cross-linguistic n-gram modeling.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors