This repository features a comprehensive pipeline for creating a digital philological edition of Epictetus and Marcus Aurelius. It includes Python scripts for API data acquisition, Gemini 3 Pro-powered translation, and CLTK-based lexical tagging. The final output is an interactive, web-based library with dynamic lemma-highlighting readers.
The goal of this project is to create a technically rigorous, digital-first edition of the core Stoic corpus. By integrating Large Language Models (LLMs) with computational linguistics, we move beyond static translations into an interactive database where English text is live-mapped to Greek technical lemmas.
- Structured Acquisition: Extracting standardized
perseus-grc2editions via the Scaife Viewer CTS API to ensure scholarly consistency. - Context-Aware LLM Translation: Utilizing Gemini 3 Pro with a sliding context window (150 characters of leading/trailing Greek) to maintain narrative and thematic continuity.
- Philological Lemmatization: Using the Classical Language Toolkit (CLTK) to identify the root dictionary form (lemma) of technical terms, ensuring that all inflected forms of a word are captured in thematic analysis.
- Automated Sanitization: Implementing a regex-based cleaning layer to eliminate structural artifacts and "token leaks" common in high-reasoning LLM outputs.
To recapitulate the project, the following scripts must be executed in the order listed below.
fetch_enchiridion.py: Requests the Greek text of the Enchiridion from the Scaife API and saves it asenchiridion_greek_source.json.fetch_meditations.py: Requests the Greek text of the Meditations and saves it asmeditations_greek_source.json.fetch_discourses.py: Requests the Greek text of the Discourses from the Scaife API and saves it asdiscourses_greek_source.json.
translate_enchiridion.py: Generates an instructional, imperative translation and Stoic commentary.translate_meditations.py: Generates a reflective, personal translation and commentary using high-level reasoning (ThinkingConfig).translate_discourses.py: Generates a conversational, dialectic translation and pedagogical commentary.
enchiridion_lexical_tagging.py: Applies the Master Stoic Lexicon to the Enchiridion using CLTK lemmatization.meditations_lexical_tagging.py: Applies the Master Stoic Lexicon to the Meditations, with specific logic to handle word collisions (e.g., horme vs. aphorme).discourses_lexical_tagging.py: Applies the Master Stoic Lexicon to the Discourses, optimized for the larger volume of text and frequent cross-references.
clean_enchiridion.py: Strips Markdown artifacts and repairs JSON leaks in the Enchiridion database.clean_meditations.py: Cleans the Meditations database and standardizes chapter reference formatting (e.g., "Book 1.1").clean_discourses.py: Cleans the Discourses database and standardizes the multi-book reference structure (e.g., "1.1", "4.1").
Install the required Python environment:
pip install requests google-genai cltk
Note: CLTK will download Ancient Greek models on the first run of the tagging scripts.
You must provide a Google API Key to run the translation scripts:
export GOOGLE_API_KEY='your_api_key_here'
Run the scripts in this specific order to ensure data integrity:
- Run all
fetch_*.pyscripts. - Run all
translate_*.pyscripts. - Run all
*_lexical_tagging.pyscripts. - Run all
clean_*.pyscripts.
The project results are viewed through a web browser using the dedicated HTML readers. Because the HTML files use the fetch() command to load JSON data, you must host them on a local server.
Steps:
- Open a terminal in the project directory.
- Start the Python server:
python3 -m http.server 9000 - Open your browser to:
http://localhost:9000 - Select
index.html,Meditations_Reader.html,Enchiridion_Reader.html, orDiscourses_Reader.htmlto begin.