Script for translating PO/TS/RESX/STRINGS/TXT localization files using Google Gemini API.
Install dependencies:
pip install -r requirements.txt
Obtain a Google API key (for example from AI Studio), then set:
set GOOGLE_API_KEY=your_google_api_key
python process.py your_file.po
python process.py your_file.ts
python process.py your_file.resx
python process.py your_file.strings
python process.py your_file.txt
Output files are written as *.ai-translated.po, *.ai-translated.ts, *.ai-translated.resx, *.ai-translated.strings, or *.ai-translated.txt.
Set target language (default is kk):
python process.py your_file.po --target-lang fr
python process.py your_file.po --target-lang fr_CA
python process.py your_file.po --thinking-level medium
Force re-translation of all translatable messages:
python process.py your_file.po --retranslate-all
Default processing behavior:
- translates unfinished messages (
untranslated+fuzzy/unfinished) - skips already translated messages unless
--retranslate-allis used
By default, vocabulary and project rules are auto-detected from target language under data/:
data/<target-lang>/vocab.txtdata/<target-lang>/rules.md
Recommended layout:
data/
kk/
vocab.txt
rules.md
fr/
vocab.txt
rules.md
fr_CA/
vocab.txt
rules.md
Locale fallback is supported:
- for
--target-lang fr_CA, the script first triesdata/fr_CA/vocab.txt/data/fr_CA/rules.md - if not found, it falls back to
data/fr/vocab.txt/data/fr/rules.md
Legacy flat naming is still accepted as a fallback:
vocab-<target-lang>.txtrules-<target-lang>.md
Override them per run:
python process.py your_file.po --vocab custom-vocab.txt --rules custom-rules.md
--vocab also accepts a glossary .po file. Only entries that are actually translated
(so untranslated, fuzzy, and obsolete entries are ignored) are converted to vocabulary pairs
and injected into the translation prompt:
python process.py your_file.po --vocab approved-glossary.po
Quick inline rule override:
python process.py your_file.po --rules-str "Use polite formal tone for settings labels."
--rules-str is merged with file-based rules when both are present.
Startup output prints both:
Vocabulary source(file:<path>ornone)Rules source(file:<path>,inline:--rules-str, combined, ornone)Thinking level(minimal,low,medium,high, or provider default)
Run a terminology discovery pass that builds a translated glossary (msgid=term, msgstr=translation) as PO:
python extract_terms.py your_file.po
Optional controls:
python extract_terms.py your_file.po --out glossary.po --batch-size 200 --parallel-requests 4
python extract_terms.py your_file.po --thinking-level high
Defaults:
- mode:
--mode all(extract full glossary) - output format:
--out-format po - output path:
<input>.glossary.po
The generated glossary .po can be reviewed and then reused directly during translation:
python process.py your_file.po --vocab your_file.glossary.po
When you run missing-term extraction with --vocab and --out-format po, the output PO is merged automatically:
python extract_terms.py your_file.po --mode missing --vocab data/kk/vocab.txt --out-format po
That PO contains:
- translated entries imported from the supplied vocabulary
- newly extracted missing terms as
fuzzyentries for review
So the resulting file can be passed straight back into translation:
python process.py your_file.po --vocab your_file.missing-terms.po
To get previous behavior (missing terms only, JSON output):
python extract_terms.py your_file.po --mode missing --out-format json --vocab data/kk/vocab.txt
Run a QA pass on an already translated .po file. The checker sends structured source / translation
pairs to Gemini and merges model findings with deterministic local checks for placeholders, tags,
accelerators, plural slots, and approved vocabulary usage:
python check_translations.py your_file.po
Default output path:
your_file.translation-check.json
Optional controls:
python check_translations.py your_file.po --probe 25
python check_translations.py your_file.po --out report.json --batch-size 100 --parallel-requests 4
python check_translations.py your_file.po --vocab approved-glossary.po --rules custom-rules.md
python check_translations.py your_file.po --rules-str "Keep menu labels short and imperative."
python check_translations.py your_file.po --thinking-level low
--probe and --num-messages are aliases. They limit how many translated messages are sent to Gemini,
which is useful for prompt testing and quick validation runs.
Defaults follow the same resource lookup as the translation script:
data/<target-lang>/vocab.txtdata/<target-lang>/rules.md
--vocab also accepts a glossary .po file, so you can point the checker at a reviewed glossary PO
directly.
Run an instruction-driven revision pass on an already translated file. The script reviews each existing source/translation pair, keeps entries that already satisfy the instruction, and updates only the entries that actually need a change.
For formats that keep source and translation in the same file (.po, .ts):
python revise_translations.py your_file.po --instruction "Change the translation of Save to Store"
python revise_translations.py your_file.ts --instruction "Use a shorter translation for Close"
For formats where the translated file no longer retains the original source text (.strings, .resx, .txt),
pass both the translated file and the original source file:
python revise_translations.py translated.ai-translated.strings --source-file original.strings --instruction "Change the translation of viewer to browser only where needed"
python revise_translations.py translated.ai-translated.resx --source-file original.resx --instruction "Replace toolbar with command bar when the source says toolbar"
python revise_translations.py translated.txt --source-file source.txt --instruction "Use formal tone for the word Exit"
Behavior:
- default output path:
<input>.revised.<ext> - default output path:
<input>-revised.<ext> - original file is left untouched unless
--in-placeis used --dry-runpreviews the review without writing a file- changed AI-reviewed entries are marked as
fuzzy/unfinishedfor human review
Useful controls:
python revise_translations.py your_file.po --instruction "Replace preferences with settings" --dry-run --probe 50
python revise_translations.py your_file.po --instruction "Use the term archive instead of package" --out revised.po
python revise_translations.py translated.ai-translated.strings --source-file original.strings --instruction "Shorten app names where possible" --batch-size 80 --parallel-requests 4
For .strings files, this project uses the following convention:
- commented key/value entries (
/* "key" = "value"; */) are treated as untranslated source entries - uncommented entries (
"key" = "value";) are treated as already translated - leading
.stringscomments are passed to the model as contextual translator notes
Translated output for .strings preserves file encoding (including UTF BOM when present) and writes translated entries as uncommented lines.
Escaped sequences are preserved in .strings output. The pipeline normalizes model-returned literal escapes to the source style for common control escapes (for example \n, \t, \r, \a, \b, \f, \v) to avoid accidental double-escaping like \\n or \\a.
For .txt files:
- each line is treated as one independent message
- blank/whitespace-only lines are preserved and skipped
- translated output preserves original line order and line breaks
The translation pipeline now normalizes all formats (.po, .ts, .resx, .strings, .txt) into a shared internal entry model with common fields (message, context, note, status, flags, plural data, and string type), then syncs updates back to each native file format on save.
Status values are normalized as:
untranslated: no translation content yetfuzzy: review-required translations (for example POfuzzyor TSunfinished)translated: translated and not fuzzyskipped: non-localizable entries (for example typed/binary.resxresources)
If Poedit complains after you change placeholder order, check the format flag on that entry:
#, c-format: reordering is allowed with positional placeholders, e.g.%2$s,%1$s.#, python-format: positional%2$sis not valid; you cannot safely reorder plain%splaceholders.
For python-format entries, reordering requires source-side named placeholders, for example:
msgid "From %(src)s to %(dst)s"
msgstr "%(dst)s konumuna %(src)s"Always preserve the same placeholder set and types (%s, %d, names) between msgid and msgstr.
python -m unittest discover -s tests -p "test_*.py" -v