We are building a highly scalable, Google Translation API-compatible service. This service provides intelligent translation for multiple language pairs, with built-in quality estimation, database caching, brand integration, and an offline reviewer system.
- Intelligent Pre-processing: Automatically skips translation for emojis, links, numbers, and HTML tags.
- Brand Name Protection: Preserves your exact brand name across translations by using variable placeholder injection (
{{BRAND_NAME}}). - Language Detection: Uses
langdetectto verify source languages. - Complexity Routing: Computes text readability and complexity. Simple text uses fast local models; complex text leverages LLM integration for contextual and nuanced translations.
- Machine Translation: Powered by Hugging Face
MarianMT(Helsinki-NLP) models. - LLM Translation Integration: Automatically routes highly complex or context-heavy JSON structures to the LLM backend.
- Quality Estimation: Scores every translation natively using cross-lingual semantic similarity (
sentence-transformers). - Caching Layer: Exact translations (scoring >= 85%) are automatically cached and retrieved from PostgreSQL to save compute.
- S3 Bucket Processing: Asynchronously translates entire directories of JSON documents stored in AWS S3 or MinIO.
- Offline Reviewer Module: Automatically scans the translation cache database in the background to fix translations with high complexity and low trust scores using the LLM backend.
Once we receive an input text via the API, it passes through the following strict, step-by-step pipeline:
Before invoking any heavy translation models, the input is validated and verified:
- Translatability Check: Checks if the input is actually translatable (Links, Emojis, Numbers, Currency Symbols, HTML tags, etc.).
- Source/Target Compatibility: Automatically detects the input language using
langdetectto verify it matches the requested source language. - Language Supported: Validates that the requested source and target languages are within our supported pairs.
- Cache Check: Queries the PostgreSQL database. If this exact text was translated before with a high trust score (>=85%), we return the cached result immediately.
- Brand Context: If a
brand_uuidis supplied, the pipeline extracts the brand's industry, tone, target audience, and glossary. - Brand Protection: The brand's exact name is wrapped in a
{{BRAND_NAME}}placeholder so the translation engine avoids localizing the brand name.
- If the text is valid and not cached, we calculate a complexity score from 1-100. This is done using a mix of NLP heuristics (token length, vocabulary diversity, and Flesch reading ease).
- Simple Inputs (Score < 50): The text is passed to our local, fast Machine Translation model (
MarianMT/Helsinki-NLP). - Complex Inputs (Score >= 50): The text is routed through an LLM sequence using structured generation prompts that preserve structural integrity (e.g. JSON Arrays) and follow your specific brand guidelines.
- Quality Scoring: The generated translation is scored alongside the source text using a reference-free, cross-lingual semantic similarity model. This gives us a confidence/quality score between 0.0 and 1.0.
- Database Storage: The original text, translated text, detected languages, translation time, and the quality score are all stored in the database.
- An asynchronous backend process (
/api/v1/review/start) regularly audits the database. Any cached translation that falls below an 85% trust score while having a high complexity score is sent back through an LLMfix_translationprompt to automatically improve the cache over time.
You can control how the heavy machine learning models are loaded via the .env file:
IS_DYNAMIC_LOADING=true(Default): Models are loaded into RAM on-demand the first time a specific language pair is requested. This allows the app to start up instantly.IS_DYNAMIC_LOADING=false: The application will download and load all supported language models during startup. (Note: This can take several minutes and gigabytes of RAM during the first boot, but ensures zero latency on the first request).