Skip to content

Complete issue 131 Translate punctuation, overrides, and article context#132

Merged
konard merged 7 commits into
mainfrom
issue-131-84c06d630fcf
Jun 1, 2026
Merged

Complete issue 131 Translate punctuation, overrides, and article context#132
konard merged 7 commits into
mainfrom
issue-131-84c06d630fcf

Conversation

@konard
Copy link
Copy Markdown
Contributor

@konard konard commented May 31, 2026

Issue #131 - Translate punctuation, overrides, and article context

Fixes the Translate-page regression for:

California (/ˌkælɪˈfɔːrniə/) is a state in the Western United States that lies on the Pacific Coast.

Root cause

The issue combined several translation gaps:

  • punctuation/brackets and parenthesized IPA could enter lexical linking instead of remaining source text;
  • the n-gram gate split source-backed phrasal verbs such as lies on;
  • the Translate flow lacked a local source-backed override layer for missing or low-quality upstream data;
  • Russian target data and grammar forms were missing for California, state, Western United States, Pacific Coast, relative that, and geographic lie on;
  • linked article context was collected in the result but had no experimental recursive translation UI;
  • the JS semantic-lexicon changes needed the Rust mirror to stay in parity.

Fix

  • Preserves parenthesized pronunciation spans and excludes brackets from lexical phrase links.
  • Adds a default virtual-source-overrides tier with Links Notation rendering and semantic concept injection.
  • Adds source-backed override data for the issue sentence, including lie on, that, California, state, Western United States, and Pacific Coast.
  • Applies Russian naturalization rules for locative region case, relative that, prepositional objects, and sentence-level then punctuation.
  • Lets Translate resolve local semantic/virtual target labels before fetching target-language source data.
  • Adds experimental linked article context translation with summary fetching, bounded translation input, and cache keys that include source URL, target language, section, and revision.
  • Adds Translate UI controls for the virtual source and article-context experiment, plus linked article action buttons.
  • Mirrors the semantic-lexicon override merge, normalized glossary keys, and grammar-form lookup helpers in Rust.
  • Updates the issue Issue on Translate page: California (/ˌkælɪˈfɔːrniə/) is a state in the Wes... #131 case study, generated formalize docs, changeset, and browser evidence.

Reproduction / verification

The reported sentence now translates to:

Калифорния (/ˌkælɪˈfɔːrniə/) это штат на западе США, который расположен на Тихоокеанском побережье.

The result has no unresolved translation questions. The regression test also covers parenthesis exclusion, lying on / lies on source-backed phrasal verb matching, virtual-source override link rendering, article-context caching, and Rust-side semantic override parity.

UI Evidence

Translate UI article context

Tests

  • node --test js/tests/integration/issue-21-context-detection.test.js js/tests/integration/issue-56.test.js js/tests/integration/issue-96-paragraph-translation.test.js js/tests/integration/issue-131.test.js
  • npm test
  • npm run check
  • cargo test
  • node scripts/check-js-rust-parity.mjs --base origin/main

Closes #131.

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #131
@konard konard self-assigned this May 31, 2026
@konard konard changed the title [WIP] Issue on Translate page: California (/ˌkælɪˈfɔːrniə/) is a state in the Wes... Fix Translate punctuation and phrasal verbs for issue #131 Jun 1, 2026
@konard konard marked this pull request as ready for review June 1, 2026 00:20
@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

Working session summary

Implemented, pushed, and marked PR #132 ready for review:

#132

What changed:

  • Added shared punctuation-aware tokenization in text-tokenization.js.
  • Preserved parenthesized IPA pronunciation as source text during translation.
  • Fixed phrasal-verb handling so lies on stays one phrase.
  • Added semantic lexicon entries for geographic lie on and Pacific Coast / Q430265.
  • Added regression coverage in issue-131.test.js.
  • Added the requested case study under docs/case-studies/issue-131.

Verification:

  • node --test js/tests/integration/issue-131.test.js passed.
  • npm test passed: unit 63, integration 299 with 10 skipped, e2e 82.
  • npm run check passed.
  • Fresh GitHub Actions on head 463817ed5db5c579141ab86c2c8e0f0fd41d609e passed: JS Checks, Rust Checks, and Broken Link Checker.

Research sources used in the case study include Oxford lie_1, Wikidata Q430265, Wikidata data access docs, and the Wikibase API docs. The working tree is clean.


This summary was automatically extracted from the AI working session output.

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Model: GPT-5.5
  • Provider: OpenAI
  • Public pricing estimate: $31.099819

📊 Context and tokens usage:

  • 754.6K / 1.1M (72%) input tokens, 72.1K / 128K (56%) output tokens

Total: (754.6K + 20.3M cached) input tokens, 72.1K output tokens, $31.099819 cost

🤖 Models used:

  • Tool: OpenAI Codex
  • Requested: gpt-5.5
  • Model: GPT-5.5 (gpt-5.5)

📎 Log file uploaded as Repository (54837KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

## Planned: virtual links and source overrides

The regression is fixed with direct lexicon entries, but the issue's broader
request is a reusable override layer. A good next step is a structured
`virtual-source-overrides` registry with entries like:

\```json
{
  "id": "lex:en:lie_on",
  "kind": "lexical-sense",
  "sourceUrl": "https://www.oxfordlearnersdictionaries.com/definition/english/lie_1",
  "sourceStatus": "external-source",
  "upstreamTarget": "wiktionary",
  "labels": { "en": ["lie on", "lies on"], "ru": ["расположен на"] }
}
\```

The formalizer can then expose a virtual link when external sources are missing
data, and the UI can show whether the entry is local-only, source-backed, or
ready to contribute upstream.

## Planned: recursive article translation

Recursive translation of linked Wikipedia articles should be behind an
experimental flag because it can explode network and token usage. The proposed
shape:

1. Add a per-link action such as "Translate linked context".
2. Fetch the source article summary or selected section, not the whole article
   by default.
3. Run the existing formalize/translate pipeline on that bounded text.
4. Cache by source URL, target language, selected section, and source revision.
5. Surface translated context as expandable evidence, not as part of the main
   sentence translation.

This keeps the main Translate page deterministic while enabling deeper context
when reviewers need it.

## Planned: grammar quality follow-up

The current fix removes unresolved and wrongly linked phrases. It does not
solve Russian agreement and case for every phrase. A later grammar pass should
handle examples such as:

- `в Запад США` -> `на западе США` or another context-appropriate expression;
- `расположен на Тихоокеанское побережье` -> case/agreement-aware wording.

That work belongs in a separate grammar requirement because it affects many
sentences beyond issue #131.

We do it here in this pull request, is not beyond, it is inside this issue's scope.

We need to ensure all changes are correct, consistent, validated, tested, documented, logged and fully meet each and all discussed requirements in deepest and widest possible sense (check issue description and all comments in issue and in pull request, make sure each and every requirement listed before actually checking if they were addressed). Nothing should be defered or delayed. The scope is the entire repository, every change in one place should also be applied in all places in the codebase and docs. Ensure all CI/CD checks pass.

Please plan and execute everything in this single pull request, you have unlimited time and context, as context auto-compacts and you can continue indefinitely, until it is each and every requirement fully addressed, and everything is totally done.

@konard konard marked this pull request as draft June 1, 2026 10:06
@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-06-01T10:06:03.858Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

@konard konard changed the title Fix Translate punctuation and phrasal verbs for issue #131 Complete issue 131 Translate punctuation, overrides, and article context Jun 1, 2026
@konard konard marked this pull request as ready for review June 1, 2026 11:01
@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

Working session summary

Implemented and pushed the completed fix to PR #132:

#132

PR is ready for review, working tree is clean, and origin/main is an ancestor of the branch. Current head is c0f77af.

What changed:

  • Added virtual source overrides and Russian grammar naturalization for the issue sentence.
  • Added experimental linked-article context translation UI.
  • Preserved IPA pronunciation punctuation correctly.
  • Mirrored semantic-lexicon changes in Rust to satisfy JS/Rust parity.
  • Updated docs, case study evidence, changeset, and PR description.

Verified locally:

  • npm test
  • npm run check
  • cargo test
  • node scripts/check-js-rust-parity.mjs --base origin/main

Fresh CI on c0f77af is green:

  • Broken Link Checker: success
  • JS Checks and release: success
  • Rust Checks: success

This summary was automatically extracted from the AI working session output.

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Model: GPT-5.5
  • Provider: OpenAI
  • Public pricing estimate: $48.262345

📊 Context and tokens usage:

  • 966.0K / 1.1M (92%) input tokens, 116.2K / 128K (91%) output tokens

Total: (966.0K + 33.4M cached) input tokens, 116.2K output tokens, $48.262345 cost

🤖 Models used:

  • Tool: OpenAI Codex
  • Requested: gpt-5.5
  • Model: GPT-5.5 (gpt-5.5)

📎 Log file uploaded as Repository (2 chunks) (180892KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

Double check everything is fully implemented as described in #131, and nothing is left as planned or delayed.

We need to ensure all changes are correct, consistent, validated, tested, documented, logged and fully meet each and all discussed requirements in deepest and widest possible sense (check issue description and all comments in issue and in pull request, make sure each and every requirement listed before actually checking if they were addressed). Nothing should be defered or delayed. The scope is the entire repository, every change in one place should also be applied in all places in the codebase and docs. Ensure all CI/CD checks pass.

Please plan and execute everything in this single pull request, you have unlimited time and context, as context auto-compacts and you can continue indefinitely, until it is each and every requirement fully addressed, and everything is totally done.

@konard konard marked this pull request as draft June 1, 2026 13:40
@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-06-01T13:40:29.939Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

@konard konard marked this pull request as ready for review June 1, 2026 13:54
@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

Working session summary

Continued and finalized PR #132: #132

I pushed commit 48c73831bd4e8633f66c90c97009d67e7cf753af to issue-131-84c06d630fcf, refreshing the issue-131 final audit docs, PR/CI snapshots, and saved local verification logs. I also marked the PR ready for review.

Verification passed:

  • node --test js/tests/integration/issue-131.test.js
  • npm test
  • npm run check
  • cargo test
  • node scripts/check-js-rust-parity.mjs --base origin/main

Fresh GitHub CI for 48c7383 is green: Broken Link Checker, JS Checks and release, and Rust Checks all completed successfully. Working tree is clean.


This summary was automatically extracted from the AI working session output.

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Model: GPT-5.5
  • Provider: OpenAI
  • Public pricing estimate: $9.823415

📊 Context and tokens usage:

  • 235.8K / 1.1M (22%) input tokens, 24.2K / 128K (19%) output tokens

Total: (235.8K + 6.4M cached) input tokens, 24.2K output tokens, $9.823415 cost

🤖 Models used:

  • Tool: OpenAI Codex
  • Requested: gpt-5.5
  • Model: GPT-5.5 (gpt-5.5)

📎 Log file uploaded as Repository (30475KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jun 1, 2026

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard konard merged commit b570834 into main Jun 1, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue on Translate page: California (/ˌkælɪˈfɔːrniə/) is a state in the Wes...

1 participant