fix(repository): v2.73 pre-upload non-ASCII validation for scripts/mapdb#2322
fix(repository): v2.73 pre-upload non-ASCII validation for scripts/mapdb#2322mrhoribu wants to merge 1 commit into
Conversation
…d mapdb
## Summary
Adds a pre-upload check to `;repo upload` and `;repo upload-mapdb` that rejects files containing non-ASCII characters and prints a per-character report identifying every occurrence.
Smart quotes, em dashes, box-drawing characters, and similar non-ASCII content sometimes slip into scripts via editor auto-correct or copy-paste from web sources. The server-side tooling and many downstream consumers assume ASCII, so catching this client-side before upload saves a round trip and surfaces a clear, actionable error.
## Behavior
When a file is clean, upload proceeds exactly as before — no change.
When non-ASCII characters are present, the upload is aborted and a report is printed showing line, column, the offending character, and its codepoint:
```
[repository: error: non-ASCII characters detected in /path/to/script.lic]
[repository: found 1 non-ASCII character; upload aborted]
[repository: line 88, col 73: ─ (U+2500)]
```
Multi-byte UTF-8 characters report as a single entry (one row per character, not per byte). Lines that aren't valid UTF-8 fall back to a byte-wise scan so stray bytes are still reported rather than crashing the check.
Applies to both:
- `upload_file` — checked after `find_file` resolves the path
- `upload_mapdb` — checked after `Map.save_json` writes the file
## Implementation
Two private helpers in the existing `Uploader`-style class:
- `non_ascii_violations(file_path)` — returns `Array<Hash>` with `{ line:, col:, char:, codepoint: }` entries. Reads line-by-line to keep memory bounded for large map JSON files.
- `check_non_ascii(file_path)` — runs the scan, echoes the report when violations exist, returns `true` (proceed) or `false` (abort).
No new dependencies. No changes to existing upload behavior on clean files.
No changes to the wire protocol or server contract.
## Scope
- `upload_file` — added one guard line
- `upload_mapdb` — added one guard line
- Two new private helpers in the same class
- Version bump 2.72 → 2.73 + changelog entry
Total: +44 / -0 lines.
## Testing
Verified against a real failure case (`gauntletcharger.lic` with an em-dash U+2500 at line 88). Also tested locally with:
- Clean ASCII files (passes through, no output)
- Single-character violations (smart quote, em dash, box-drawing)
- Multi-line violations
- Invalid UTF-8 bytes (falls back gracefully, still reports)
- The 0x7F / 0x80 ASCII boundary
Happy to add an RSpec file if there's a fixture pattern preferred for this script — the helpers are pure functions of file contents and would be straightforward to spec.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Adds a pre-upload check to
;repo uploadand;repo upload-mapdbthat rejects files containing non-ASCII characters and prints a per-character report identifying every occurrence.Smart quotes, em dashes, box-drawing characters, and similar non-ASCII content sometimes slip into scripts via editor auto-correct or copy-paste from web sources. The server-side tooling and many downstream consumers assume ASCII, so catching this client-side before upload saves a round trip and surfaces a clear, actionable error.
Behavior
When a file is clean, upload proceeds exactly as before — no change.
When non-ASCII characters are present, the upload is aborted and a report is printed showing line, column, the offending character, and its codepoint:
Multi-byte UTF-8 characters report as a single entry (one row per character, not per byte). Lines that aren't valid UTF-8 fall back to a byte-wise scan so stray bytes are still reported rather than crashing the check.
Applies to both:
upload_file— checked afterfind_fileresolves the pathupload_mapdb— checked afterMap.save_jsonwrites the fileImplementation
Two private helpers in the existing
Uploader-style class:non_ascii_violations(file_path)— returnsArray<Hash>with{ line:, col:, char:, codepoint: }entries. Reads line-by-line to keep memory bounded for large map JSON files.check_non_ascii(file_path)— runs the scan, echoes the report when violations exist, returnstrue(proceed) orfalse(abort).No new dependencies. No changes to existing upload behavior on clean files. No changes to the wire protocol or server contract.
Scope
upload_file— added one guard lineupload_mapdb— added one guard lineTesting
Verified against a real failure case (
gauntletcharger.licwith an em-dash U+2500 at line 88). Also tested locally with: