Skip to content

fix(repository): v2.73 pre-upload non-ASCII validation for scripts/mapdb#2322

Open
mrhoribu wants to merge 1 commit into
masterfrom
fix/repository-v2.73
Open

fix(repository): v2.73 pre-upload non-ASCII validation for scripts/mapdb#2322
mrhoribu wants to merge 1 commit into
masterfrom
fix/repository-v2.73

Conversation

@mrhoribu
Copy link
Copy Markdown
Contributor

@mrhoribu mrhoribu commented May 12, 2026

Summary

Adds a pre-upload check to ;repo upload and ;repo upload-mapdb that rejects files containing non-ASCII characters and prints a per-character report identifying every occurrence.

Smart quotes, em dashes, box-drawing characters, and similar non-ASCII content sometimes slip into scripts via editor auto-correct or copy-paste from web sources. The server-side tooling and many downstream consumers assume ASCII, so catching this client-side before upload saves a round trip and surfaces a clear, actionable error.

Behavior

When a file is clean, upload proceeds exactly as before — no change.

When non-ASCII characters are present, the upload is aborted and a report is printed showing line, column, the offending character, and its codepoint:

[repository: error: non-ASCII characters detected in /path/to/script.lic]
[repository: found 1 non-ASCII character; upload aborted]
[repository:   line 88, col 73: ─ (U+2500)]

Multi-byte UTF-8 characters report as a single entry (one row per character, not per byte). Lines that aren't valid UTF-8 fall back to a byte-wise scan so stray bytes are still reported rather than crashing the check.

Applies to both:

  • upload_file — checked after find_file resolves the path
  • upload_mapdb — checked after Map.save_json writes the file

Implementation

Two private helpers in the existing Uploader-style class:

  • non_ascii_violations(file_path) — returns Array<Hash> with { line:, col:, char:, codepoint: } entries. Reads line-by-line to keep memory bounded for large map JSON files.
  • check_non_ascii(file_path) — runs the scan, echoes the report when violations exist, returns true (proceed) or false (abort).

No new dependencies. No changes to existing upload behavior on clean files. No changes to the wire protocol or server contract.

Scope

  • upload_file — added one guard line
  • upload_mapdb — added one guard line
  • Two new private helpers in the same class
  • Version bump 2.72 → 2.73 + changelog entry

Testing

Verified against a real failure case (gauntletcharger.lic with an em-dash U+2500 at line 88). Also tested locally with:

  • Clean ASCII files (passes through, no output)
  • Single-character violations (smart quote, em dash, box-drawing)
  • Multi-line violations
  • Invalid UTF-8 bytes (falls back gracefully, still reports)
  • The 0x7F / 0x80 ASCII boundary

…d mapdb

## Summary

Adds a pre-upload check to `;repo upload` and `;repo upload-mapdb` that rejects files containing non-ASCII characters and prints a per-character report identifying every occurrence.

Smart quotes, em dashes, box-drawing characters, and similar non-ASCII content sometimes slip into scripts via editor auto-correct or copy-paste from web sources. The server-side tooling and many downstream consumers assume ASCII, so catching this client-side before upload saves a round trip and surfaces a clear, actionable error.

## Behavior

When a file is clean, upload proceeds exactly as before — no change.

When non-ASCII characters are present, the upload is aborted and a report is printed showing line, column, the offending character, and its codepoint:
```
[repository: error: non-ASCII characters detected in /path/to/script.lic]
[repository: found 1 non-ASCII character; upload aborted]
[repository:   line 88, col 73: ─ (U+2500)]
```
Multi-byte UTF-8 characters report as a single entry (one row per character, not per byte). Lines that aren't valid UTF-8 fall back to a byte-wise scan so stray bytes are still reported rather than crashing the check.

Applies to both:
- `upload_file` — checked after `find_file` resolves the path
- `upload_mapdb` — checked after `Map.save_json` writes the file

## Implementation

Two private helpers in the existing `Uploader`-style class:
- `non_ascii_violations(file_path)` — returns `Array<Hash>` with `{ line:, col:, char:, codepoint: }` entries. Reads line-by-line to keep memory bounded for large map JSON files.
- `check_non_ascii(file_path)` — runs the scan, echoes the report when violations exist, returns `true` (proceed) or `false` (abort).

No new dependencies. No changes to existing upload behavior on clean files.
No changes to the wire protocol or server contract.

## Scope

- `upload_file` — added one guard line
- `upload_mapdb` — added one guard line
- Two new private helpers in the same class
- Version bump 2.72 → 2.73 + changelog entry

Total: +44 / -0 lines.

## Testing

Verified against a real failure case (`gauntletcharger.lic` with an em-dash U+2500 at line 88). Also tested locally with:

- Clean ASCII files (passes through, no output)
- Single-character violations (smart quote, em dash, box-drawing)
- Multi-line violations
- Invalid UTF-8 bytes (falls back gracefully, still reports)
- The 0x7F / 0x80 ASCII boundary

Happy to add an RSpec file if there's a fixture pattern preferred for this script — the helpers are pure functions of file contents and would be straightforward to spec.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Warning

Rate limit exceeded

@mrhoribu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 43 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dd5a97a1-b74b-4934-bd5a-e377dbb8df45

📥 Commits

Reviewing files that changed from the base of the PR and between 7389607 and ece2346.

📒 Files selected for processing (1)
  • scripts/repository.lic
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/repository-v2.73

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant