Skip to content

feat: Add Document Chunking Sandbox — Issue #164#257

Open
Arjita-2B-18 wants to merge 2 commits into
avishek0769:mainfrom
Arjita-2B-18:feature/164-document-chunking-sandbox
Open

feat: Add Document Chunking Sandbox — Issue #164#257
Arjita-2B-18 wants to merge 2 commits into
avishek0769:mainfrom
Arjita-2B-18:feature/164-document-chunking-sandbox

Conversation

@Arjita-2B-18

Copy link
Copy Markdown
Contributor

Summary

Adds an interactive chunking sandbox at /sandbox so contributors and
users can preview how text will be split before running the full
ingestion pipeline.

Changes

Frontend

  • src/App.tsx — Added /sandbox as a ProtectedRoute
  • src/pages/Sandbox.tsx — New page with:
    • Text paste area with live character counter
    • Sliders for chunk size (50–2000) and overlap (auto-clamped to chunkSize - 10)
    • "Preview Chunks" button calling POST /api/chunk-preview
    • Result cards with per-chunk character count
    • Purple highlight on the overlap region in each chunk (except the first)
    • Stats pills: total chunks / size / overlap
    • Reset button, full error + loading states
    • Matches existing design system: bg-[#0b0b0f], Sidebar, accent-blue, accent-purple

Backend

  • backend/controllers/chat.controller.js
    • Added isolated chunkText() utility function (no UI coupling, reusable by worker)
    • Added chunkPreview handler using asyncHandler + ApiResponse/ApiError
    • Input validation: required string, max 100k chars, chunkSize 10–5000, overlap ≥ 0
    • Zero DB writes. Zero vector operations. Zero production chat data created.
  • backend/routes/chat.routes.js — Registered POST /chunk-preview

Acceptance Criteria

  • Users can paste text and choose chunking parameters
  • Page shows resulting chunks with counts and overlap behavior highlighted
  • Sandbox does not store vectors or create production chat data

Notes

  • chunkText() is intentionally isolated in both files so the ingestion
    worker can import the backend version directly in a future PR
  • No new npm dependencies added

Closes #164

@avishek0769 avishek0769 added Hard This is issue is hard to solve SSoC26 Social Summer of Code - 2026 labels Jun 14, 2026
@avishek0769

Copy link
Copy Markdown
Owner

@Arjita-2B-18 resolve merge conflicts

- Add /sandbox route in App.tsx (ProtectedRoute)
- Create src/pages/Sandbox.tsx with interactive preview UI
- Add chunkText() pure utility (isolated for worker reuse)
- Add POST /api/chunk-preview in chat.controller.js
- Register route in chat.route.js before /:chatId wildcard

Closes avishek0769#164
@Arjita-2B-18 Arjita-2B-18 force-pushed the feature/164-document-chunking-sandbox branch from c22cb20 to 0d06076 Compare June 15, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hard This is issue is hard to solve SSoC26 Social Summer of Code - 2026

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a document chunking sandbox

2 participants