Skip to content

fix(backend): prevent DoS by limiting scraper fetch payload size#95

Open
Vedhant26 wants to merge 2 commits into
avishek0769:mainfrom
Vedhant26:fix/backend-scraper-dos
Open

fix(backend): prevent DoS by limiting scraper fetch payload size#95
Vedhant26 wants to merge 2 commits into
avishek0769:mainfrom
Vedhant26:fix/backend-scraper-dos

Conversation

@Vedhant26

Copy link
Copy Markdown
Contributor

What changed

  • Added safeFetchText helper in �ackend/utils/ragUtilities.js.
  • Modified scrapeWebpage and scrapeTitle to use safeFetchText instead of blindly buffering (await fetch(url)).text() into memory.
  • The new fetch logic enforces a 5MB maximum payload size and validates that the Content-Type is a text-based document.

Why

Calling etch(url).text() on user-provided URLs without size constraints creates an unconstrained resource consumption vulnerability (DoS). If a malicious actor inputs a URL that serves an infinite data stream or a massive multi-gigabyte file, the Node.js process attempts to buffer the entire payload into a single string. This rapidly exhausts the heap memory limit, crashing the backend API via an Out Of Memory (OOM) Panic. This fix ensures that oversized payloads are rejected and aborted before they can crash the server.

How to test

Run the backend:
�ash id="t1x9ab" pnpm run dev
Expected result:

  • The RAG scraper works normally for regular documentation pages.
  • Attempting to ingest a massive file (e.g., an ISO or a mock 10GB stream) throws a Payload too large. Exceeded 5MB limit. error and safely aborts the operation without crashing the API.

Screenshots (if UI change)

N/A

Related issue

Closes #94


Pre-Submission Checklist

  • Branch named following GSSoC convention
  • Changes committed with meaningful commit message
  • Changes pushed to remote fork
  • Relevant tests executed successfully
  • Code follows project contribution guidelines

@Vedhant26

Copy link
Copy Markdown
Contributor Author

@avishek0769 I am ready for any constructuve criticism :)

@avishek0769 avishek0769 added Medium This is issue is not easy to solve but not hard SSoC26 Social Summer of Code - 2026 labels May 30, 2026
@avishek0769

Copy link
Copy Markdown
Owner

@Vedhant26 One suggestion: instead of throwing an error when the 5 MB limit is exceeded, return the data collected up to that limit. That way, the function still provides useful output rather than failing entirely once the limit is reached.

@avishek0769

Copy link
Copy Markdown
Owner

@Vedhant26 Any update on this PR?

@Vedhant26

Copy link
Copy Markdown
Contributor Author

Working in it , I'll reach out soon

@Vedhant26 Vedhant26 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@avishek0769

Copy link
Copy Markdown
Owner

@Vedhant26 Resolve the conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Medium This is issue is not easy to solve but not hard SSoC26 Social Summer of Code - 2026

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: Unconstrained fetch payload size causes Server DoS (OOM Crash)

2 participants