Skip to content

Conversation

@ankit-mehta07
Copy link

@ankit-mehta07 ankit-mehta07 commented Feb 3, 2026

Summary

Fixes #569

Summary

  • Added file size validation to the document upload endpoint
  • Prevents oversized files from being uploaded to S3
  • Introduced a reusable validator for file size checks
  • Added test coverage for upload size validation
  • Updated documentation to reflect the new restriction

Motivation

This change avoids unnecessary S3 storage usage and prevents downstream issues when handling very large documents.

Summary by CodeRabbit

  • New Features

    • File uploads now enforce a configurable 50MB size limit and reject empty files with clear errors.
  • Documentation

    • Added "File Size Restrictions" detailing limits and error responses (413 for too large, 422 for empty).
  • Tests

    • Added upload tests covering oversized, empty, and within-limit files.
  • Bug Fix / API

    • Normalized several endpoint paths by removing trailing slashes (minor URL changes).

@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

Adds a configurable document upload size limit (default 50MB), enforces size and emptiness validation before S3 upload, updates docs, and adds tests covering oversized, empty, and valid uploads.

Changes

Cohort / File(s) Summary
Configuration
backend/app/core/config.py
Added MAX_DOCUMENT_UPLOAD_SIZE_MB setting (default 50).
Validation
backend/app/services/documents/validators.py
New validate_document_file(UploadFile) that checks filename, measures size, rejects empty files (422) and oversized files (413), and returns size in bytes.
Route integration
backend/app/api/routes/documents.py
Imported and invoked validate_document_file in upload_doc prior to pre-transform validation to abort early on invalid files.
Documentation
backend/app/api/docs/documents/upload.md
Added "File Size Restrictions" section documenting 50MB default limit and error responses (413, 422).
Tests
backend/app/tests/api/routes/documents/test_route_document_upload.py
Added tests for oversized (51MB → 413), empty (→ 422), and within-limit (1MB → success + DB record) uploads.
Route path normalization
backend/app/api/routes/login.py, backend/app/api/routes/private.py, backend/app/api/routes/utils.py, backend/app/tests/.../test_login.py, backend/app/tests/.../test_private.py
Removed trailing slashes from several route decorators and corresponding tests (e.g., /reset-password//reset-password, /users//users, /health//health).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as Upload Handler
    participant Validator
    participant S3
    participant DB as Database

    Client->>Handler: POST /upload (multipart file)
    activate Handler

    Handler->>Validator: validate_document_file(file)
    activate Validator
    alt validation fails
        Validator-->>Handler: HTTPException (413 or 422)
        Handler-->>Client: Error response
    else validation passes
        Validator-->>Handler: file_size (bytes)
        deactivate Validator

        Handler->>Handler: pre_transform_validation(...)
        Handler->>S3: Upload file
        activate S3
        S3-->>Handler: Upload success (URL/metadata)
        deactivate S3

        Handler->>DB: Create Document record
        activate DB
        DB-->>Handler: Record created
        deactivate DB

        Handler-->>Client: 201 Created (document metadata)
    end
    deactivate Handler
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I checked each byte with careful paws,

Fifty megs now hold the upload laws.
Empty files hop away, denied—
Good records saved with S3 as guide. 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes out-of-scope changes: removal of trailing slashes from API endpoints in login.py, private.py, and utils.py routes, unrelated to the stated objective of file size validation. Remove unrelated API endpoint formatting changes and revert trailing slash modifications in login.py, private.py, and utils.py to keep the PR focused on file size validation.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add file size validation to document upload endpoint' clearly summarizes the main change: implementing file size validation for document uploads.
Linked Issues check ✅ Passed The PR fully addresses issue #569 by implementing file size restriction (50MB configurable limit), validation logic, error handling, documentation, and comprehensive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/api/routes/documents.py (1)

108-123: ⚠️ Potential issue | 🟡 Minor

Add return type annotation to upload_doc.

The function is missing a return type annotation. As per coding guidelines, all function parameters and return values must have type hints.

✅ Suggested change
-):
+) -> APIResponse[DocumentUploadResponse]:
🤖 Fix all issues with AI agents
In `@backend/app/services/documents/validators.py`:
- Line 53: The log line should be prefixed with the function name and mask
sensitive values: update the logger.info call in the function (e.g.,
validate_document_file) to use the "[validate_document_file]" prefix and call
mask_string on file.filename and any other sensitive fields (e.g.,
logger.info(f"[validate_document_file] Document file validated:
{mask_string(file.filename)} ({file_size} bytes)")). Ensure you import/use the
existing mask_string utility and keep file_size non-sensitive.
🧹 Nitpick comments (1)
backend/app/tests/api/routes/documents/test_route_document_upload.py (1)

329-424: Use a temp-file factory fixture (and avoid 51MB writes).
These tests duplicate file-creation logic and write large files byte-by-byte. A factory fixture aligns with test guidelines and speeds up IO.

💡 Example refactor
+@pytest.fixture
+def temp_file_factory():
+    def _make(size_bytes: int, suffix: str) -> Path:
+        with NamedTemporaryFile(mode="wb", suffix=suffix, delete=False) as fp:
+            fp.truncate(size_bytes)
+            return Path(fp.name)
+    return _make
-        with NamedTemporaryFile(mode="wb", suffix=".pdf", delete=False) as fp:
-            chunk_size = 1024 * 1024
-            for _ in range(51):
-                fp.write(b"0" * chunk_size)
-            fp.flush()
-            large_file = Path(fp.name)
+        large_file = temp_file_factory(51 * 1024 * 1024, ".pdf")
As per coding guidelines, Use factory pattern for test fixtures in `backend/app/tests/`.

detail="Empty file uploaded"
)

logger.info(f"Document file validated: {file.filename} ({file_size} bytes)")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Prefix and mask the log message.
Logging must include the function-name prefix and mask sensitive values like filenames.

🔧 Suggested change
+from app.utils import mask_string
...
-    logger.info(f"Document file validated: {file.filename} ({file_size} bytes)")
+    logger.info(
+        f"[validate_document_file] Document file validated: "
+        f"{mask_string(file.filename)} ({file_size} bytes)"
+    )
As per coding guidelines, Prefix all log messages with the function name in square brackets: `logger.info(f"[function_name] Message {mask_string(sensitive_value)}")`.
🤖 Prompt for AI Agents
In `@backend/app/services/documents/validators.py` at line 53, The log line should
be prefixed with the function name and mask sensitive values: update the
logger.info call in the function (e.g., validate_document_file) to use the
"[validate_document_file]" prefix and call mask_string on file.filename and any
other sensitive fields (e.g., logger.info(f"[validate_document_file] Document
file validated: {mask_string(file.filename)} ({file_size} bytes)")). Ensure you
import/use the existing mask_string utility and keep file_size non-sensitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Documents: Put size limit on files being uploaded

1 participant