-
Notifications
You must be signed in to change notification settings - Fork 9
Add file size validation to document upload endpoint #576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds a configurable document upload size limit (default 50MB), enforces size and emptiness validation before S3 upload, updates docs, and adds tests covering oversized, empty, and valid uploads. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Handler as Upload Handler
participant Validator
participant S3
participant DB as Database
Client->>Handler: POST /upload (multipart file)
activate Handler
Handler->>Validator: validate_document_file(file)
activate Validator
alt validation fails
Validator-->>Handler: HTTPException (413 or 422)
Handler-->>Client: Error response
else validation passes
Validator-->>Handler: file_size (bytes)
deactivate Validator
Handler->>Handler: pre_transform_validation(...)
Handler->>S3: Upload file
activate S3
S3-->>Handler: Upload success (URL/metadata)
deactivate S3
Handler->>DB: Create Document record
activate DB
DB-->>Handler: Record created
deactivate DB
Handler-->>Client: 201 Created (document metadata)
end
deactivate Handler
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Failure to add the new IP will result in interrupted reviews. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/api/routes/documents.py (1)
108-123:⚠️ Potential issue | 🟡 MinorAdd return type annotation to
upload_doc.The function is missing a return type annotation. As per coding guidelines, all function parameters and return values must have type hints.
✅ Suggested change
-): +) -> APIResponse[DocumentUploadResponse]:
🤖 Fix all issues with AI agents
In `@backend/app/services/documents/validators.py`:
- Line 53: The log line should be prefixed with the function name and mask
sensitive values: update the logger.info call in the function (e.g.,
validate_document_file) to use the "[validate_document_file]" prefix and call
mask_string on file.filename and any other sensitive fields (e.g.,
logger.info(f"[validate_document_file] Document file validated:
{mask_string(file.filename)} ({file_size} bytes)")). Ensure you import/use the
existing mask_string utility and keep file_size non-sensitive.
🧹 Nitpick comments (1)
backend/app/tests/api/routes/documents/test_route_document_upload.py (1)
329-424: Use a temp-file factory fixture (and avoid 51MB writes).
These tests duplicate file-creation logic and write large files byte-by-byte. A factory fixture aligns with test guidelines and speeds up IO.As per coding guidelines, Use factory pattern for test fixtures in `backend/app/tests/`.💡 Example refactor
+@pytest.fixture +def temp_file_factory(): + def _make(size_bytes: int, suffix: str) -> Path: + with NamedTemporaryFile(mode="wb", suffix=suffix, delete=False) as fp: + fp.truncate(size_bytes) + return Path(fp.name) + return _make- with NamedTemporaryFile(mode="wb", suffix=".pdf", delete=False) as fp: - chunk_size = 1024 * 1024 - for _ in range(51): - fp.write(b"0" * chunk_size) - fp.flush() - large_file = Path(fp.name) + large_file = temp_file_factory(51 * 1024 * 1024, ".pdf")
| detail="Empty file uploaded" | ||
| ) | ||
|
|
||
| logger.info(f"Document file validated: {file.filename} ({file_size} bytes)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix and mask the log message.
Logging must include the function-name prefix and mask sensitive values like filenames.
🔧 Suggested change
+from app.utils import mask_string
...
- logger.info(f"Document file validated: {file.filename} ({file_size} bytes)")
+ logger.info(
+ f"[validate_document_file] Document file validated: "
+ f"{mask_string(file.filename)} ({file_size} bytes)"
+ )🤖 Prompt for AI Agents
In `@backend/app/services/documents/validators.py` at line 53, The log line should
be prefixed with the function name and mask sensitive values: update the
logger.info call in the function (e.g., validate_document_file) to use the
"[validate_document_file]" prefix and call mask_string on file.filename and any
other sensitive fields (e.g., logger.info(f"[validate_document_file] Document
file validated: {mask_string(file.filename)} ({file_size} bytes)")). Ensure you
import/use the existing mask_string utility and keep file_size non-sensitive.
Summary
Fixes #569
Summary
Motivation
This change avoids unnecessary S3 storage usage and prevents downstream issues when handling very large documents.
Summary by CodeRabbit
New Features
Documentation
Tests
Bug Fix / API