Skip to content

feat(portal): stream file uploads to reduce memory from O(file_size) to O(5 MiB)#409

Open
lilongen wants to merge 2 commits intoboxlite-ai:mainfrom
lilongen:feat/streaming-file-upload
Open

feat(portal): stream file uploads to reduce memory from O(file_size) to O(5 MiB)#409
lilongen wants to merge 2 commits intoboxlite-ai:mainfrom
lilongen:feat/streaming-file-upload

Conversation

@lilongen
Copy link
Copy Markdown

Summary

  • Replace full-file Vec<UploadChunk> buffering in upload_tar with a bounded mpsc::channel(4) + spawned reader task, capping peak memory at ~5 MiB regardless of file size
  • Extract stream_file_chunks helper (accepts impl AsyncRead) for testability and single-responsibility
  • Use std::mem::take for dest_path/container_id (first chunk only, zero-copy on subsequent chunks)
  • Always await reader JoinHandle before checking gRPC result to surface root-cause errors first
  • Return error on receiver drop instead of silent Ok(()) to detect incomplete uploads

Motivation

upload_tar previously read the entire tar file into a Vec<UploadChunk> before streaming to the guest. For a 500 MB file, this consumed ~500 MB of heap. The download_tar path in the same file already used streaming correctly — this PR closes the asymmetry.

Scenario Before After
500 MB upload ~500 MB heap ~5 MiB
1 GB upload ~1 GB heap ~5 MiB

Test plan

  • cargo fmt -- --check clean
  • cargo clippy -p boxlite --no-default-features --lib -- -D warnings clean (0 warnings)
  • cargo test -p boxlite --no-default-features --lib — 608 passed, 0 failed
  • 8 new unit tests covering all branches of stream_file_chunks:
    • Multi-chunk (3 MiB), single-chunk (500 B), partial last chunk, empty file
    • Receiver dropped (cancellation) returns error
    • I/O read error propagation via FailingReader mock
    • Byte-level data integrity verification
    • upload_tar file-not-found error path

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 27, 2026 05:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the portal-side tar upload path to stream file contents instead of buffering the entire archive in memory, bringing upload behavior in line with the already-streaming download path and reducing peak heap usage for large uploads.

Changes:

  • Reworked FilesInterface::upload_tar to stream UploadChunks via a bounded mpsc channel + ReceiverStream rather than prebuilding a Vec<UploadChunk>.
  • Added stream_file_chunks helper to read any AsyncRead in CHUNK_SIZE increments and emit UploadChunks.
  • Added unit tests covering chunking behavior, integrity, cancellation/receiver-drop behavior, and error propagation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +57 to +73
let upload_result = self.client.upload(stream).await;

// Always join the reader task first — its error is the root cause when
// both the reader and the gRPC call fail (e.g. read error causes the
// stream to end, which causes the server to report failure).
match reader_handle.await {
Ok(Err(read_err)) => return Err(read_err),
Err(join_err) => {
return Err(BoxliteError::Internal(format!(
"Upload reader task failed: {}",
join_err
)));
}
Ok(Ok(())) => {}
}

// Use futures::stream::iter for the upload stream
let stream = futures::stream::iter(chunks);

let response = self
.client
.upload(stream)
.await
.map_err(map_tonic_err)?
.into_inner();
let response = upload_result.map_err(map_tonic_err)?.into_inner();
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upload_tar returns the reader task error before inspecting the gRPC result. If the server rejects the upload early (e.g. permission denied) and drops the receiver, stream_file_chunks will likely return Internal("Upload stream closed..."), which then masks the real tonic Status from upload_result. Consider combining both results: if upload_result is Err, prefer that error unless the reader error is a real I/O failure (e.g. Storage("Failed to read...")), or include both errors in the returned message.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to 71
let upload_result = self.client.upload(stream).await;

// Always join the reader task first — its error is the root cause when
// both the reader and the gRPC call fail (e.g. read error causes the
// stream to end, which causes the server to report failure).
match reader_handle.await {
Ok(Err(read_err)) => return Err(read_err),
Err(join_err) => {
return Err(BoxliteError::Internal(format!(
"Upload reader task failed: {}",
join_err
)));
}
Ok(Ok(())) => {}
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the gRPC call completes, upload_tar always awaits reader_handle. If the upload fails quickly (connection error) but the file read blocks indefinitely (e.g. stalled filesystem), this can hang the whole operation and hide the gRPC failure. Consider aborting the reader task (or using a timeout) when upload_result is Err, while still joining normally on the success path to surface read errors.

Copilot uses AI. Check for mistakes.
@lilongen lilongen force-pushed the feat/streaming-file-upload branch from 37ca16f to 67e33de Compare April 2, 2026 14:40
lile and others added 2 commits April 21, 2026 15:30
…to O(5 MiB)

Replace full-file Vec buffering in upload_tar with a bounded mpsc channel
(capacity=4) and a spawned reader task. This caps peak memory at ~5 MiB
regardless of file size, matching the streaming pattern already used in
download_tar and the guest-side upload handler.

Key changes:
- Extract stream_file_chunks helper accepting impl AsyncRead for testability
- Use std::mem::take for dest_path/container_id (first chunk only, zero-copy)
- Always await reader JoinHandle before checking gRPC result (root-cause priority)
- Return error on receiver drop instead of silent Ok (incomplete upload detection)
- Add 8 unit tests: multi-chunk, single-chunk, partial, empty, receiver-drop,
  read-error (FailingReader mock), data-integrity, and file-not-found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the server rejects an upload (e.g. permission denied) and drops the
receiver, stream_file_chunks returns a generic "stream closed" error that
masks the real tonic Status. Now distinguish real I/O errors (Storage)
from stream-closed errors: prefer the gRPC error for the latter since
it contains the actual root cause.

Also fix clippy::io_other_error lint (Rust 1.94).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 21, 2026 07:30
@lilongen lilongen force-pushed the feat/streaming-file-upload branch from 67e33de to 409fd00 Compare April 21, 2026 07:30
@lilongen
Copy link
Copy Markdown
Author

PR #409 Review Fix

Copilot Comment #1: Reader error masks gRPC error — Fixed

When the gRPC server rejects an upload (e.g. permission denied) and drops the receiver, stream_file_chunks returns a generic Internal("Upload stream closed...") that masks the real tonic::Status.

Fix: Distinguish real I/O errors (Storage) from stream-closed errors:

  • Storage error → return directly (reader I/O failure is the root cause)
  • Other error (e.g. "stream closed") → check upload_result first; if gRPC also failed, return the gRPC error (it's the real root cause)

Copilot Comment #2: Reader may block on stalled filesystem — Not fixing

When gRPC fails, the ReceiverStream is dropped, which closes the channel. stream_file_chunks will see tx.send() fail and exit immediately in the vast majority of cases. The only scenario where the reader truly blocks is a stalled filesystem (system-level fault), which adding a timeout/abort wouldn't meaningfully help with. Low risk, high complexity trade-off.

Additional

  • Fixed clippy::io_other_error lint (Rust 1.94: Error::new(ErrorKind::Other, ...)Error::other(...))
  • All 620 tests pass, clippy clean, fmt clean

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

fn map_tonic_err(err: tonic::Status) -> BoxliteError {
BoxliteError::Internal(err.to_string())
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map_tonic_err currently converts tonic::Status into BoxliteError::Internal, which loses the error classification (BoxliteError::Rpc) that the rest of the portal interfaces get via impl From<tonic::Status> for BoxliteError (see src/shared/src/errors.rs). Consider removing this helper and using ? / BoxliteError::from(status) (or at least mapping to BoxliteError::Rpc) so callers and the FFI layer can distinguish RPC failures from internal errors.

Suggested change
BoxliteError::Internal(err.to_string())
BoxliteError::Rpc(err.to_string())

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants