Skip to content

Conversation

@brendandahl
Copy link
Collaborator

@brendandahl brendandahl commented Jan 6, 2026

Previously, when EMSCRIPTEN_FETCH_STREAM_DATA was used, we would collect all the streamed data chunks and create the full response when finished in the XHR polyfill. The full response is not used by the C fetch API, so we can avoid storing it when streaming is enabled.

Previously, when `EMSCRIPTEN_FETCH_STREAM_DATA` was used we would collect
all the streamed data chunks and create the full response when finished.
This caused excessive memory usage. Now we leave it up to the user to
process the data and collect it as needed.
@brendandahl brendandahl requested review from kripken and sbc100 January 6, 2026 22:24
@brendandahl
Copy link
Collaborator Author

See #25383 (comment)

@brendandahl
Copy link
Collaborator Author

/cc @inolen

@kripken
Copy link
Member

kripken commented Jan 7, 2026

Is this a breaking change?

@brendandahl
Copy link
Collaborator Author

Is this a breaking change?

No, the old code that used the firefox onprogress events wouldn't pass all the data when finished to the C API. There's a test for it in

assert(fetch->data == 0); // The data was streamed via onprogress, no bytes available here.

@kripken
Copy link
Member

kripken commented Jan 7, 2026

But I'm still confused, the description says

Now we leave it up to the user to process the data and collect it as needed.

What if the user doesn't do that properly - could existing code not break?

@brendandahl
Copy link
Collaborator Author

Let me reword that... I was thinking of it from the perspective of the FetchXHR polyfill not the C API. The polyfill behavior does change (we no longer collect all the bytes), but this isn't visible to anyone using the emscripten C API.

@inolen
Copy link
Collaborator

inolen commented Jan 7, 2026

Hey @brendandahl thanks! This looks great for not allocating the entire response when streaming. I was going to post a patch in that comment thread, but I got a bit distracted and haven't had time to run tests on my changes.

There are two other issues with the code right now that I noticed when using this with thousands of files. These aren't directly related to your new code, so I can always open a new PR if desired -

1.) saveResponseAndStatus leaks the allocated data when using the streaming mode. Since xhr.response is NULL, ptr / ptrLen remain 0, and it then overwrites ptr with 0 leaking the allocated data.
2.) the realloc's feel unnecessary (although functional) if the new buffer is smaller than the old.

Here was my current WIP state -
https://gist.github.com/inolen/8bfcb72c58061512475c69b84f75524d

@brendandahl
Copy link
Collaborator Author

I haven't looked into the other issues you mentioned yet. A separate PR would be good. We should probably run some of the fetch tests with lsan.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see, thanks!

@brendandahl brendandahl merged commit 035c704 into emscripten-core:main Jan 8, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants