-
-
Notifications
You must be signed in to change notification settings - Fork 34.6k
http: align header value validation with Fetch spec #61597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
http: align header value validation with Fetch spec #61597
Conversation
|
Review requested:
|
mcollina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
cc @nodejs/tsc this require a few more sets of eyes |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #61597 +/- ##
==========================================
- Coverage 89.77% 89.75% -0.02%
==========================================
Files 673 673
Lines 203840 203949 +109
Branches 39180 39194 +14
==========================================
+ Hits 182998 183064 +66
- Misses 13156 13194 +38
- Partials 7686 7691 +5
🚀 New features to boost your workflow:
|
RafaelGSS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this seems like a fix, I believe this could be considered a semver-major PR.
For node:http, this is not a fix but an elective behavioural change:
So, not required, but compliant optional. Footnotes |
pimterry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is, this is definitely a semver-major breaking change imo - it's directly exposed by the public http.validateHeaderValue API, and changes that method's behaviour significantly. Easy to imagine cases where that breaks things, or plausibly creates vulnerabilities.
Even with a major bump, I'm against doing this by default. From a quick search it seems that most other backend runtimes & proxies etc are stricter here, and there's significantly different tradeoffs for traffic from a untrusted browser clients vs potentially internal-use clients like Node. Eliminating CR/LF drops the most obvious request smuggling risks, but I'm concerned this will expose other weird & wonderful edge cases of confused header value interpretations later on. It's better UX & ecosystem behaviour if our defaults match other similar runtimes and strictly follow the RFCs so far as possible.
Also, doing some digging - it seems we use the same checkInvalidHeaderChar method that this regex powers to validate response status messages (here) before sending them. It's extra likely that we don't want to relax that I think - doing so clearly ignores the RFCs, and there's no use case for this at all AFAICT.
All that said, I'd be perfectly happy with this as optional behaviour, applied to header values only (not the status message). The JSDOM case makes perfect sense and I'm sure there's other similar scenarios, but imo it should be opt-in.
There's two approaches I can see:
- Use the existing
insecureHTTPParseroption to relax this. Very easy & convenient to implement. Right now that's only used for parsing received requests or responses I think, but insecure parsing of our own inputs would fit reasonably imo. This is already configurable at the whole-process level (by CLI arg) or with a per-request option. - Or, we create a new HTTP request option for this (
relaxedHeaderValueValidationor something). A bit more work and duplication, but allows you to change this setting without enabling insecure parsing more generally.
Mainly depends whether there are cases where you'd want to enable this but not relax parsing of the corresponding response at the same time imo. Thoughts would be interesting (from @domenic as well for the jsdom use case).
|
My personal feeling is that it's simplest to just align with modern specs (Fetch and RFC 9110) on the most-permissive set. In practice, that's what "the ecosystem at large" uses: i.e., any software that wants to interoperate with the entire ecosystem will inevitably need to support such characters. (There are various "feature requests" from people on the Node.js issue tracker who are prevented from talking to their own backend or frontend because of this, because other parts of their stack use a larger set of characters. E.g. #21509, #39684, #22064 (comment), maybe #50213. See also httpwg/http-core#683 where there are some reports from the Python ecosystem.) I think it's a losing battle trying to hold up stricter standards on an API-by-API basis, and I'm glad that even the HTTPWG has recognized that, as part of RFC 9110 updating the old stricter rules in RFC 7230. I also think it's important that Node.js's global But if there are parts of the Node.js community which would like to impose stricter standards by default on other parts of the Node.js API, e.g. raw usage of Stepping back, I think @pimterry is concerned about the impact of this on various different subsystems. Here are all the ones I see mentioned plus a few more:
My reading of RFC 9110 is that all of (1)-(4) should use the loose validation rules. And my initial testing shows that Node.js rejects all of (1)-(4) currently, whereas e.g. Python's I also think that having (5) mismatch the specs is unfortunate, but on the other hand, I didn't know that (5) existed before today. I agree per specs (6) should exclude control characters. |
|
+1 as semver major |
I don't think that's an accurate characterization. Browsers have standardized on relaxed rules, agreed. RFC 9110 though still clearly says control characters are invalid and excludes them in the grammar, but just says they may be retained on receipt under specific circumstances when you know it's safe (which we don't). It doesn't say anything at all in favor of sending these values being valid. I also can't see any other shifts in the ecosystem outside browsers to actively broaden support either. Did some quick testing today: For receiving inbound request header values containing 0x1:
For sending outbound requests with header values containing 0x1:
I'd read that as a fairly mixed split, with more modern & backend-focused runtimes (Go, Rust) generally leaning stricter. It is definitely true that some big chunks of the HTTP ecosystem will reject traffic entirely if you include these values in your headers, and the RFC says they're invalid, so I do think it's sensible for Node to stop people from sending these by default. I'm open to supporting this behind an option though. Imo I don't think I think the only use case for this is maximum compatibility when handling arbitrary traffic, right? That's useful if you're acting as a general-purpose client (a browser, curl, etc) or HTTP proxy (I maintain one of these - I will use this myself!) but it's usually not what you want if you're implementing a backend, any kind of reverse proxy/gateway, or scripting. In those cases you roughly know or control the traffic you expect, and usually want be strict & get errors if you're sending invalid HTTP rather than trying to be maximally flexible. Imo most Node.js use cases fall in the latter category. In terms of subsystem scenarios as described above, 2 & 3 (inbound message parsing) are covered by Thinking through the use cases, if you're handling arbitrary traffic and aiming to maximize compatibility, I think you are almost always going to want to enable this alongside |
Add support for relaxed HTTP header value validation when using the `insecureHTTPParser` option. This extends the existing option (which already relaxes inbound HTTP parsing) to also relax outbound header value validation. By default, header values are validated strictly per RFC 7230/9110, rejecting control characters (0x00-0x1f except HTAB) and DEL (0x7f). When `insecureHTTPParser: true` is set on a request/response, or `--insecure-http-parser` flag is used globally, header values are validated per Fetch spec rules, only rejecting NUL (0x00), CR (0x0d), LF (0x0a), and characters > 0xff. This allows Node.js to interoperate with servers/clients that use control characters in header values while maintaining security by always rejecting CR/LF (response splitting) and NUL characters. Refs: nodejs#61582 Refs: https://fetch.spec.whatwg.org/#header-value
bf3c5a1 to
41ebd2a
Compare
|
Updated the implementation based on @pimterry's feedback: Changes made:
The implementation reuses the existing |
I generally disagree with your post, but I disagree with this part most emphatically. I don't think we should couple client and server behavior. I don't think you should make a program switch to a completely different HTTP parser, which has a lot of side effects, just to get spec-compliant behavior. And especially you should not make a program switch to one marked "insecure".
This is not accurate. It's per RFC 7230, but not RFC 9110. (Which obsoletes RFC 7230.) |
|
Thanks @domenic for the feedback! I see there's a disagreement on the approach:
I'm happy to implement either approach. Should I:
Also, I'll correct the comment about RFC 7230 vs 9110 - you're right that RFC 9110 obsoletes 7230 and has different (more relaxed) rules. |
RFC 9110 obsoletes RFC 7230 and has more relaxed rules. The strict validation follows RFC 7230 rules, not RFC 9110.
It is not spec compliant to send these characters in general. RFC 9110 says Fetch & browsers intentionally ignore this, and that's a specific decision under different constraints. I agree we need to manage compat with that better, it is important, but it's not the same context as Node's defaults.
The current option already affects both servers & clients (parsing of client requests & server responses, respectively). Using the existing insecure option for this new behaviour too would add coupling, but between request & response handling on a single exchange: i.e. if you can send If we create a new option for this behaviour, and you want to accept control chars in headers in both directions, to fully match fetch, right now you'd need to use two options: http.request('https://example.com', {
relaxedHeaderValidation: true, // New option: let me send 0x1 in request headers
insecureHTTPParser: true // Existing option: Let me receive 0x1 in response headers
});That seems like poor UX to me, unless there's a good reason you'd want to support this in only one way? I do agree it would be nice to avoid enabling other insecure behaviour though just to read these header values. Option C: We make a new parser option in llhttp to cover these header values specifically, and then expose this in Node separately, to split behaviour into 'insecure parsing' (full backward compat, more risky, actively discouraged), 'relaxed parsing' (=fetch-compatible, broadly accepted as secure AFAIK, but not fully compliant), and 'strict parsing' (strictly RFC 9110 compliant, the current default), and then have a single option which enables this relaxed parsing/validation in both directions, without insecure parsing. E.g. It's a pain to have a whole separate mode, but the llhttp code supports separate flags to finely control these behaviours internally, and exposing a middle-ground mode in Node here could be a nice way to help encourage more people away from insecure parsing, if it covers lots of use cases in a safer way. (For reference, what exactly the insecure flag currently enables is listed here -
@RajeshKumar11 Thanks for your work here, sorry this has got a bit tied up in debate! If I were you I'd hold off on more changes for a minute, and let's see where we end up on the discussion here. I'm hoping somebody else from @nodejs/http or @nodejs/undici may want to chime in at some point as well to help drive a clearer consensus in one direction or the other.
@domenic Maybe our long GitHub comments aren't the best discussion format. Do you want to discuss more directly? Happy to chat: https://cal.com/httptoolkit-tim/ |
This PR adds support for relaxed HTTP header value validation via the existing
insecureHTTPParseroption. This addresses the use case where Node.js needs to interoperate with servers/clients that use control characters in header values (per Fetch spec).Behavior
Default (Strict) - per RFC 7230/9110
Header values are validated strictly, rejecting:
0x00-0x1f(except HTAB0x09)0x7f0xffWith
insecureHTTPParser: true(Lenient) - per Fetch specHeader values are validated leniently, only rejecting:
0x00(NUL)0x0a(LF)0x0d(CR)0xffThis allows control characters like
0x01-0x08,0x0b-0x0c,0x0e-0x1f, and0x7fwhen the option is enabled.Usage
What stays strict (regardless of option)
Per @pimterry's feedback:
http.validateHeaderValue()- The public API remains strictSecurity Consideration
This change is safe because:
0x0d) and LF (0x0a) are always rejected, even in lenient mode0x00is rejected in both modesinsecureHTTPParserto get lenient behaviorChanges
lib/_http_common.js: Added strict and lenient regex patterns,checkInvalidHeaderCharnow accepts optionallenientparameterlib/_http_outgoing.js: Added_isLenientHeaderValidation()method, updatedsetHeader,appendHeader,addTrailersto use lenient validation when enabledtest/parallel/test-http-invalidheaderfield2.js: Updated to test both strict and lenient modesRefs: #61582
Refs: https://fetch.spec.whatwg.org/#header-value