bpo-36274: Encode request lines with surrogate escapes#12315
Conversation
There was a problem hiding this comment.
I believe there is too much duplication in the unit tests for both of your pull requests. Please use a loop or two (like what is done in test_invalid_headers()).
There was a problem hiding this comment.
Sure; done. Any preference on which approach to take, though?
While this is out of spec according to RFC 7230 (which limits
expected octets to some subset of ASCII), it is often useful to
be able to mimic an out-of-spec client when testing a server or
application.
Don't use Latin-1 (though that would be in keeping with how we
handle headers and bodies) to encourage callers to write
RFC-complient clients. Rather, use surrogate escape sequences
('\udc80' - '\udcff') to increase friction while still
allowing out-of-spec requests to be expressable.
https://bugs.python.org/issue36274
d57b342 to
0573040
Compare
jaraco
left a comment
There was a problem hiding this comment.
This change looks good to me and is my preferred option from those presented. The only thing I'd like to see is some explanation in the tests linking to the justification (already made in the issue tracker).
|
After reviewing this request with @ericsnowcurrently, we've decided that this approach is dangerous in that it has the potential to expose users unexpectedly to non-compliant behavior, where as currently they are assured compliance. In particular, if a user had input from a source where it was surrogate-escaped non-ascii unicode, the request would currently be rejected but now will be accepted. Instead, we would like to see a more explicit opt-in, such as through a separate method or through a setting on the call and/or client object. As a result, I'll be closing this PR ad the alternate and will follow up in the bug. |
While this is out of spec according to RFC 7230 (which limits
expected octets to some subset of ASCII), it is often useful to
be able to mimic an out-of-spec client when testing a server or
application.
Don't use Latin-1 (though that would be in keeping with how we
handle headers and bodies) to encourage callers to write
RFC-complient clients. Rather, use surrogate escape sequences
('\udc80' - '\udcff') to increase friction while still
allowing out-of-spec requests to be expressable.
This is the second fix proposed in the bug report; the first was submitted as #12314 so reviewers can decide between fixes.
https://bugs.python.org/issue36274
https://bugs.python.org/issue36274