-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation of MultipartFormData
streaming encoder
#200
base: main
Are you sure you want to change the base?
Conversation
591475c
to
1c57fa0
Compare
const response = await fetch("https://http-me.glitch.me/meow?header=cat:é");
strictEqual(response.headers.get('cat'), "é"); |
1c57fa0
to
655b57d
Compare
@andreiltd, is this ready to review, or are you still expecting to make changes to it? |
Yes, this is ready to review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either I'm missing something in looking at the diff, or there is some code missing—see the inline comment on form-data-encoder.h
.
One thing that I can't fully evaluate, because it'll presumably be part of the implementation of encode_stream
: IIUC, field names, will always be first have their newlines normalized to CRLF
, and then have those escaped. It'd be good to fold those into one operation, if possible.
Otherwise, I left a few comments and suggestions. I'm a bit concerned about allocation and general failure handling, so it'd be good to go over those aspects in some detail.
if (i + 1 < src.size() && src[i + 1] == LF) { | ||
len += newline_len; | ||
i++; | ||
} else { | ||
len += newline_len; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (i + 1 < src.size() && src[i + 1] == LF) { | |
len += newline_len; | |
i++; | |
} else { | |
len += newline_len; | |
} | |
len += newline_len; | |
if (i + 1 < src.size() && src[i + 1] == LF) { | |
i++; | |
} |
const char CR = '\r'; | ||
const char *CRLF = "\r\n"; | ||
|
||
size_t compute_normalized_len(std::string_view src, const char *newline) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like this is only ever called with CRLF
as the input for newline
. Given that, does it make sense to remove the newline
parameter and hardcode use of CRLF
instead? Chances are LLVM emits the same code anyway, but it'd be better not to have to rely on that.
// `%0A`, 0x0D (CR) with `%0D` and 0x22 (") with `%22`. | ||
// | ||
// https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#multipart-form-data | ||
std::optional<std::string> escape_newlines(std::string_view str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't just for newlines, it'd be good to give it a different name. Perhaps escape_name
, given that this applies escaping of a set of characters in names?
static size_t query_length(JSContext *cx, HandleObject self); | ||
static JSObject *encode_stream(JSContext *cx, HandleObject self); | ||
static JSObject *create(JSContext *cx, HandleObject form_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of these seems to be missing? (And in the case of query_length
I also can't find any uses.)
bool MultipartFormDataImpl::handle_entry_header(JSContext *cx, StreamContext &stream) { | ||
auto entry = stream.entries->begin()[chunk_idx_]; | ||
auto header = fmt::memory_buffer(); | ||
auto name = escape_newlines(entry.name).value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a null check?
RootedString type_str(cx, Blob::type(obj)); | ||
auto type = core::encode(cx, type_str); | ||
|
||
if (!filename || !type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename
check needs to happen right after the call to escape_newlines
: if that operation errors, we don't want to run any more fallible code.
// Hex encode bytes to string | ||
auto bytes = std::move(res.unwrap()); | ||
auto bytes_str = std::string_view((char *)(bytes.ptr.get()), bytes.size()); | ||
auto base64_str = base64::forgivingBase64Encode(bytes_str, base64::base64EncodeTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a null check?
return outbuf.size() - read; | ||
} | ||
|
||
template <typename I> size_t write(I first, I last) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this method is deliberately infallible? If so, could you document that fact?
|
||
bool is_draining() { return (file_leftovers_ || remainder_.size()); }; | ||
|
||
template <typename I> size_t write_and_cache_remainder(StreamContext &stream, I first, I last); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value isn't ever used, it seems. Maybe remove it?
auto leftover = datasz - written; | ||
if (leftover > 0) { | ||
MOZ_ASSERT(remainder_.empty()); | ||
remainder_.assign(first + written, last); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an infallible operation? It seems like it should need to allocate memory, so I'm not sure I understand how that works. And ideally, if it does allocate, we should make that fallible—or at least assert that it didn't fail and abort execution.
I did a quick comparison of encoded
FormData
between SM and Chrome using following JS code:Simple FormData
The results are:
I also did some manual fuzzing to test the streaming logic by varying the buffer sizes the encoder writes into. Specifically, I tested these buffer sizes in bytes: 1, 2, 4, 8, 32, 1024, and 8192 by changing the size in the implementation. Though, having some unit test framework would be nice.
Relevant RFC: https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1