Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for zero-copy reading #29

Open
mateuszj opened this issue Sep 6, 2016 · 4 comments
Open

Add support for zero-copy reading #29

mateuszj opened this issue Sep 6, 2016 · 4 comments

Comments

@mateuszj
Copy link

mateuszj commented Sep 6, 2016

As I see, currently to read a huge binary data, they need to be written to the buffer by reading function. What I would like to have is to get a pointer to memory-mapped binary instead of providing a buffer that will be eventually filled with binary data by reader function.

@antoinealb
Copy link
Contributor

I think this might be hard to achieve due to the stream-oriented API of cmp. The data might not always come from a memory buffer but could be streamed over network, be read from a file, etc. If you need such functions I suggest implementing them at the reader level instead.

@camgunz
Copy link
Owner

camgunz commented Sep 12, 2016

Hi!

I see what you're saying. If you're reading individual members (like u8, s16, etc.) those all get copied. If you have a big binary blob in your data, you can just get a pointer to it like you normally would so it's not as bad as it could be, but it's still inefficient.

The reason I made that decision is that MessagePack stores its data in big-endian format. Converting data from big-endian to little-endian, you either convert the data in-place or you convert a copy of the data. As @antoinealb pointed out, CMP doesn't assume it can modify the data in-place, so it has to make a copy.

It might be possible to add zero-copy reading without modding the API, but it probably isn't. There are a number of potential solutions:

Add a zero-copy reader and a use_zero_copy field to cmp_ctx_t

Pros: Minimal API changes necessary
Cons: Enlarges the cmp_ctx_t struct and adds a branch to every read

Add a cmp_stream_ctx_t and cmp_buffer_ctx_t, #define the current API to the stream API

Pros: Fast, and the semantics are clear
Cons: Makes CMP much more complex, essentially requires doubling the API

I'm leaning towards the stream/buffer option because it's the best and the complexity can be managed pretty easily. It's also possible to handle endianness in a reasonable way at that point.

Unfortunately I can't put a time estimate on this, and I'd like to do it myself rather than take a PR. Though if some intrepid soul wants to submit one and put up with what will certainly be a lot of heartless criticism and nitpicking, I won't stop them :)

@leaveye
Copy link

leaveye commented Sep 14, 2016

I am facing the same issue. Thanks a lot for your bringing in the enhancement.

@camgunz
Copy link
Owner

camgunz commented May 8, 2017

Thinking about this some more, I think the main issue is that the code uses cmp_read_object for nearly everything, which reads the data using cmp.reader into a stack-allocated cmp_object_t, and then reads the data out of that into whatever the caller passed. The main reason I did it that way was to avoid mutating passed output parameters in the event of an error; so say for example you call cmp_read_ext and reading the type is fine, but reading the size fails. Well now your type output parameter has been mutated whereas the size output parameter hasn't.

Changing this is a subtle difference in behavior/semantics, but I think it's worth it: it's probably possible to nearly double CMP's speed by getting rid of this.

The other advantage is that it's then possible to implement an "unpacked buffer" on top of CMP where you directly decode a MessagePack encoded buffer into a different buffer while only copying the data once. This way I don't need to split the API or modify cmp_ctx_t, and we keep a buffer interface out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants