-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest new encoding scheme #2686
Comments
So far I have reviewed several options: It is a relatively simple non-self describing binary encoding, designed to replace RLP in ethereum2. Example: type Example struct {
Field1 uint32
Field2 [20]byte
Field3 []byte // dynamic byte array
Field4 [32]byte
} Encoding is a concantenation from: This is an optimization to support resource-restricted environments so that they can decode any field without allocating memory for the whole struct. The primary target is an EVM, so we may not need it. The pros of this optimization are questionable for go-spacemesh, as SVM (which may have potentially have used this optimization) declares its own encoding. Besides, basic types (fixed/dynamic byte slices, uint8-64, bool, structs) ssz supports:
Another part of ssz specifies a particular merkleization scheme together with an encoding, so that every object in the protocol can be hashed accordingly. There are libraries in many languages (go, rust, java, nim, python, js) that are actively supported because they are used in eth2 clients. I have used this one https://github.com/ferranbt/fastssz , it is efficient and well tested but doesn't implement the streaming interface and misses some minor things mostly related to style that we will need. It generally satisfies the properties that we are looking for. But as it is self-describing (includes tags in a manner similar to json) the length of the encoding is somewhat longer. So I decided not to pursue this option.
It shares a lot of similarities with ssz (excluding merkleization which we won't use). The most significant difference is that the offset is not used to encode dynamic length fields. So if we consider my example above, encoding with borsch will look like: Besides that, there are a few minor differences:
Libraries are implemented by near org: https://github.com/orgs/near/repositories?q=borsh&type=&language=&sort= |
Thanks for the summary! That's very helpful. Ideally, I think we would be using the same serialization library for any kind of serializing we do in our code, so if there's a single library that can support both our crypto requirements and SVM use-cases (including potential future use-cases) that is an advantage. For example, support for maps might come in under that category... Have you looked at Cap'n Proto? It seems that the latest versions define a Canonicalized encoding that would make it a viable option (unlike protobufs or flatbuffers, which explictly say they're not deterministic). |
I think that it would be good if we can use consistent encoding everywhere if we assume that multiple spacemesh clients may exist in the future. Based on available SMIPs SVM can use any of ssz or borsh, but according to @YaronWittenstein current SVM encoding is more compact. Maybe we can use it in go-spacemesh too.
In the previous project we considered using it and decided not to use it:
So, it introduces additional complexity without obvious benefits. For spacemesh use case, simple binary protocol seems to be a better choice (anything similar to ssz, borsh as they are both very straightforward) as it will be more error-proof, and needs to be concerned only with some security considerations from capnproto. |
After some thoughts, i am leaning towards borsh:
i made a lib for golang https://github.com/dshulyak/borsh , it provides a streaming interface and doesn't rely on reflection. for now it defines support for types that we currently use in the go-sm codebase. |
One more option is to stick with xdr, and rewrite the library using the code generation approach. I haven't looked into xdr details before, but it is actually not that different from borsh (xdr uses big endian instead of little endian for integers, but in general encoding schema is the same). UPD: actually xdr seems to have more overhead in comparison with borsh based on spacemeshos/SMIPS#23 |
The XDR library consumes an excessive amount of memory on parsing of some data: #3014 |
#2701 may be caused by a bug in the XDR library, I have no other ideas (needs to be confirmed) |
Description
Currently, XDR codec scheme is implemented in a way that is very CPU and memory intensive, causing nodes to allocate a lot of memory and eventually crash.
We want to explore more efficient codecs .
Codecs must meet the following pre requisites:
In addition to the "must haves" we want our chosen library to be CPU and memory efficient.
The text was updated successfully, but these errors were encountered: