Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codec Vectors #12

Merged
merged 15 commits into from
Sep 18, 2024
Merged

Codec Vectors #12

merged 15 commits into from
Sep 18, 2024

Conversation

davxy
Copy link
Contributor

@davxy davxy commented Aug 30, 2024

README first

  • ASN.1 schema
  • JAM compact integers codec

Once successfully processed by some of the teams, the codec will be applied to:

Partially address #10

@davxy davxy marked this pull request as ready for review August 30, 2024 16:35
@vekexasia
Copy link

I believe the disputes_exstrinsic.json test is wrong.

the encoded value starts with
02dd1b65c036547750d2f84ff4c6fac7de56944658530a62e81c6cc290087440d00300000001
mine is
02dd1b65c036547750d2f84ff4c6fac7de56944658530a62e81c6cc290087440d00301

that 1 is/shouldbe the vote which by formula 277 is encoded with generic E.

According to 272, x being between 0 and 2^(7) means that l` is 0 hence (2^8 - 2^(8-0) + 1/1 = 1

looks like E_4 was applied instead of E

@davxy
Copy link
Contributor Author

davxy commented Aug 31, 2024

@vekexasia 03000000 is the verdict "epoch index" (in json "age", which is an u32).
Follows the bool which is encoded as 01 as expected. Looks like in your string the epoch index in instead encoded as just 03 (a u8).

I think eq. 277 encoding of a must be explicitly specified as E_4 (gavofyork/graypaper#67)

From (272)

Note that at present this is utilized only in encoding the length prefix of variable-length sequences

Which make sense.

@vekexasia
Copy link

vekexasia commented Aug 31, 2024

ha! yes sorry the a component has nothing specified so I assumed E not E_4 . but indeed it makes sense to constantly encode it as E_4

@vekexasia
Copy link

Also I think i found another 2 issues while tesitng assurances.

  • First of all it seems you're using E_2 instead of E when encoding validatorIndex. the codec for Ea is not formally specified hence I believe that we should use E or formally write the codec in the paper. To be honest I would just stick to E even if _2 is more than enough for jam.

  • Are you testing the decoding from bin? because (276) has no intrinsic length discriminator. this means that either we specify Ea codec to include a discriminator on the bitsequence length or I find it impossible to sequentially decode the extrinsic unless we use a bottom-up approach which I think is not the ideal solution.

@davxy
Copy link
Contributor Author

davxy commented Aug 31, 2024

@vekexasia

First of all it seems you're using $E_2$ instead of $E$

This situation is analogous to the previous one. I align with the guidance provided in the GP, just below equation 272, which I report here:

"Note that at present this is utilized only in encoding the length prefix of variable-length sequences."

Introducing variable-length encoding for arbitrary fields (e.g., using $E$ for some u16 fields and $E_2$ for others) is a bit of a trap for implementers and may lead to interoperability issues.

While it's true that, the given vector has at most $V$ entries, this approach could save a small amount of bytes (less than $V$), I question whether such a minor gain is truly worth. In my opinion, it may not be. However, I acknowledge that my perspective may not be definitive in this matter.

For now, I'll adhere to the guidance provided just below equation 272. Should it become a requirement, ideally with a clear rationale, I will gladly adjust the vectors accordingly. :-)

If is not required then maybe is the case to make it explicit in the GP.

cc @gavofyork ^^^


Are you testing the decoding from bin?

If you are referring to $f \in \mathbb{B}_C$ (eq. 123), this is a fixed-length bit sequence, so there is no need for a length discriminator.

In the proposed vectors, C is 2. I've updated the README with some notes about this subject

@vekexasia
Copy link

From (272)

Note that at present this is utilized only in encoding the length prefix of variable-length sequences

oh my. I'm sorry I must have missed this from your previous statement. I must also have missed it from the graypaper when I implemented the codecs a while ago.

Ultimately it makes sense to have a stricter codec ( Eg: E2 instead of E) but then I believe it should be explicit in the paper.

For example it makes a lot of sense to use E2 for the validatorIndex but leaving it open to the developer implementation should be avoided.

I believe one of the reasons the graypaper exists is to have a formal specification and avoid relying on "pseudocode" or reference implementation.

@vekexasia
Copy link

vekexasia commented Aug 31, 2024

Question.

https://github.com/davxy/jam-test-vectors/blob/codec-vectors/codec/data/work_result_1.json#L1-L9

I see you set gas_ratio < 0 but if i am not mistaken g in L defined in (121) forbids it to be negative being in NG which is an alias of N264.

I know the readme states that values may conflict but if my previous statement above is right, the codec should also forbid encoding negative values:

(282):
image

(271):
image

@davxy
Copy link
Contributor Author

davxy commented Sep 1, 2024

@vekexasia fixed

@vekexasia
Copy link

hey @davxy looks like the work report (Set W) is also wrongly serialized.

According to (283) first entry should be xa while it seems your binary has xs as first entry

@davxy
Copy link
Contributor Author

davxy commented Sep 1, 2024

@vekexasia this has been reported here #9 (comment)

see this gavofyork/graypaper#67 for a modification proposal to the GP

codec/schema.asn Outdated Show resolved Hide resolved
[
{
"anchor": "0x0cffbf67aae50aeed3c6f8f0d9bf7d854ffd87cef8358cbbaa587a9e3bd1a776",
"bitfield": "0x01",
Copy link

@emielsebastiaan emielsebastiaan Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd like to be more explicit about the actual value of the bitsequence bitfield (and not the octet representation, I'd change this JSON-testvector to show the following value. The size of the fixed length bitsequence is set to 2 (GP_constant_C). The suggested change below avoids any confusion about its length. That said, this is not really necessary, since the byte representation is also correct (and JSON will eventually not be used).

"bitfield": [
    true,
    false
],

@boymaas
Copy link

boymaas commented Sep 13, 2024

@davxy I have a question regarding the header binary test vectors: I'm curious as to why we use four bytes to encode an integer as defined in (272) when a single byte would suffice. We can determine the length by reading the prefix, even though it's defined as a u32 we can pack it in one byte. Isn't it the purpose of (272) to make it as compact as possible?

image

@boymaas
Copy link

boymaas commented Sep 13, 2024

Another question concerning the validator_count: I assume that this count can change under certain conditions. Would it be more practical to make these validators variable-length and prefix them?

image

@davxy
Copy link
Contributor Author

davxy commented Sep 13, 2024

@boymaas The reasons are design decisions, so you should eventually raise your concerns in the graypaper.
The test vectors just follows what prescribed by the paper.

In any case, IMHO, it is better to maintain a uniform design for u16/u32 encoding in the protocol rather than saving 1 byte here and there by introducing pitfalls related to variable-length encoding.

@xlc
Copy link
Contributor

xlc commented Sep 16, 2024

This should be good to merge?

@davxy
Copy link
Contributor Author

davxy commented Sep 16, 2024

@xlc As far as I'm concerned, yes, I have nothing to add to this PR, and we haven't received any pushback so far.
However, I do not have the permissions to perform the merge myself.
I will proceed with regenerating the other vectors using the new compact codec.

@davxy
Copy link
Contributor Author

davxy commented Sep 16, 2024

Now is ready :-)

@gavofyork gavofyork merged commit 7a96598 into w3f:master Sep 18, 2024
1 check passed
@davxy davxy deleted the codec-vectors branch September 18, 2024 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants