Skip to content
This repository has been archived by the owner on Dec 26, 2023. It is now read-only.

SMIP: network id fixed-size parameter, miner id to contain network id #26

Closed
moshababo opened this issue Aug 31, 2020 · 18 comments
Closed
Assignees

Comments

@moshababo
Copy link

moshababo commented Aug 31, 2020

Motivation

  1. The network id is currently defined by 1 byte value. This is limiting the range of possible values to 255, which can cause unintentional collisions between different networks. This should be changed to 4 bytes, so that network_id=hash(network params).
    TBD: the hash function and the encoding format for the input

  2. The miner id is currently a public key of ED25519 key pair. Multiple networks can potentially have a similar miner id. To avoid various issues associated with that, such as the replay attack, the network id should be part of the miner id, so that miner_id=hash(public key, network id), sized with 20 bytes.
    TBD: the hash function

@avive
Copy link
Contributor

avive commented Aug 31, 2020

Q: is it really necessary to add a hash of genesis params to achieve this goal of avoiding collisions?

Isn't it satisfactory just to add 1 byte to the current notion of an integer net id to avoid unintentional collisions in a 65,536 unique ids space? I think that @lrettig and @noamnelke proposed this - there might be an issue about it open as well. Since the network id is likely to be added to p2p messages to avoid peers from different networks to talk with each other, it is desirable to keep them small and this is simpler to use, for example in transaction signing (see #23 where I recently added net id as part of what is signed and verified per tx w/o adding to tx binary payload size). This is also much easier to debug as you can easily output the net id from a p2p message without having to compare hashes to figure out what network is this message from. If I see a p2p message with a hash then I can't tell what is the net id of the network of this message w/o having the genesis params or their hash of all networks and I might not have that info - how do I know about all running networks w/o a centralized table of some sorts?

We can still keep the net id 2 bytes int and have the miner id be set to hash of (pub_key, net_id) to individualize it per network or if there's a security concern have it equal to hash(pub_key, net_id, hash_of_geneiss_params).

@lrettig
Copy link
Member

lrettig commented Aug 31, 2020

See also spacemeshos/go-spacemesh#2113.

This should be changed to 32 bytes, so that network_id=hash(network params).

I like this, especially if network params includes genesis time. It seems safe - at the risk of increasing the size of every tx that includes it.

@avive
Copy link
Contributor

avive commented Aug 31, 2020

it doesn't have to increase the transaction size - Please see #23 - we can do it without adding the net id to the tx binary payload. It is @noamnelke idea which I put down into words in the smip but it is added to every p2p message... Curious to hear some of the lessons of only using an int from eth 1.0... @lrettig

@moshababo
Copy link
Author

I got confused between 32 bits and 32 bytes. My suggestion is to use 4 bytes, and have the (partial) hash output to create a "magic value", similarly to Bitcoin. I guess reducing it to 2 bytes would be fine, if it's really necessary.

@avive
Copy link
Contributor

avive commented Sep 1, 2020

Either way - let's please try to finalize this quickly so we can finalize the tx signing and verification spec for 0.2.

@avive
Copy link
Contributor

avive commented Sep 1, 2020

So to formalize the proposal a bit. How about something like this?

  • bytes genesis_params - bson binary serialization of genesis params json data (data from a genesis params json config file for a specific network).
  • bytes[4] magic_value = head(4, sha256(genesis_params))))
  • bytes[32] miner_id = sha256(miner_pub_key, magic_value)

So p2p messages include the magic value so receivers of messages in full nodes can drop them if they are not on the node's network. The 4 bytes magic value will also be added to transaction binary payload before signing transactions and will be added by transaction signature verifier.

Does this mean that we don't need the notion of an integer net id in network params? It might be useful to address a network in a unique way instead of a random long number but perhaps adding a nickname string in the params file will be sufficient for this?

Also - is it okay to use bson to create binary network params or some other method is better? I think we want the hash to be a hash of the content of a config file so it can be easily verified and computed. Maybe bson is not needed and it is sufficient to provide a reference json file for a network and hash the utf-8 encoding of its text?

@tal-m
Copy link

tal-m commented Sep 1, 2020

I don't understand why we want such a small netid -- a full-length id (20 bytes is fine) that is the hash of the genesis parameters (including genesis timestamp) sounds much more robust, and I don't see any downsides. On the other hand, even if we can't think of specific attacks with a short id, it seems like it's asking for trouble.

I don't think we want the hash to be of an ascii representation, again, there's room for weird errors (such as whitespace, etc.) --- How about we use an XDR encoding of the parameters as they are parsed by the node? (we settled on XDR as the canonical serialization format for cryptographic verification, right?)

Regarding magic numbers -- why do we need a magic number for every message? we're using the netid in the signature verification in any case, so we'll detect a peer using a different netid when the signature verification fails. The magic number doesn't help against DoS attacks since a malicious user can always use the "right" magic number and force the honest party to perform the signature check. Is there another reason for a magic number?

@avive
Copy link
Contributor

avive commented Sep 1, 2020

I don't think we want the hash to be of an ascii representation, again, there's room for weird errors (such as whitespace, etc.) --- How about we use an XDR encoding of the parameters as they are parsed by the node? (we settled on XDR as the canonical serialization format for cryptographic verification, right?)

this is why I proposed bson above which is binary json serialization spec, so white spaces are ignored and it is much simpler to encode / decode. I think that this is simpler than having to rely on the node to use xdr encoding on the parsed data - I don't think it is parsed in one place like that but I may be wrong.

Regarding magic numbers -- why do we need a magic number for every message? we're using the netid in the signature verification in any case, so we'll detect a peer using a different netid when the signature verification fails. The magic number doesn't help against DoS attacks since a malicious user can always use the "right" magic number and force the honest party to perform the signature check. Is there another reason for a magic number

I think we discussed this previously and said we want it to make sure that messages don't accidentally cross networks which can cause lots of issues and bugs. It is just a safety layer. I'm not sure if signature verification is good enough. I know @noamnelke and @y0sher talked about this in the past with @iddo333 involved as well. @y0sher - what do you think?

@tal-m
Copy link

tal-m commented Sep 1, 2020

Regarding bson vs XDR -- I think it's a good idea to have a single serialization format for input to cryptographic operations. Basically, I want to be able to have a simple rule of thumb of the sort "if you're verifying a signature, the data is first serialized using format X and then hashed using algorithm Y". Since we're already using XDR everywhere else, why not use it here too?

If "accidental" cross network traffic causes issues, it indicates our code is currently too brittle. Our system needs to tolerate malicious traffic, so a "patch" that only deals with honest misconfigurations (like a one or two byte netid) will just hide underlying security/dos bugs.

@avive avive closed this as completed Sep 1, 2020
@avive avive reopened this Sep 1, 2020
@lrettig
Copy link
Member

lrettig commented Sep 1, 2020

Does this mean that we don't need the notion of an integer net id in network params? It might be useful to address a network in a unique way instead of a random long number but perhaps adding a nickname string in the params file will be sufficient for this?

I don't see what value an integer net id gives us if we already have something more robust, as we're discussing here. People are free to call their networks by whatever name or number they want. The hash is the real, unique identifier. There is of course no way for us to enforce that every network has a unique ID and that no two networks use the same ID, but I think including genesis time in the hash gets us pretty close.

I agree with baking cross-network replay protection into every message using this ID. This makes forks much less dire, which is definitely a design goal that we should aim for. I'm agnostic as to whether this happens as part of the signature verification, or external to it.

I also agree with @tal-m's concerns regarding serialization. I think there is a single, top-level config object that we could pretty easily serialize using XDR.

@lrettig
Copy link
Member

lrettig commented Sep 1, 2020

If I see a p2p message with a hash then I can't tell what is the net id of the network of this message w/o having the genesis params or their hash of all networks and I might not have that info - how do I know about all running networks w/o a centralized table of some sorts?

I don't think this matters. It suffices that you can tell whether a message is intended for the network that you're connected to, or not.

I assume this will also be a part of the P2P handshake process - is that schema documented somewhere?

@moshababo
Copy link
Author

Following our research meeting yesterday:

  • The network id hash input should contain only the genesis params, and not the entire network params, which contains configurable params as well.
  • network id of 4 bytes would make it easy to work with as numeric value, but @tal-m suggested to use 20 bytes for better security.
  • The network id should be advertised by nodes during the handshake process whenever establishing a secure communication channel. It should not be added to the following p2p messages.
  • The network id should be added to the transaction signing input as defined in SMIP: Spacemesh Transactions Syntax and Binary Encoding #23. It should be added at the beginning of the input and not at the end, so that potential optimization of reusing the hash function state would be possible.
  • The genesis params hash input binary presentation should be encoded via XDR. JSON config files can still be used to allow maximum flexibility, and should be parsed into the in-memory presentation before being encoded via XDR.
  • @iddo333 should provide the recommended size for the miner id.

@avive
Copy link
Contributor

avive commented Sep 9, 2020

Following our research meeting yesterday:

  • The network id hash input should contain only the genesis params, and not the entire network params, which contains configurable params as well.
  • network id of 4 bytes would make it easy to work with as numeric value, but @tal-m suggested to use 20 bytes for better security.
  • The network id should be advertised by nodes during the handshake process whenever establishing a secure communication channel. It should not be added to the following p2p messages.
  • The network id should be added to the transaction signing input as defined in SMIP: Spacemesh Transactions Syntax and Binary Encoding #23. It should be added at the beginning of the input and not at the end, so that potential optimization of reusing the hash function state would be possible.
  • The genesis params hash input binary presentation should be encoded via XDR. JSON config files can still be used to allow maximum flexibility, and should be parsed into the in-memory presentation before being encoded via XDR.
  • @iddo333 should provide the recommended size for the miner id.

Good summary!

  • Can we also finalize miner_id. e.g. bytes[32] miner_id = sha256(miner_pub_key, net_id) ?
  • I will update SMIP: Spacemesh Transactions Syntax and Binary Encoding #23 specs to use net-id as prefix and not suffix.
  • I think we said that genesis params should include at least genesis time-stamp and genesis vaults data.

@moshababo
Copy link
Author

  • Can we also finalize miner_id. e.g. bytes[32] miner_id = sha256(miner_pub_key, net_id) ?

We'll wait for @iddo333 input on this.

@lrettig
Copy link
Member

lrettig commented Sep 9, 2020

should be parsed into the in-memory presentation before being encoded via XDR.

Or we could require that the params be compiled into binary format (using a separate tool) before launching, as @tal-m proposed.

@tal-m suggested to use 20 bytes for better security.

If I understood correctly, Tal's reasoning here was to make intentional collisions harder, although that feels a bit like overkill to me

The network id hash input should contain only the genesis params, and not the entire network params, which contains configurable params as well.

I think we should split the existing node config object into two sections to facilitate this: genesis params (fixed for all time for a given network) and configurable params (e.g., log levels)

@avive
Copy link
Contributor

avive commented Sep 10, 2020

Yes, there should be 1 genesis config file which includes - genesis time, genesis accounts and vaults, friendly network name, e.g. TweedleDee, and network semantic version (useful for wallets to choose which network to connect to based on this config file and we start using semantic versioning for testnets and devnets) and all immutable network param. e.g. hare committee size. The genesis file should be sufficient to connect to a network without having to provide a config file. The config file is optional and just overrides values of configurable node params which are mutable for a network such as log levels. We should provide reasonable defaults for these so most users will be able to join by providing only the genesis config file.

I propose to call the first file 'network params file' and the second file 'configuration file'.

I detail all of these as this is highly relevant to new wallet flows we are building into smapp such as being able to switch between networks (something we want to do), prepare a wallet for mainent pre genesis (important use case for vesting wallets) and ,more....

@avive
Copy link
Contributor

avive commented Sep 12, 2020

It is useful to be able to use same compiled node binary for different networks by just providing a different network params file and without having to build a node specifically for a network because this is much more modular and configurable. Our current flow of having devnets and testnets from a binary node release with a specific set of feature is using this. I also don't understand what benefits (if any) computing the net id hash at build time and hard-coding it to a node binary has or how this is related to hash collisions prevention as anyone can build a node with different network params - it is just a bit more cumbersome. What are the benefits of hard-coding a net-id into a binary over doing this at runtime via config?

@lrettig
Copy link
Member

lrettig commented Jan 9, 2022

Closing in favor of #75

@lrettig lrettig closed this as completed Jan 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants