-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of indices may leak information #29
Comments
@manicprogrammer, thanks for raising it, I had identified this before, however it slipped my mind to add a note in the privacy considerations section of this specification. I would probably further clarify the issue you raise with the following. The feasibility of a verifier discerning additional information about the un-revealed statements in a proof, based on the revealed indicies and total statements, relies on the following assumptions.
In summary it would be great to add this to the privacy considerations section of the specification. Also because what you have cited only occurs in very particular cases (i.e the criteria outlined above, something we should continue to clarify), do you mind updating the title of the issue to, use of indicies MAY leak information. Because it's not a forgone conclusion or absolute that the use of indicies will always leak information. |
Of course. I added Again, if this is outside the privacy guarantees of the scheme then that is cool but it seems it would be highly limiting to the utility of the scheme. If the scheme itself is not to be robust against potentially leaking non-disclosed meta-data then no need to read further. Every scheme has it's strengths and weaknesses and it's bounds of operations. But if it is... As for 1) if reliance on not-leaking data is based on the verifier not knowing the form of the original data then this is a problem for many domains of use. It can be that this scheme is not a candidate for those domains. I see the world as where the verifier almost certainly knows the potential shape of the original data in order to know what data to request and the context around it. For 2) It certainly affects more than arrays of information. Well shaped data can minimize leakage but a reliance for the scheme to not leak being made on a flawless data shape regardless of how it is selective disclosed is going to be repeated failures by implementers. Especially if the implementer needs to meet both an unknown shape (no. 1) and a maximally safe shape (no. 2). People will screw it up. I know I will. Protecting against this is hard to do even if you don't provide the indices due to potential dependencies between statements but that occurs between related statements where the shape analysis can yield fully unrelated data. In regards to 3) obviously an index itself doesn't reveal the content at that index. The knowledge of the existence of a statement at a defined position can tell a lot of information someone did not wish to disclose and in the simple example provide significant data about something they active chose not to ** disclose. My example was specific to indicate the outcome. There are lots of data context and shapes of data that may not leak additional data but there are lots of data context and shapes where it will. As long as the indices are provided you are guaranteed to provide beyond the disclosed data the existence of non-disclosed data and it's positioning and may leak other significant data that can be collected from that (even more so if looking at a large set of claims from a common issuer). Again, if this is outside the privacy guarantees of the scheme then that is cool I just wanted to ensure it was recognized and found not pertinent or made visible if pertinent. |
I feel like I might be missing something but would any of this be resolved by making all fields mandatory of schemas using this kinda proof? Is there something about why that is semantically incorrect? Would it help here? |
@manicprogrammer, thanks and also to clarify, I did not mean my comment to give the impression that the point you raise is in anyway insignificant or not valid, it is important for us to make sure that these limitations and considerations are known and highlighted in the spec.
To an extent, as is the case with any technology there are a recognised set of boundaries that it can provide within and I think eliminating all possible unwanted information disclosure through the mechanism you describe would be difficult to achieve.
I would say to be able to discern information like in your example, it requires deeper knowledge then knowing just the schema of the information (which I agree would probably be quite trivial in many cases). Instead, much of the ordering of the resulting
Yes I would agree that there are similar possible leakages that could occur for representations other than arrays, for example purely knowing whether a value is present in an assertion or not, however I think arrays are the more interesting case.
I think perhaps this point was a little lost, what I was meaning to say is that not all proofs on the example you gave would leak the information. For example say I only revealed my |
@NickDarvey you could, however this would cause a bloat issue and add in complexity in the issuance and presentation protocols where the complexity would defeat the value being pursued. For example say you are signing an array, what number of elements do you choose to sign as a default value that when populated in other assertions doesn't lead to the leakage outlined? |
Ahh I didn't realize you could selectively disclose items in an array. Understood! |
All good. My goal in this issue is to ensure some limitations, assumptions, and boundaries are recognized to be mitigated or documented for the scheme. The goal of the issue is not to ensure they don't exist. No scheme has perfect everything; there are always trade-offs and why different schemes for different purposes are needed. |
Is the resolution action for this issue to be that the assumptions (such as explicitly providing shape information and existence of non-disclosed data is within the implementers data handling constraints) and warnings/limitations (whatever word is proper) about how leakage can occur? |
To clarify just in case it is not clear- when I say "The goal of the issue is not to ensure they don't exist." It's not the goal but I clearly think it is a significant limitation in privacy assurances if the scheme is formed in this way and could mute any ability for it to be adopted for anything but the most narrow scopes. |
@manicprogrammer I'm more than open to suggestions that would help to improve the scheme in this aspect, provided they do not impose an exponential increase in the complexity of the scheme. I do think this issue is highlighting at the very least that a section in the spec must be devoted to describing a set of considerations around this topic and potential mitigants that can be used, such as how the information is represented. Do you have any potential solutions or are you aware of any other selective disclosure approaches that do not suffer from this problem? |
Also as a side note for those interested in this issue, you may have noticed #30 removing the fields |
@tplooker I'll look at the code which I have not dug into fully. I suspect you would recognize any reasonable means of doing that already and it's just a trade-off choice between size, complexity and other parameters. So, NO, I don't have a ready proposal for a solution or even know if there is one that would meet the other requirements. It would almost certainly be more verbose and much a much larger proof. Other redacted signature schemes I have looked at in order to address this scenario have a witness/proof per statement- so you get a linear size increase with each revealed statement - so you could easily add 1200 bits per revealed statement- not what you are looking for here. I was hoping that what I was going to find here through an implementation and the use of the bi-linear curve/map signatures that it would be the best of both worlds- a succinct single proof that didn't expose the existence of redacted statements. There might be a mechanism to do that in this scheme with some tweaks and there might not - but if the purpose of the scheme is not to be concerned with that then there is no need to try and derive it. I am not wanting to push this scheme to a different purpose. |
As a final follow up on this I did review the pertinent parts of the code and after also reviewing the Rust BBS crate documentation and the papers it is based on - as you know it's inherent in the BBS+ implementation in use to take as part of the key construction a value for the discrete number of messages that key will sign and the BBS+ implementation further requires the verifier to have knowledge of the messages indices to do verification. So, as @tplooker mentioned, outside of something really major that potential leakage is just the nature of this specific approach. We can't have it all. :-) |
I suggest we close this and focus on adding security consideration / guidance, as I noted in #60 |
Pending close for 1 week, no objections, closing |
The use of indices for Revealed Indices and the TotalStatements value leaks information.
This leakage removes or undermines any zero knowledge claims. It may not be the intent of this specification to minimize leakage and if so this issue can be disregarded.
The use of the Revealed Indices leaks information- it proves not only that information was not disclosed but in some cases what information was not exposed and in more significant cases may leak implicit values.
Take for instance the following two snippet of an attestation on two different subjects
Subject A
Subject B
if each subject wished to reveal only maritalStatus and employmentType the below is how each subject's proof might look:
Subject A
Subject B
I just leaked out in a common structure that the Subject A has 2 dependents and Subject B has 0 dependents.
If I know by convention or strict schema employmentType is followed by dependents, if exist, then followed by marital status for a type of credential or a credential from a given issuer then I know by the indices the number of dependents though they did not wish to reveal that information.
Even if the dependents statements were deeper set of other statements normalized into some variable set of statements I can differentiate between subjects with 0 dependents and subjects with > 0 dependents.
There are a multitude of ways and variations on this theme of how an expected or known ordinal listing will leak undesired information.
If the intent is not to leak undisclosed information then the proof must be structured in a way so that not only is the redacted data not disclosed but that it is not shown that any other data existed.
The text was updated successfully, but these errors were encountered: