Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a document describing key revocation #47

Merged
merged 9 commits into from
Jun 28, 2021

Conversation

mnm678
Copy link
Contributor

@mnm678 mnm678 commented Mar 2, 2021

This document lays out some of the options for key revocation that have been discussed. These options might eventually fit better as part of the key management document, but are posted separately for the sake of discussion.

This document may eventually be part of the key management
requirements. It describes a few common mechanisms for
key revocation.

Signed-off-by: Marina Moore <[email protected]>

One of the goals of Notary v2 is to build in solutions for key revocation that are easy to use and ensure that users will always use non-compromised keys. This document discusses some potential mechanisms for key revocation.

In existing systems, there are three main approaches to key revocation: automatic revocation through key expiration, key revocation lists, and distribution of trusted keys. I discuss some of the benefits and pitfalls of each of these techniques, and how some of these techniques are combined to provide a wholistic approach to key revocation in TUF.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In existing systems, there are three main approaches to key revocation: automatic revocation through key expiration, key revocation lists, and distribution of trusted keys. I discuss some of the benefits and pitfalls of each of these techniques, and how some of these techniques are combined to provide a wholistic approach to key revocation in TUF.
In existing systems, there are three main approaches to key revocation: automatic revocation through key expiration, key revocation lists, and distribution of trusted keys. I discuss some of the benefits and pitfalls of each of these techniques, and how some of these techniques are combined to provide a holistic approach to key revocation in TUF.


## Distribution of trusted keys

Instead of distributing untrusted keys, this method distributes a list of currently trusted keys. If a key needs to be revoked, it is removed from the list of trusted keys. This technique as the added benefit of ensuring that users have access to the new trusted key as soon as they learn of a revocation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Instead of distributing untrusted keys, this method distributes a list of currently trusted keys. If a key needs to be revoked, it is removed from the list of trusted keys. This technique as the added benefit of ensuring that users have access to the new trusted key as soon as they learn of a revocation.
Instead of distributing untrusted keys, this method distributes a list of currently trusted keys. If a key needs to be revoked, it is removed from the list of trusted keys. This technique has the added benefit of ensuring that users have access to the new trusted key as soon as they learn of a revocation.

Copy link
Contributor

@sudo-bmitch sudo-bmitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these options has pros and cons. Thinking through them:

Key Expiration: this has the advantage of being automatically enforced, even for disconnected environments. I'd question if this means everything the key has previously signed would need to be resigned. Fairly certain the answer is yes (otherwise an attacker could sign a malicious image for 10 years when they breach a key that only has 2 hours left until the certificate on the key expires). That would result in lots of re-signing of old images for every key rotation. Perhaps that could be made easier by having a single signature for a list of images (digests), rather than a separate signature for each individual image.

Revocation lists: while it's convenient that this has an immediate affect when the revocation is published, I'm seeing multiple downsides. In disconnected environments, that query may fail, or it may be sent to a mirror server the client in that disconnected environment is told to trust. A stale mirror in the disconnected environment could be used to send malicious images, though in those cases it's the client intentionally indicating they want to trust a mirror that shouldn't have been trusted. The more concerning scenario to me are the devices with access to the public internet using the upstream revocation list. What do we do when access to that revocation list goes down? Do we fail insecure and potentially allowing a vulnerability, or fail secure and cause an outage. Last year's Apple scenario showed we can have the worst of both, where the revocation server could be extremely slow to eventually timeout on the response.

For the TUF scenario, I think we want to explore what it would look like for the root key to be eventually expired (with a relatively long lifetime). And with short lifetimes on the timestamp signing, what does that look like for mirrors and popular registries that want to push as much out to CDN's as possible.

And it's bigger than this document scoped, but we also need to explore what key distribution looks like with v2. If we are avoiding TOFU by having clients explicitly trusting a root key for the organization, how is that root key first deployed, how does it get rotated, and if we do in-band rotation, can we trust that chain of rotated root keys that lead back to one or more now expired keys.


However, the user must be able to ensure that the key revocation list is accurate and up to date. If an attacker is able to replay an old revocation list, the user may continue to trust compromised keys. Therefore the distribution of the key revocation list must allow the user to verify authenticity and timeliness.

Also, for security reasons, keys cannot be removed from a key revocation list, so the list will grow larger and larger over time and may eventually have a noticeable bandwidth impact.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mitigate the risk of ever growing revocation list, there can be a combination, a longish expiration time on keys that can be revoked. Then a revoked key can be removed from the list after it eventually expires.

@sudo-bmitch
Copy link
Contributor

I could also see a use case for a middle ground, where there's several types of keys used by an organization. One to identify "this content was produced by this organization" and another key that claims "we believe this content is currently secure". That would allow Docker to use a longer lifetime key on something like a 4 year old ubuntu image, that we know has vulnerabilities, but still want to know that Docker signed it. And since that image isn't changing, it could go through a more controlled offline signing process with a longer lived key. While the ubuntu image that was just updated yesterday, that we think is secure, may get signed with a key that expires in a week.

@dlorenc
Copy link

dlorenc commented Mar 3, 2021

I haven't been looking at the problem as broadly as all of you - on purpose - but I'm strongly in favor of expiration approaches rather than revocation ones.

Transparency logs + timestamps are better than revocation IMO - and can also be made compatible with most air-gapped scenarios.

So, overall strong +1 to this entire document @mnm678 :)

Signed-off-by: Marina Moore <[email protected]>
Add an initial list of pros and cons for each technique
and add a few clarifications

Signed-off-by: Marina Moore <[email protected]>
@dlorenc
Copy link

dlorenc commented Mar 21, 2021

This LGTM!

@SteveLasker
Copy link
Contributor

I started writing up some thoughts and concerns for the scaling problem of maintaining all the public and private content we expect to see currently, and the coming years.
Before getting into the tradeoffs of short-lived, long-lived keys, how we might continually update those, or how we might manage revocations or allow lists, I figured I'd outline a sense of scale.
Please see: Scaling Public & Private Registries, and perhaps we can have a discussion there.

My hope is we can have a baseline of scale, and then the various approaches may start to become more obvious.

@dlorenc
Copy link

dlorenc commented Mar 23, 2021

To be honest those are still tiny numbers compared to other public-key crypto key management systems in the wild today. Could you be more specific on what you think will be a scaling issue here?

@SteveLasker
Copy link
Contributor

@mnm678, I can see this doc being used in one of two ways:

  1. A good generalization of the key revocation scenarios, as general background reference we consider through our designs.
  2. A discussion for how TUF solves some of the problems (not trying to state quantity, just that it's an opinionated approach)

If you can strip this down to general background on the scenarios, we can merge it into the repo as good overview for the reader, regardless of implementation. As we develop the key management specs of Notary v2, we can reference this, and update it to reflect how Notary v2 solves these problems.

Or, we can transfer this as a discussion, capturing an opinionated view on TUF.

The difference allows us to merge content (1) we haven't closed on a direction with, vs. opinionated content on a specific solution we haven't committed to yet (2)

@sudo-bmitch
Copy link
Contributor

I think there's a different take on the challenge that TUF provides, but agree that doesn't need to be specifically called out as requiring TUF in this document. Instead I'd keep the last section but rephrase it to not specifically name TUF, but instead describe some of the qualities of an intermediate solution, one that compresses the short expiration certificates into a single signature on a collection of artifacts, which is separate from the individual signatures within that collection.

@mnm678
Copy link
Contributor Author

mnm678 commented Apr 2, 2021

Thanks @SteveLasker and @sudo-bmitch. I updated the description of TUF's approach to key revocation to more generally describe combining explicit and implicit revocation.

@SteveLasker
Copy link
Contributor

Thanks @mnm678,
The description has references to TUFs implementation, so I think you're asking for this to move to a discussion for how TUF solves these problems, as opposed to the general background, not referencing specific implementations.
We can transfer "issues" to "discussions", but we can't transfer PRs to "discussions".
Can you create a discussion titled something like "Describing Key Revocation with TUF", link this PR and close this one?

@mnm678
Copy link
Contributor Author

mnm678 commented Apr 3, 2021

I guess I wasn't clear. I re-worded the final option to explain how implicit and explicit key revocation can be used in general so that it can be discussed in the context of other approaches to key revocation. The technique of combining implicit and explicit key revocation was introduced in this paper by @JustinCappos and others and refined in this paper by @trishankatdatadog and others. It has only been used widely in TUF and related projects, so I think a mention of TUF is necessary to understand the technique and know where to look for more information.

More fundamentally, when talking about key revocation the specific implementation is important, because like other security systems, it is only as secure as its weakest link. That's the benefit of using existing, well tested security mechanisms instead of attempting to build them from the ground up.

@trishankatdatadog
Copy link

The description has references to TUFs implementation, so I think you're asking for this to move to a discussion for how TUF solves these problems, as opposed to the general background, not referencing specific implementations.

Steve, I'm hard-pressed to see how you could discuss a comprehensive background without discussing specific implementations. That's like citing papers without naming its authors.

Can you create a discussion titled something like "Describing Key Revocation with TUF", link this PR and close this one?

Is there a good reason to close this PR and open a discussion instead, other than citing specific implementations?

@sudo-bmitch
Copy link
Contributor

For the last option, while TUF may be the only solution we know of that takes this approach, we've done such a good job keeping the other sections abstract and not listing the various implementations of each technique that naming TUF in the last section comes across as a sales pitch.

Here's my own attempt to reword this:

Combining explicit and implicit revocation

By using a hierarchical combination of keys, a trusted root key can delegate signing to various keys that expire. Additionally, artifacts may be signed by more than one key, allowing automated tooling to provide short lived signatures that verify the signer and artifact have not been revoked. Clients then verify the necessary collection of signatures is found on the artifact.

This method allows signers to have relatively long lived keys, to simplify their workflow and avoid needing to resign the artifacts themselves, while enabling timely revoking of the signer key or a single artifact signature.

For efficiency, a meta-artifact can be created and maintained, containing references to a collection currently signed artifacts. And the short lived signature can be created for this single artifact, rather than every artifact individually.

Pros:

  • Keys may be quickly revoked
  • Individual artifact signatures may be quickly revoked
  • Signers do not need to frequently resign all artifacts
  • Verifiers only need to trust the root key, all delegated keys can be verified against this

Cons:

  • Requires maintenance of an automated system to refresh short lived signatures
  • A root key compromise requires updating all signers, clients, and signatures on the artifacts
  • Signatures in disconnected environments and on artifact copies may quickly become stale
  • Updating short lived signatures on a large number of artifacts may encounter scaling challenges and loses some of the caching efficiencies of content addressable storage in registries

@mnm678
Copy link
Contributor Author

mnm678 commented Apr 27, 2021

Thanks @sudo-bmitch. I updated the pr.


## Key Expiration

Adding an expiration time to every key allows keys to automatically be revoked after a certain period of time. The expiration time is usually included with the key so that it is easy for users to find. This technique does not require any action from the key holder, and ensures that users will have to refresh their trusted keys before those keys expire.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to clarify that the key expiry (metadata) needs to be signed by a issuing key that the client trusts.


Cons:
* Keys can't be revoked before expiration
* Artifacts must be re-signed after expiration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamping is an option that allows use of signed artifacts after the key expires.

* Artifacts must be re-signed after expiration


## Key revocation lists (Deny lists)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section and next is similar to Allowlist and Denylist in Repository, the challenge is synchronizing deny lists in multi-registry scenarios. Also we don't cover artifact level revocation in this doc, and specify where allow/deny lists will be stored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note about synchronization. I purposefully left out artifact level revocation in this document for now, but we can combine those discussions is a later draft.

threatmodel.md Outdated
@@ -10,3 +10,10 @@ It is assumed that an attacker may perform one or more the following actions:

While it is not always possible to protect against all scenarios, the system should to the extent possible mitigate and/or reduce the damage caused by a successful attack, detect the occurrence of an attack and notify appropriate parties, yet remain usable for parties operating the system. Furthermore, the system should recover from successful attacks in a way that presents low operational overhead and risk to users.

Attacker Goals:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be reworded to be less generic terminology, and targeted for artifact registry and consumers? Also is this intended to be an initial version? I think the final threat model and analysis would be detailed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are duplicated in #35, and they are a bit out of scope for this pr, so I'll remove them here and try to address your comment there

@mnm678
Copy link
Contributor Author

mnm678 commented May 27, 2021

Thanks for the review @gokarnm! I updated the document and responded inline to a couple of comments.

@mnm678
Copy link
Contributor Author

mnm678 commented Jun 6, 2021

@sudo-bmitch @gokarnm I updated the intro as discussed in the meeting. Can I get a review/approval to merge?

@sudo-bmitch
Copy link
Contributor

Just noticed the DCO validation error. Not sure if that will block the ability to merge.

Still LGTM and hope we can merge and iterate forward. Thanks for driving this @mnm678 !

@gokarnm
Copy link
Contributor

gokarnm commented Jun 6, 2021

@mnm678 LGTM!

Copy link
Contributor

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can add a con to the meta-artifact section, addressing the multi-registry challenge for moving individual artifacts


However, the user must be able to ensure that the key revocation list is accurate and up to date. If an attacker is able to replay an old revocation list, or show different versions to different registries, the user may continue to trust compromised keys. Therefore the distribution of the key revocation list must allow the user to verify authenticity and timeliness.

Also, for security reasons, keys cannot be removed from a key revocation list, so the list will grow larger and larger over time and may eventually have a noticeable bandwidth impact, although this can be mitigated by combining key revocation lists with keys that expire.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true, that once in a list, it's never removed? Or, can keys that are known to have expired be removed at a later date? Perhaps 50% longer than the life of the key or something. It does seem like a non-scalable solution that needs mitigation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be combined with strongly enforced key expiration to allow the keys to eventually be deleted.


This method allows signers to have relatively long lived keys, to simplify their workflow and avoid needing to resign the artifacts themselves, while enabling timely revoking of the signing key or a single artifact signature.

For efficiency, a meta-artifact can be created and maintained, containing references to a collection currently signed artifacts. And the short lived signature can be created for this single artifact, rather than every artifact individually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this interact with individual artifacts moving within and across registries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends on the implementation, but the meta-artifact would have to be updated as the collection changes.

Cons:
* Requires maintenance of an automated system to refresh short lived signatures
* A root key compromise requires updating all signers, clients, and signatures on the artifacts
* Updating short lived signatures on a large number of artifacts may encounter scaling challenges and loses some of the caching efficiencies of content addressable storage in registries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a con here for the meta-artifact collection of keys needs to somehow be parseable for individual artifact movement within and across registries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meta-artifact is an optional efficiency feature, so I added the con to the paragraph describing the feature.

@SteveLasker
Copy link
Contributor

@mnm678, can you also solve the DCO issue?

Update the combined key revocation option to remove references
to TUF and more generically describe the way it allows for both
explicit and implicit key revocation.

Thanks to @sudo-bmitch for wording suggestions.

Signed-off-by: Marina Moore <[email protected]>
Signed-off-by: Marina Moore <[email protected]>
@dlorenc
Copy link

dlorenc commented Jun 25, 2021

This looks great to me, we can definitely keep iterating after merge.

@hallyn
Copy link

hallyn commented Jun 25, 2021

+1 on merge

Copy link
Contributor

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SteveLasker SteveLasker merged commit ab8fd3a into notaryproject:main Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants