-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support external OCSP cache #546
Comments
Good idea. Any specific recommendations for a first implementation? This was partly planned for via the Updateable interface, but with no configuration mechanism. It's currently hard-coded here: amppackager/packager/certcache/certcache.go Line 144 in 6a1c7af
|
I guess I don't know what is shared between multiple processes. I examined the cache file on one of our pods, it's only a few hundred bytes. I also don't understand the relationship to the lock file. It seems like only one process can do things with the cache at a time? I guess I'm just not sure what the cache is doing enough to know how things would interact with a tens to hundreds of service pods. |
For the OCSP, what is shared between multiple processes is an OCSP response from the CA (a signed 7-day validity period for the cert). Can be examined with I used https://gist.github.com/sleevi/5efe9ef98961ecfb4da8 as my design guide. It can be summarized as "don't be a jerk and hammer the server when things are bad". It is updated every ~3.5 days, per item 3. The disk cache is to solve item 1. We don't want to fetch a new OCSP response whenever an individual AMP Packager pod restarts (which could happen accidentally often if there is some /healthz issue), nor do we want to fetch it 1000 times if 1000 new pods are added to the config. I think any persistence mechanism would be OK here -- even an in-memory mechanism as long as it's The lock file is to solve item 6: request coalescing. When it's time to update the OCSP cache entry, we don't want all 1000 pods to issue a request at the same time. It looks like there are some client libraries that add transactions to memcached, but Redis has built-in transaction support? Anyway, it doesn't even need to be fully atomic; something heuristic like:
Close enough to transactional because the time between steps 1 & 2 is short (no external dependencies). Oof, sorry for the length of this comment. |
How about this groupcache fork? If it's tiny, in memory, only needs to be fetched on expire, this might work with the right cache key. It also looks like it may be possible to do the updates via Kubernetes secrets volumes. I'm trying to figure out if you can write to a secretes volume and it will get updated in the API server. |
That groupcache fork works for me. The lack of a need for a separate server is clever. IIUC, it doesn't provide a mechanism for peer discovery: the list of peers would need to be provided via flag or config file. Is that right? I'm not too familiar with Kubernetes, but I'll note that the cached OCSP response is not a secret; it is included in the HTTP response from certcache. So maybe a more generic volume could work? |
Yea, a flag for the groupcache peer list would probably be the easiest. But looking at how this works, we would need a dynamic updater. Digging around, I found autocache, which is basically groupcache + peer discovery. This way in Kubernetes you would point the flag like this: Using the cache like this is a bit fragile, because if they all restart, the cache key is lost. But that doesn't seem like the end of world for this use case. The reason I picked the secrets volume is because it's backed by a tmpfs and managed by Kubernetes. When you push a new secret to the Kubernetes API, it pushes out the file to all pods. Making it a good enough updater. It's also persistent in the API server. Most of the other generic volume modes can not be shared between pods. Only one pod can access them at a time. Having put some thought into the Kubernetes option, I think this is probably not what we want here anyway. It means having to use the Kubernetes API directly in the service, which means granting access to the API. This is not really something we want to do from a security perspective. Sadly, Kubernetes doesn't provide direct access to etcd for services to use. As in a generic key-value/lock system. At least not in the same way Chubby is used. :-( |
Yeah, autocache seems like it'd fit this case. Agreed that losing the cache key in that rare event is not the end of the world; request coalescing should ensure that'll only be ~1 fetch afterwards. We could also relax a constraint if it makes things simpler: We don't need every replica to have write access to the cache. We could designate (via flag) one pod to be the "OCSP renewer" that fetches & writes ( This would require more extensive code change to certcache.go to make it work, but it might be worth it. Using that model, perhaps emptyDir would do the job, since it persists across crashes. (I'm assuming non-crash restarts of the pod are rare.) |
Another random idea: use something like gokv to allow a configurable KV store (including those provided by AWS / Azure / Google Cloud). Is this too heavyweight for this one value? (In the future, two values. I'd like to remove the persistent volume dependency for amppkg's ACME client too.) |
Sorry if I'm late to the party. If we're considering removing the persistent volume dependency for ACME client as well, please remember the following requirements:
|
@banaag Definitely not removing, just adding another method to manage. |
For large Kubernetes deployments, it's not recommended to use NFS mounts.
Having this cache be handled externally by a service would make supporting large environments easier.
Ideas:
The text was updated successfully, but these errors were encountered: