-
Notifications
You must be signed in to change notification settings - Fork 362
When and How CC Updates the Runtime
There are multiple paths through which CC updates the runtime (Diego or Eirini). There are multiple paths and they can be hidden and circuitous. As a result, it can be difficult to reason about whether an API interaction will update the runtime.
This document aims to be descriptive of the current world rather than prescriptive of a future world. We will change and improve things, so this document may drift out of alignment with reality. Also, this stuff is complicated, so this document will probably get some things wrong.
The two main methods for updating the runtime are the ProcessObserver
class and the Diego::Sync
job. The ProcessObserver
triggers when certain fields are updated on processes and immediately update the runtime. The Diego::Sync
job runs periodically and will update the runtime if certain fields have changed on the process.
In general, fields on the Cloud Controller API are desired state and do not necessarily reflect the actual state of the runtime. By default, changing a field on the API will not automatically update the runtime unless you restart your application (for example: security groups, environment variables, disk). That said, there are numerous exceptions, which we will explore below.
To understand if a change to a process will automatically propagate to the runtime, one must first understand process versioning. When a proccess's state
, memory
, health_check_type
, health_check_http_endpoint
, or ports
fields are updated, then the process's version will be updated to a new random guid. Changing a process's version will then result in the process's LRP changing as will be demonstrated in coming sections.
Server side manifests are special cased to skip updating process versions if the memory
field is provided. This means that as long as the process memory is set in the manifest, the process version will not change, regardless of what other fields are changed on the process.
Open Question: Why does this not apply to other manifest fields like
health_check_type
andhealth_check_http_endpoint
?
ProcessObserver
triggers when changes to processes are committed to to the database. It is called using after_commit
hooks in the after_save and after_destroy model hooks.
Note: Because most of Cloud Controller unit tests depend on database transactions, these
after_commit
hooks don't trigger in tests, unless you switch the test's database isolation to:truncation
.
If the process's state
, diego
, enable_ssh
, or ports
fields are updated the the ProcessObserver
will start or updated the process. Note that this method name is a bit confusing, because, as we will see, the Runner.start
method will also update running processes.
Focusing on the Diego case, Diego::Runner#start
directly forwards on to Diego::Messenger#send_desire_request
, which then forwards on to the Diego::DesireAppHandler.create_or_update_app
, where it is finally revealed that we will also be updated an existing process, not just creating new processes.
Here is where the process version comes into play. The Diego::DesireAppHandler
checks to see if the process has a matching LRP. To do this, it uses Diego::BbsAppsClient#get_app, which in turn uses Diego::ProcessGuid
. Here we see that the LRP is identified using a combination of the process's guid and it's version. This means if the process's version has changed since Diego was last updated, Diego::BbsAppsClient
will not find a matching LRP for the process.
If a matching LRP is found (e.g. the process's version hasn't changed), then Diego::BbsAppsClient#update_app
is called. Note that only instances
, updated_at
, and routes
can be updated on an LRP. Changing other fields will require creating a new LRP.
If a matching LRP is NOT found (e.g. the process is new or the version has changed), then Diego::BbsAppsClient#desire_app
to create a new LRP for the process.
Open Question: What happens to the old LRP? Do we wait for the sync job to sweep it up?
If none of the above fields are updated and the instances
field is updated, then the process will be scaled via Diego::Runner#scale
.
If the process's package is currently pending (calculating this is a whole can of worms), then nothing happens. In this case the sync job will be responsible for eventually scaling the process.
Open Question: In the v3 world, does this flow make sense? Theoretically uploading a new package should be independent from scaling the process.
If the process's package is NOT pending, it will call Diego::Messenger#send_desire_request
and re-join the flow above.
The Diego sync job runs every 30 seconds (by default) on the Cloud Controller Clock. When the sync job runs, it checks to make sure that CC's processes match the LRPs in Diego using Diego::ProcessesSync
.
Diego::ProcessesSync
loops over all the processes in the ccdb and checks if there is a corresponding LRP in Diego. To do this comparison, it again uses Diego::ProcessGuid
. Remember that this class identifies processes using a combination of the process's guid and it's version.
At this point, the sync job behaves very similarly to the ProcessObserver
. If a matching LRP is found and the process's updated_at
is different than the LRP's, then it calls Diego::BbsAppsClient#update_app
. If a matching LRP is NOT found, then it calls Diego::BbsAppsClient#desire_app
.
Finally, the sync job deletes all remaining LRPs that haven't been matched to a process.
The AppRestart
action is a special way to reach into the runtime without going through the ProcessObserver
or the Diego::Sync
job.
The ProcessObserver
will not fail if there are difficulties communicating with Diego (since the sync job will be around to clean up later). This means that stopping an app will not guarantee that the app is actually stopped in the runtime. AppRestart
aims to make sure that the app's processes are actually stopped before starting them again.
To do this, the AppRestart
action calls ProcessRestart.restart
. This in turn calls Diego::Runner#stop
, Diego::Messenger#send_stop_app_request
, and then Diego::BbsAppsClient#stop_app
, which deletes the LRP. It then goes through a similar call stack to create a new LRP for the process.
One interesting eccentricity of bypassing the ProcessObserver
is that the updated_at
timestamps won't match between the process and the LRP, so the clock will come through and do an unnecessary, albeit harmless, update on the LRP.
Coming Soon
Coming Soon
-
Pipelines
-
Contributing
- Tips and Tricks
- Cloud Controller API v3 Style Guide
- Playbooks
- Development configuration
- Testing
-
Architectural Details
-
CC Resources
- Apps
- Audit Events
- Deployments
- Labels
- Services
- Sidecars
-
Dependencies
-
Troubleshooting
- Ruby Console Script to Find Fields that Cannot Be Decrypted
- Logging database queries in unit tests
- Inspecting blobstore cc resources and cc packages(webdav)
- How to Use USR1 Trap for Diagnostics
- How to Perf: Finding and Fixing Bottlenecks
- How to get access to mysql database
- How To Get a Ruby Heap Dumps & GC Stats from CC
- How to curl v4 internal endpoints with mtls
- How to access Bosh Director console and restore an outdated Cloud Config
- Analyzing Cloud Controller's NGINX logs using the toplogs script
-
k8s
-
Archive