When and How CC Updates the Runtime

Introduction

There are multiple paths through which CC updates the runtime (Diego or Eirini). There are multiple paths and they can be hidden and circuitous. As a result, it can be difficult to reason about whether an API interaction will update the runtime.

This document aims to be descriptive of the current world rather than prescriptive of a future world. We will change and improve things, so this document may drift out of alignment with reality. Also, this stuff is complicated, so this document will probably get some things wrong.

Overview

The two main methods for updating the runtime are the ProcessObserver class and the Diego::Sync job. The ProcessObserver triggers when certain fields are updated on processes and immediately update the runtime. The Diego::Sync job runs periodically and will update the runtime if certain fields have changed on the process.

Details

Base Case

In general, fields on the Cloud Controller API are desired state and do not necessarily reflect the actual state of the runtime. By default, changing a field on the API will not automatically update the runtime unless you restart your application (for example: security groups, environment variables, disk). That said, there are numerous exceptions, which we will explore below.

Process Version

To understand if a change to a process will automatically propagate to the runtime, one must first understand process versioning. When a proccess's state, memory, health_check_type, health_check_http_endpoint, or ports fields are updated, then the process's version will be updated to a new random guid. Changing a process's version will then result in the process's LRP changing as will be demonstrated in coming sections.

Manifest Bypassing Version

Server side manifests are special cased to skip updating process versions if the memory field is provided. This means that as long as the process memory is set in the manifest, the process version will not change, regardless of what other fields are changed on the process.

Open Question: Why does this not apply to other manifest fields like health_check_type and health_check_http_endpoint?

Process Observer

ProcessObserver triggers when changes to processes are committed to to the database. It is called using after_commit hooks in the after_save and after_destroy model hooks.

Note: Because most of Cloud Controller unit tests depend on database transactions, these after_commit hooks don't trigger in tests, unless you switch the test's database isolation to :truncation.

If the process's state, diego, enable_ssh, or ports fields are updated the the ProcessObserver will start or updated the process. Note that this method name is a bit confusing, because, as we will see, the Runner.start method will also update running processes.

Focusing on the Diego case, Diego::Runner#start directly forwards on to Diego::Messenger#send_desire_request, which then forwards on to the Diego::DesireAppHandler.create_or_update_app, where it is finally revealed that we will also be updated an existing process, not just creating new processes.

Here is where the process version comes into play. The Diego::DesireAppHandler checks to see if the process has a matching LRP. To do this, it uses Diego::BbsAppsClient#get_app, which in turn uses Diego::ProcessGuid. Here we see that the LRP is identified using a combination of the process's guid and it's version. This means if the process's version has changed since Diego was last updated, Diego::BbsAppsClient will not find a matching LRP for the process.

If a matching LRP is found (e.g. the process's version hasn't changed), then Diego::BbsAppsClient#update_app is called. Note that only instances, updated_at, and routes can be updated on an LRP. Changing other fields will require creating a new LRP.

If a matching LRP is NOT found (e.g. the process is new or the version has changed), then Diego::BbsAppsClient#desire_app to create a new LRP for the process.

Open Question: What happens to the old LRP? Do we wait for the sync job to sweep it up?

Instances

If none of the above fields are updated and the instances field is updated, then the process will be scaled via Diego::Runner#scale.

If the process's package is currently pending (calculating this is a whole can of worms), then nothing happens. In this case the sync job will be responsible for eventually scaling the process.

Open Question: In the v3 world, does this flow make sense? Theoretically uploading a new package should be independent from scaling the process.

If the process's package is NOT pending, it will call Diego::Messenger#send_desire_request and re-join the flow above.

Sync Job

The Diego sync job runs every 30 seconds (by default) on the Cloud Controller Clock. When the sync job runs, it checks to make sure that CC's processes match the LRPs in Diego using Diego::ProcessesSync.

Diego::ProcessesSync loops over all the processes in the ccdb and checks if there is a corresponding LRP in Diego. To do this comparison, it again uses Diego::ProcessGuid. Remember that this class identifies processes using a combination of the process's guid and it's version.

At this point, the sync job behaves very similarly to the ProcessObserver. If a matching LRP is found and the process's updated_at is different than the LRP's, then it calls Diego::BbsAppsClient#update_app. If a matching LRP is NOT found, then it calls Diego::BbsAppsClient#desire_app.

Finally, the sync job deletes all remaining LRPs that haven't been matched to a process.

Restart Action

The AppRestart action is a special way to reach into the runtime without going through the ProcessObserver or the Diego::Sync job.

The ProcessObserver will not fail if there are difficulties communicating with Diego (since the sync job will be around to clean up later). This means that stopping an app will not guarantee that the app is actually stopped in the runtime. AppRestart aims to make sure that the app's processes are actually stopped before starting them again.

To do this, the AppRestart action calls ProcessRestart.restart. This in turn calls Diego::Runner#stop, Diego::Messenger#send_stop_app_request, and then Diego::BbsAppsClient#stop_app, which deletes the LRP. It then goes through a similar call stack to create a new LRP for the process.

One interesting eccentricity of bypassing the ProcessObserver is that the updated_at timestamps won't match between the process and the LRP, so the clock will come through and do an unnecessary, albeit harmless, update on the LRP.

Tasks

Coming Soon

Deployments