Skip to content

Gateway

Joe Kralicky edited this page Jan 27, 2023 · 13 revisions

Summary

The Gateway is the central point of entry for the Opni system.

Table of contents

Architecture

The Opni Gateway is a multi-faceted API server that manages connections and communication with agents. It comprises several servers, each with a specific role.

API Servers

  • Public gRPC API Server: This is the only publicly accessible server, offering a minimal set of APIs necessary for agents to authenticate and connect to the gateway. Most other interactions with the gateway occur through a long-lived bidirectional stream, which agents initiate by connecting to a service on this endpoint.

  • Internal Management Server: This server exposes RESTful APIs for managing core internal resources such as clusters, bootstrap tokens, RBAC objects, and capabilities. Additionally, it enables API extensions, allowing plugins to expose custom gRPC services at the same endpoint as the core management API. These endpoints aren't accessible outside the cluster.

  • Internal HTTP Server: This server handles the /metrics endpoint and the admin dashboard. The dashboard is a single page app served from static web assets embedded into the binary at build time. The HTTP server also supports API extensions, allowing plugins to register custom routes. Similar to the management server, these endpoints aren't accessible outside the cluster.

  • Local HTTP Server: This server, only accessible within the gateway pod, handles the /debug/pprof endpoint for diagnostics and the /healthz endpoint for Kubelet health checks.

Plugins

The gateway uses the hashicorp/go-plugin library to manage plugins. A fixed set of interfaces are available for plugins to implement. These interfaces let plugins interact with different systems within the gateway. A single plugin binary can have implementations for many interfaces, and there are no required interfaces.

Plugins contain the majority of the implementation details and logic for the "capabilities" of Opni, such as Monitoring and Logging. They may also contain other APIs that aren't part of the core gateway.

Plugin Loader

The agent uses a Plugin Loader to load plugin binaries from disk. The agent loads the plugins once on startup, and it doesn't unload or restart them. The plugin loader loads all plugins at the same time in parallel. Plugins can't have dependencies on other plugins, but they can use the management API to query for the existence of specific API extensions, for example.

Compiling Plugins

The plugins/ directory stores all plugin code. Each subdirectory has a main module, which compiles to a separate plugin binary, named plugin_<subdirectory>. For example, the plugins/example directory contains a main module that compiles to plugin_example. The plugin loader will only load binaries prefixed with plugin_.

API Extensions

API Extensions are an important mechanism that enables a plugin to communicate with other systems in the central Opni cluster, with agents, other plugins, and with the admin dashboard and CLI. The gateway facilitates transparently routing relevant API requests to the plugins, making the process of implementing API extensions easy.

  • Management API Extensions

  • HTTP API Extensions

  • Stream API Extensions

Scale and performance

Scaling the gateway is currently not supported, since there are a couple stateful components that need to be upgraded to support it. This is a work in progress. The majority of the components of the gateway are stateless.

The gateway should be able to handle a large number of clusters before scaling is necessary. Load testing is still in progress, tracked here:https://github.com/rancher/opni/issues/275

Security

Being the only component of Opni that is designed to be exposed to the internet, security is a critical aspect of the gateway. There are two major areas of importance: communication between the gateway and agents, and communication between the gateway and browsers, such as through Grafana or the admin dashboard.

  • Communication between the gateway and agents

    The agent bootstrap process is the first step in securely connecting agents to the gateway.

    The bootstrap process is as follows:

    1. The server generates a self-signed keypair, and a bootstrap token.
    2. The client is given the bootstrap token and one or more fingerprints of public keys in the server's certificate chain ("pinned" public keys). It first sends an empty request to the server's /bootstrap/join endpoint. The client cannot yet trust the server's self-signed certificate, so it does not send any important data in the request.
    3. During the TLS handshake, the client computes the fingerprints of the public keys in the server's offered certificates, and compares them to its pinned fingerprints. If any of the fingerprints match, and the server's certificate chain is valid (i.e. each certificate is signed by the next certificate in the chain), the client trusts the server and completes the TLS handshake. The client will save the server's root certificate in its keyring for later on.
    4. The server responds with several JWS messages with detached payloads (one for each active bootstrap token).
    5. The client finds the JWS with the matching bootstrap token ID, fills in the detached payload (the bootstrap token), and sends it back to the server's /bootstrap/join endpoint along with the client's own unique identifier it wishes to use (typically the client's kube-system namespace resource UID) and an ephemeral x25519 public key. This step requires the client to trust the server, because the complete JWS (which includes the bootstrap token) is a secret.
    6. The server verifies the reconstructed JWS. If it is correct, the server can now trust the client. The server responds with its own ephemeral x25519 public key.
    7. Both the client and server use their ephemeral keypair and their peer's public key to generate a shared secret using diffie-hellman. Then, this secret is passed through a KDF to create two static ed25519 keys. One is used to generate and verify MACs for client->server messages, and the other is used to generate and verify MACs for server->client messages. The key exchange and KDF algorithms are based on the libsodium key exchange api.
    8. These keys are added to identical keyrings on both the client and server, and saved to persistent storage.

    In subsequent connections to the gateway, the agent must use this keyring to authenticate itself itself as follows:

    1. The client sends a connection request to the gateway to attempt to establish a long-lived bidirectional stream. The same TLS handshake is performed to establish transport security.
    2. The server replies to the client with a challenge by sending a GRPC header with the key X-Challenge and a UUID as the value.
    3. The client concatenates its cluster ID, the challenge UUID, and the GRPC method (in this case, /stream.Stream/Connect) and signs the combined payload with the client key from its keyring to generate a MAC (message authentication code). The client responds to the server with this MAC.
    4. The server generates a MAC in the same way using its copy of the keyring for the client with the given ID. If and only if both MACs are exactly identical, the client is successfully authenticated and the connection request can proceed.

    Addendum: preventing timing attacks

    It is important that in the event the client authentication fails, that it is not able to ascertain any information about why exactly it failed. In particular, the client should not be able to distinguish between failure caused by an incorrect MAC, and failure caused by the cluster with the requested ID not existing at all. If the client assumes that requests for IDs that exist take longer on average than requests for IDs that don't exist (a reasonable assumption), it could measure average request durations over time and theoretically obtain this information.

    A common strategy used to mitigate such an attack is for the server to sleep for a random duration before completing the request, but this strategy is not effective enough to prevent timing attacks, as given a large enough sample set, the client could account for this in its measurements. The approach taken by the gateway is to instead ensure all requests genuinely perform the same amount of work and do not short-circuit or "fail fast". This requires some careful code structure, but the result is that all timing measurements the client could take are statistically indistinguishable from one another.

  • Communication between the gateway and browsers

TODO: fill this in

High availability

TODO: fill this in

Testing

TODO: fill this in

Clone this wiki locally