This is not an officially supported Google product.
For network emulation, there are many approaches using VM's for emulation of a hardware router. Arista, Cisco, Juniper, and Nokia have multiple implementations of their network operating system and various generations of hardware emulation. These systems are very good for most validation of vendor control plane implementations and data plane for limited certifications. The idea of this project is to provide a standard "interface" so that vendors can produce a standard container implementation which can be used to build complex topologies.
- Have standard lifecycle management infrastructure for allowing multiple vendor device emulations to be present in a single "topology"
- Allow for control plane access via standard k8s networking
- Provide a common networking interface for the forwarding plane between network
pods.
- Data plane wires between pods
- Control plane wires between topology manager
- Define service implementation for allowing interaction with the topology
manager service.
- Topology manager is the public API for allowing external users to manipulate the link state in the topology.
- The topology manager will run as a service in k8s environment.
- It will provide a gRPC interface for tests to interact with
- It will listen to CRDs published via the network device pods for discovery
- Data plane connections for connectivity between pods must be a public
transport mechanism
- This can't be implemented as just exposing "x eth devices on the pod" because Linux doesn't understand the associated control messages which are needed to make this work like a wire.
- Transceiver state, optical characteristics, wire state, packet filtering / shaping / drops
- LACP or other port aggregation protocols or APS cannot be simulated correctly
- The topology manager will start a topology agent on each host for the pod to directly interact with.
- The topology agent will provide the connectivity between nodes
- Define how pods boot an initial configuration
- Ideally, this method would allow for dynamic
- Define how pods express services for use in-cluster as well as external services
The main use case of this infrastructure is for the development of tests to validate control plane / configuration of network devices without needing real hardware.
The main use case we are interested in is the ability to bring up arbitrary topologies to represent a production topology. This would require multiple vendors as well as traffic generation and end hosts.
In support of the testing we need to be able to provide every tester, engineer and continuous automated run a set of environments to validate test scenarios used in production. These can also be used to pre-validate hardware testing as well. This can reduce cycle time as there will be no contention for the virtual testbed vs. the hardware testbed. This also allows for "unit testing" the integration test.
For the development of new services or for offering a better environment to developers for existing services, virtual testbeds would allow for better scaling of resources and easier to use testbeds that would be customized for a team's needs. Specifically, workflow automation struggles to have physical representations of metros that need to be validated for workflows. A virtual testbed would allow for the majority of workflows to be validated against any number of production topologies.
See the collection of docs for in depth guides on how use Kubernetes Network Emulation (KNE).
The KNE CLI optionally collects anonymous usage metrics. This is turned OFF
by default. We use the metrics to gauge the health and performance of various
KNE operations (i.e. cluster deployment, topology creation) on an opt-in
basis. There is a global flag --report_usage
that when provided shares
anonymous details about certain KNE CLI commands. Collected data can be seen in
the event proto definition. Usage metrics are NOT shared
by default. Additionally the PubSub project and topic the events are published
to are configurable. If you want to track your own private metrics about your
KNE usage then that is supported by providing a Cloud PubSub project/topic of
your choosing. Full details about how/when usage events are published can be
found in the codebase here. We appreciate usage metric
reporting as it helps us develop a better KNE experience for all of our users.
Whether that be detecting an abnormally high number of cluster deployment
failures due to an upgrade to an underlying dependency introduced by a new
commit, or detecting a bug from a scenario where the failure rate for topologies
over n links is far greater than n-1 links. Usage metric reporting is
helpful tool for the KNE developers.
This project is mainly based on the k8s-topo from github.com/networkop/k8s-topo and meshnet-cni plugin from github.com/networkop/meshnet-cni.