News [Dec 2023]: Check out Machnet for fast and easy DPDK-based messaging. It supports many types of cloud VMs and bare-metal NICs, multiple application processes, language bindings, etc.
eRPC is a fast and general-purpose RPC library for datacenter networks.
Our NSDI 2019 paper
describes the system in detail. Documentation can be generated by running doxygen
.
Some highlights:
- Multiple supported networks: Ethernet, InfiniBand, and RoCE
- Low latency: 2.3 microseconds round-trip RPC latency with UDP over Ethernet
- Performance for small 32-byte RPCs: ~10M RPCs/sec with one CPU core, 60--80M RPCs/sec with one NIC.
- Bandwidth for large RPC: 75 Gbps on one connection (one CPU core at server and client) for 8 MB RPCs
- Scalability: 20000 RPC sessions per server
- End-to-end congestion control that tolerates 100-way incasts
- Nested RPCs, and long-running background RPCs
- A port of Raft as an example. Our 3-way replication latency is 5.3 microseconds with traditional UDP over Ethernet.
- NICs: Fast (10 GbE+) NICs are needed for good performance. eRPC works best with Mellanox Ethernet and InfiniBand NICs. Any DPDK-capable NICs also work well.
- System configuration:
- At least 1024 huge pages on every NUMA node, and unlimited SHM limits
- On a machine with
n
eRPC processes, eRPC uses kernel UDP ports{31850, ..., 31850 + n - 1}.
These ports should be open on the management network. Seescripts/firewalld/erpc_firewall.sh
for systems runningfirewalld
.
- Build and run the test suite:
cmake . -DPERF=OFF -DTRANSPORT=dpdk; make -j; sudo ctest
.DPERF=OFF
enables debugging, which greatly reduces performance. SetDPERF=ON
for good performance.- Here,
dpdk
should be replaced withinfiniband
for InfiniBand NICs. - A machine with two ports is needed to run the unit tests if DPDK is chosen.
Run
scripts/run-tests-dpdk.sh
instead ofctest
.
- Run the
hello_world
application:cd hello_world
- Edit the server and client hostnames in
common.h
- Based on the transport that eRPC was compiled for, compile
hello_world
usingmake dpdk
, ormake infiniband
. - Run
./server
at the server, and./client
at the client
- Generate the documentation:
doxygen
- Ethernet/UDP mode:
- DPDK-enabled bare-metal NICs: Use
DTRANSPORT=dpdk
. We have primarily tested Mellanox CX3--CX5 NICs. - DPDK-enabled NICs on Microsoft Azure: Use
-DTRANSPORT=dpdk -DAZURE=on
- DPDK-enabled bare-metal NICs: Use
- RDMA (InfiniBand/RoCE) NICs: Use
DTRANSPORT=infiniband
. AddDROCE=on
if using RoCE.
-
eRPC works well on Azure VMs with accelerated networking.
-
Configure two Ubuntu 18.04 VMs as below. Use the same resource group and availability zone for both VMs.
- Uncheck "Accelerated Networking" when launching each VM from the Azure
portal (e.g., F32s-v2). For now, this VM should have just the control
network (i.e.,
eth0
) andlo
interfaces. - Add a NIC to Azure via the Azure CLI:
az network nic create --resource-group <your resource group> --name <a name for the NIC> --vnet-name <name of the VMs' virtual network> --subnet default --accelerated-networking true --subscription <Azure subscription, if any> --location <the VM's availability zone>
- Stop the VM launched earlier, and attach the NIC created in the previous step to the VM (i.e., in "Networking" -> "Attach network interface").
- Re-start the VM. It should have a new interface called
eth1
, which eRPC will use for DPDK traffic.
- Uncheck "Accelerated Networking" when launching each VM from the Azure
portal (e.g., F32s-v2). For now, this VM should have just the control
network (i.e.,
-
Prepare DPDK 21.11
- rdma-core must be installed
from source. We recommend the tag
stable-v40. First, install its dependencies listed in rdma-core's README. Then, in the
rdma-core` directory:cmake .
sudo make install
- Install upstream pre-requisite libraries and modules:
sudo apt install make cmake g++ gcc libnuma-dev libgflags-dev numactl
sudo modprobe ib_uverbs
sudo modprobe mlx4_ib
- Download the DPDK tarball and extract it. Other DPDK versions are not supported.
- Edit
config/common_base
by changingCONFIG_RTE_LIBRTE_MLX5_PMD
andCONFIG_RTE_LIBRTE_MLX4_PMD
toy
instead ofn
. - Build and locally install DPDK:
- rdma-core must be installed
from source. We recommend the tag
export RTE_SDK=<some dpdk directory>
git clone --depth 1 --branch 'v21.11' https://github.com/DPDK/dpdk.git "${RTE_SDK}"
cd "${RTE_SDK}"
meson build -Dexamples='' -Denable_kmods=false -Dtests=false -Ddisable_drivers='raw/*,crypto/*,baseband/*,dma/*'
cd build/
DESTDIR="${RTE_SDK}/build/install" ninja install
- Create hugepages:
sudo bash -c "echo 2048 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages"
sudo mkdir /mnt/huge
sudo mount -t hugetlbfs nodev /mnt/huge
- Build eRPC's library and latency benchmark:
cmake . -DTRANSPORT=dpdk -DAZURE=on
make
make latency
- Create the file
scripts/autorun_process_file
like below. Here, do not use the IP addresses of the accelerated NIC (i.e., not ofeth1
).
<Public IPv4 address of VM #1> 31850 0
<Public IPv4 address of VM #2> 31850 0
- Run the eRPC application (the latency benchmark by default):
- At VM #1:
./scripts/do.sh 0 0
- At VM #2:
./scripts/do.sh 1 0
- At VM #1:
- The
apps
directory contains a suite of benchmarks and examples. The instructions below are for this suite of applications. eRPC can also be simply linked as a library instead (seehello_world/
for an example). - To build an application, create
scripts/autorun_app_file
and change its contents to one of the available directory names inapps/
. Seescripts/example_autorun_app_file
for an example. Then generate a Makefile usingcmake . -DTRANSPORT=dpdk/infiniband
. - Each application directory in
apps/
contains a config file that must specify all flags defined inapps/apps_common.h
. For example,num_processes
specifies the total number of eRPC processes in the cluster. - The URIs of eRPC processes in the cluster are specified in
scripts/autorun_process_file
. Each line in this file must be<hostname> <management udp port> <numa_node>
. - Run
scripts/do.sh
for each process:- With single-CPU machines:
num_processes
machines are needed. Runscripts/do.sh <i> 0
on machinei
in{0, ..., num_processes - 1}
. - With dual-CPU machines:
num_machines = ceil(num_processes / 2)
machines are needed. Runscripts/do.sh <i> <i % 2>
on machine i in{0, ..., num_machines - 1}
.
- With single-CPU machines:
- To automatically run an application at all processes in
scripts/autorun_process_file
, runscripts/run-all.sh
. For some applications, statistics generated in a run can be collected and processed usingscripts/proc-out.sh
.
- GitHub issues are preferred over email. Please include the following
information in the issue:
- NIC model
rdma_core
version and DPDK version- Operating system
Anuj Kalia
Copyright 2018, Carnegie Mellon University
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.