Skip to content

Commit

Permalink
merger
Browse files Browse the repository at this point in the history
  • Loading branch information
helight committed Jul 7, 2022
2 parents 09bd9ca + 763c0dc commit c156857
Show file tree
Hide file tree
Showing 31 changed files with 7,061 additions and 4 deletions.
78 changes: 75 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,8 @@ on:
pull_request_review:
workflow_dispatch:


jobs:
build:
lmp-build:
runs-on: ubuntu-18.04

steps:
Expand All @@ -30,5 +29,78 @@ jobs:

- name: make lmp
run: |
go build -o lmp main.go
cd eBPF_Visualization/eBPF_server
make all
sidecar-build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.18

- name: Build
run: |
cd eBPF_Supermarket/sidecar/
go build -v ./...
- name: Test
run: |
cd eBPF_Supermarket/sidecar/
go test -v ./...
sidecar-integration-test-with-minikube-and-istio:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.18

- name: Start minikube
id: minikube
uses: medyagh/setup-minikube@master

- name: Set Up istio
run : |
# https://istio.io/latest/docs/setup/getting-started/
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.14.1
export PATH=$PWD/bin:$PATH
istioctl install --set profile=demo -y
kubectl label namespace default istio-injection=enabled
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
- name: Wait for istio
run: |
kubectl get nodes
kubectl get pods -owide -A
kubectl get services
test_pod() {
while [[ $(kubectl get pods -l $1 -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}') != "True" ]]
do
echo "waiting for pod"
sleep 5
kubectl get pods -owide -A;
done
return 0
}
test_pod app=ratings
echo "ratings is done"
test_pod app=productpage
echo "productpage is done"
kubectl exec "$(kubectl get pod -l app=ratings -o jsonpath='{.items[0].metadata.name}')" -c ratings -- curl -sS productpage:9080/productpage | grep -o "<title>.*</title>"
- name: Run
run: |
export MINIKUBE_STARTED=TRUE
eval $(minikube -p minikube docker-env)
env
docker ps -a
ps axjf
cd eBPF_Supermarket/sidecar/
go run main.go
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "eBPF_Supermarket/sidecar/libbpf"]
path = eBPF_Supermarket/sidecar/libbpf
url = https://github.com/libbpf/libbpf
4 changes: 3 additions & 1 deletion eBPF_Supermarket/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,6 @@
| eBPF数据收集器 | eBPF_data_collector | eBPF数据收集器 | 赵晨雨 |
| 基于eBPF的网络拥塞观测与排查 | Network congestion observation and troubleshooting based on eBPF | 通过监控内核中的网络延迟抖动、网络拥塞状态机的切换帮忙网络问题排查 | 董旭 |
| eBPF初学者体验环境 | tryebpf | 提供bpftrace、BCC等适合初学者上手使用的线上环境 | 白宇宣 |
| Interrupt Exception | Interrupt_exception | 采集Linux系统异常中断相关信息、包括中断类型号以及函数调用栈等 | 张翔哲 |
| Interrupt Exception | Interrupt_exception | 采集Linux系统异常中断相关信息、包括中断类型号以及函数调用栈等 | 张翔哲 |
| 基于 eBPF 的云原生场景下 Pod 性能监测 | cilium_ebpf_probe | 针对云原生领域下的Pod提供进程级别的细粒度监测 | 汪雨薇 |
| 基于 eBPF 的云原生场景下 sidecar 性能监测 | sidecar | 针对云原生场景下 sidecar 的工作原理进行内核层面的观测和优化 | [@ESWZY](https://github.com/ESWZY) |
108 changes: 108 additions & 0 deletions eBPF_Supermarket/cilium_ebpf_probe/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# 基于 eBPF 的云原生场景下 Pod 性能监测

---

**目录**

[toc]

---

# 1.项目背景

​ Kubernetes 是云原生领域的容器编排系统,内部原理复杂但对外界使用者透明。其中pod是kubernetes 项目的原子调度单位,因此,容器级别的数据监测是 Kubernetes 集群中十分重要的一环。当容器出现资源占用过多、性能下降等问题时,目前现有的APM工具能得到的Metric数据,大多为统计型数据,Pod 对应的进程网络传输报文数、网络状态等总体情况,而不能定位到问题出现在哪个阶段,并且如Promethues和Zabbix等工具主要数据来源是通过/proc进行挖掘,存在检测效率低、性能差等问题。而一些特有的内核函数执行结果与实践,内容珍贵,例如`do fork`内核函数返回报错,则代表内核可用PID耗尽,`count_mounts`内核函数返回文件数量报错则代表加载数量超过内核限制,值得挖掘。

​ 例如,当前的众多云业务架构因为分工问题,容易出现服务数量多,服务关系复杂的现象,出现无法确定特定服务的下游依赖服务是否正常、无法回答应用之间的连通性是否正确等问题。因此,需要以容器为核心,采集关联的 Kubernetes 可观测数据,与此同时,向下采集容器相关进程的系统和网络可观测数据,向上采集容器相关应用的性能数据,通过关联关系将其串联起来,完成端到端可观测数据的覆盖,在过程中能够追寻trace。

# 2.解决方案

1. 提取系统调用中网络栈相关的黄金指标(latency、traffic、errors、saturation),如流量、重传、RTT、丢包率,以及错误数、慢调用数、请求数、半连接数量、全连接数量等内容;
1. 在必要的情况下,提取网络栈中进程的调用函数,确定各部分的资源损耗和工作时延。
1. 在用户层,将探针结果与应用态信息匹配,包括Docker、Pod等内容。

# 3.方法设计

## 3.1内核网络跟踪指标

​ 数据包在内核中使用`sk_buff`结构体来传递,网络套接字是用`sock`结构体定义的,该结构体在各网络协议结构体的开头部分存放,如`tcp_sock`结构体,在`tcp_prot``udp_prot`部分挂载了网络协议,需要保证查看的`socket`结构体的状态处于full套接字状态,才能得到有效数据。

### 3.1.1可供追踪的基础指标

1. 可从各个不同的角度测量网络延迟:DNS延迟、连接延迟、首字节延迟、软件栈各层之间的延迟等,并在**有负载的情况和空闲网格中**分别测量这些延迟,以进行比较;
2. TCP连接创建的相关事件,跟踪`sock:inet_sock_set_state`跟踪点等,检查状态从`TCP_CLOSE``TCP_SYN_SENT`状态的变化,跟踪新TCP连接的建立和时长;
3. TCP的被动连接连接,跟踪`inet_csk_accept`内核函数,检测TCP监听溢出情况;
4. TCP的连接时长,根据`sock:inet_sock_set_state`,当状态变为`TCP_CLOSE`的时候就可以进行信息的抓取;
5. 跟踪TCP重传或者其他的TCP事件,如`tcp_drop``skb:kfree_skb`等跟踪点;

### 3.1.2其他信息

1. 采样内核调用栈信息来分析网络相关代码路径所占的时间比例;
2. 当某些情况异常时,可以对skb结构体的生命周期时长,进行短期监控,显示内核网络栈中是否存在延迟情况以及锁等待的情况;对网络设备的发送延迟进行统计,测量网络包发送到设备队列`net:net_dev_start_xmit`和设备发送完成`skb:consume_skb`跟踪点之间的时间差
3. 使用高频CPU性能分析抓取内核调用栈信息,以量化CPU资源在网络协议和驱动程序之间的使用情况,如off-cpu情况、cgroup限制的影响、页错误等情况。

### 3.1.3HTTP请求采集

​ 在TCP的基础上,对HTTP请求进行采集。主要分为三步:数据采集、请求/响应关联和请求/响应解析。

#### (1)数据采集

​ HTTP服务当接收到请求时会有accept/read/write/close等函数执行,这些函数最终执行内核的系统调用,例如一次请求会通过read接收HTTP请求,并通过write进行日志输出以及返回HTTP结果,因此分别通过`ssize _t read(int fd,void* buf,ssize_t count)``ssize _t write(int fd,void* buf,ssize_t count)`的追踪可以转换为event。

#### (2)请求/相映关联

​ 常规的TCP请求都会用同一个fd进行通信,只需根据进程号和fd能关联同一个请求和响应。在同一个socket fd上的read系统调用和wrtie系统调用即可得到应用在处理该socket上请求耗时,并将该次请求与返回封装成排障trace。通过耗时、返回码等业务层语义能够确定每次eBPF排障trace是否异常:

#### (3)请求/响应解析

​ 通过**协议识别****协议解析**完成。

​ 在协议识别部分,可以通过特征或关键字快速匹配协议,对于HTTP请求来说,通过HTTP版本号(HTTP/1.0或HTTP/1.1)可以快速识别协议。

​ 在协议解析部分,是为了产生指标用于后续分析,在解析过程中需根据协议自身的格式进行解析。

## 3.2 应用层关联

​ 容器是操作系统级别的虚拟化。在Linux中涉及namespace对系统的分区,通常与cgroup结合使用进行资源控制。而Docker容器和Pod依赖于进程模型和资源限制手段cgroups。因此,本项目从pod和docker容器进程模型和资源限制手段cgroups入手,研究容器编排平台工作单元的工作状态。

​ 对于Docker而言:

1. 可以通过内核中针对cgroup的跟踪点,包括`cgroup:cgroup_setup_root``cgroup:cgroup_attach_task`帮助调试容器的启动,同时可以使用BPF_TYPE_CGROUP_SKB程序类型,附加到cgroup入口点和出口点上处理网络数据包。

2. 可以通过PID命名空间来区分容器,nsproxy结构体中的`pid_ns_for_children``/proc/PID/ns/pid_for_children`符号链接指向的PID命名空间相匹配。

3. 可以通过UTS命名空间来区分容器,nsproxy结构体中的`uts_ns`与容器名称nodename一致。

​ 对于Pod而言:

1. 可以通过网络命名空间识别kubernetes Pod,在同一个Pod中的容器共享同样的网络命名空间。
2. 可以通过Kubernetes的API server验证,通过PodName得到PodStatus,从而得到Pod对应的Container ID,在Docker下根据Container ID得到对应的Pid以及Children Pid对应的Container的子进程。

​ 完成以上关联后,可以连接到k8s相关的resource,并且将fd操作的网络事件四元组信息进行提取。

## 3.3 数据收集和处理

​ 将数据导出为易于处理的格式,如JSON、CSV等。

# 时间规划

预研阶段(06月15日-6月30日)

* [ ] 熟悉BPF开发知识,cilium/ebpf的开发知识和框架内容;
* [ ] 熟悉对Kubernetes Client的调用;
* [ ] 熟悉Linux网络栈的基础调用知识和流程;

研发第一阶段(07月01日 - 07月31日)

* [ ] 使用BPF对HTTP层数据进行采集:以Golang net/http库为基础,通过uprobe,获取HTTP相关信息,对于HTTP2,则需获取进行HPACK压缩前的明文数据;
* [ ] 通过Kubernetes Client,将指标与单机内Docker、单机内Pod进行关联,进行应用层关联;

研发第二阶段(08月01日 - 07月31日)

* [ ] 使用BPF对对基础指标进行追踪;
* [ ] 在添加负载或基础指标异常的情况下,使用BPF对进阶指标进行追踪;

研发第三阶段(09月01日 - 09月30日)

* [ ] 探索关联真实的网络请求,并针对数据分析深层延迟原因(如CPU、内存等内核调用二次导致);
* [ ] 整理开发文档、数据可视化接口等。

10 changes: 10 additions & 0 deletions eBPF_Supermarket/cilium_ebpf_probe/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module github.com/WYuei/cilium_ebpf_probe

go 1.18

require (
github.com/cilium/ebpf v0.9.0
github.com/gorilla/mux v1.8.0
)

require golang.org/x/sys v0.0.0-20210906170528-6f6e22806c34 // indirect
12 changes: 12 additions & 0 deletions eBPF_Supermarket/cilium_ebpf_probe/go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
github.com/cilium/ebpf v0.9.0 h1:ldiV+FscPCQ/p3mNEV4O02EPbUZJFsoEtHvIr9xLTvk=
github.com/cilium/ebpf v0.9.0/go.mod h1:+OhNOIXx/Fnu1IE8bJz2dzOA+VSfyTfdNUVdlQnxUFY=
github.com/frankban/quicktest v1.14.0 h1:+cqqvzZV87b4adx/5ayVOaYZ2CrvM4ejQvUdBzPPUss=
github.com/google/go-cmp v0.5.6 h1:BKbKCqvP6I+rmFHt06ZmyQtvB8xAkWdhFyr0ZUNZcxQ=
github.com/gorilla/mux v1.8.0 h1:i40aqfkR1h2SlN9hojwV5ZA91wcXFOvkdNIeFDP5koI=
github.com/gorilla/mux v1.8.0/go.mod h1:DVbg23sWSpFRCP0SfiEN6jmj59UnW/n46BH5rLB71So=
github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/rogpeppe/go-internal v1.6.1 h1:/FiVV8dS/e+YqF2JvO3yXRFbBLTIuSDkuC7aBOAvL+k=
golang.org/x/sys v0.0.0-20210906170528-6f6e22806c34 h1:GkvMjFtXUmahfDtashnc1mnrCtuBVcwse5QV2lUk/tI=
golang.org/x/sys v0.0.0-20210906170528-6f6e22806c34/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 h1:E7g+9GITq07hpfrRu66IVDexMakfv52eLZ2CXBWiKr4=
32 changes: 32 additions & 0 deletions eBPF_Supermarket/cilium_ebpf_probe/headers/LICENSE.BSD-2-Clause
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Valid-License-Identifier: BSD-2-Clause
SPDX-URL: https://spdx.org/licenses/BSD-2-Clause.html
Usage-Guide:
To use the BSD 2-clause "Simplified" License put the following SPDX
tag/value pair into a comment according to the placement guidelines in
the licensing rules documentation:
SPDX-License-Identifier: BSD-2-Clause
License-Text:

Copyright (c) <year> <owner> . All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
99 changes: 99 additions & 0 deletions eBPF_Supermarket/cilium_ebpf_probe/headers/bpf_endian.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
#ifndef __BPF_ENDIAN__
#define __BPF_ENDIAN__

/*
* Isolate byte #n and put it into byte #m, for __u##b type.
* E.g., moving byte #6 (nnnnnnnn) into byte #1 (mmmmmmmm) for __u64:
* 1) xxxxxxxx nnnnnnnn xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx mmmmmmmm xxxxxxxx
* 2) nnnnnnnn xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx mmmmmmmm xxxxxxxx 00000000
* 3) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 nnnnnnnn
* 4) 00000000 00000000 00000000 00000000 00000000 00000000 nnnnnnnn 00000000
*/
#define ___bpf_mvb(x, b, n, m) ((__u##b)(x) << (b-(n+1)*8) >> (b-8) << (m*8))

#define ___bpf_swab16(x) ((__u16)( \
___bpf_mvb(x, 16, 0, 1) | \
___bpf_mvb(x, 16, 1, 0)))

#define ___bpf_swab32(x) ((__u32)( \
___bpf_mvb(x, 32, 0, 3) | \
___bpf_mvb(x, 32, 1, 2) | \
___bpf_mvb(x, 32, 2, 1) | \
___bpf_mvb(x, 32, 3, 0)))

#define ___bpf_swab64(x) ((__u64)( \
___bpf_mvb(x, 64, 0, 7) | \
___bpf_mvb(x, 64, 1, 6) | \
___bpf_mvb(x, 64, 2, 5) | \
___bpf_mvb(x, 64, 3, 4) | \
___bpf_mvb(x, 64, 4, 3) | \
___bpf_mvb(x, 64, 5, 2) | \
___bpf_mvb(x, 64, 6, 1) | \
___bpf_mvb(x, 64, 7, 0)))

/* LLVM's BPF target selects the endianness of the CPU
* it compiles on, or the user specifies (bpfel/bpfeb),
* respectively. The used __BYTE_ORDER__ is defined by
* the compiler, we cannot rely on __BYTE_ORDER from
* libc headers, since it doesn't reflect the actual
* requested byte order.
*
* Note, LLVM's BPF target has different __builtin_bswapX()
* semantics. It does map to BPF_ALU | BPF_END | BPF_TO_BE
* in bpfel and bpfeb case, which means below, that we map
* to cpu_to_be16(). We could use it unconditionally in BPF
* case, but better not rely on it, so that this header here
* can be used from application and BPF program side, which
* use different targets.
*/
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
# define __bpf_ntohs(x) __builtin_bswap16(x)
# define __bpf_htons(x) __builtin_bswap16(x)
# define __bpf_constant_ntohs(x) ___bpf_swab16(x)
# define __bpf_constant_htons(x) ___bpf_swab16(x)
# define __bpf_ntohl(x) __builtin_bswap32(x)
# define __bpf_htonl(x) __builtin_bswap32(x)
# define __bpf_constant_ntohl(x) ___bpf_swab32(x)
# define __bpf_constant_htonl(x) ___bpf_swab32(x)
# define __bpf_be64_to_cpu(x) __builtin_bswap64(x)
# define __bpf_cpu_to_be64(x) __builtin_bswap64(x)
# define __bpf_constant_be64_to_cpu(x) ___bpf_swab64(x)
# define __bpf_constant_cpu_to_be64(x) ___bpf_swab64(x)
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
# define __bpf_ntohs(x) (x)
# define __bpf_htons(x) (x)
# define __bpf_constant_ntohs(x) (x)
# define __bpf_constant_htons(x) (x)
# define __bpf_ntohl(x) (x)
# define __bpf_htonl(x) (x)
# define __bpf_constant_ntohl(x) (x)
# define __bpf_constant_htonl(x) (x)
# define __bpf_be64_to_cpu(x) (x)
# define __bpf_cpu_to_be64(x) (x)
# define __bpf_constant_be64_to_cpu(x) (x)
# define __bpf_constant_cpu_to_be64(x) (x)
#else
# error "Fix your compiler's __BYTE_ORDER__?!"
#endif

#define bpf_htons(x) \
(__builtin_constant_p(x) ? \
__bpf_constant_htons(x) : __bpf_htons(x))
#define bpf_ntohs(x) \
(__builtin_constant_p(x) ? \
__bpf_constant_ntohs(x) : __bpf_ntohs(x))
#define bpf_htonl(x) \
(__builtin_constant_p(x) ? \
__bpf_constant_htonl(x) : __bpf_htonl(x))
#define bpf_ntohl(x) \
(__builtin_constant_p(x) ? \
__bpf_constant_ntohl(x) : __bpf_ntohl(x))
#define bpf_cpu_to_be64(x) \
(__builtin_constant_p(x) ? \
__bpf_constant_cpu_to_be64(x) : __bpf_cpu_to_be64(x))
#define bpf_be64_to_cpu(x) \
(__builtin_constant_p(x) ? \
__bpf_constant_be64_to_cpu(x) : __bpf_be64_to_cpu(x))

#endif /* __BPF_ENDIAN__ */
Loading

0 comments on commit c156857

Please sign in to comment.