Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu_watcher:添加工具说明文档,更新运行环境 #762

Merged
merged 7 commits into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions eBPF_Supermarket/CPU_Subsystem/blazesym
Submodule blazesym added at 90eb4e
1 change: 1 addition & 0 deletions eBPF_Supermarket/CPU_Subsystem/bpftool
Submodule bpftool added at 06c61e
11 changes: 9 additions & 2 deletions eBPF_Supermarket/CPU_Subsystem/cpu_watcher/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,10 @@ kworker/u256:1 15144 13516 1131
node 14221 2589 3355
```

原理介绍:

[抢占调度原理分析](https://github.com/vvzxy/lmp/blob/develop/eBPF_Supermarket/CPU_Subsystem/cpu_watcher/docs/preempt_time.md)
helight marked this conversation as resolved.
Show resolved Hide resolved

### 3.**统计调度延迟:**

​ 分析系统中进程调度的延迟情况,提供相关统计数据,输出包括当前系统的最大调度延迟、最小调度延迟、平均调度延迟。
Expand All @@ -107,6 +111,9 @@ node 14221 2589 3355
17:31:35 362.039000 217053.545000 6.462000
17:31:36 373.751000 217053.545000 6.462000
```
原理介绍:

[调度延迟原理分析](https://github.com/vvzxy/lmp/blob/develop/eBPF_Supermarket/CPU_Subsystem/cpu_watcher/docs/schedule_delay.md)

### 4.**统计系统调用响应时间:**

Expand Down Expand Up @@ -142,7 +149,7 @@ Time Pid comm syscall_id delay/us

原理介绍:

[lmp/eBPF_Supermarket/CPU_Subsystem/cpu_watcher/docs/mq_delay功能介绍.md at develop · albertxu216/lmp (github.com)](https://github.com/albertxu216/lmp/blob/develop/eBPF_Supermarket/CPU_Subsystem/cpu_watcher/docs/mq_delay功能介绍.md)
[消息队列延迟原理分析](https://github.com/albertxu216/lmp/blob/develop/eBPF_Supermarket/CPU_Subsystem/cpu_watcher/docs/mq_delay功能介绍.md)

### 6.对内核函数schedule()的执行时长进行测试

Expand Down Expand Up @@ -222,4 +229,4 @@ per_len = 1000

如果你也对cpu_watcher或ebpf感兴趣,欢迎加入我们一起开发cpu_watcher工具,希望我们可以共同成长。

**cpu_watcher负责人:**[email protected] [email protected] [email protected]
**cpu_watcher负责人:** [email protected] [email protected] [email protected]
44 changes: 44 additions & 0 deletions eBPF_Supermarket/CPU_Subsystem/cpu_watcher/docs/preempt_time.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## preempt_time工具介绍

​ preempt_time,统计每次系统中抢占调度所用的时间。

### 原理分析

​ 使用 btf raw tracepoint监控内核中的每次调度事件:

```c
SEC("tp_btf/sched_switch")
```

​ btf raw tracepoint 跟常规 raw tracepoint 有一个 最主要的区别是: btf 版本可以直接在 ebpf 程序中访问内核内存, 不需要像常规 raw tracepoint 一样需要借助类似 `bpf_core_read` 或 `bpf_probe_read_kernel` 这样 的辅助函数才能访问内核内存。

```c
int BPF_PROG(sched_switch, bool preempt, struct task_struct *prev, struct task_struct *next)
```

​ 该事件为我们提供了关于抢占的参数preempt,我们可以通过判断preempt的值来决定是否记录本次调度信息。

​ 另一挂载点为kprobe:finish_task_switch,即本次调度切换完成进行收尾工作的函数,在此时通过ebpf map与之前记录的调度信息做差,即可得到本次抢占调度的时间:

```c
SEC("kprobe/finish_task_switch")
```

### 输出效果

可以获取到抢占进程的`pid`与进程名,以及被抢占进程的`pid`,和本次抢占时间,单位纳秒

```
COMM prev_pid next_pid duration_ns
node 14221 2589 3014
kworker/u256:1 15144 13516 1277
node 14221 2589 3115
kworker/u256:1 15144 13516 1125
kworker/u256:1 15144 13516 974
node 14221 2589 2560
kworker/u256:1 15144 13516 1132
node 14221 2589 2717
kworker/u256:1 15144 13516 1206
kworker/u256:1 15144 13516 1131
node 14221 2589 3355
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
## schedule_delay工具介绍

​ schedule_delay工具可以检测该系统当前的调度延迟。即从一个任务具备运行的条件,到真正执行(获得 CPU 的执行权)的这段时间。

​ 实时观测该指标可以帮助我们了解到当前操作系统的负载。

### 原理分析

​ 只需考虑,在何时一个任务会被加入运行队列等待运行。内核提供了两个函数实现这个功能:

- 新建的进程通过调用`wake_up_new_task`,将新创建的任务加入runqueue等待调度。
- 进程从睡眠状态被唤醒时触发,调用`ttwu_do_wakeup`函数,进入runqueue等待调度。

​ 关于这两个函数,内核提供了两个对应的`tracepoint`:

| 内核函数 | 对应`tracepoint` |
| :--------------: | :--------------------: |
| wake_up_new_task | sched:sched_wakeup_new |
| ttwu_do_wakeup | sched:sched_wakeup |

​ 在触发到这两个tracepoint的时候,记录这个进程的信息和进入运行队列的时间。

​ 除此之外,我们还应该考虑到,当一个进程**被迫离开cpu**时,其状态依然是`TASK_RUNNING`,所以在schedule时,我们还要做出判断,决定该进程是否要被记录。

| 内核函数 | 对应`tracepoint` |
| :------: | :----------------: |
| schedule | sched:sched_switch |

​ 在触发到这个tracepoint时,记录此时即将要占用cpu的进程信息,通过ebpf map记录的进入运行队列的时间作差,即调度延迟。在这里还需要判断上一个进程是否要被记录。

```c
if(prev_state == TASK_RUNNING)//判断退出cpu进程的状态
```

​ 最后要注意的是,为了避免map溢出,我们还需要在进程退出的时候,删除map中记录的数据。

| 内核函数 | 对应`tracepoint` |
| :------: | :----------------------: |
| do_exit | sched:sched_process_exit |

### 输出效果

​ 我们可以检测到系统从加载ebpf程序到当前的平均、最大、最小调度时延:

```
TIME avg_delay/μs max_delay/μs min_delay/μs
17:31:28 35.005000 97.663000 9.399000
17:31:29 326.518000 12618.465000 7.994000
17:31:30 455.837000 217053.545000 6.462000
17:31:31 422.582000 217053.545000 6.462000
17:31:32 382.627000 217053.545000 6.462000
17:31:33 360.499000 217053.545000 6.462000
17:31:34 364.805000 217053.545000 6.462000
17:31:35 362.039000 217053.545000 6.462000
17:31:36 373.751000 217053.545000 6.462000
```

Loading
Loading