Use netdata to collect interfaces statistics from remote linux system.
I write this tool because I want one second granularity interface statistics on EdgeRouter.
- Run
ifstat.py
on Linux system which we want to get interface statistics- Modify
INTERFACES
, 'LISTEN' andPORT
according to your need
- Modify
- Deploy netdata plugin and config to a running netdata server
- Copy
netdata/ifstat.chart.py
to netdataplugins.d/python.d.plugin
directory - Edit
netdata/ifstat.conf
and then copy toetc/netdata/python.d
directory
- Copy
ifstat.py
starts a simple socket server on remote linux system- Upon receiving request, it reads various files under
/sys/class/net/<itf>/statistics/
and combines them into csv format and then send back
- Upon receiving request, it reads various files under
netdata/ifstat.chart.py
is a netdata plugin written in Python- It sends requests to the above socket server defined in
netdata/ifstat.conf
, parses response and then generates output for netdata
- It sends requests to the above socket server defined in
Because data collection is triggered by netdata, so always get data that's most current. This is avoids getting data that's not aligned with collection frequency. We can also stop collection simply by removing socket server in config without touching on remote linux system.
By watching output of top:
- For ER-X, collecting 4 interfaces takes less than 1.3% CPU usage
- For ER-12, collecting 8 interfaces takes 1% ~ 2% CPU usage
Memory (RES
) used is less than 4800 bytes.
When hardware offloading enabled, rx/tx bytes/packets statistics may be far below actual value. I guess because packets are forwarded directly by hardware and thus not counted on some interfaces.
For me, the reduced 50us NAT+forwarding latency is more valuable than correct statistics. So I will keep offloading enabled and only disable it when needing correct statistics.
After deploying the python plugin, run following command:
bash plugins.d/python.d.plugin ifstat debug trace
If everything works fine, we will see lines starting with BEGIN
, SET
, etc.
that's for netdata.
The approach I first used turns out to be wrong. In this approach:
- Remote device generates data in a while loop, sleeping a little less than 1 second between every data generations
- Netdata plugin collects data every 1 second
The key problem here is that the generated data does not align with netdata collection interval. This may lead to wrong statistics values. Consider following scenario (suppose time starts at 0 second):
- data generated on 0.01 second
- netdata collects data at 1.00 second
- data generated on 1.01 second
- netdata collects data at 2.02 second
In this case, netdata collects value which represents a 2 seconds interval instead of 1 second. Thus we could see wrong spikes in chart.
The proper way to do this is written in document. The core idea is to collect data at exactly constant rate.
So it's better to let netdata plugin collection trigger data generation. That's why I moved to use Python socket server for moniting EdgeRouter interfaces. Not using SNMP is because update latency for snmpd.
For reference, obselect/ifstat.sh
is kept as an example of collecting data
separately.