Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ethtool counters related to RDMA/ROCE #3137

Open
gangxie112 opened this issue Sep 29, 2024 · 4 comments
Open

Add the ethtool counters related to RDMA/ROCE #3137

gangxie112 opened this issue Sep 29, 2024 · 4 comments

Comments

@gangxie112
Copy link

Hi,

It seems that some important metrics in ethtool related to the RDMA/ROCE are not supported, such as tx.pause.ctrl.phy,rx.prio5.pause and etc. Those counters are very important in ROCE network and included in physical/priority port counter.

So, we we have any plan to support them?

@discordianfish
Copy link
Member

Dunno how ethtool retrieves them but if there is a way to retrieve them not requiring privileges we're open to a PR for that

@dswarbrick
Copy link
Contributor

dswarbrick commented Oct 1, 2024

Is tx_pause_ctrl_phy vendor or model specific? The only reference to it I can find is for the Mellanox ConnectX series of NICs which use the mlx5 driver, https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/mellanox/mlx5/counters.html

In addition to the basic set of ethtool counters which are mature and implemented by pretty much every NIC, there are also quite a few vendor-specific ethtool stats / options.

@gangxie112
Copy link
Author

Yes, those metrics are proprietary to specific nic vendors. But since some nics are widely used, we should at least consider some other way to support it, such as adding a plugging framework. At this time, users have to develop a agent to gather and push the metrics. This is typical way adopted by many cloud providers as far as I know.

@dswarbrick
Copy link
Contributor

The textfile collector feature is arguably the "plugin framework" in node_exporter.

Implementing support natively for vendor- / hardware-specific counters is tricky without having access to said hardware for testing. I would suggest either attempting to implement this yourself (assuming that you have access to such hardware, and are a reasonably proficient Go developer), or loan some hardware to a developer who is willing to do the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants