Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling peer down/up events seems harder to parse in scaling tests bouncing bgp peers #74

Open
3fr61n opened this issue Jul 20, 2018 · 1 comment
Labels

Comments

@3fr61n
Copy link

3fr61n commented Jul 20, 2018

Hi @TimEvens

Before anything, excellent project! quite interesting.

I was doing some scaling tests, I noticed that for a router handling ~100 peers with ~3000 routes per peer, when I bounced all bgp sessions (or restart the bmp collector), it takes a lot of time (~40-50min) to the collector to dump all information on kafka. (bgp peer down/up event and bgp route updates)

Checking the logs on the openbmp collector it seems that bgp tear down/up events takes a lot longer to process than the bgp route updates. (is this expected?)

For instance, the following logs shows that it takes 10 seconds to process a peer down event...

2018-07-20T11:07:24.445246 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.5: BGP peer down notification with reason code: 1
2018-07-20T11:07:34.456889 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.7: BGP peer down notification with reason code: 1
2018-07-20T11:07:44.464163 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.37: BGP peer down notification with reason code: 1

Meanwhile route updates goes quite fast...

Checking the router side using logs and counters, the router dump all bmp events in just ~3-4 min, however until all peer down/up are being processes the collector does not begin with any route update processing. (this is also expected?)

Thanks in advance
and Regard

@3fr61n 3fr61n changed the title Handling peer down/up events seems in scaling tests bouncing bgp peers Handling peer down/up events seems harder to parse in scaling tests bouncing bgp peers Jul 20, 2018
@TimEvens
Copy link
Contributor

The 10 second gap between peer down events must be on the router side. The collector does not cache or store (eg. maintain a rib). The collector is just a real time pass though of bmp/bgp messages, The delay that we would see is on the consumer side (eg. DB such as Postgres or MySQL). The openbmp log messages indicating a 10 second gap must be some router/sender causing that. Which router/version are you using? Can you send me a pcap trace at [email protected]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants