You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before anything, excellent project! quite interesting.
I was doing some scaling tests, I noticed that for a router handling ~100 peers with ~3000 routes per peer, when I bounced all bgp sessions (or restart the bmp collector), it takes a lot of time (~40-50min) to the collector to dump all information on kafka. (bgp peer down/up event and bgp route updates)
Checking the logs on the openbmp collector it seems that bgp tear down/up events takes a lot longer to process than the bgp route updates. (is this expected?)
For instance, the following logs shows that it takes 10 seconds to process a peer down event...
2018-07-20T11:07:24.445246 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.5: BGP peer down notification with reason code: 1
2018-07-20T11:07:34.456889 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.7: BGP peer down notification with reason code: 1
2018-07-20T11:07:44.464163 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.37: BGP peer down notification with reason code: 1
Meanwhile route updates goes quite fast...
Checking the router side using logs and counters, the router dump all bmp events in just ~3-4 min, however until all peer down/up are being processes the collector does not begin with any route update processing. (this is also expected?)
Thanks in advance
and Regard
The text was updated successfully, but these errors were encountered:
3fr61n
changed the title
Handling peer down/up events seems in scaling tests bouncing bgp peers
Handling peer down/up events seems harder to parse in scaling tests bouncing bgp peers
Jul 20, 2018
The 10 second gap between peer down events must be on the router side. The collector does not cache or store (eg. maintain a rib). The collector is just a real time pass though of bmp/bgp messages, The delay that we would see is on the consumer side (eg. DB such as Postgres or MySQL). The openbmp log messages indicating a 10 second gap must be some router/sender causing that. Which router/version are you using? Can you send me a pcap trace at [email protected]?
Hi @TimEvens
Before anything, excellent project! quite interesting.
I was doing some scaling tests, I noticed that for a router handling ~100 peers with ~3000 routes per peer, when I bounced all bgp sessions (or restart the bmp collector), it takes a lot of time (~40-50min) to the collector to dump all information on kafka. (bgp peer down/up event and bgp route updates)
Checking the logs on the openbmp collector it seems that bgp tear down/up events takes a lot longer to process than the bgp route updates. (is this expected?)
For instance, the following logs shows that it takes 10 seconds to process a peer down event...
2018-07-20T11:07:24.445246 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.5: BGP peer down notification with reason code: 1
2018-07-20T11:07:34.456889 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.7: BGP peer down notification with reason code: 1
2018-07-20T11:07:44.464163 | NOTICE | parsePeerDownEventHdr | sock=16 : 10.0.0.37: BGP peer down notification with reason code: 1
Meanwhile route updates goes quite fast...
Checking the router side using logs and counters, the router dump all bmp events in just ~3-4 min, however until all peer down/up are being processes the collector does not begin with any route update processing. (this is also expected?)
Thanks in advance
and Regard
The text was updated successfully, but these errors were encountered: