This repository has been archived by the owner on Oct 10, 2023. It is now read-only.
Event Streams access controller loops on failure causing other clients to be disconnected #158
Labels
bug
Something isn't working
Issue Description
Consumers connected to Event Streams 2019.N see frequent consumer group re-balances caused by heartbeat expiration, after an application using an invalid API key was introduced to the environment.
Multi second gaps are seen in client side logs, at trace level, between a heartbeat being sent to the Kafka Broker, and the heartbeat being responded to.
A delay is also seen between the heartbeat being sent by the client and the log line in the Kafka Broker showing the heartbeat has arrived at the broker.
This caused the client heartbeat to expire, as it was not received by the broker within the time specified by session.timeout.ms
Similar errors were also seen with the poll interval also timing out.
Issue Resolution
Kafka has a number of Processor threads, which are used for processing work arriving with the broker.
The same thread was being used to process heartbeats as was being used for authorization checks.
The custom Kafka authorizer calls out to the Event Streams Access controller, which in turn calls out to IAM. If IAM returns an error then Access Controller retries twice more, with a 3 second gap between the retries.
The processing thread was blocking on the many failing authorization calls, which caused delays to processing of the heartbeats and the heartbeat expiration and slow responses to the client.
The fix was to update Access Controller to not retry if an error was returned saying the API Key was invalid.
Workaround
Ensure API keys in use are all valid - this prevents the looping that causes the delay
OR
Increase session.timeout.ms to a larger value, so that the heartbeat does not expire
It may also be beneficial to increase max.poll.interval.ms, as this can also time out if it was previously set to a smaller than default value.
Fix details
IBM Internal Issue Number - 6780
Fix target - Not yet available
The text was updated successfully, but these errors were encountered: