Skip to content
This repository has been archived by the owner on Oct 10, 2023. It is now read-only.

GeoReplicator tasks stop and errors seen at source Kafka #70

Open
nictownsend opened this issue Apr 9, 2020 · 0 comments
Open

GeoReplicator tasks stop and errors seen at source Kafka #70

nictownsend opened this issue Apr 9, 2020 · 0 comments
Labels
2019.4.1 Issues targetted to be fixed in 2019.4.1 bug Something isn't working

Comments

@nictownsend
Copy link
Member

nictownsend commented Apr 9, 2020

Issue Description

Georeplicator status shows replicators tasks are failing and they have to be manually restarted.

The replicator-deploy logs show:

[2020-02-28 15:05:08,141] ERROR WorkerSourceTask{id=<redacted>} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Unexpected error in commit: The server experienced an unexpected error when processing the request.
	at com.ibm.eventstreams.replicator.ReplicatorTask.poll(ReplicatorTask.java:188)
	at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:245)
	at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:221)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
	at java.util.concurrent.FutureTask.run(FutureTask.java:277)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(Thread.java:812)
[2020-02-28 15:05:08,141] ERROR WorkerSourceTask{id=<redacted>} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)

The source Kafka brokers show errors at the same time:

[2020-04-02 16:56:59,479] ERROR [KafkaApi-2] Error when handling request: clientId=<redacted>, correlationId=<redacted>, api=OFFSET_COMMIT, body={group_id=<redacted>,generation_id=39,member_id=<member-redacted>} (kafka.server.KafkaApis)
java.util.NoSuchElementException: key not found: <member-redacted>
	at scala.collection.MapLike.default(MapLike.scala:235)
	at scala.collection.MapLike.default$(MapLike.scala:234)
	at scala.collection.AbstractMap.default(Map.scala:63)
	at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
	at kafka.coordinator.group.GroupMetadata.get(GroupMetadata.scala:203)
	at kafka.coordinator.group.GroupCoordinator.$anonfun$tryCompleteHeartbeat$1(GroupCoordinator.scala:927)
	at kafka.coordinator.group.GroupCoordinator$$Lambda$1108.00000000600D0690.apply$mcZ$sp(Unknown Source)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
	at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:198)
	at kafka.coordinator.group.GroupCoordinator.tryCompleteHeartbeat(GroupCoordinator.scala:920)
	at kafka.coordinator.group.DelayedHeartbeat.tryComplete(DelayedHeartbeat.scala:34)
	at kafka.server.DelayedOperation.maybeTryComplete(DelayedOperation.scala:121)
	at kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched(DelayedOperation.scala:388)
	at kafka.server.DelayedOperationPurgatory.checkAndComplete(DelayedOperation.scala:294)
	at kafka.coordinator.group.GroupCoordinator.completeAndScheduleNextExpiration(GroupCoordinator.scala:737)
	at kafka.coordinator.group.GroupCoordinator.completeAndScheduleNextHeartbeatExpiration(GroupCoordinator.scala:730)
	at kafka.coordinator.group.GroupCoordinator.$anonfun$handleHeartbeat$2(GroupCoordinator.scala:486)
	at kafka.coordinator.group.GroupCoordinator$$Lambda$1115.000000005C091790.apply$mcV$sp(Unknown Source)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
	at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:198)
	at kafka.coordinator.group.GroupCoordinator.handleHeartbeat(GroupCoordinator.scala:451)
	at kafka.server.KafkaApis.handleHeartbeatRequest(KafkaApis.scala:1336)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:120)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
	at java.lang.Thread.run(Thread.java:812)

Issue Resolution

https://issues.apache.org/jira/browse/KAFKA-8896 provided a fix.

This is in Kafka 2.3.1 which is available in IBM Event Streams 2019.4.2.

Workaround

Fix details

IBM Internal issue number - 5199
Fix target - 2019.4.2

@nictownsend nictownsend added 2019.4.1 Issues targetted to be fixed in 2019.4.1 bug Something isn't working labels Apr 9, 2020
@EmmaHumber EmmaHumber changed the title GeoReplicator tasks are being killed by errors from source Kafka GeoReplicator tasks stop and errors seen at source Kafka Apr 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
2019.4.1 Issues targetted to be fixed in 2019.4.1 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant