You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently we saw a issue where all our threads were waiting on waitForCompletion and it never came out of this condition.
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
at me.prettyprint.cassandra.connection.ConcurrentHClientPool.waitForConnection(ConcurrentHClientPool.java:140)
at me.prettyprint.cassandra.connection.ConcurrentHClientPool.borrowClient(ConcurrentHClientPool.java:108)
It appears that there might a bug in Hector which is triggered in a very specific condition:
During high load when pool is fully utilized
During this time if there is a hiccup in the network or communication issue with the node, then at that point hector is not able to add the HClient back to the pool and throws runtime exception, but in borrowClient call of HClient increments prematurely even though cassandraClient may return null.
HClient cassandraClient = availableClientQueue.poll();
int currentActiveClients = activeClientsCount.incrementAndGet();
The above logic then puts it in this condition:
if (currentActiveClients <= cassandraHost.getMaxActive()) {
cassandraClient = createClient();
} else {
// We can't grow so let's wait for a connection to become available.
cassandraClient = waitForConnection();
}
And eventually blocks everything because there are no elements in availableClientQueue.
I think the fix is to increment only after cassandraClient != null.
Let me know if this looks ok?
The text was updated successfully, but these errors were encountered:
int currentActiveClients = activeClientsCount.incrementAndGet();
should really be
int currentActiveClients = availableClientQueue.size();
and then this condition makes sense:
if ( cassandraClient == null ) {
if (currentActiveClients <= cassandraHost.getMaxActive()) {
cassandraClient = createClient();
} else {
// We can't grow so let's wait for a connection to become available.
cassandraClient = waitForConnection();
}
}
Currently we seem to be over incrementing this count and possibility of ending up with bad pool and not noticing it until errors start happening
We have tried the change suggested by @mohitanchlia; however, we encounter another lock competition havoc. availableClientQueue of type ArrayBlockingQueue would internally access the critical section protected by an ReentrantLock. Under heavy traffic of database access, this lock might impact the overall query performance form a client. We ended up using the socket timeout mechanism to allow a thread to exit when its has waited for too long.
Recently we saw a issue where all our threads were waiting on waitForCompletion and it never came out of this condition.
It appears that there might a bug in Hector which is triggered in a very specific condition:
During high load when pool is fully utilized
During this time if there is a hiccup in the network or communication issue with the node, then at that point hector is not able to add the HClient back to the pool and throws runtime exception, but in borrowClient call of HClient increments prematurely even though cassandraClient may return null.
HClient cassandraClient = availableClientQueue.poll();
int currentActiveClients = activeClientsCount.incrementAndGet();
The above logic then puts it in this condition:
And eventually blocks everything because there are no elements in availableClientQueue.
I think the fix is to increment only after cassandraClient != null.
Let me know if this looks ok?
The text was updated successfully, but these errors were encountered: