Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory problem on ConcurrentHClientPool using HThriftClient with TFramedTransport #665

Open
flefilla opened this issue Aug 28, 2014 · 3 comments

Comments

@flefilla
Copy link

In version 1.1-4, when ConcurrentHClientPool release HClient, if it is opened, it is pooled in availableClientQueue.

If we are in case of HThrifClient with TTransport wrapped with TFramedTransport, TMemoryInputTransport readBuffer_ keeps datas of operations done by HClient.

These given datas multiplied by connection's number can increase quickly the memory.

Why doesn't clear readBuffer_ on HClient release ?

I have 1 additional question,
Why max active connection is divided by 3 to obtain HClient number by host ?

Thanks

@zznate
Copy link
Collaborator

zznate commented Aug 28, 2014

Why doesn't clear readBuffer_ on HClient release ?

It keeps the data, but it is overwritten during the next usage.

This is a "feature" of that version of Thrift. It keeps the underlying byte[] to avoid having to re-allocate/re-grow. The problem, as you have discovered, is that they will grow out to except a larger payload, but they will not shrink, doing so all the way out to the max message length (15mb by default).

Why max active connection is divided by 3 to obtain HClient number by host ?

Did not need to have all MAX_CONNECTIONS threads allocated and that seemed a good number from empirical observation of adding a service into a running architecture. That was a good guess it seemed as no one has yet had a big enough issue with it to want to add a MIN_CONNECTIONS or similar :)

@flefilla
Copy link
Author

Thank you for the answer.

So, the maximum (approximately ) retained heap by the ConcurrentHClientPool using HThriftClient with TFramedTransport follows this rule :
HOSTS_NUMBER * (MAX_ACTIVE_CONNECTION / 3) * MAX_MESSAGE_LENGTH ?

@zznate
Copy link
Collaborator

zznate commented Aug 28, 2014

Yes - exactly that.

This by itself may be a reason to incorporate the DataStax Java Driver for simple operations in your code as well, maintaining a much smaller pool of hector connections for large batch mutates or getting at dynamic columns easier.

Further, the binary protocol for CQL uses evented IO via Netty on the client and server so is significantly more efficient resource wise.

That said, despite what you may read elsewhere, using raw thrift is more performant and flexible if (a really big "if" there) you understand the underlying storage model and its limits.

There's really no reason you can't use both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants