-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use recyclable & paginated buffers for search hits #99590
Comments
Pinging @elastic/es-search (Team:Search) |
Those source bytes are normally compressed, so the situation becomes more difficult when we decompress it, as it generates yet another bigger plain BytesReference:
|
That's true, but it isn't hard to decompress a recyclable |
What will be harder is that those plain bytes come all the way from lucene: I hope we can tell lucene to give as a reference on the shape of a DataInput or similar (like in this proposal apache/lucene#12460)? |
Yeah that's certainly another place worth fixing. But the problems in this area are particularly noticeable on coordinating nodes today, so fixing this in the transport layer will already help. And the work to give |
Moving towards ref counted search responses, this removes all direct references to SearchResponse from ML tests. The remaining references in ML tests are for mocked search responses. Additionally, while making this change, I noticed there are multiple places in ML production code that have the `SearchResponse searchResponse = ` pattern. Meaning, once we refCount `SearchResponse`, those places will have to be updated as well (in a future PR). Related to: #100966 Related to: #99590
Moving towards ref counted search responses, this removes all direct references to SearchResponse from ML tests. The remaining references in ML tests are for mocked search responses. Additionally, while making this change, I noticed there are multiple places in ML production code that have the `SearchResponse searchResponse = ` pattern. Meaning, once we refCount `SearchResponse`, those places will have to be updated as well (in a future PR). Related to: elastic#100966 Related to: elastic#99590
@original-brownbear can we close this? |
What's left to do (credits to @original-brownbear ): remove all remaining uses of SearchHit#asUnpooled if we wanted to be thorough + make Lucene read sources to pooled buffers as well |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Today
SearchHit#source
is a plainBytesReference
which is probably inappropriate since hits are occasionally large and often quite short-lived. For instance, if fetching hits from a remote node, the coordinator will read this field into a singlenew byte[]
allocated within here:elasticsearch/server/src/main/java/org/elasticsearch/search/SearchHit.java
Line 152 in 1925712
Those allocations can be humongous and will impose quite some load on the GC. We should instead consider move to using recyclable and paginated buffers for search hits. In terms of network reads that means using
StreamInput#readReleasableBytesReference()
and anywhere we're allocating it locally we should use the node'sBigArrays
. Unfortunately that has quite wide-reaching consequences, since it would mean thatSearchHit
must now becomeReleasable
, as must all its owning classes, so that the buffers can be recycled once they're no longer in use.This has substantial overlap with #89656 since a large part of the work involved in tracking this memory usage for circuit-breaking purposes would also require us to add such a lifecycle to
SearchHit
and its owners.The text was updated successfully, but these errors were encountered: