Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use recyclable & paginated buffers for search hits #99590

Open
DaveCTurner opened this issue Sep 14, 2023 · 8 comments
Open

Use recyclable & paginated buffers for search hits #99590

DaveCTurner opened this issue Sep 14, 2023 · 8 comments
Assignees
Labels
>bug priority:high A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 14, 2023

Today SearchHit#source is a plain BytesReference which is probably inappropriate since hits are occasionally large and often quite short-lived. For instance, if fetching hits from a remote node, the coordinator will read this field into a single new byte[] allocated within here:

Those allocations can be humongous and will impose quite some load on the GC. We should instead consider move to using recyclable and paginated buffers for search hits. In terms of network reads that means using StreamInput#readReleasableBytesReference() and anywhere we're allocating it locally we should use the node's BigArrays. Unfortunately that has quite wide-reaching consequences, since it would mean that SearchHit must now become Releasable, as must all its owning classes, so that the buffers can be recycled once they're no longer in use.

This has substantial overlap with #89656 since a large part of the work involved in tracking this memory usage for circuit-breaking purposes would also require us to add such a lifecycle to SearchHit and its owners.

@DaveCTurner DaveCTurner added >bug :Search/Search Search-related issues that do not fall into other categories labels Sep 14, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Sep 14, 2023
@iverase
Copy link
Contributor

iverase commented Sep 14, 2023

Those source bytes are normally compressed, so the situation becomes more difficult when we decompress it, as it generates yet another bigger plain BytesReference:

this.source = CompressorFactory.uncompressIfNeeded(this.source);

@DaveCTurner
Copy link
Contributor Author

That's true, but it isn't hard to decompress a recyclable BytesReference into another recyclable BytesReference too as long as we can add the plumbing to make the recycler available where it's needed.

@iverase
Copy link
Contributor

iverase commented Sep 14, 2023

What will be harder is that those plain bytes come all the way from lucene:

https://github.com/apache/lucene/blob/008a0d420635239570e09e0862a357f512781c60/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java#L243

I hope we can tell lucene to give as a reference on the shape of a DataInput or similar (like in this proposal apache/lucene#12460)?

@DaveCTurner
Copy link
Contributor Author

I hope we can tell lucene to give as a reference on the shape of a DataInput or similar

Yeah that's certainly another place worth fixing. But the problems in this area are particularly noticeable on coordinating nodes today, so fixing this in the transport layer will already help. And the work to give SearchHit a proper lifecycle will pay dividends in various other ways too.

elasticsearchmachine pushed a commit that referenced this issue Nov 10, 2023
Moving towards ref counted search responses, this removes all direct
references to SearchResponse from ML tests.

The remaining references in ML tests are for mocked search responses.

Additionally, while making this change, I noticed there are multiple
places in ML production code that have the `SearchResponse
searchResponse = ` pattern. Meaning, once we refCount `SearchResponse`,
those places will have to be updated as well (in a future PR).

Related to: #100966 Related
to: #99590
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this issue Nov 13, 2023
Moving towards ref counted search responses, this removes all direct
references to SearchResponse from ML tests.

The remaining references in ML tests are for mocked search responses.

Additionally, while making this change, I noticed there are multiple
places in ML production code that have the `SearchResponse
searchResponse = ` pattern. Meaning, once we refCount `SearchResponse`,
those places will have to be updated as well (in a future PR).

Related to: elastic#100966 Related
to: elastic#99590
@benwtrent
Copy link
Member

@original-brownbear can we close this?

@benwtrent benwtrent added the priority:high A label for assessing bug priority to be used by ES engineers label Jul 9, 2024
@original-brownbear original-brownbear self-assigned this Jul 17, 2024
@javanna
Copy link
Member

javanna commented Jul 17, 2024

What's left to do (credits to @original-brownbear ): remove all remaining uses of SearchHit#asUnpooled if we wanted to be thorough + make Lucene read sources to pooled buffers as well

@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:high A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

6 participants