-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tiered caching blog #3376
Add tiered caching blog #3376
Conversation
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws @pajuric Editorial review complete. Please see my changes and let me know if you have any questions. Thanks!
_posts/2024-10-11-tiered-cache.md
Outdated
|
||
As discussed, on-heap caches have limitations when handling larger datasets. A more effective caching mechanism is *tiered caching*, which uses multiple cache layers, starting with on-heap caching and extending to a disk-based tier. This approach balances performance and capacity, allowing you to store larger datasets without consuming valuable heap memory. | ||
|
||
In the past, using a disk for caching raised concerns because traditional spinning hard drives were slower. However, advancements in storage technology, like modern SSD and NVMe drives, now deliver much faster performance. Although disk access is still slower than memory, the speed gap has narrowed enough that the performance trade-off is minimal and often outweighed by the advantage of increased storage capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"advantage" => "benefit"?
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@pajuric Editorial comments addressed. This blog is ready to publish. |
We still need to add Peter(@peteralfonsi) before we publish this blog, right? |
Yes, I am waiting for his info and will add it to this PR when it's available. |
Signed-off-by: Fanit Kolchina <[email protected]>
@sgup432 @peteralfonsi Peter's bio added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice blog @sgup432! Minor comments to help improve the readability
_posts/2024-10-11-tiered-cache.md
Outdated
|
||
## On-heap caching: A good start, but is it enough? | ||
|
||
On-heap caching in OpenSearch provides a quick, simple, and efficient way to cache data locally on a node. It offers low-latency data retrieval and thereby provides significant performance gains. However, these advantages come with trade-offs, especially as the cache grows, which may lead to performance challenges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, these advantages come with trade-offs, especially as the cache grows, which may lead to performance challenges.
Maybe it is just me, but this line reads slightly incomplete or looks confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Can we reword this to something like below? As it is more specific.
"However, these advantages come with trade-offs, especially as the cache grows in size and reaches its capacity, which may lead to performance challenges due to high evictions and misses."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded version added.
_posts/2024-10-11-tiered-cache.md
Outdated
|
||
## When to use tiered caching | ||
|
||
Because tiered caching currently only applies to the request cache, it's useful when the existing on-heap request cache isn't large enough to store your datasets and you encounter frequent evictions. You can check request cache statistics using the `GET /_nodes/stats/indices/request_cache` endpoint to monitor evictions, hits, and misses. If you notice frequent evictions along with some hits, enabling tiered caching could provide a significant performance boost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we roughly quantify along with some hints
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgup432 Could you address this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws @jainankitk I think we can leave it as it is. I deliberately left it like that i.e. not quantify it as it is hard to say whether tiered cache will only benefit with >50% or >30% cache hit ratio.
_posts/2024-10-11-tiered-cache.md
Outdated
|
||
Tiered caching is especially beneficial in these situations: | ||
|
||
- Your domain experiences many cache evictions and has repeatable queries. You can confirm this by using request cache statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: repeatable -> repeating/repeated ?
_posts/2024-10-11-tiered-cache.md
Outdated
Tiered caching is especially beneficial in these situations: | ||
|
||
- Your domain experiences many cache evictions and has repeatable queries. You can confirm this by using request cache statistics. | ||
- You're working with log analytics or read-only indexes, in which data doesn't change often, and you're encountering frequent evictions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indexes -> indices
in which data doesn't change often
looks redundant I guess. read-only indices is self explanatory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indexes is used per our style guide.
_posts/2024-10-11-tiered-cache.md
Outdated
- Your domain experiences many cache evictions and has repeatable queries. You can confirm this by using request cache statistics. | ||
- You're working with log analytics or read-only indexes, in which data doesn't change often, and you're encountering frequent evictions. | ||
|
||
By default, the request cache only stores aggregation queries. You can enable caching for specific requests by using the `?request_cache=true` query parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stores aggregation queries
stores aggregation query results
_posts/2024-10-11-tiered-cache.md
Outdated
|
||
## What's next? | ||
|
||
While tiered caching is a promising feature, we're actively working on further improvements. We're currently exploring ways to make tiered caching more performant. Future enhancements may include promoting frequently accessed items from the disk cache to the on-heap cache, persisting disk cache data between restarts, or integrating tiered caching with other OpenSearch cache types, such as the query cache. You can follow our progress in [this issue](https://github.com/opensearch-project/OpenSearch/issues/10024). We encourage you to try tiered caching in a non-production environment and to share your feedback to help make this feature more robust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
share your feedback to help make this feature more robust
share your feedback to help improve this feature
_posts/2024-10-11-tiered-cache.md
Outdated
|
||
## What's next? | ||
|
||
While tiered caching is a promising feature, we're actively working on further improvements. We're currently exploring ways to make tiered caching more performant. Future enhancements may include promoting frequently accessed items from the disk cache to the on-heap cache, persisting disk cache data between restarts, or integrating tiered caching with other OpenSearch cache types, such as the query cache. You can follow our progress in [this issue](https://github.com/opensearch-project/OpenSearch/issues/10024). We encourage you to try tiered caching in a non-production environment and to share your feedback to help make this feature more robust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restarts, or integrating
restarts, and integrating
Signed-off-by: Fanit Kolchina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Just one minor change.
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
_posts/2024-10-11-tiered-cache.md
Outdated
has_science_table: true | ||
meta_keywords: tiered caching, disk-based caching, on-heap caching, OpenSearch caching performance, how tiered caching works | ||
meta_description: Explore how OpenSearch combines on-heap and disk-based caching to handle larger datasets and improve performance. Learn about the trade-offs of tiered caching, how it works, and future developments. | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the meta with the following:
meta_keywords: tiered caching, on-heap cache, disk-based caching, how tiered caching works, OpenSearch cache optimization
meta_description: Explore the benefits of combining on-heap and disk-based caching in OpenSearch to manage large datasets. Learn how tiered caching works, when to use it, and the performance results of our testing.
_posts/2024-10-11-tiered-cache.md
Outdated
- peteral | ||
- kkhatua | ||
- kolchfa | ||
date: 2024-10-11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update blog date to 2024-10-24
@kkhatua - You are approved to push this live. |
…he.md Signed-off-by: Peter Zhu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push to staging.
Closes #3374
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.