-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SKG on elasticsearch #196
Comments
I’ve been testing the various cases from Chapter 5, and for the vibranium results, the scores are starting to change but remain quite similar overall (example attached).
As for the Star Wars content-based recommendation, I wasn’t entirely sure how to approach it. I tried incorporating different tokens from the text into the results, though it’s not ideal since I have to split tokens like ‘Princess Leia.’ Still, I’m seeing similar results (example attached). Let me know your thoughts
Based on the Solr documentation and the code, it appears that in the Star Wars case, relatedness is scored by comparing the context filter for each token, if I understand correctly. Let me know if it’s worth exploring the Elasticsearch API further to implement the exact method for calculating relatedness. |
That's really cool @lschneidpro ! I won't have time to review this probably for the next month (the book is being released and I'll be traveling to speak at a bunch of conferences), but I'll definitely add this to my list of things to review once I free up. Out of curiosity, does this (or could it conceivably) handle the multi-level traversals (like the query disambiguation examples in chapter 7)? I'd definitely be interested in getting code for this working for Elasticsearch and OpenSearch users. If you can get the multi-level aggregations working then and this could work consistently between Solr and Elasticsearch/Opensearch, I think there would be a lot of people interested. |
Hi @treygrainger, Thanks for your feedback! I'm about to go on vacation, so no worries. Currently, the implementation doesn't support multi-level traversals. To fully understand the functionalities, I'll need to dive deeper into the SKG academic paper and the SOLR code. So far, I've been using Elasticsearch's Significant Terms Aggregation and Significant Text Aggregation. These compute foreground and background statistics based on the query, and I use a custom script (your SKG code) to derive a custom score. By the way, in my tests, the SOLR implementation runs faster than the Elasticsearch options. I’m not an Elasticsearch expert, so I’m unsure how to implement SKG fully without developing a dedicated plugin. I can reach out to Elasticsearch support, or perhaps you or someone on your team with more Elasticsearch expertise could provide some guidance. Here are the options I see moving forward:
As for query disambiguation, I think sub-aggregations could work. The first aggregation level would target categories, while the second level would apply the classic significant text aggregation within each category bucket. I'll experiment with this when I'm back. Best, |
@treygrainger any updates? Thanks |
FWIW I ran this in OpenSearch and got the same results for the
|
Hi everyone,
I'm currently reading the book but using Elasticsearch instead of Solr. I attempted to reimplement the Semantic Knowledge Graph (SKG) on ES, and developed a custom scoring script for Elasticsearch's significant text aggregation, inspired by the original Solr code found here. So far, I've been able to achieve the same scores as those in the health dataset example. I haven't tested the other cases from the book yet, but I wanted to share my implementation to see if it aligns with the authors' intentions.
resulting in:
I appreciate any feedback—thanks!
The text was updated successfully, but these errors were encountered: