-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_source field goes missing from Elassandra after nodetool rebuild_index #347
Comments
Please note the issue looks similar to #244, but it doesn't have a resolution. |
Please also note the count, http://localhost:9200/cmsentitydb/_count is significantly different across the 4 datacenters i.e. 43406958, 43458440, 43451846, 35910790 |
Such situation usually happen when a row is expired at the Cassandra level, but was indexed before being expired. For results with empty _source, please check the underlying row exist by issuing a SELECT * FROM table where PK = _id. |
@vroyer No the record doesn't exist on underlying Cassandra table. Actually that's the real issue we are getting wrong results from elassandra index and there are just too many such records. in elassandra index. Is there a way to get rid of all such documents from Elassandra? |
In that situation, you should delete the index, and re-create it to only index existing rows, or (2nd scenario) create a new index, and switch using an ES index alias. Just keep in mind that cassandra trigger a single-thread index build when the first index is created. So, in the 1rst scenario, if you want to rebuild quickly, you'll need on each node to kill the single-thread index rebuild (nodetool compactionstats + nodetool stop --compaction_id xxxx) and relauch a nodetool index_rebuild --threads 16 .... And in the second scenario, you'll need to launch the index rebuild... |
@vroyer - Thanks for the quick response. The first approach is not an option for us because it's already being used in production. We'll got for the second approach. But both of these approaches are time taking and don't resolve the issue quickly on production environment. It would be great if Elassandra can keep itself in sync with the Cassandra deletes, so that we don't face such issues on the live environment. |
Missing documents where probably removed by previous compactions.
You can enable re-index on compaction to get the behaviour you expect, but it significantly increases cost of compaction, and it’s too late right now !
… On 11 Jun 2020, at 18:10, Pankaj Yadav ***@***.***> wrote:
@vroyer <https://github.com/vroyer> - Thanks for the quick response. The first approach is not an option for us because it's already being used in production. We'll got for the second approach.
But both of these approaches are time taking and don't resolve the issue quickly on production environment. It would be great if Elassandra can keep itself in sync with the Cassandra deletes, so that we don't face such issues on the live environment.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#347 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOMPGJLMWP7SFHZTWWAND3RWD6XVANCNFSM4N3LQHYQ>.
|
How can I achieve this any references would help: 'enable re-index on compaction' |
Elassandra version:
elassandra-6.8.4.3
Plugins installed: []
JVM version (
java -version
):java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
OS version (
uname -a
if on a Unix-like system):Linux CMSNextDB3871 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
I have 4 datacenters of Cassandra and recently migrated to Elassandra. I did a nodetool rebuild_index recently and see lots of documents in Elassandra which don't have a corresponding record in Cassandra. All these documents don't have _source field.
Steps to reproduce:
Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.
Please provide the following information:
system.log
cluster_status.log
gossipinfo.log
keyspace.log
The text was updated successfully, but these errors were encountered: