Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for track_total_hits #266

Open
selimt opened this issue Aug 4, 2021 · 4 comments
Open

Support for track_total_hits #266

selimt opened this issue Aug 4, 2021 · 4 comments

Comments

@selimt
Copy link

selimt commented Aug 4, 2021

We want to be able to return the exact number of matches from an elasticsearch query. Currently if the number of hits exceed 10000, then hits.total.value contains 10000. We understand that this is an elasticsearch limitation.

This also makes it hard for

There is a search option called "track_total_hits" where if this is set to "true", then it hits.total.value does contain the accurate number of hits:

https://www.elastic.co/guide/en/elasticsearch/reference/7.13/search-your-data.html#track-total-hits

Is there a way to incorporate this option into elasticsearch-dsl-drf ? Although this makes it harder for pagination to be implemented correctly since paging can still not exceed 10000.

Alternatively we can use the Count API in ES but that requires re-running the same query twice. We would then add this additional value in the result.

Thanks.

@barseghyanartur
Copy link
Owner

@selimt:

At the moment is could be solved on the ViewSet definition level as follows:

from django_elasticsearch_dsl_drf.viewsets import DocumentViewSet

class MySearchViewSet(DocumentViewSet):
    def __init__(self, *args, **kwargs):
        super(MySearchViewSet, self).__init__(*args, **kwargs)
        self.search.extra(track_total_hits=True)

@barseghyanartur
Copy link
Owner

@selimt:

Did it work for you?

@selimt
Copy link
Author

selimt commented Aug 25, 2021

That didn't but this did :

        self.search = self.search.extra(track_total_hits=True)

Although the pagination doesn't seem to work with it since if I try to provide an offset past 10000 it fails:

{
    "errors": {
        "traceback": [
            "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/rest_framework/views.py\", line 506, in dispatch\n    response = handler(request, *args, **kwargs)\n  File \"/opt/catalog_server/python/server/catalog_search/views.py\", line 793, in list\n    page = self.paginate_queryset(queryset)\n  File \"/usr/local/lib/python3.7/site-packages/rest_framework/generics.py\", line 171, in paginate_queryset\n    return self.paginator.paginate_queryset(queryset, self.request, view=self)\n  File \"/usr/local/lib/python3.7/site-packages/django_elasticsearch_dsl_drf/pagination.py\", line 379, in paginate_queryset\n    resp = queryset[self.offset:self.offset + self.limit].execute()\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch_dsl/search.py\", line 715, in execute\n    self, es.search(index=self._index, body=self.to_dict(), **self._params)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py\", line 168, in _wrapped\n    return func(*args, params=params, headers=headers, **kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py\", line 1673, in search\n    body=body,\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py\", line 458, in perform_request\n    raise e\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py\", line 426, in perform_request\n    timeout=timeout,\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py\", line 277, in perform_request\n    self._raise_error(response.status, raw_data)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/connection/base.py\", line 331, in _raise_error\n    status_code, error_message, additional_info\nelasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [10011]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')\n"
        ]
    },
    "data": {
        "offset": "10001"
    }
}

@barseghyanartur
Copy link
Owner

That didn't but this did :

        self.search = self.search.extra(track_total_hits=True)

Although the pagination doesn't seem to work with it since if I try to provide an offset past 10000 it fails:

{
    "errors": {
        "traceback": [
            "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/rest_framework/views.py\", line 506, in dispatch\n    response = handler(request, *args, **kwargs)\n  File \"/opt/catalog_server/python/server/catalog_search/views.py\", line 793, in list\n    page = self.paginate_queryset(queryset)\n  File \"/usr/local/lib/python3.7/site-packages/rest_framework/generics.py\", line 171, in paginate_queryset\n    return self.paginator.paginate_queryset(queryset, self.request, view=self)\n  File \"/usr/local/lib/python3.7/site-packages/django_elasticsearch_dsl_drf/pagination.py\", line 379, in paginate_queryset\n    resp = queryset[self.offset:self.offset + self.limit].execute()\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch_dsl/search.py\", line 715, in execute\n    self, es.search(index=self._index, body=self.to_dict(), **self._params)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py\", line 168, in _wrapped\n    return func(*args, params=params, headers=headers, **kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py\", line 1673, in search\n    body=body,\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py\", line 458, in perform_request\n    raise e\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py\", line 426, in perform_request\n    timeout=timeout,\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py\", line 277, in perform_request\n    self._raise_error(response.status, raw_data)\n  File \"/usr/local/lib/python3.7/site-packages/elasticsearch/connection/base.py\", line 331, in _raise_error\n    status_code, error_message, additional_info\nelasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [10011]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')\n"
        ]
    },
    "data": {
        "offset": "10001"
    }
}

Ah, yeah, sure, the self.search = self.search.extra(track_total_hits=True) it should be.

Regarding the pagination after the 10,000. It's so by design in Elasticsearch. I think normal pagination would fail on that one too. When you want to search beyond 10,000 alternative pagination shall be used (search_after).

There's an issue for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants