Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query on rollup index for average aggegation metric is giving incorrect results #64

Closed
adityaj1107 opened this issue Jun 3, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@adityaj1107
Copy link
Contributor

Issue by Sreevani871
Monday Apr 19, 2021 at 07:31 GMT
Originally opened as opendistro-for-elasticsearch/index-management#440


Describe the bug
Same Aggregation query is being fired on source index and rollup index for aggregation metric values comparision, Results are not matching. Average aggregation query on rollup index giving incorrect results.

Rollup Job Configuration
curl -XPUT "localhost:9200/_opendistro/_rollup/jobs/rollup-test?pretty" -H "Content-Type:application/json" -d '{ "rollup": { "enabled": true, "schedule": { "cron": { "expression": "*/1 * * * *", "timezone":"UTC" } }, "description": "Test rollup job", "source_index": "jaeger-span-2021.04.17-000103", "target_index": "rollup-test", "page_size": 5000, "delay": 300, "continuous": false, "dimensions": [ { "date_histogram": { "source_field": "startTimeMillis", "fixed_interval": "1h", "timezone": "UTC" } }, { "terms": { "source_field": "process.serviceName" } }, { "terms": { "source_field": "process.tag.application@version" } }, { "terms": { "source_field": "operationName" } }, { "terms": { "source_field": "exception.type" } }, { "terms": { "source_field": "exception.message" } } ], "metrics": [ { "source_field": "duration", "metrics": [ { "avg": {} }, { "max": {} }, { "min": {} }, { "sum": {} }, { "value_count": {} } ] } ] } } '
Query on Rollup Index
Request
curl -X GET "localhost:9200/rollup-test/_search?pretty&size=0" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "terms": { "process.serviceName": [ "service-xxxxxx" ] } } ] } }, "aggregations": { "timeline": { "date_histogram": { "field": "startTimeMillis", "fixed_interval": "1h" }, "aggs": { "service": { "terms": { "field": "process.serviceName" }, "aggs": { "avg_duration": { "avg": { "field": "duration" } }, "max_duration": { "max": { "field": "duration" } }, "min_duration": { "min": { "field": "duration" } }, "count": { "value_count": { "field": "duration" } }, "sum": { "sum": { "field": "duration" } } } } } } } }'
Response
rollup-index-response.txt

Query on Source Index
Request
curl -X GET "localhost:9200/jaeger-span-2021.04.17-000103/_search?pretty&size=0" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "terms": { "process.serviceName": [ "service-xxxxxx" ] } } ] } }, "aggregations": { "timeline": { "date_histogram": { "field": "startTimeMillis", "fixed_interval": "1h" }, "aggs": { "service": { "terms": { "field": "process.serviceName" }, "aggs": { "avg_duration": { "avg": { "field": "duration" } }, "max_duration": { "max": { "field": "duration" } }, "min_duration": { "min": { "field": "duration" } }, "count": { "value_count": { "field": "duration" } }, "sum": { "sum": { "field": "duration" } } } } } } } }'

Response
source-index-response.txt

Setup Details

All other metrics SUM, VALUE_COUNT, MIN, MAX are giving correct results and matching with aggregation metrics of source index. Only Average is giving incorrect results.
Consider following example taken from response of Rollup index query:
{ "key_as_string" : "2021-04-17T02:00:00.000Z", "key" : 1618624800000, "doc_count" : 562, "service" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "service-xxxxxx", "doc_count" : 562, "avg_duration" : { "value" : 754.1463076048377 }, "count" : { "value" : 2847569 }, "min_duration" : { "value" : 37.0 }, "sum" : { "value" : 1.5818190941E10 }, "max_duration" : { "value" : 2.07551568E8 } } ] } }
Here the expected avg_duration: 1.5818190941E10 / 2847569 = 5,554.9807365511 but the actual value resulted in response is avg_duration = 754.1463076048377

Can anyone explain the reason behind this discrepancy?

@adityaj1107 adityaj1107 added the bug Something isn't working label Jun 3, 2021
@adityaj1107
Copy link
Contributor Author

Comment by RashmiRam
Thursday Apr 22, 2021 at 12:24 GMT


This line https://github.com/opendistro-for-elasticsearch/index-management/blob/v1.12.0.0/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/util/RollupUtils.kt#L246 should be changed to state.sums = 0L; state.counts = 0L;

Ref: https://www.elastic.co/guide/en/elasticsearch/painless/7.10/painless-literals.html#integer-literals
Ref: elastic/elasticsearch#27199

All the aggs which shows wrong value for avg assumes sum as 2147483647 and divide that by count. Resulting in wrong values. This can be verified by multiplying the avg with count to arrive at this number(2147483647) for sum (For each wrong avg values in rolled up search)

@adityaj1107
Copy link
Contributor Author

Comment by Sreevani871
Friday Apr 23, 2021 at 07:29 GMT


Any help here @dbbaughe ?
One more issue is with the delay field in rollup job configuration, When I configured the job with continuous field set true and delay field set to 300000(milliseconds), The execution of the job is not honouring the delay time.
In code delay field type is defined as long. What time-unit does it get converted during execution?

@adityaj1107
Copy link
Contributor Author

Comment by Sreevani871
Wednesday Apr 28, 2021 at 11:46 GMT


Any help here?

@RashmiRam
Copy link

@aditjind Can this be #64 (comment) considered as fix for this issue? If so, Shall I raise PR for the same?

thalurur pushed a commit to thalurur/open-index-management that referenced this issue Oct 22, 2021
* register action hook

Signed-off-by: bowenlan-amzn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants