Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Fix the profile API returns prematurely. #340

Merged

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented Dec 23, 2020

Issue #, if available:
#339

Description of changes:
MultiResponsesDelegateActionListener helps send multiple requests asynchronously and return one final response altogether. While waiting for all inflight requests, the method respondImmediately and failImmediately can stop waiting and return immediately. While these two methods are convenient, it is easy to misuse them and cause bugs (see #339 for example). This PR removes the method respondImmediately and failImmediately and refactor profile runner to avoid using them.

This PR also stops printing out the unknown entity state since it is not useful.

Testing done:

  1. Added unit tests to verify the bug fix.
  2. Manual tests to run profile calls for single-stream and multi-entity detectors for different phases of the detector lifecycle (disabled, init, running). Verified profile results make sense.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

MultiResponsesDelegateActionListener helps send multiple requests asynchronously and return one final response altogether. While waiting for all inflight requests, the method respondImmediately and failImmediately can stop waiting and return immediately. While these two methods are convenient, it is easy to misuse them and cause bugs (see opendistro-for-elasticsearch#339 for example). This PR removes the method respondImmediately and failImmediately and refactor profile runner to avoid using them.

This PR also stops printing out the unknown entity state since it is not useful.

Testing done:
1. Added unit tests to verify the bug fix.
2. Manual tests to run profile calls for single-stream and multi-entity detectors for different phases of the detector lifecycle (disabled, init, running). Verified profile results make sense.
@codecov
Copy link

codecov bot commented Dec 23, 2020

Codecov Report

Merging #340 (114c948) into master (2ae77ed) will increase coverage by 0.80%.
The diff coverage is 75.43%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #340      +/-   ##
============================================
+ Coverage     75.50%   76.30%   +0.80%     
- Complexity     2160     2224      +64     
============================================
  Files           207      209       +2     
  Lines         10030    10139     +109     
  Branches        898      902       +4     
============================================
+ Hits           7573     7737     +164     
+ Misses         2035     1991      -44     
+ Partials        422      411      -11     
Flag Coverage Δ Complexity Δ
cli 79.27% <ø> (ø) 0.00 <ø> (ø)
plugin 76.08% <75.43%> (+0.87%) 0.00 <17.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...distroforelasticsearch/ad/model/EntityProfile.java 52.38% <0.00%> (+1.36%) 12.00 <0.00> (+4.00)
.../ad/util/MultiResponsesDelegateActionListener.java 100.00% <ø> (+21.95%) 14.00 <0.00> (+1.00)
...distroforelasticsearch/ad/EntityProfileRunner.java 79.28% <73.68%> (ø) 31.00 <1.00> (ø)
...elasticsearch/ad/AnomalyDetectorProfileRunner.java 77.20% <82.35%> (+14.70%) 61.00 <16.00> (+11.00)
...elasticsearch/ad/constant/CommonErrorMessages.java 85.71% <100.00%> (+2.38%) 1.00 <0.00> (ø)
...stroforelasticsearch/ad/AnomalyDetectorPlugin.java 95.04% <0.00%> (ø) 11.00% <0.00%> (ø%)
...ticsearch/ad/settings/AnomalyDetectorSettings.java 100.00% <0.00%> (ø) 1.00% <0.00%> (ø%)
...ch/ad/common/exception/LimitExceededException.java 100.00% <0.00%> (ø) 3.00% <0.00%> (+1.00%)
...roforelasticsearch/ad/task/ADTaskCacheManager.java 100.00% <0.00%> (ø) 30.00% <0.00%> (?%)
...stroforelasticsearch/ad/task/ADBatchTaskCache.java 100.00% <0.00%> (ø) 15.00% <0.00%> (?%)
... and 8 more

@kaituo kaituo added the bug Something isn't working label Dec 24, 2020
new MultiResponsesDelegateActionListener<EntityProfile>(
listener,
totalResponsesToWait,
"Fail to fetch profile for " + entityValue + " of detector " + detectorId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may replace with FAIL_FETCH_ERR_MSG

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced

@@ -214,7 +214,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
if (modelProfile != null) {
builder.field(CommonName.MODEL, modelProfile);
}
if (state != null) {
if (state != EntityState.UNKNOWN) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for state to be null? Do we still need to keep checking if state is null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible. Added back.

listener.onFailure(new RuntimeException(CommonErrorMessages.FAIL_TO_FIND_DETECTOR_MSG + detectorId, e));
}
} else {
listener.onFailure(new RuntimeException(CommonErrorMessages.FAIL_TO_FIND_DETECTOR_MSG + detectorId));
Copy link
Contributor

@ylwu-amzn ylwu-amzn Dec 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use AnomalyDetectionException here? Same question for other places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it. Most of transport APIs use AnomalyDetectionException (except the recent ones added by Sarat). When I reviewed Sarat's PRs, I thought about pointing it out, but didn't because changing to use AnomalyDetectionException does not add too much benefit except that we have standardized the exceptions we throw. Our public APIs do not standardize the exception it throws back to the user. Take profile API as an example, sometimes it throws IOException, sometimes NullPointerException, and sometimes RuntimeException. Do you think we should change all of public APIs and transport APIs to use AnomalyDetectionException? The benefit is that we can have a standard wrapper exception to throw. The drawback is that this might be another PR due to the large changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only catch exceptions in AD realtime job and count in failure stats. We may need to count all exceptions from other places, then we need to wrap the exception.
No so urgent. We can fix it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

@kaituo kaituo merged commit 0654957 into opendistro-for-elasticsearch:master Dec 28, 2020
kaituo added a commit to kaituo/anomaly-detection that referenced this pull request Dec 29, 2020
…asticsearch#340)

Besides backporting, this PR also:
First, it fixes another premature return in AnomalyDetectorProfileRunner.onInittedEver.
Second, it replaces set-env with GITHUB_ENV so that backport PR can invoke CI. The `set-env` command is disabled.

Testing done:
1. Verfied manually the new early return bug is fixed..
2. Manual tests to run profile calls for single-stream and multi-entity detectors for different phases of the detector lifecycle (disabled, init, running). Verified profile results make sense.
kaituo added a commit that referenced this pull request Dec 29, 2020
* Backport: Fix the profile API returns prematurely. (#340)

Besides backporting, this PR also:
First, it fixes another premature return in AnomalyDetectorProfileRunner.onInittedEver.
Second, it replaces set-env with GITHUB_ENV so that backport PR can invoke CI. The `set-env` command is disabled. It also fixes golang lint version so that CI can run.

Testing done:
1. Verfied manually the new early return bug is fixed..
2. Manual tests to run profile calls for single-stream and multi-entity detectors for different phases of the detector lifecycle (disabled, init, running). Verified profile results make sense.
ylwu-amzn pushed a commit that referenced this pull request Feb 24, 2021
* Fix the profile API returns prematurely.

MultiResponsesDelegateActionListener helps send multiple requests asynchronously and return one final response altogether. While waiting for all inflight requests, the method respondImmediately and failImmediately can stop waiting and return immediately. While these two methods are convenient, it is easy to misuse them and cause bugs (see #339 for example). This PR removes the method respondImmediately and failImmediately and refactor profile runner to avoid using them.

This PR also stops printing out the unknown entity state since it is not useful.

Testing done:
1. Added unit tests to verify the bug fix.
2. Manual tests to run profile calls for single-stream and multi-entity detectors for different phases of the detector lifecycle (disabled, init, running). Verified profile results make sense.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants