-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci/transformers: add baseline checks for test cases #1269
Conversation
1b7ce58
to
aa79416
Compare
aa79416
to
0861a88
Compare
Signed-off-by: Dmitry Rogozhkin <[email protected]>
0861a88
to
0d79274
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
failing_cases = { | ||
'tests.benchmark.test_benchmark.BenchmarkTest': { | ||
'test_inference_encoder_decoder_with_configs': None, | ||
'test_inference_fp16': None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, what does the "None" mean here? Do we need to root cause and fix those test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these are all test cases which fail and which need to be root caused and fixed. We require that no more tests should fail on top of this list - in such a case workload should fail.
None
here is in a place for the dictionary to pass additional information about failing tests. In the simplest case, we don't pass anything, i.e. we pass None
, since test name in this list by itself signifies that test fails. However, in some cases we need to mark that test is flaky. That's where this placeholder comes handy:
failing_cases = {
'tests.models.detr.test_image_processing_detr.DetrImageProcessingTest': {
'test_fast_is_faster_than_slow': { 'flaky': True },
...
I use this already for few flaky tests in the middle of the list. See lines 48-57.
Actually we can further expand this idea if needed and pass for example links to the known bugs or PRs associated with the failing case and further print this in the result table.
Baseline is set in a python script. Script looks into the results and categorizes test cases as:
Test cases from above categories are printed into summary.