Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci/transformers: add baseline checks for test cases #1269

Merged
merged 1 commit into from
Jan 13, 2025

Conversation

dvrogozh
Copy link
Contributor

@dvrogozh dvrogozh commented Jan 9, 2025

Baseline is set in a python script. Script looks into the results and categorizes test cases as:

  • New failures - if detected workflow is marked as failed
  • New passes - if detected workflow is marked as failed (since baseline requires an update)
  • Skipped flakies - if detected workflow is marked as failed (since baseline requires an update)
  • Known failures - these are failing tests knowing to fail per baseline, workflow is marked as passed (if nothing from above detected)

Test cases from above categories are printed into summary.

@dvrogozh dvrogozh force-pushed the transformers2 branch 5 times, most recently from 1b7ce58 to aa79416 Compare January 9, 2025 23:07
@dvrogozh dvrogozh marked this pull request as ready for review January 10, 2025 01:46
Copy link
Contributor

@chuanqi129 chuanqi129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

failing_cases = {
'tests.benchmark.test_benchmark.BenchmarkTest': {
'test_inference_encoder_decoder_with_configs': None,
'test_inference_fp16': None,
Copy link
Contributor

@chuanqi129 chuanqi129 Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, what does the "None" mean here? Do we need to root cause and fix those test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are all test cases which fail and which need to be root caused and fixed. We require that no more tests should fail on top of this list - in such a case workload should fail.

None here is in a place for the dictionary to pass additional information about failing tests. In the simplest case, we don't pass anything, i.e. we pass None, since test name in this list by itself signifies that test fails. However, in some cases we need to mark that test is flaky. That's where this placeholder comes handy:

failing_cases = {
    'tests.models.detr.test_image_processing_detr.DetrImageProcessingTest': {
        'test_fast_is_faster_than_slow': { 'flaky': True },
    ...

I use this already for few flaky tests in the middle of the list. See lines 48-57.

Actually we can further expand this idea if needed and pass for example links to the known bugs or PRs associated with the failing case and further print this in the result table.

@dvrogozh dvrogozh added this pull request to the merge queue Jan 13, 2025
Merged via the queue into intel:main with commit b2560ac Jan 13, 2025
2 of 5 checks passed
@dvrogozh dvrogozh deleted the transformers2 branch January 13, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants