-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debug_overlay.conf: temporarily disable SOF_BOOT_TEST #8624
Conversation
- This test has failed on MTL since it was enabled. Running tests that we systematically ignore the failure of is much worse than not running them because it provides a false impression of quality. - This can cause DSP panics as seen in thesofproject#8621 Signed-off-by: Marc Herbert <[email protected]>
No DSP panic in https://sof-ci.01.org/sofpr/PR8624/build1103/devicetest/index.html, even WITH #8621 commit on top. So CONFIG_SOF_BOOT_TEST really is the smoking gun. PS: cavs https://sof-ci.01.org/sofpr/PR8624/build1104/devicetest/index.html is also all green and so is https://sof-ci.01.org/sofpr/PR8624/build1102/build zmain/LNL failure EDIT: 2nd and final test run WITHOUT #8621:
|
0769156
to
d11f724
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm against disabling it. If the test finds a non-working subsystem, we need to fix or remove that subsystem, not disable the whole test framework.
This is temporary and it's a one-line change, pure configuration change. It's not removing any code and preventing anyone from using it, testing it and fixing it.
This test does NOT find anything and will not find anything: it failed every single time on MTL since the very beginning yet we noticed only now (which cost @andyross and myself at least half a day). It's yet another "green failure". That's not a problem with the test itself (it's because we've always ignored FW errors thesofproject/sof-test#1075) but the net effect is the same.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point @lyakh but given this is the only test, this seems reasonable. If we had a test that would pass, we could only disable one test, but that's not an option now.
If we remove those (purposefully annoying) error messages from everybody's sight, the underlying problems will never get fixed.
Sorry, what do you mean "it doesn't find anything?" I think it's a valid test and it uncovers a real problem in SOF. And we knew about the failure from the moment that PR was submitted, it was just invisible in CI because the VMH that's failing the test is only available on MTL and MTL logs aren't yet publicly available in CI.
You mean you're working on a CI test that would evaluate firmware logs? I could provide a "fix," removing the part of the VMH functionality that's failing, if that's desired.
Originally this boot-time self-test was designed as an initially non-fatal advisory feature - until all found flaws are fixed. |
@lyakh We need to disable the failing boot test ASAP. The failing test has not caused side-effects before, but now it clearly has, so we can't afford to have it enabled in CI. Please ack this or submit an alternative. And can we make sure there's a bug filed for the failing case, so we can get proper priority to this. EDIT: |
@marc-hb Intel Internal CI tests crashed due to a problem with the verifier, we are in the process of repairing it, and the PR itself should pass in about 40 minutes |
That's not why it's invisible. It's invisible because no one ever clicks on a green "PASS". There are literally hundreds of PASS/FAIL results for every test run so of course people never look at "PASS" results. That's humanly impossible.
There is no such thing as "advisory" because no one looks at PASS logs.
Yes I have been (even though green failures were never a priority) but now I'm blocked because there are countless (and hopefully harmless) repetition of someFW errors that no one ever noticed - cause no one opens PASS results. Details in thesofproject/sof-test#1075 |
In 2020 someone tried to convince me that a test run accidentally missing logs should still be a PASS... |
This test has failed on MTL since it was enabled. Running tests that
we systematically ignore the failure of is much worse than not running
them because it provides a false impression of quality.
This can cause DSP panics as seen in
dai_zephyr: Silence benign warnings #8621
Signed-off-by: Marc Herbert [email protected]