Address E2E issues and shortcomings #211

eyelidlessness · 2024-09-11T16:28:02Z

While reviewing #183, I was preparing a suggested change which would require a corresponding change to the vue.test.ts E2E test. This led to issues actually getting E2E tests to complete locally at all, and most of a day debugging the issue.

Local system

I've concluded that it's highly likely part of the problem I encountered is related to the fact that I'm running macOS 13 (aka Ventura; specifically 13.5, build 22G74 if for some reason even that level of detail turns out to be pertinent).

I reached this conclusion after finding this Playwright issue and this PR fixing it

Steps to reproduce

yarn workspace @getodk/web-forms test:e2e

Observed symptomatic behavior

In WebKit, vue.test.ts consistently fails with a timeout. For clarity, since I think we should consider renaming this test module to something more descriptive of what it does: this contains one test, which:

Iterates through all demo fixtures, and for each fixture...
Loads the fixture
Attempts to fill all of the form's questions
If the form has a language menu, attempts to select the last language menu item

The failure occurs on the 01-itext-basic.xml form, timing out on step 4 (this line specifically).

Importantly, this is only a symptom of the issue.

Root cause

While it may seem unrelated, the failure actually occurs earlier in the rendering process of 01-itext-basic.xml.

In the current WebKit build, on my local macOS version: a runtime error is produced when attempting to access the form-global validation popover's hidePopover method (this line specifically).
Regardless of OS, browser:
- The E2E mechanism to detect and report errors counts error/warning logs that may be issued (notably, it does not collect any other detail about those logs, such as their text or where the logs occur). The test passes if all test steps for all form fixtures complete without stalling or throwing, and only after that there is an assertion that no such error/warning logs were issued. This assertion is not reached if any earlier aspect of the test stalls or fails.
- If any unhandled runtime error occurs at any point in rendering, the UI implementation does not currently provide any mechanism for E2E tests to detect that or invoke any assertions around it. And neither do the the E2E tests attempt to do so.

In this specific case, this is the root cause of the observed symptomatic behavior:

A runtime error occurs during a reactive effect.
Rendering scheduled past that point is either interrupted or delayed.
The test attempts to access a DOM structure which does not exist (at least at call time).
DOM access being achieved by Playwright's "wait for match and resolve" approach (also common in many other browser test runners/frameworks) never finds a match, waiting indefinitely until interrupted by test timeout.

Likely, tangentially related, flakiness observed/root cause

I also observed and debugged a confounding factor, which I believe is partly responsible for many of our flaky E2E test woes. Specifically: the test's step to load each form fixture is just a click call. Form loading itself is asynchronous, but the test does not wait for any observable condition after initiating load of each fixture by click. (This is in contrast with the logic to wait for an observable page load condition.

As I was debugging the popover/WebKit/macOS 13 issue, I also observed that flakiness improves drastically by introducing a similar observable condition check for each fixture's form load.

Followup actions to consider

Introduce programmatically detectable runtime error reporting in the UI:
- Catch-all, for unknown/unhandled cases
- Specific handling for known/knowable fallible cases (few in the current engine API, but will soon be growing; many in the UI's other API accesses, albeit currently fall more into the "knowable" subcategory)
Reporting some or all of these errors may or may not also be user-facing. But the purpose is to be able to identify, in tests, when and what errors occur at all. Reporting should accommodate known/expected points of potential failure (e.g. form load, specific fallible user interactions). Importantly reporting should also accommodate programmatically detecting unknown/unexpected points of failure. (I don't want to be too prescriptive here, but think events.)
Include assertions in tests around those programmatically detectable error reports; at least in E2E tests, possibly also some unit/integration tests where pertinent.
Since Vue and other dependencies seem to like reporting errors/warnings via console API, we should improve the aspects of the E2E test which observe and assert log activity:
- Collect more detail about the logs. At least their text, preferably also their file/URL, line number. All of these details are available from the same Playwright APIs already in use.
- Perform log assertions much more frequently:
  - At least at the end of each fixture's test steps
  - Preferably more frequently than that, e.g. after specific test steps (or even between each step!)
  - Ideally in a way that might be reported cumulatively in test run failures (e.g. expect.soft assertions, some means of deferring test interruption by collecting assertion failures and dumping them later)
Break up vue.test.ts so there is a single test per form fixture it exercises, likely as an each/table test to start. I'm not sure whether this will require substantial tooling/build change (i.e. to make the fixture list available at test definition time), but I'm hoping it's possible with just the runtime information already available (i.e. import.meta.glob, Vite define config, something along those lines).
While we're at it, let's rename vue.test.ts to something more descriptive of what it actually tests! Does fixture-form-filling.test.ts` sound good?
As I mentioned in the tangentially related flakiness concern: introduce an observable condition check for each fixture's form load. This could be as simple as a class on the form root node which is only added once the form RootNode becomes non-null (i.e. when Promise<RootNode> resolves), and a corresponding Playwright locator assertion for that class.

Consider E2E approach with more intent

Lastly, I think we may want to re-evaluate the value of these very general E2E fixture tests. In theory, many or all of the above followup actions could make them more valuable than they are right now. But without some or all of these improvements, I think the value they provide is heavily outweighed by wasting a whole lot of time (CPU time in CI, our time in debugging and mitigation).

This isn't to say that we won't get value out of having E2E tests! I just think it's worth considering whether we'll get more value out of testing more specific functionality. The fixture filling exercise is still compelling as a smoketest, but I think it's pretty far from reaching its potential.

The build-specific E2E test is valuable because it serves as a regression test against a recurring build issue.

Otherwise, I think we'll get the most value out of E2E tests which exercise distinct user-facing scenarios, and either:

Make lower level execution challenging due to e.g. interactivity
Involve browser-level features like persistence, workers, and other conventionally difficult-to-unit-test scenarios

The text was updated successfully, but these errors were encountered:

- We should consider the `optimizeDeps` change regardless. We do this in all of the other packages. - We should probably not make the changes to `showPopover` and `hidePopover` calls without deeper consideration of fallback behavior, if we are in fact supporting weird edge cases where otherwise-supported browsers are mysteriously lacking certain expected APIs Much more detail about the latter in #211

- We should consider the `optimizeDeps` change regardless. We do this in all of the other packages. - We should probably not make the changes to `showPopover` and `hidePopover` calls without deeper consideration of fallback behavior, if we are in fact supporting weird edge cases where otherwise-supported browsers are mysteriously lacking certain expected APIs Much more detail about the latter in getodk#211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address E2E issues and shortcomings #211

Address E2E issues and shortcomings #211

eyelidlessness commented Sep 11, 2024

Address E2E issues and shortcomings #211

Address E2E issues and shortcomings #211

Comments

eyelidlessness commented Sep 11, 2024

Local system

Steps to reproduce

Observed symptomatic behavior

Root cause

Likely, tangentially related, flakiness observed/root cause

Followup actions to consider

Consider E2E approach with more intent