Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during build when failed test storage exceed 100MB (GitHub single file limit) #3

Open
DonaldChung-HK opened this issue Dec 28, 2022 · 2 comments

Comments

@DonaldChung-HK
Copy link
Contributor

DonaldChung-HK commented Dec 28, 2022

This is the likely cause fo build fail if the dashboard build successfully but auto-commit failed/timeout (saving the new JSON file).

If individual files exceed 100MB GitHub will refuse a push. This will occur either if there is a large influx of new test fail or total number of unique test fail build up over time. This will likely to happen if the tests are python based as they tend to output larger stacktrace.

Here are some solutions on how to deal with such situation.

If it is just a one off large influx of test

  1. If they are just in the pull request pipeline, the system will return to normal after 1-2 days as the search range moved over the problematic build range.
  2. You can also manually delete the problematic builds.
  3. Manually marking the is_completed as true in history/{pipeline_name}/{pipeline_name}_by_build_fail_pickle.json so that the search algorithm skip over the problematic builds.
  4. Put a safe guard to not save/update the stack trace if number of test failed exceed a certain number (a large number of test > 30 would indicate a catastrophic failure which will be dealt with immediately)

If the tests build-up over time:

  1. Periodically clean up the history file as old test fails are irrelevant.
  2. Create a mechanism where tests are checked against the latest detected fail date and they are removed after not detected after a certain date (~60 - 180 days.)

Finally, we can add a process where wesplit the files which exceed 100MB and join them before processing. However, you might run out of memory if the files gets too big as python is inefficient in JSON reading which use ~ 7-8x of memory of JSON size

@DonaldChung-HK
Copy link
Contributor Author

I think we should create some sort of cleaning python script. such as based trimming a JSON object base on size, build number, and latest detected fail date so that we can keep the file size in check and retire old data

@DonaldChung-HK
Copy link
Contributor Author

So the problem with the high influx of fail is that for the by-build history files we are saving all the stack traces instead of just the latest one (to prevent actual loss of data since the updating mechanism is not very robust), we were flooded by 10x120 fails per pipeline which exponentially increases the file size to over 150MB when usually it is just around 2MB per file.

I have implemented 2 bash scripts that will break down the files if they are too big (~90MB) and join them when performing GitHub actions in #5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant