Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignored exception *sometimes* at the end of a succesful build #1379

Closed
yuvipanda opened this issue Nov 28, 2024 · 12 comments · Fixed by #1380
Closed

Ignored exception *sometimes* at the end of a succesful build #1379

yuvipanda opened this issue Nov 28, 2024 · 12 comments · Fixed by #1380

Comments

@yuvipanda
Copy link
Collaborator

Sometimes at the end of a succesful build, I get the following exception:

Exception ignored in: <function Application.__del__ at 0x7f31ff7147c0>
Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/traitlets/config/application.py", line 1065, in __del__
  File "/opt/venv/lib/python3.11/site-packages/traitlets/config/application.py", line 1054, in close_handlers
  File "/opt/venv/lib/python3.11/site-packages/traitlets/traitlets.py", line 687, in __get__
  File "/opt/venv/lib/python3.11/site-packages/traitlets/traitlets.py", line 666, in get
TypeError: 'NoneType' object is not callabl

I don't have a local repro yet, but gonna dig into this a little

@yuvipanda
Copy link
Collaborator Author

I could repro it with the following:

docker run -v /var/run/docker.sock:/var/run/docker.sock quay.io/jupyterhub/repo2docker:2024.07.0  jupyter-repo2docker --ref=8a6eb6b19ee02f5652a7bb345c200c34155d89df --image=quay.io/veda-binder/staging-jupyterhub-2drepo2docker-e0a968:8a6eb6b19ee02f5652a7bb345c200c34155d89df --no-clean --no-run --json-logs --user-name=jovyan --user-id=1000 https://github.com/jupyterhub/repo2docker

@yuvipanda
Copy link
Collaborator Author

Minimal reproducer:

docker run -v /var/run/docker.sock:/var/run/docker.sock quay.io/jupyterhub/repo2docker:2024.07.0  jupyter-repo2docker --no-run --user-name=jovyan --user-id=1000 --json-logs https://github.com/jupyterhub/repo2docker

--json-logs is the culprit - removing that makes the error go away

@yuvipanda
Copy link
Collaborator Author

On latest main, I can repro this with:

 jupyter-repo2docker --no-run --user-name=jovyan --user-id=1000 --json-logs https://github.com/jupyterhub/repo2docker

Again, removing --json-logs makes it go away

@yuvipanda
Copy link
Collaborator Author

sys.excepthook = self.json_excepthook
is the culprit line. If i comment that out, this goes away.

@yuvipanda
Copy link
Collaborator Author

Looks like this is a bug introduced between traitlets 5.9 and 5.10! If I run pip install --force traitlets==5.9.0, and try to repro it, I can't.

@yuvipanda
Copy link
Collaborator Author

ipython/traitlets#818 is the PR that introduced this issue.

@yuvipanda
Copy link
Collaborator Author

I tested a later version of python to see if this goes away. it does not :(

@yuvipanda
Copy link
Collaborator Author

ipython/traitlets#698 is actually perhaps related, since that seems to have led to ipython/traitlets#722

@minrk
Copy link
Member

minrk commented Nov 28, 2024

Great job tracking it down! It makes perfect sense that ipython/traitlets#698 would cause this, since __del__ has to handle objects being partially disassembled if called during process teardown, which people often miss since the situation can ~never happen in tests. Approximately everything (even imported modules!) can be None during __del__. The fix is usually: don't do anything you don't have to in __del__, and try/except everything. I'll try to see what the right fix is in traitlets, it might just be suppressing the error in __del__.

If we want/need a workaround here, I believe it would be to call app.close_handlers() explicitly before letting it be garbage collected, e.g. in a finally here.

@minrk
Copy link
Member

minrk commented Nov 28, 2024

tracking it all the way down, the NoneType is actually the stdlib module function typing.cast, which has been de-initialized to None, which is why it takes both of those PRs you linked to create the problem, I think. 698 triggers some logic that cannot be assumed to complete in __del__, and 722 happens to introduce the logic that does not always complete if the process is being torn down.

@minrk
Copy link
Member

minrk commented Nov 28, 2024

#1380 is the workaround, plus submitted two PRs upstream that should also fix the underlying issue

@yuvipanda
Copy link
Collaborator Author

Excellent work, @minrk! Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants