-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert workaround of test_virtual_functions.py from #2746 #2797
base: master
Are you sure you want to change the base?
Revert workaround of test_virtual_functions.py from #2746 #2797
Conversation
While everyone can take a look at the valgrind report, here are the things to notice:
EDIT: See next comment. It's not a double free, it's "just" a use-after-free. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was fun. First of all, here's how to reliably repro this issue with a self-contained example:
#include <Python.h>
static void pybind11_object_dealloc(PyObject *self) {
auto type = Py_TYPE(self);
type->tp_free(self);
Py_DECREF(type);
}
static PyModuleDef pybind11_module_def_m{
.m_base = PyModuleDef_HEAD_INIT,
.m_name = "m",
.m_size = -1,
};
static PyObject *pybind11_init_impl_m() {
auto m = PyModule_Create(&pybind11_module_def_m);
PyType_Slot slots[] = {
{Py_tp_dealloc, (void*)pybind11_object_dealloc},
{0, nullptr}
};
PyType_Spec spec{"m.B", sizeof(PyObject), 0, Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HEAPTYPE, slots};
auto type = PyType_FromSpec(&spec);
PyObject_SetAttrString(m, "B", (PyObject*)type);
return m;
}
int main() {
PyImport_AppendInittab("m", pybind11_init_impl_m);
Py_InitializeEx(1);
PyRun_SimpleString("import m\n"
"def t():\n"
" class A:\n"
" class B(m.B):pass\n"
" b=B()\n"
"t()\n");
Py_Finalize();
}
A few interesting things one can point out:
- The above snippet doesn't depend on pybind11. The only thing slightly resembling pybind11 is
pybind11_object_dealloc()
. Pay attention to it. - The snippet is slightly evil - the heap type flags omit
Py_TPFLAGS_DEFAULT
and thus don't play well with stackless python and also have bad performance. Also, notp_repr
ortp_str
means you're not allowed to printm.B()
.
2.1. I was trying to be terse, so the code does just enough to "anger" valgrind.
This same snippet had ~4 of these use-after-free errors in python 3.8. In python 3.7 there were hundreds of similar errors, but back then things were different. Also, python 3.10 has this use-after-free error as well.
References for further reading:
- Test failure with Python 3.8.0 #1946
- https://bugs.python.org/issue35810
- sizmailov@3beea65
- 6cb584e
- https://docs.python.org/3.8/whatsnew/3.8.html#changes-in-the-c-api
That should explain the "things were different in days of python3.7". The suggested tp_dealloc
from link number 5 does the following:
static void
foo_dealloc(foo_struct *instance) {
PyObject *type = Py_TYPE(instance);
PyObject_GC_Del(instance);
#if PY_VERSION_HEX >= 0x03080000
// This was not needed before Python 3.8 (Python issue 35810)
Py_DECREF(type);
#endif
}
If we take a look at pybind11_object_dealloc
as it is in pybind11, we see the same approach. It really looks like we're doing everything by the book!
If anyone is wondering what happens if we just skip the Py_DECREF
in python3.8+, well, leaks happen. We start leaking references, as in #1946.
Considering that before python3.8 these use-after-free errors were plenty and that the above snippet is irreducible without hiding the error, I'm pretty sure this is a CPython bug. An obscure one, but still.
Seems like we, yet again, have to choose between leaking and (obscure) memory errors. Now who wants to bet that the actual fix for this issue will, yet again, break some part of CPython API?
As for this specific pull request, I think we shouldn't merge it until CPython actually fixes the error. Before anyone asks, I haven't submitted a report on BPO.
Issue submitted to bugs.python.org by @bstaletick, btw: https://bugs.python.org/issue42961 |
Description
We made workaround in #2746 for an issue flagged by Valgrind we didn't understand/track down. So far it's unclear whether this is a real issue (a very tricky one, then, because this small change triggers it), or an issue with Valgrind. In any case, we'd like to revert this :-)