Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert workaround of test_virtual_functions.py from #2746 #2797

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

YannickJadoul
Copy link
Collaborator

Description

We made workaround in #2746 for an issue flagged by Valgrind we didn't understand/track down. So far it's unclear whether this is a real issue (a very tricky one, then, because this small change triggers it), or an issue with Valgrind. In any case, we'd like to revert this :-)

@bstaletic
Copy link
Collaborator

bstaletic commented Jan 15, 2021

While everyone can take a look at the valgrind report, here are the things to notice:

  1. The block really was freed in pybind11_meta_dealloc.
  2. Then the interpreter starts to shut down and we get another free in free_keys_object.
  3. Both frees get passed the PyObject* corresponding to p (see this pull request's diff).
  4. If you as much as look at this test funny, the double free doesn't happen.
    4.1. I've forgotten which of the two frees doesn't happen.

 

EDIT: See next comment. It's not a double free, it's "just" a use-after-free.

Copy link
Collaborator

@bstaletic bstaletic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fun. First of all, here's how to reliably repro this issue with a self-contained example:

#include <Python.h>

static void pybind11_object_dealloc(PyObject *self) {
	auto type = Py_TYPE(self);
	type->tp_free(self);
	Py_DECREF(type);
}

static PyModuleDef pybind11_module_def_m{
	.m_base = PyModuleDef_HEAD_INIT,
	.m_name = "m",
	.m_size = -1,
};

static PyObject *pybind11_init_impl_m() {
	auto m = PyModule_Create(&pybind11_module_def_m);
	PyType_Slot slots[] = {
		{Py_tp_dealloc, (void*)pybind11_object_dealloc},
		{0, nullptr}
	};
	PyType_Spec spec{"m.B", sizeof(PyObject), 0, Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HEAPTYPE, slots};
	auto type = PyType_FromSpec(&spec);
	PyObject_SetAttrString(m, "B", (PyObject*)type);
	return m;
}

int main() {
	PyImport_AppendInittab("m", pybind11_init_impl_m);
	Py_InitializeEx(1);
	PyRun_SimpleString("import m\n"
			   "def t():\n"
			   "  class A:\n"
			   "    class B(m.B):pass\n"
			   "    b=B()\n"
			   "t()\n");
	Py_Finalize();
}

A few interesting things one can point out:

  1. The above snippet doesn't depend on pybind11. The only thing slightly resembling pybind11 is pybind11_object_dealloc(). Pay attention to it.
  2. The snippet is slightly evil - the heap type flags omit Py_TPFLAGS_DEFAULT and thus don't play well with stackless python and also have bad performance. Also, no tp_repr or tp_str means you're not allowed to print m.B().
    2.1. I was trying to be terse, so the code does just enough to "anger" valgrind.

This same snippet had ~4 of these use-after-free errors in python 3.8. In python 3.7 there were hundreds of similar errors, but back then things were different. Also, python 3.10 has this use-after-free error as well.

References for further reading:

  1. Test failure with Python 3.8.0 #1946
  2. https://bugs.python.org/issue35810
  3. sizmailov@3beea65
  4. 6cb584e
  5. https://docs.python.org/3.8/whatsnew/3.8.html#changes-in-the-c-api

That should explain the "things were different in days of python3.7". The suggested tp_dealloc from link number 5 does the following:

static void
foo_dealloc(foo_struct *instance) {
    PyObject *type = Py_TYPE(instance);
    PyObject_GC_Del(instance);
#if PY_VERSION_HEX >= 0x03080000
    // This was not needed before Python 3.8 (Python issue 35810)
    Py_DECREF(type);
#endif
}

If we take a look at pybind11_object_dealloc as it is in pybind11, we see the same approach. It really looks like we're doing everything by the book!

If anyone is wondering what happens if we just skip the Py_DECREF in python3.8+, well, leaks happen. We start leaking references, as in #1946.

Considering that before python3.8 these use-after-free errors were plenty and that the above snippet is irreducible without hiding the error, I'm pretty sure this is a CPython bug. An obscure one, but still.

Seems like we, yet again, have to choose between leaking and (obscure) memory errors. Now who wants to bet that the actual fix for this issue will, yet again, break some part of CPython API?


As for this specific pull request, I think we shouldn't merge it until CPython actually fixes the error. Before anyone asks, I haven't submitted a report on BPO.

@YannickJadoul
Copy link
Collaborator Author

Issue submitted to bugs.python.org by @bstaletick, btw: https://bugs.python.org/issue42961

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants