-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: In some situations, py::cast returns null instead of raising an exception #4099
Comments
It turns out that this can be triggered by an ODR violation. Compile in debug mode with no optimizations, and have a py::cast call with the I'm not sure what pybind11 is supposed to do here. I suppose it would be nice if, even in this situation, pybind11 raised an error rather than returning null. That's not much consolation though... |
This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations *always* be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]
We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations *always* be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations *always* be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations *always* be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: c466358a96edcd76dacd1c4785c1ba10fdc6e819 Pull Request resolved: #82552
This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #82548 Approved by: https://github.com/albanD
We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations *always* be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: #82552 Approved by: https://github.com/albanD
@rwgk Has been looking into fixing/detecting these potential ODR violations so pinging him. |
Did you already run the full reproducer with sanitizers (valgrind, asan, msan)? It's not certain that the problem here is related to work I did, although it could be. Reading the comments here didn't trigger any great ideas on my end. If you post the instructions for the full reproducer and it's not too cumbersome, I could maybe give it a try. The holder code has strictly speaking "plenty of UB potential": #2672 (comment) Are the C-style and
The ODR guard work I did recently is very specific to detecting If you want to give it a try, git clone smart_holder instead of master and use Fixing the ODR is a different question. |
I ran the full reproducer with ASAN, but I didn't try Valgrind or MSAN. I think I've got a pretty decent handle on the root cause so I can try to make a minimal reproducer. To full reproduce, check out pytorch/pytorch@7be44f8 , do a build with |
To fix the ODR, I added a lint rule to our project to force all includes of pybind.h to go through a helper header, which ensures that all the specializations are in scope whenever we use pybind.h. This is viable for us but maybe not for a monorepo setting. |
Sounds promising, I'll wait for that. (I cannot easily run msan in a general Linux environment. It's much more practical with a reproducer that I can reasonably easily run in our corp dev environment.) |
Maybe you have covered this already: |
nice one. We don't have any uses of it in our codebase yet but I can easily add it to our lint haha |
Summary: This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at pybind/pybind11#4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: #82548 Approved by: https://github.com/albanD Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/50e8abbcadde70c6eb6ef932d3e6cc3fa26a5cd7 Reviewed By: osalpekar Differential Revision: D38322453 Pulled By: ezyang fbshipit-source-id: 63b05bd97604da85b5cae2d7a6e7ff6880cd4ae5
Required prerequisites
Problem description
I have noticed under certain situations that I can hit the error:
TypeError: Unregistered type : c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImpl> >
when I use a py::cast on a custom smart pointer. Ordinarily, I expect this to work (and in other contexts, it does work), but sometimes it does not. Additionally, the documentation specifies that py::cast should always raise an exception upon cast failure, but I observe that it instead returns a nullptr and sets the Python error context, without actually raising an exception.I wasn't able to extract a short repro; I do have a full repro but it involves compiling a giant project, LMK if you're interested. The triggering code looks like:
where toSymIntNodeImpl returns a
c10::intrusive_ptr<c10::SymIntNodeImpl>
.py_symint
is null and a Python error is set after callingpy::cast
. Here is the backtrace at this point:Stepping through the rest of the execution, lack of type info means
type_caster_generic::cast
short circuits:but then nothing seems to detect that the handle is empty and so this null handle ends being returned all the way.
c10::intrusive_ptr
is a shared ptr like class that does intrusive refcounting. It was declared to be a holder type withI also interposed the type info registration mechanism, and observed that
SymIntNodeImpl
was registered, but notc10::intrusive_ptr<SymIntNodeImpl>
. The workaround, in this case, is to explicitly deref the intrusive ptr before passing it to cast, but this is error prone and it would be nice to root cause the issue.This is on pybind11 aa304c9
Reproducible example code
No response
The text was updated successfully, but these errors were encountered: