-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attach python lifetime to shared_ptr passed to C++ #2839
base: master
Are you sure you want to change the base?
Conversation
9955bcc
to
b25a027
Compare
b25a027
to
0612e51
Compare
0612e51
to
669d437
Compare
Hi, @virtuald. Thanks for trying this out in more detail than just the slightly crazy "hack" suggested on Gitter! :-) I'm not sure I meant for this hack to be the default way
Starting from these two concerns and thinking further, I see a number of alternative approaches we could think about:
This PR doesn't really suffer from the specific reference cycles I mentioned to you on Gitter. The reference cycles are an issue when the Python object keeps a C++ object alive (which it does) while the C++ would keep the Python object alive. In this case, that's not really an issue, because there's no reference from the C++ object to the Python object; the I also know that @bstaletic was interested in pushing pybind11 towards using full-on garbage-collected heap types (as CPython is apparently pushing people away from non-GC heap types). A radically different solution, which would also work more generally, would be to do something similar to @EricCousineau-TRI's I'll also tag @EricCousineau-TRI, since he will be the one who has been thinking about this by far the longest of all of us. |
Also, I had forgotten about this, but I think I found where I got the original idea. This issue is a very interesting read: #1049 (comment) |
I believe you get an error if you pass something with a unique_ptr holder to something with a shared_ptr argument? Or at least, I tried it and I got an error, will have to check again tonight. If there was an automatic unique -> shared conversion going on somewhere, we could add this there as well. I agree that it would be ideal if this could only be done for instances of trampoline classes. Any idea how to determine that? |
Yep, you're right, that's true; currently, you can't take away ownership from Python (though there's some work happening on that).
Good question. I don't have time for a thorough search, currently, but this might be harder than I thought. It might be that all this information is lost after declaring a class and its |
Currently I don't have the free bandwidth to fully page in here, but some quick remarks:
|
Nonono, let's not go that way. Completely agreed! The GC-type solution (similar to @EricCousineau-TRI's current solution) would solve this independently of holders, though (so also for non- |
Unfortunately, gil_scoped_acquire is in pybind11.h, so I can't use it here. There's also a note there about how the GIL shouldn't be acquired if the interpreter is shutting down, but it's not clear to me how to avoid that problem here. |
Regarding grabbing the GIL in the deleter, it appears that the std::function wrapper does this already so it's not without precedent. |
I think this is worth doing by default, and I'm -1 on requiring a user to enable it via a tag or other hack. The current behavior of potentially losing the python half of an object when passing it to a C++ function is (while rare) clearly incorrect, unexpected, and very difficult to root cause without digging into the C++ code. Evidence that users agree with me: #1120, #1145, #1333, #1389, #1552, #1546 (which has a similar implementation as this as a workaround!), #1566, #1774, #1902, #1937, and probably others. I've looked for a bit, and as far as I can tell there is not currently a mechanism to determine whether a class was constructed by python or not. To record that information:
Even then, that just tells you there's an alias class associated with the python type, not whether the constructed python type was created from an alias. You might be able to add a check for that in |
Looking at this more, you could enable upgrading a unique holder to shared_ptr by providing an implementation of Edit: ah, this is what rwgk was referring to, I think |
669d437
to
d216a0f
Compare
347870a
to
2b93ea7
Compare
Added the gil release, and a test inspired by #2844 |
5a11a72
to
ed477fe
Compare
This has been updated to only use the special deleter if an alias class is used. I found a place to store the alias indicator that isn't currently covered by the Anyways, the last commit here changes init_instance to vary based on whether an alias class was used. If an alias class is used, a version of init_instance is used which sets a flag on the py::instance. This flag is used to determine whether the special deleter that keeps the python portion alive should be used or not. |
Sorry I'm a bit late to the party and haven't read the above fully, but I think this does a similar thing to my (now quite stale) PR #1566 . I never worked out how to detect if it was an alias though, and in my version I did the shared_ptr dance a bit differently, haven't had a close look so don't know which is better, but I suspect yours. Would be great to get this problem fixed once and for all! |
@cdyson37 yep, this solves the same problem yours does, but takes care of only doing this for aliased shared_ptrs. |
This is an awesome PR! My only significant concern is exactly this question. I tested this PR in the Google environment with all sanitizers (ASAN, MSAN, TSAN, UBSAN) and didn't find any issues (except a couple known TSAN, UBSAN failures that already exist on master). I think the suggested changelog is a bit unclear. It doesn't mention alias classes. Is the following true? "For classes wrapped with (A) shared_ptr as the holder, and (B) having a trampoline (aka alias class), shared_ptrs passed from Python to C++ keep the Python object alive." |
Thanks for catching that, I'll update the changelog entry. The alias class thing was added later. |
Hi Dustin, I will merge #2841 in a minute, which will generate a merge conflict here I think. Sorry, but it's probably easy to resolve. If it wasn't for the pesky In the meantime, what do you think about splitting out a separate PR, just for factoring out |
Already did that :) #2845 |
@rwgk I'd be happy to help review your smart_holder branch (especially because it would fix so many long standing bugs / issues). As long as it's ABI and API compatible it seems like a good idea to integrate within pybind11. I think it would actually be good to have both progressive and conservative mode available, with an eventual goal of switching everything to progressive by default (which can be done at the next Python version release when everyone is forced to recompile the world anyways). |
A quick big thank you, that's awesome!
I'm turning this over in my mind still. Will get back soon. |
I'd really like to remove the Progressive mode. The main reason: I think if we have to support it, it could be more difficult to move to smart_holder as the default, in case we need subtle behavior tweaks. I don't know if that's actually the case, but I would be more surprised than not if there aren't any. But, is there anyone already relying on the Progressive mode? — I don't know about any use case. @virtuald @rhaschke? (I know of one or two people that tried but wound up using the Conservative mode.) How could I find out more systematically? |
I guess "Progressive mode" refers to |
Hi, we compile with PYBIND11_USE_SMART_HOLDER_AS_DEFAULT and it works great! But I have no problem dropping that for classic mode, if it means getting it in master! |
I also use But either way, I would like to also see smart_holder be merged to master, so I don't care how it gets there. I use my own fork of pybind11, so I can tweak my generator code as necessary. |
Yes.
Thanks! |
Currently, creating CTCDecoder object by passing a language model to `lm` argument without assigning it to a variable elsewhere causes `RuntimeError: Tried to call pure virtual function "LM::start"`. According to discussions on PyBind11, ( pybind/pybind11#4013 and pybind/pybind11#2839 ) this is due to Python object garbage-collected by the time it's used by code implemented in C++. It attempts to call methods defined in Python, which overrides the base pure virtual function, but the object which provides this override gets deleted by garbage collrector, as the original object is not reference counted. This commit fixes this by simply assiging the given `lm` object as an attribute of CTCDecoder class. Address #3218
Currently, creating CTCDecoder object by passing a language model to `lm` argument without assigning it to a variable elsewhere causes `RuntimeError: Tried to call pure virtual function "LM::start"`. According to discussions on PyBind11, ( pybind/pybind11#4013 and pybind/pybind11#2839 ) this is due to Python object garbage-collected by the time it's used by code implemented in C++. It attempts to call methods defined in Python, which overrides the base pure virtual function, but the object which provides this override gets deleted by garbage collrector, as the original object is not reference counted. This commit fixes this by simply assiging the given `lm` object as an attribute of CTCDecoder class. Address #3218
Summary: Currently, creating CTCDecoder object by passing a language model to `lm` argument without assigning it to a variable elsewhere causes `RuntimeError: Tried to call pure virtual function "LM::start"`. According to discussions on PyBind11, ( pybind/pybind11#4013 and pybind/pybind11#2839 ) this is due to Python object garbage-collected by the time it's used by code implemented in C++. It attempts to call methods defined in Python, which overrides the base pure virtual function, but the object which provides this override gets deleted by garbage collrector, as the original object is not reference counted. This commit fixes this by simply assiging the given `lm` object as an attribute of CTCDecoder class. Address #3218 Pull Request resolved: #3230 Reviewed By: hwangjeff Differential Revision: D44642989 Pulled By: mthrok fbshipit-source-id: a90af828c7c576bc0eb505164327365ebaadc471
Summary: Currently, creating CTCDecoder object by passing a language model to `lm` argument without assigning it to a variable elsewhere causes `RuntimeError: Tried to call pure virtual function "LM::start"`. According to discussions on PyBind11, ( pybind/pybind11#4013 and pybind/pybind11#2839 ) this is due to Python object garbage-collected by the time it's used by code implemented in C++. It attempts to call methods defined in Python, which overrides the base pure virtual function, but the object which provides this override gets deleted by garbage collrector, as the original object is not reference counted. This commit fixes this by simply assiging the given `lm` object as an attribute of CTCDecoder class. Address #3218 Pull Request resolved: #3230 Reviewed By: hwangjeff Differential Revision: D44642989 Pulled By: mthrok fbshipit-source-id: a90af828c7c576bc0eb505164327365ebaadc471
) Summary: Currently, creating CTCDecoder object by passing a language model to `lm` argument without assigning it to a variable elsewhere causes `RuntimeError: Tried to call pure virtual function "LM::start"`. According to discussions on PyBind11, ( pybind/pybind11#4013 and pybind/pybind11#2839 ) this is due to Python object garbage-collected by the time it's used by code implemented in C++. It attempts to call methods defined in Python, which overrides the base pure virtual function, but the object which provides this override gets deleted by garbage collrector, as the original object is not reference counted. This commit fixes this by simply assiging the given `lm` object as an attribute of CTCDecoder class. Address #3218 Pull Request resolved: #3230 Reviewed By: hwangjeff Differential Revision: D44642989 Pulled By: mthrok fbshipit-source-id: a90af828c7c576bc0eb505164327365ebaadc471
Regarding the merging With the latest version of both branches, I am observing a 1.31x compilation time increase and a 1.48x binary size increase in a release build of a class binding benchmark (this compares |
Thanks for sharing those results. The measurements are larger than I would have guessed, especially the binary size increase is really surprising. For my particular situation:
People have a choice, that's good. |
Patching pybind11 to fix premature deletion of python objects created and wrapped in c++. This patch is based off an open PR for pybind11. pybind/pybind11#2839
Patching pybind11 to fix premature deletion of python objects created and wrapped in c++. This patch is based off an open PR for pybind11. pybind/pybind11#2839
@rwgk do you have an update regarding the potential merging of Thanks! |
Sorry not sure still. — The project I wrote the smart_holder for is at just a little over 1 failure in 100,000 tests, but I still cannot know for sure when we're all the way through, and that's the highest priority. I'll continue merging master into smart_holder with just a few days delay at most. If someone wants to help that would be great:
|
Patching pybind11 to fix premature deletion of python objects created and wrapped in c++. This patch is based off an open PR for pybind11. pybind/pybind11#2839
Patching pybind11 to fix premature deletion of python objects created and wrapped in c++. This patch is based off an open PR for pybind11. pybind/pybind11#2839
I have also run into this problem. Is there any ETA yet for merging in the |
See #5339. I requested feedback five days ago (2024-08-26) but haven't gotten any. I want to encourage everyone to chime in. |
Description
Modifies the base type caster to ensure that python instances don't die when passed to C++.
The shared_ptr is always given to C++ code, so we construct a new shared_ptr that is given a custom deleter. The custom deleter increments the python reference count to bind the python instance lifetime with the lifetime of the shared_ptr. This behavior is only applied to instances that have an associated trampoline class
This enables things like passing the last python reference of a subclass to a C++ function without the python reference dying -- which means custom virtual function overrides will still work.
Reference cycles will cause a leak, but this is a limitation of shared_ptr
Thanks to @YannickJadoul for the inspiration putting me on this path. Fixes #1333 and #1120, #1145, #1389, #1552, #1546, #1566, #1774, #1902, #1937
Other inspiration from Boost::Python. Quoting from @rwgk at #2646:
Downsides:
py::wraper<>
trampoline shim with GC garbage collection (to prevent inheritance slicing) #2757 discusses this topic a bitPotential improvements:
Right now this does this for all shared_ptr, might make sense to only do this for things that are python instances of python classes?Now only does this for things that have trampolinesSuggested changelog entry: