-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shortcut dict::get
and optimize item access for dict
#2779
base: master
Are you sure you want to change the base?
Conversation
First reaction: we already have Second reaction: don't use that raw C API, as we already have that Is this really the case? Is Discovery: So, should we 1) extend this PR to also have an efficient |
Note, |
Just to be clear, given the existing interfaces on other |
A
It should be, although I'm not absolutely certain... Yes I think
Personally, yes for 1) and no for 2), because
Will address this shortly. |
Huh? What do you mean? I'm talking about implementing |
Sorry I misunderstood your suggestion... I just pushed a prototype that specializes |
dict::get
that mirrors dict.get
in Pythondict::get
and optimize item access for dict
Argh, apologies, @lqf96. I thought this would be a lot more straightforward, after I looked how small Let me give you another quick review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for doing the extra effort and adding operator []
! Almost there, I think :-)
@YannickJadoul Pushed an update with your suggestions... I also have one question on whether I should |
I don't think so. (C)Python increases the reference count in |
Ok i'll then remove it... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detail::dict_iterator begin() const { return {*this, 0}; } | ||
detail::dict_iterator end() const { return {}; } | ||
void clear() const { PyDict_Clear(ptr()); } | ||
template <typename T> bool contains(T &&key) const { | ||
return PyDict_Contains(m_ptr, detail::object_or_cast(std::forward<T>(key)).ptr()) == 1; | ||
} | ||
|
||
object get(handle key, handle default_ = none()) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other reviewers: should this be moved out of the class definition? I think it's on the edge of being too long, but good enough and maybe nicer to keep it just as-is than to split it up into two parts.
// NULL without an exception means the key wasn’t present | ||
if (!PyErr_Occurred()) | ||
// Synthesize a KeyError with the key | ||
PyErr_SetObject(PyExc_KeyError, key.ptr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right error to set? Should we talk about __hash__
or __eq__
?
Should we even care about throwing from these two? In C++, a throw from std::hash<T>
is just plain insane. What's the state of std::unordered_map
after a throw from std::hash
or std::equal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted, there's two kinds of errors:
NULL
returned but notPyErr_Occurred()
: key is not present (we have to set an error ourselves)NULL
returned andPyErr_Occurred()
: exception occurred during__hash__
or__eq__
(and the error is already set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. My bad. Though the second part is important too.
Should we even care about throwing from these two? In C++, a throw from std::hash is just plain insane. What's the state of std::unordered_map after a throw from std::hash or std::equal?
The problem is not in inserting a single element and throwing. The problem is rehashing the dictionary and throwing half-way during the operation. At that point, basically anything goes. Thus, should we care about
NULL
returned but notPyErr_Occurred()
: key is not present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rgiht, I kind of missed that. But anyway, yes, Python throws an exception so we need to propagate this as C++ exception. We're not going to change how Python deals with this? I think the current implementation follows accessing dict
s in pure Python.
// NULL with an exception means exception occurred when calling | ||
// "__hash__" or "__eq__" on the key | ||
if (PyErr_Occurred()) | ||
throw error_already_set(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw or return default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2779 (comment)
I'm NOT opposed to this PR, but optimizing in this situation makes no sense to me. Is this really worth adding code? If runtime performance matters: use a C++ type (e.g. std::unordered_map), push the operations on the type to C++ in its entirety, wrap that. — That's exactly why pybind11 is so important: it's super easy to wrap another truly optimized function. If you are dealing with Python objects: you implicitly admit that performance doesn't really matter. Optimizations are expensive in terms of human effort and added code complexity (also human effort, really, for maintenance), with probably insignificant benefit in the big picture. |
OK, sounds good, I support this PR.
…On Mon, Sep 13, 2021 at 9:17 AM Aaron Gokaslan ***@***.***> wrote:
@rwgk <https://github.com/rwgk> After looking through the codebase, I
think it's definitely worth optimizing since we often call the Python
attributes throughout the c++ code which is backed by a py::dict. I think
this functionality is worth including and optimizing and this is not the
first time this PR has been requested.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2779 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFUZABDJOSDNH7DP4ZAC4LUBYPZXANCNFSM4V3I2E6A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Two quick high-level observations:
|
@rwgk I haven't been looking at this PR for a long time... Will try to follow up in a few days. |
@lqf96 Yeah. apologies for that. We are cleaning up old PRs. I am fine with the current complexity of this PR, we can always reduce it to a shim that calls "attr()" at worse case. |
Description
Add shortcut method
dict::get
that mirrorsdict.get
in Python.Also optimize item access fordict::get
has four versions: two of them takepy::handle
as key and callPyDict_GetItem
, while two others takeconst char*
as key and callPyDict_GetItemString
. Likedict.get
if the default value is not provided it will beNone
.dict
by specializingdict::operator []
that returns adict_accessor
.Suggested changelog entry: