Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shortcut dict::get and optimize item access for dict #2779

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

lqf96
Copy link
Contributor

@lqf96 lqf96 commented Jan 9, 2021

Description

Add shortcut method dict::get that mirrors dict.get in Python. dict::get has four versions: two of them take py::handle as key and call PyDict_GetItem, while two others take const char* as key and call PyDict_GetItemString. Like dict.get if the default value is not provided it will be None. Also optimize item access for dict by specializing dict::operator [] that returns a dict_accessor.

Suggested changelog entry:

* Add shortcut method `dict::get` that mirrors `dict.get` in Python
* Optimize item access for `dict` with `dict_accessor` and specialized `dict::operator []`

@YannickJadoul
Copy link
Collaborator

First reaction: we already have item_accessor object_api<T>::operator [](handle key) const and item_accessor object_api<T>::operator [](const char *key) const. But yes, this doesn't contain get, so OK, maybe.

Second reaction: don't use that raw C API, as we already have that operator [] for all py::objects. But ah, that's getattr, so probably (a lot?) less efficient than PyDict_GetItem. So, right. But why don't we have a fast operator [] for dicts, then?

Is this really the case? Is PyDict_GetItem a lot faster than PyObject_GetAttribute?

Discovery: list and tuple have this detail::list_accessor and detail::tuple_accessor, which "specialize" operator [] for list and tuple.

So, should we 1) extend this PR to also have an efficient operator [], and 2) reuse the accessor functionality as it has been used for other classes?

@YannickJadoul
Copy link
Collaborator

Note, PyDict_GetItemString is a shallow wrapper (the string is converted to a PyObject anyway), and can probably just be implemented similarly to the item_accessor object_api<D>::operator[](const char *key) overload.

@YannickJadoul
Copy link
Collaborator

Just to be clear, given the existing interfaces on other py::object subtypes, I think this PR is a good idea, but I'd like to make it slightly more complete/consistent with the other classes :-)

@lqf96
Copy link
Contributor Author

lqf96 commented Jan 10, 2021

First reaction: we already have item_accessor object_api<T>::operator [](handle key) const and item_accessor object_api<T>::operator [](const char *key) const. But yes, this doesn't contain get, so OK, maybe.

A get function is necessary if a fallback value is desired when the key is not found. Yes you can do it by calling operator [] and catching KeyError but that seems awkward.

Second reaction: don't use that raw C API, as we already have that operator [] for all py::objects. But ah, that's getattr, so probably (a lot?) less efficient than PyDict_GetItem. So, right. But why don't we have a fast operator [] for dicts, then?

Is this really the case? Is PyDict_GetItem a lot faster than PyObject_GetAttribute?

Discovery: list and tuple have this detail::list_accessor and detail::tuple_accessor, which "specialize" operator [] for list and tuple.

It should be, although I'm not absolutely certain... Yes I think operator [] should be specialized for dict so that they enjoy the same benefits of get.

So, should we 1) extend this PR to also have an efficient operator [], and 2) reuse the accessor functionality as it has been used for other classes?

Personally, yes for 1) and no for 2), because get is just about getting some from the dictionary. The set side should be instead handled by setdefault (or PyDict_SetDefault). If you want I can also address it in this PR.

Note, PyDict_GetItemString is a shallow wrapper (the string is converted to a PyObject anyway), and can probably just be implemented similarly to the item_accessor object_api<D>::operator[](const char *key) overload.

Will address this shortly.

@YannickJadoul
Copy link
Collaborator

A get function is necessary if a fallback value is desired when the key is not found. Yes you can do it by calling operator [] and catching KeyError but that seems awkward.

operator [] does not throw a key error until trying to access it. Have a look at the accessor<Policy> template. But operator bool seems to unfortunately be deprecated.

2), because get is just about getting some from the dictionary.

Huh? What do you mean? I'm talking about implementing dict::operator [] in the same fashion as e.g. list::operator []. Why would that not be a good idea?

@lqf96
Copy link
Contributor Author

lqf96 commented Jan 11, 2021

Huh? What do you mean? I'm talking about implementing dict::operator [] in the same fashion as e.g. list::operator []. Why would that not be a good idea?

Sorry I misunderstood your suggestion... I just pushed a prototype that specializes dict::operator [] for faster item access. Due to some problems around exception handling I have to use PyDict_GetItemWithError for Python 3 and fall back to generic_item::get for Python 2. I also found out I can't reuse the implementation of dict::operator [] for get again because of exception handling.

@lqf96 lqf96 changed the title Add dict::get that mirrors dict.get in Python Add shortcut dict::get and optimize item access for dict Jan 11, 2021
@YannickJadoul
Copy link
Collaborator

Argh, apologies, @lqf96. I thought this would be a lot more straightforward, after I looked how small list::operator [] is :-(
Let's see. If this becomes to big, we revert to your original PR, but now that you've done the work, it would be nice to get this in :-)

Let me give you another quick review.

Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for doing the extra effort and adding operator []! Almost there, I think :-)

include/pybind11/pytypes.h Outdated Show resolved Hide resolved
include/pybind11/pytypes.h Show resolved Hide resolved
include/pybind11/pytypes.h Outdated Show resolved Hide resolved
include/pybind11/pytypes.h Show resolved Hide resolved
include/pybind11/pytypes.h Outdated Show resolved Hide resolved
include/pybind11/pytypes.h Show resolved Hide resolved
include/pybind11/pytypes.h Show resolved Hide resolved
@lqf96
Copy link
Contributor Author

lqf96 commented Jan 11, 2021

@YannickJadoul Pushed an update with your suggestions... I also have one question on whether I should inc_ref the key before calling PyErr_SetObject...

@YannickJadoul
Copy link
Collaborator

I also have one question on whether I should inc_ref the key before calling PyErr_SetObject...

I don't think so. (C)Python increases the reference count in _PyErr_SetObject and other callers in that file decrease the refcount after calling _PyErr_SetObject, so we're good like this.

@lqf96
Copy link
Contributor Author

lqf96 commented Jan 11, 2021

I don't think so. (C)Python increases the reference count in _PyErr_SetObject and other callers in that file decrease the refcount after calling _PyErr_SetObject, so we're good like this.

Ok i'll then remove it...

Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @lqf96. Nothing more from me. I think this can perfectly go in with the next minor release, so I'll add it to that milestone!

The test failure is fine (at least if it's only 3.10-dev); see #2774.

detail::dict_iterator begin() const { return {*this, 0}; }
detail::dict_iterator end() const { return {}; }
void clear() const { PyDict_Clear(ptr()); }
template <typename T> bool contains(T &&key) const {
return PyDict_Contains(m_ptr, detail::object_or_cast(std::forward<T>(key)).ptr()) == 1;
}

object get(handle key, handle default_ = none()) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other reviewers: should this be moved out of the class definition? I think it's on the edge of being too long, but good enough and maybe nicer to keep it just as-is than to split it up into two parts.

@YannickJadoul YannickJadoul added this to the v2.7 milestone Jan 11, 2021
// NULL without an exception means the key wasn’t present
if (!PyErr_Occurred())
// Synthesize a KeyError with the key
PyErr_SetObject(PyExc_KeyError, key.ptr());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right error to set? Should we talk about __hash__ or __eq__?

Should we even care about throwing from these two? In C++, a throw from std::hash<T> is just plain insane. What's the state of std::unordered_map after a throw from std::hash or std::equal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, there's two kinds of errors:

  1. NULL returned but not PyErr_Occurred(): key is not present (we have to set an error ourselves)
  2. NULL returned and PyErr_Occurred(): exception occurred during __hash__ or __eq__ (and the error is already set).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. My bad. Though the second part is important too.

Should we even care about throwing from these two? In C++, a throw from std::hash is just plain insane. What's the state of std::unordered_map after a throw from std::hash or std::equal?

The problem is not in inserting a single element and throwing. The problem is rehashing the dictionary and throwing half-way during the operation. At that point, basically anything goes. Thus, should we care about

  1. NULL returned but not PyErr_Occurred(): key is not present

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rgiht, I kind of missed that. But anyway, yes, Python throws an exception so we need to propagate this as C++ exception. We're not going to change how Python deals with this? I think the current implementation follows accessing dicts in pure Python.

// NULL with an exception means exception occurred when calling
// "__hash__" or "__eq__" on the key
if (PyErr_Occurred())
throw error_already_set();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw or return default?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwgk
Copy link
Collaborator

rwgk commented Jul 15, 2021

I'm NOT opposed to this PR, but optimizing in this situation makes no sense to me. Is this really worth adding code?

If runtime performance matters: use a C++ type (e.g. std::unordered_map), push the operations on the type to C++ in its entirety, wrap that. — That's exactly why pybind11 is so important: it's super easy to wrap another truly optimized function.

If you are dealing with Python objects: you implicitly admit that performance doesn't really matter. Optimizations are expensive in terms of human effort and added code complexity (also human effort, really, for maintenance), with probably insignificant benefit in the big picture.

@henryiii henryiii modified the milestones: v2.7, v2.8 Jul 16, 2021
@rwgk
Copy link
Collaborator

rwgk commented Sep 13, 2021 via email

@rwgk
Copy link
Collaborator

rwgk commented Sep 13, 2021

Two quick high-level observations:

  • clang-tidy failures need to be fixed.
  • I see two throw error_already_set(); in newly added code but those don't seem to be exercised in the added tests. Are they covered somewhere else already?

@lqf96
Copy link
Contributor Author

lqf96 commented Sep 13, 2021

@rwgk I haven't been looking at this PR for a long time... Will try to follow up in a few days.

@Skylion007
Copy link
Collaborator

@lqf96 Yeah. apologies for that. We are cleaning up old PRs. I am fine with the current complexity of this PR, we can always reduce it to a shim that calls "attr()" at worse case.

@henryiii henryiii modified the milestones: v2.8, v2.9 Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants