You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although Python's str type is immutable from Python space, it is a C-mutable object. PyUnicodeObject has several writeable members. In fact PyUnicodeObject's payload itself is writeable from C code when condition Py_REFCNT(o) == 1. @larryhastings and I agree that a per-object lock for str is too costly. Instead we like to go with an optimistic global unicode lock.
Disclaimer: I don't fully understand the details of the current implementation and PEP 393.
The hash members caches the hash value for hash('somestring'). It is only computed on demand. Since hash doesn't involve any storage, no locking is required. At worst two threads compute the same hash value and override each other.
writeable data members
wchar_t *wstr (PyASCIIObject)
char *utf8 (PyCompactUnicodeObject)
PyUnicodeObject.data
Write access to any and all C-mutable members, that involve memory allocation, must be synchronized by the GUL. Otherwise two threads may set the same pointer, which result in a memory leak of one of the allocated buffers. My gut feeling tells me that conflicts are scarce, so optimistic locking is going to perform better here.
Check if utf8 member is already set
When utf8 member is not set, compute UTF-8 value
acquire GUL
Check again of another thread has set utf8 member in the mean time.
if utf8 member is still NULL, set member
if utf8 member has been set by another thread, discard and free UTF-8 value
release GUL
special casing of Py_REFCNT() == 1
Python's str uses a special case to optimize string concatenation and in _PyUnicodeWriter. As far as I am able to figure out _PyUnicodeWriter, it requires the special case to work. I'm not yet sure how to handle this special case. I have been considering a new flag constructable which can be set if-and-only-if a PyUnicodeObject is in C API calls in a single thread. struct state has unused 24 bits left.
No, it's not easily possible. AFAIK PyPy does not implement the trick where a single-reference PyUnicodeObject is mutable as long as it has not escaped into Python space. CPython's PyUnicodeObject is mutable in more ways. For instance each PyUnicodeObject can hold multiple optional representations of its data, e.g. an additional UTF-8 representation. The case is explained in the paragraph writeable data members. We can't get rid of the additional members w/o a major rewrite, API breakage and performance decrease.
Although Python's str type is immutable from Python space, it is a C-mutable object.
PyUnicodeObject
has several writeable members. In factPyUnicodeObject
's payload itself is writeable from C code when conditionPy_REFCNT(o) == 1
. @larryhastings and I agree that a per-object lock for str is too costly. Instead we like to go with an optimistic global unicode lock.Disclaimer: I don't fully understand the details of the current implementation and PEP 393.
https://www.python.org/dev/peps/pep-0393/#specification
Py_hash_t hash
The
hash
members caches the hash value forhash('somestring')
. It is only computed on demand. Since hash doesn't involve any storage, no locking is required. At worst two threads compute the same hash value and override each other.writeable data members
wchar_t *wstr (PyASCIIObject)
char *utf8 (PyCompactUnicodeObject)
PyUnicodeObject.data
Write access to any and all C-mutable members, that involve memory allocation, must be synchronized by the GUL. Otherwise two threads may set the same pointer, which result in a memory leak of one of the allocated buffers. My gut feeling tells me that conflicts are scarce, so optimistic locking is going to perform better here.
utf8
member is already setutf8
member is not set, compute UTF-8 valueutf8
member in the mean time.utf8
member is still NULL, set memberutf8
member has been set by another thread, discard and free UTF-8 valuespecial casing of Py_REFCNT() == 1
Python's
str
uses a special case to optimize string concatenation and in_PyUnicodeWriter
. As far as I am able to figure out_PyUnicodeWriter
, it requires the special case to work. I'm not yet sure how to handle this special case. I have been considering a new flagconstructable
which can be set if-and-only-if aPyUnicodeObject
is in C API calls in a single thread.struct state
has unused 24 bits left.WIP branch
I have started a branch but gave up after a couple of hours, https://github.com/tiran/gilectomy/tree/gul
The text was updated successfully, but these errors were encountered: