-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
complete dummy in lock-free - but this looks helpful? #18
Comments
and here is bounty for anyone who succesfully merges anything lock-free: |
this looks even more promising set of lock-free data structures (not sure if C89): |
liblfds: First reaction: Windows is a pretty important platform for us, even Win32. But if it's a great library maybe we can contribute to it if it needs help. (They aren't sure about their Win32 support.) Second reaction: we can't use a linked-list implementation for lists, and if "Hash (add-only)" means what I think it means we can't use that either. And the other data structures are of secondary importance. Maybe they'll be interesting later down the road, but not right now. concurrencykit: they seem to enforce requirements that are incompatible with us. e.g. on their hash table, they key is a region of memory up to 64k long. it sounds like they own it. Maybe we can learn things by reading their source code / algorithms / approaches, but I don't think we can use either of these libraries. Since I want to try and keep the issues area clean, I'm closing this. I think it's fine to bring these things up, but maybe we can use the wiki for that? |
Wiki sounds good! Thanks for quick analysis, which makes sense to me. On Saturday, June 4, 2016, larryhastings [email protected] wrote:
|
@larryhastings I'm more worried about the needs to initialize structs on each CPU core separately. All in all, this does not sound worth integrating to me, either. |
related #29 |
Hi. I am the author of liblfds. I'm puzzled by a sentence in one of the earlier comments - "we can't use a linked-list implementation for lists". I don't understand - on the face of it, that seems to make no sense? "we can't use a linked list for lists"? Regarding Win32, the platform will be supported on an ongoing basis, in part because of existing users, in part because of future possible users, and in part because of the very high value of the platform in providing a very different platform upon which to compile (and so to find bugs and mistakes). What is in question however is the provision of Visual Studio solution files. They are problematic in terms of workload and so challenging to support. Finally, regarding a concern from JustAMan (clearly a Faith No More fan) - "I'm more worried about the needs to initialize structs on each CPU core separately". This is as far as I know absolutely and unequivocably unavoidable. Each thread MUST issue a load memory barrier so that it is guaranteed to see the initialization work performed by the initializing thread on the data structure state. The same work is being done in fact when normal locks are used (for it is equally as necessary) - for each lock, when it is taken, issues a load barrier. However, as with all things, there may be steps that can be taken to address or ameilorate the problem. In general, there is a very strong need for a O(log n) or O(1) data structure in the library which supports full add and delete - there are so many uses cases which require this and cannot be met without it. |
What Python names a "list" is actually an "array" as it occupies a continuous chunk of memory and offers Thanks for explaining that macro, I was misunderstanding it. It makes full sense now. As for Win32 support - are we talking about the same thing? I think @larryhastings was worried about 32-bit version, as your site states that 64-bit version is tested while 32-bit "should work". P.S. My nickname was invited by myself, and first time I heard about "Faith No More" was from you today :) So it's not related to that. |
Howdy liblfds. I appreciate your willingness to help. But AFAICT the data structures provided by your library aren't applicable to CPython + gilectomy right now. JustAMan's summary about the list is correct. Python's term for the data structure is "list", so naturally that's what I call it, but it really is an array. |
JustAMan wrote:
Ah, Python lists - all is clear. But, as I recall, Python lists can grow. Is the list reallocated each time, or is there a higher level data structure holding each of the allocations which form the list?
Right - exactly. One of the problems of being too close to your own code for too long - always in documentation you try to explain things to the outsider, but you can progressively lose the ability to be in the mind that outsider.
Ah - well spotted. I was indeed thinking Larry was concerned about Windows support, as opposed to 32 bit support. For all platforms, including Windows, both 32 and 64 bit platforms will be supported on an ongoing basis - the problem however is that I currently lack a 32 bit Windows platform to compile upon :-/ I have a Windows XP installer disk, but it's in storage, in a distant country!
Good. Solution files for me are just... OMG death. Something like ten thousand settings, each of which has to be set by hand using a mouse and a GUI.
nmake is I also believe natively provided. I'll have a look.
grin https://www.youtube.com/watch?v=7g4L47kEcS0 Comment - "this song is on my funeral playlist" =-) |
Howdy, Larry.
Library development is ongoing. What is it that you would find useful? |
Python lists are effectively realloc'd when they run out of space. Same with hash tables. I don't know of anything specifically that we need WRT concurrent-friendly data structures. So far we're managing with fine-grained locking. We have much bigger performance worries than the data structures right now. |
What happens if the user attempts to insert an element between two existing elements? the right hand-side of the array is copied up one element, to make room? I always had the impression Python lists were really something much more under the hood - a tree or a hash - and the list-like interface was something presented to the user. Have you come across counted trees? (maybe you already use them?) the key for an element is its position in the tree, and inserting a new element automatically increases the position of following elements without having to visit them. So you would imagine a tree with say 100 elements in, 0 to 99, and then the user inserts a new element at location 50. It's O(log n) to insert (the tree is balanced), and that gives you elements now 0 to 100, exactly as you want them, with the new element at 50 and all the elements after automatically bumped up by one (i.e. at zero cost).
The hash tables are array backed, rather than backed by a dynamic data structure, such that they can run out of space?
It'd be hard to write something which would make me more curious :-) what kind of problems are you facing? |
Indeed. Python list is realloc'ed if it doesn't have enough room (they try to reallocate it in, I think, exponential way - like make it twice bigger each time or something like that). So if you try to insert something in the middle it's about copying right part one place righter and then placing an entry to the vacant space.
Yes and no. They're backed with an array, but when this array gets around 2/3 filled they're realloc'd to get more room (and stuff is copied over to make access faster), so in a way they cannot run out of space. Here's a link with good enough explanation how Python hash tables (dictionaries, in Python language) work: http://www.laurentluce.com/posts/python-dictionary-implementation/ |
Thankyou for the link. A considered design, chosen to maximize cache locality and optimized for consequecutive access cases. The realloc is a major roadbump when it happens, but given the array only increases in size, the initial realloc and copies should usually occur early and so the system will come to a stable state and then perform. The downside is resource usage will always be at the high-water mark, until the hash is destroyed. In a multi-threaded scenario, how is the hash protected? Larry mentioned fine-grained locking? I'm not an expert in locking-based solutions, but in the straightforward solutions (one lock per data structure) the scaling factor is negative, i.e. you go fastest with a single core, no matter how many cores you have (although here I think of data structure accesses only - i.e. n threads simply hammering the data structures; in practise, in real code, where the threads usually have a bunch of single-threaded work to do but then need a data structure access or two, actual overall system throughput rises as the core count rises, until the shared data structure accesses become so slow, from contention, that their cost is greater than the single-threaded work being performed). |
I've been reading this; https://lwn.net/SubscriberLink/689548/4328423f85a47679/ I'm not really famiilar with conventional garbage collection, or how it is used in practise, so I am most likely completely wrong, but I have an implementation (not yet published - next release probably, I have debugging and testing to finish) of a type of lock-free garbage collection, and although it's quite limited, quite specialized, there may be a very small chance it might be of some relevance. Its advantage is that there is no need to stop-the-world, which I believe is usually what has to happen with garbage collection. |
The problem with
In default Python implementation everything is protected by GIL (one global lock per process). In this gilectomy thing dictionaries are so far protected with a per-structure recursive lock, and there's some specifics about locking when one considers R/W locks. I've tried to briefly outline those at #30, I hope I didn't miss anything important there. |
When you insert in the middle, the elements after are memcpy'd up one slot to make room. They're grown by 1.5x. Dicts are stored in a hash table, which is an array. It used to be simpler, but now the keys can be (always are?) stored separately for the "key-sharing dicts" feature merged in 3.2. In the future it would be more straightforward if you'd answer your own questions about Python's builtin types. They're all implemented in C in this directory: https://hg.python.org/cpython/file/tip/Objects Dicts (hash tables) are in dictobject, lists (arrays) are in listobject. |
http://www.liblfds.org/
portable, license-free, lock-free data structure library written in C89
The text was updated successfully, but these errors were encountered: