Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: document that Backends are not thread safe, and provide multi-thread recipes/mechanics #10346

Open
1 task done
hottwaj opened this issue Oct 21, 2024 · 3 comments
Open
1 task done
Labels
feature Features or general enhancements

Comments

@hottwaj
Copy link

hottwaj commented Oct 21, 2024

Is your feature request related to a problem?

ibis Backend instances are not thread safe (which makes sense as DBs generally have a concept of "connections" which can't be shared by multiple threads simultaneously).

Current docs don't explain this so its easy to cause a hard crash if you're experimenting, and an API to workaround it would be very helpful

See unit test below which uses memtable in multiple threads. Running this code either causes a segfault or deadlock.

import ibis
import concurrent.futures
import pandas

class TestMultiThreadedMemtable(TestCase):
    def test_multi_threaded_memtable(self):
        """
        Test that ibis.memtable can be called from multiple threads without issue
        """
        def make_table():
            return ibis.memtable(pandas.DataFrame({'name': ['Alice', 'Bob', 'Charlie'],
                                                   'age': [1, 2, 3]})).to_pandas()
        
        with concurrent.futures.ThreadPoolExecutor() as executor:
            futures = [executor.submit(make_table) for _ in range(100)]
            tables = [f.result() for f in futures]
            
        for table in tables:
            pandas.testing.assert_frame_equal(table, make_table(),
                                              check_like=True)

What is the motivation behind your request?

No response

Describe the solution you'd like

While its not possible to share connections between threads, it would be great if:

  1. docs pointed out that Backends are not thread safe
  2. ibis provided some API to make it easy to build multiple Backends for use in a multi-threaded environment. this could be i) an API for a connection pool/copying an existing connection or ii) some kind of thread-safe wrapper around a Backend that uses a different underlying copy of a Backend (i.e. different connection) for each thread

What version of ibis are you running?

9.5.0

What backend(s) are you using, if any?

duckdb

Code of Conduct

  • I agree to follow this project's Code of Conduct
@hottwaj hottwaj added the feature Features or general enhancements label Oct 21, 2024
@hottwaj
Copy link
Author

hottwaj commented Oct 21, 2024

Also ibis.memtable has no arg to allow users to specify the connection to be used, which means it is can't really be used in a thread safe way at the moment

It either needs a "backend/connection" arg added or all table creation needs to be done using con.register or con.read_xxx type methods

@cpcloud
Copy link
Member

cpcloud commented Oct 21, 2024

You can set the backend that will be used to execute memtable expressions by calling ibis.set_backend(con).

Also, you can execute memtables against any existing connection, like any other expression, using con.execute(my_memtable).

It's very unlikely we'll have capacity to add thread-safety of any kind to Ibis in the foreseeable future.

As you allude to, connection state thread-safety is not something Ibis generally has control over and bolting on thread-safety after the fact has a number of disadvantages.

@hottwaj
Copy link
Author

hottwaj commented Oct 21, 2024

Thanks I understand & that makes sense, but I think some docs and maybe a simple example are needed to clarify?

Using con.execute(memtable) seems a workable way to build more complex expressions on a memtable and ultimately execute those on different connections in a thread-safe way - its basically somewhat similar to the "unbound table" approach. Thanks for pointing it out, I wouldn't have thought of it

ibis.set_backend probably sets global state though so is also not thread safe and I don't think can help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

No branches or pull requests

2 participants