How to use celery pools=prefork with HashEncoder multiprocessing #407

aihiangng · 2024-07-30T06:13:04Z

aihiangng
Jul 30, 2024

I am trying to use celery multiprocessing with HashEncoder from category_encoders which has its own multiprocessing. However, when running celery with HashEncoder .transform() , I get "celery: daemonic processes are not allowed to have children" because Celery uses billiard as multiprocessing, and HashEncoder itself uses multiprocessing.

billiard and multiprocessing are different libraries - billiard is the Celery project's own fork of multiprocessing.

Several solutions were provided by the community which didnt work for me

Monkey patch hash encoder to use billiard as multiprocessing instead of the original multiprocessing. However, the HashEncoder library also uses sklearn which makes it hard to monkey patch both libraries (category_encoders and sklearn) and could introduce instability.
celery set pools=threads. However, this solution uses only 1 core and multiple threads which does not allow true parallelisation, even with concurrency=3.
Run celery without daemon. Not advised to do it this way.
Replace HashEncoder with other encoders. Not a good solution given that my data has high dimensionality, hash encoder would be better.

I really need my worker to run in parallel as performance is critical and would prefer celery=prefork. Is there any workaround that can allow multiprocessing with Celery and hashencoder?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use celery pools=prefork with HashEncoder multiprocessing #407

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to use celery pools=prefork with HashEncoder multiprocessing #407

aihiangng Jul 30, 2024

Replies: 0 comments

aihiangng
Jul 30, 2024