memory sharing via gunicorn workers? #2599

ebbunnim · 2022-06-16T05:58:39Z

ebbunnim
Jun 16, 2022

Hi, I want to ask bentoml team about two things.

I'm trying to use the transformers library in the new version 1.0.0-rc1(to use the runner). But it seems that i have to define a transformers pipeline and store the model artifacts. By the way, the task I want to perform is related to sentence similarity, which is not supported by the transformers pipeline. What should I do?
Due to OOM issue, memory sharing between gunicorn workers should be done. If i can't use the rc1 version due to the issue above, i want to use 0.13 stable version. But in this version, after increasing the number of workers to 5, i got OOM right away. (Maybe the reason is that the size of the embedding vector, which is the basis for the sentence similarity calculation, is too large.) Can i use --preload option for memory sharing between gunicorn workers?
( NIT : The app packed by bentoml will be on the k8s cluster)

Jun 16, 2022

About memory sharing between unicorn workers with --preload option, in general we do not recommend this approach. it is not really memory sharing, but simply preload the model in python before forking the process. It may work in some cases, but it is tightly coupled with the extension implementation and may not be the most efficient way of accessing a shared model (as @bojiang explained above).

I'd recommend go with 1.0 release. The runner design in bentoml 1.0 is going to solve the "memory sharing" issue you were looking for and avoid the OOM issue.

There is currently an issue regarding transformer custom pipeline, we are working on a fix. see #2534

View full answer

bojiang · 2022-06-16T06:09:12Z

bojiang
Jun 16, 2022
Maintainer

Hi @ebbunim.
For q1, the doc about transformer integration needs to be improved for custom pipelines. We are investigating that.

The rough idea for your case now is to use a custom pipeline (https://huggingface.co/docs/transformers/add_new_pipeline). For now, users have to register the custom task to src/transformers/pipelines/__init__.py:SUPPORTED_TASKS on their own as well. We may provide a better solution, or at least docs, for this use case in future releases.

0 replies

bojiang · 2022-06-16T06:16:39Z

bojiang
Jun 16, 2022
Maintainer

For q2, to be clear, most ML frameworks have an internal solution to make use of multiple CPU cores. For example, users can set intra_op_parallelism_threads/inter_op_parallelism_threads to control the threads of TensorFlow operations. By default, the number would be CPU core count. Thus in many cases, N instances of model workers mean N*CPUs threads on the system. Because of the overhead of context switching, it drags down the throughput.

0 replies

parano · 2022-06-16T06:23:59Z

parano
Jun 16, 2022
Maintainer

About memory sharing between unicorn workers with --preload option, in general we do not recommend this approach. it is not really memory sharing, but simply preload the model in python before forking the process. It may work in some cases, but it is tightly coupled with the extension implementation and may not be the most efficient way of accessing a shared model (as @bojiang explained above).

I'd recommend go with 1.0 release. The runner design in bentoml 1.0 is going to solve the "memory sharing" issue you were looking for and avoid the OOM issue.

There is currently an issue regarding transformer custom pipeline, we are working on a fix. see #2534

0 replies

withsmilo · 2022-06-16T09:25:32Z

withsmilo
Jun 16, 2022
Maintainer

@ebbunnim
Just as workaround with v0.13.1, you could manually add the preload_app setting if you want. Please refer 1db97c3. But I'd recommend to go with the 1.0 release also.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

memory sharing via gunicorn workers? #2599

{{title}}

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

BentoML

memory sharing via gunicorn workers? #2599

ebbunnim Jun 16, 2022

Replies: 4 comments

bojiang Jun 16, 2022 Maintainer

bojiang Jun 16, 2022 Maintainer

parano Jun 16, 2022 Maintainer

withsmilo Jun 16, 2022 Maintainer

ebbunnim
Jun 16, 2022

bojiang
Jun 16, 2022
Maintainer

bojiang
Jun 16, 2022
Maintainer

parano
Jun 16, 2022
Maintainer

withsmilo
Jun 16, 2022
Maintainer