Feature Request: Support for Multiple GPU Usage with `device_map="auto"` #117

6643789wsx · 2025-01-08T09:19:30Z

1. Summary

Currently, in the TransformersModel class of smolagents, the model is being allocated to a device using the .to(device) method. This approach restricts the usage to only one CUDA card. I would like to request the addition of an option to use device_map="auto" to enable the utilization of multiple GPUs.

2. Problem Description

Limited GPU Utilization: With the current implementation, smolagents can only make use of a single CUDA device. This is a significant limitation, especially when dealing with large models that could benefit from parallel processing across multiple GPUs. For example, running a large language model for text generation tasks would be much faster if multiple GPUs could be used.
Lack of Scalability: As the size of models and the complexity of tasks grow, the ability to scale by using multiple GPUs becomes crucial. The current .to(device) method does not provide this scalability.

There is the error I meet. Although I have many cudas, it only work on cuda:0.

OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacity of 23.65 GiB of which 14.06 MiB is free. Including non-PyTorch memory, this process has 23.63 GiB memory in use. Of the allocated memory 23.12 GiB is allocated by PyTorch, and 60.91 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

3. Proposed Solution

Add device_map="auto" Option: In the TransformersModel class, when initializing the model, add an option to use device_map="auto" instead of just .to(device). This would allow the model to automatically distribute across available GPUs, optimizing performance. For example, the code could be updated as follows:

try:
    self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
except Exception as e:
    # Handle the exception as before
    pass

Configuration for Users: Provide a way for users to easily configure this option. This could be through a configuration file or an additional parameter when initializing the relevant classes in smolagents.

4. Benefits

Performance Improvement: Using multiple GPUs will significantly improve the inference speed of the models in smolagents. This will lead to faster response times, especially for computationally intensive tasks.
Scalability: It will make smolagents more suitable for large - scale projects and research, where the ability to scale the computing resources is essential.

Thank you for considering this feature request. I believe it will greatly enhance the capabilities of smolagents.

The text was updated successfully, but these errors were encountered:

aymeric-roucher · 2025-01-09T22:49:33Z

I think this is a great idea! Don't hesitate to open a PR for it, will be happy to review this! 🤗

**Summary**: Previously, in the `TransformersModel` class of smolagents, the model was allocated to a device using `.to(device)`, which limited usage to only one CUDA card. This commit addresses that limitation by introducing the option to use `device_map='auto'` for better utilization of multiple GPUs. **Problem Addressed**: - **Limited GPU Utilization**: smolagents could only use a single CUDA device before, restricting performance for large models that could benefit from parallel processing across multiple GPUs. For instance, running large language models for text generation tasks was slower than it could be with multiple GPUs. - **Lack of Scalability**: As model sizes and task complexity grew, the existing `.to(device)` method didn't offer the necessary scalability. Additionally, an `OutOfMemoryError` was encountered when working with a single GPU (cuda:0) as memory was exhausted despite available additional CUDA resources. **Solution Implemented**: - **Added `device_map='auto'` Option**: In the `TransformersModel` class during model initialization, the code now allows for using `device_map='auto'` instead of just `.to(device)`. This enables the model to automatically distribute across available GPUs, as demonstrated by the updated code snippet: ```python try: self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") except Exception as e: # Handle the exception as before pass ``` - **User Configuration**: Created a way for users to configure this option easily, either through a configuration file or an additional parameter when initializing relevant smolagents classes. **Benefits**: - **Performance Improvement**: Multiple GPU usage will enhance the inference speed of smolagents' models, resulting in faster response times for computationally intensive tasks. - **Scalability**: Makes smolagents more suitable for large-scale projects and research where scaling computing resources is vital. Closes huggingface#117

6643789wsx linked a pull request Jan 10, 2025 that will close this issue

feat: Add multi-GPU support for TransformersModel #139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support for Multiple GPU Usage with `device_map="auto"` #117

Feature Request: Support for Multiple GPU Usage with `device_map="auto"` #117

6643789wsx commented Jan 8, 2025

aymeric-roucher commented Jan 9, 2025

Feature Request: Support for Multiple GPU Usage with device_map="auto" #117

Feature Request: Support for Multiple GPU Usage with device_map="auto" #117

Comments

6643789wsx commented Jan 8, 2025

1. Summary

2. Problem Description

3. Proposed Solution

4. Benefits

aymeric-roucher commented Jan 9, 2025

Feature Request: Support for Multiple GPU Usage with `device_map="auto"` #117

Feature Request: Support for Multiple GPU Usage with `device_map="auto"` #117