Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for Multiple GPU Usage with device_map="auto" #117

Open
6643789wsx opened this issue Jan 8, 2025 · 1 comment · May be fixed by #139
Open

Feature Request: Support for Multiple GPU Usage with device_map="auto" #117

6643789wsx opened this issue Jan 8, 2025 · 1 comment · May be fixed by #139

Comments

@6643789wsx
Copy link

1. Summary

Currently, in the TransformersModel class of smolagents, the model is being allocated to a device using the .to(device) method. This approach restricts the usage to only one CUDA card. I would like to request the addition of an option to use device_map="auto" to enable the utilization of multiple GPUs.

2. Problem Description

  1. Limited GPU Utilization: With the current implementation, smolagents can only make use of a single CUDA device. This is a significant limitation, especially when dealing with large models that could benefit from parallel processing across multiple GPUs. For example, running a large language model for text generation tasks would be much faster if multiple GPUs could be used.
  2. Lack of Scalability: As the size of models and the complexity of tasks grow, the ability to scale by using multiple GPUs becomes crucial. The current .to(device) method does not provide this scalability.

There is the error I meet. Although I have many cudas, it only work on cuda:0.

OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacity of 23.65 GiB of which 14.06 MiB is free. Including non-PyTorch memory, this process has 23.63 GiB memory in use. Of the allocated memory 23.12 GiB is allocated by PyTorch, and 60.91 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

3. Proposed Solution

  1. Add device_map="auto" Option: In the TransformersModel class, when initializing the model, add an option to use device_map="auto" instead of just .to(device). This would allow the model to automatically distribute across available GPUs, optimizing performance. For example, the code could be updated as follows:
try:
    self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
except Exception as e:
    # Handle the exception as before
    pass
  1. Configuration for Users: Provide a way for users to easily configure this option. This could be through a configuration file or an additional parameter when initializing the relevant classes in smolagents.

4. Benefits

  1. Performance Improvement: Using multiple GPUs will significantly improve the inference speed of the models in smolagents. This will lead to faster response times, especially for computationally intensive tasks.
  2. Scalability: It will make smolagents more suitable for large - scale projects and research, where the ability to scale the computing resources is essential.

Thank you for considering this feature request. I believe it will greatly enhance the capabilities of smolagents.

@aymeric-roucher
Copy link
Collaborator

I think this is a great idea! Don't hesitate to open a PR for it, will be happy to review this! 🤗

6643789wsx added a commit to 6643789wsx/smolagents that referenced this issue Jan 10, 2025
**Summary**:
Previously, in the `TransformersModel` class of smolagents, the model was allocated to a device using `.to(device)`, which limited usage to only one CUDA card. This commit addresses that limitation by introducing the option to use `device_map='auto'` for better utilization of multiple GPUs.

**Problem Addressed**:
- **Limited GPU Utilization**: smolagents could only use a single CUDA device before, restricting performance for large models that could benefit from parallel processing across multiple GPUs. For instance, running large language models for text generation tasks was slower than it could be with multiple GPUs.
- **Lack of Scalability**: As model sizes and task complexity grew, the existing `.to(device)` method didn't offer the necessary scalability. Additionally, an `OutOfMemoryError` was encountered when working with a single GPU (cuda:0) as memory was exhausted despite available additional CUDA resources.

**Solution Implemented**:
- **Added `device_map='auto'` Option**: In the `TransformersModel` class during model initialization, the code now allows for using `device_map='auto'` instead of just `.to(device)`. This enables the model to automatically distribute across available GPUs, as demonstrated by the updated code snippet:
```python
try:
    self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
except Exception as e:
    # Handle the exception as before
    pass
```
- **User Configuration**: Created a way for users to configure this option easily, either through a configuration file or an additional parameter when initializing relevant smolagents classes.

**Benefits**:
- **Performance Improvement**: Multiple GPU usage will enhance the inference speed of smolagents' models, resulting in faster response times for computationally intensive tasks.
- **Scalability**: Makes smolagents more suitable for large-scale projects and research where scaling computing resources is vital.

Closes huggingface#117
@6643789wsx 6643789wsx linked a pull request Jan 10, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants