-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Support for Multiple GPU Usage with device_map="auto"
#117
Comments
I think this is a great idea! Don't hesitate to open a PR for it, will be happy to review this! 🤗 |
6643789wsx
added a commit
to 6643789wsx/smolagents
that referenced
this issue
Jan 10, 2025
**Summary**: Previously, in the `TransformersModel` class of smolagents, the model was allocated to a device using `.to(device)`, which limited usage to only one CUDA card. This commit addresses that limitation by introducing the option to use `device_map='auto'` for better utilization of multiple GPUs. **Problem Addressed**: - **Limited GPU Utilization**: smolagents could only use a single CUDA device before, restricting performance for large models that could benefit from parallel processing across multiple GPUs. For instance, running large language models for text generation tasks was slower than it could be with multiple GPUs. - **Lack of Scalability**: As model sizes and task complexity grew, the existing `.to(device)` method didn't offer the necessary scalability. Additionally, an `OutOfMemoryError` was encountered when working with a single GPU (cuda:0) as memory was exhausted despite available additional CUDA resources. **Solution Implemented**: - **Added `device_map='auto'` Option**: In the `TransformersModel` class during model initialization, the code now allows for using `device_map='auto'` instead of just `.to(device)`. This enables the model to automatically distribute across available GPUs, as demonstrated by the updated code snippet: ```python try: self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") except Exception as e: # Handle the exception as before pass ``` - **User Configuration**: Created a way for users to configure this option easily, either through a configuration file or an additional parameter when initializing relevant smolagents classes. **Benefits**: - **Performance Improvement**: Multiple GPU usage will enhance the inference speed of smolagents' models, resulting in faster response times for computationally intensive tasks. - **Scalability**: Makes smolagents more suitable for large-scale projects and research where scaling computing resources is vital. Closes huggingface#117
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
1. Summary
Currently, in the
TransformersModel
class of smolagents, the model is being allocated to a device using the.to(device)
method. This approach restricts the usage to only one CUDA card. I would like to request the addition of an option to usedevice_map="auto"
to enable the utilization of multiple GPUs.2. Problem Description
.to(device)
method does not provide this scalability.There is the error I meet. Although I have many cudas, it only work on cuda:0.
3. Proposed Solution
device_map="auto"
Option: In theTransformersModel
class, when initializing the model, add an option to usedevice_map="auto"
instead of just.to(device)
. This would allow the model to automatically distribute across available GPUs, optimizing performance. For example, the code could be updated as follows:4. Benefits
Thank you for considering this feature request. I believe it will greatly enhance the capabilities of smolagents.
The text was updated successfully, but these errors were encountered: