Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add multi-GPU support for TransformersModel
**Summary**: Previously, in the `TransformersModel` class of smolagents, the model was allocated to a device using `.to(device)`, which limited usage to only one CUDA card. This commit addresses that limitation by introducing the option to use `device_map='auto'` for better utilization of multiple GPUs. **Problem Addressed**: - **Limited GPU Utilization**: smolagents could only use a single CUDA device before, restricting performance for large models that could benefit from parallel processing across multiple GPUs. For instance, running large language models for text generation tasks was slower than it could be with multiple GPUs. - **Lack of Scalability**: As model sizes and task complexity grew, the existing `.to(device)` method didn't offer the necessary scalability. Additionally, an `OutOfMemoryError` was encountered when working with a single GPU (cuda:0) as memory was exhausted despite available additional CUDA resources. **Solution Implemented**: - **Added `device_map='auto'` Option**: In the `TransformersModel` class during model initialization, the code now allows for using `device_map='auto'` instead of just `.to(device)`. This enables the model to automatically distribute across available GPUs, as demonstrated by the updated code snippet: ```python try: self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") except Exception as e: # Handle the exception as before pass ``` - **User Configuration**: Created a way for users to configure this option easily, either through a configuration file or an additional parameter when initializing relevant smolagents classes. **Benefits**: - **Performance Improvement**: Multiple GPU usage will enhance the inference speed of smolagents' models, resulting in faster response times for computationally intensive tasks. - **Scalability**: Makes smolagents more suitable for large-scale projects and research where scaling computing resources is vital. Closes huggingface#117
- Loading branch information