This directory contains sample Jupyter Notebooks demonstrating tensor parallel inference for various PyTorch large language models (LLMs) on AWS Inferentia (Inf2) instances) and AWS Trainium (Trn1) instances.
For additional information on these training scripts, please refer to the tutorials found in the official Inferentia and Trainium documentation.
The following samples are available for LLM tensor parallel inference:
Name | Instance type |
---|---|
facebook/opt-13b | Inf2 & Trn1 |
facebook/opt-30b | Inf2 & Trn1 |
facebook/opt-66b | Inf2 |
meta-llama/Llama-2-13b | Inf2 & Trn1 |