Made in Vancouver, Canada by Picovoice
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language models. picoLLM Inference Engine is:
- Accurate; picoLLM Compression improves GPTQ by significant margins
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
- Free for open-weight models
- Android 5.0 (SDK 21+)
picoLLM Inference Engine supports the following open-weight models. The models are available for download on the Picovoice Console.
- Gemma
gemma-2b
gemma-2b-it
gemma-7b
gemma-7b-it
- Llama-2
llama-2-7b
llama-2-7b-chat
llama-2-13b
llama-2-13b-chat
llama-2-70b
llama-2-70b-chat
- Llama-3
llama-3-8b
llama-3-8b-instruct
llama-3-70b
llama-3-70b-instruct
- Llama-3.2
llama3.2-1b-instruct
llama3.2-3b-instruct
- Mistral
mistral-7b-v0.1
mistral-7b-instruct-v0.1
mistral-7b-instruct-v0.2
- Mixtral
mixtral-8x7b-v0.1
mixtral-8x7b-instruct-v0.1
- Phi-2
phi2
AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. You must keep your AccessKey secret. You will need internet connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100% offline and completely free for open-weight models. Everyone who signs up for Picovoice Console receives a unique AccessKey.
Download your desired model file from the Picovoice Console. If you do not download the file directly from your Android device, you will need to upload it to the device to use it with the demos. To upload the model, connect your Android device to your computer via USB or launch a simulator. Ensure it's recognized by the Android Debug Bridge (ADB) using the adb devices
command. If it is not recognized, you may have to enable USB debugging.
Use ADB to upload the file to the Downloads
directory of your device:
adb push model.pllm /sdcard/Downloads
There are two demos available: completion and chat. The completion demo accepts a prompt and a set of optional
parameters and generates a single completion. It can run all models, whether instruction-tuned or not. The chat demo can
run instruction-tuned (chat) models such as llama-3-8b-instruct
, phi2
, etc. The chat demo enables a back-and-forth
conversation with the LLM, similar to ChatGPT.
-
Replace
"${YOUR_ACCESS_KEY_HERE}"
in MainActivity.java with your AccessKey obtained from Picovoice Console. -
Open the Completion Demo in Android studio - build and run the project.
-
Press the
Load Model
button and load the model file from your external storage. If you can't see your model file in the file navigator, ensure theLarge files
toggle at the top is enabled. -
Enter a prompt that you want a completion for, e.g. "roses are red".
-
Experiment with the optional parameters by pressing the menu button on the top left.
-
Replace
"${YOUR_ACCESS_KEY_HERE}"
in MainActivity.java with your AccessKey obtained from Picovoice Console. -
Open the Chat Demo in Android studio - build and run the project.
-
Press the
Load Model
button and load the model file from your external storage. If you can't see your model file in the file navigator, ensure theLarge files
toggle at the top is enabled. -
Chat back and forth with the LLM using the text box at the bottom.
-
Use the clear button in the lower right hand of the text box to reset the chat and start a new one.