Skip to content
/ rama Public

Rust-Llama. How simple can we make llama with reasonable performance?

Notifications You must be signed in to change notification settings

semtexzv/rama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rama - (Rust-Llama)

Experimental project for local inference. This project is intended as a testbed for improving on-device inference performance Llama.cpp is a huge codebase, with many features. Rama has one implementation running in fp32.

Current performance on M3 pro for Llama 2 7B - (oct 2024) - 0.2 tok/s.

About

Rust-Llama. How simple can we make llama with reasonable performance?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published