Skip to content

Nabeegh-Ahmed/llm-distillation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Distillation

This repo provides a framework that can be used to seemlessly integrate LLM Distillation in your existing LLM pipelines. LLM distillation is the process of using small, less costly models in parallel to larger models to save on costs. Here is a great resource that provides in depth detail of the process.

How to use

As an example, the repo contains openai_distillation.py that uses this framework to distill OpenAI models. Here's how you can do it yourrself:

  • First, add your API keys inside /env. This repo uses Qdrant as an embeddings store
  • Extend the LLMProvider inside llm_services.py to work with your LLM
  • Extend the DatasetProvider inside dataset_services.py to format the collected data according to your LLM
  • Finally, extend the FineTuningProvider inside llm_services.py to create relevant fine tuning jobs

How it works

The Data Collection Part

  • Each text generation request, and the generated response is collected and stored in an embeddings store
  • A tag of indexed: False is added to each newly created embedding

The Fine Tuning Part

  • Inside a scheduled job, each embedding with tag indexed: False is collected
  • The data is formatter and is used to fine tune a small model
  • Each embedding's tag is updated to indexed: True

The Distillation Part

  • On each request, an AI request router is used
  • For the user's query, we search it inside the embeddings store
  • If an entry above a certain similarity threshold and indexed: True is found, the request is routed to the distilled model
  • Else, the request fallbacks to a main model (OpenAI, Mixtral...) and the data is collected to be fine tuned later
Distillation architecture as proposed by Recursal.ai

Acknowledgements

This technique was originally proposed by Recursal.ai in their 🏘️ Run over 120+ NPCs, in a tiny AI town with RWKV. This repo extracts, extends, and generalizes their approach to be used in production.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages