⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
-
Updated
Oct 8, 2024 - Python
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Large-scale LLM inference engine
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
scalable and robust tree-based speculative decoding algorithm
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
REST: Retrieval-Based Speculative Decoding, NAACL 2024
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
minimal C implementation of speculative decoding based on llama2.c
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
Dynasurge: Dynamic Tree Speculation for Prompt-Specific Decoding
A Faster Inference Implementation of Large Language Model
Verification of the effect of speculative decoding in Japanese.
Implementation of Speculative Sampling in "Accelerating Large Language Model Decoding with Speculative Sampling"
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Coupling without Communication and Drafter-Invariant Speculative Decoding
Add a description, image, and links to the speculative-decoding topic page so that developers can more easily learn about it.
To associate your repository with the speculative-decoding topic, visit your repo's landing page and select "manage topics."