You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Here are the key points:
The Problem: When LLMs generate text, they typically use either greedy decoding (always picking the most likely token) or temperature sampling. Current sampling methods often struggle to balance diversity with accuracy, especially for reasoning tasks.
The Innovation: The authors discovered that when LLMs generate tokens, the logits (pre-softmax scores) naturally separate into two regions:
A "noisy" region following a Gaussian distribution (background noise)
An "informative" region containing the actually relevant tokens
The Solution: Top-nσ works by:
Identifying the maximum logit value
Selecting tokens that are within n standard deviations (σ) of this maximum
Only sampling from these selected tokens
Using temperature to control sampling within this filtered set
Key Benefits:
Maintains consistent performance even at high temperatures, unlike other methods
Computationally efficient as it operates directly on logits
Outperforms both existing sampling methods and greedy decoding on reasoning tasks
Works particularly well for tasks requiring careful reasoning
Results: The method was tested on four reasoning-focused datasets and showed superior performance, especially at higher temperatures where other methods typically fail.
The paper essentially shows that by being more selective about which tokens to sample from based on their statistical properties, you can get better and more reliable results from language models, particularly for tasks that require careful reasoning.
Motivation
Looks to be the best sampler yet, and will be a clear differentiator for llama.cpp
Possible Implementation
See white paper: "Top-nσ Not All Logits Are You Need"
The text was updated successfully, but these errors were encountered:
Top-nσ shows very promising results in the paper! And it's cool to see a sampler maintain a stable sampling space even at high temperatures. I'm currently working on implementing this paper. However, since this sampling method isn't widely adopted and is still "in the early release phase" I'm not sure how likely it is to be accepted by llama.cpp maintainers. Regardless, I'll still put out an implementation for others to checkout.
Prerequisites
Feature Description
Here are the key points:
The Problem: When LLMs generate text, they typically use either greedy decoding (always picking the most likely token) or temperature sampling. Current sampling methods often struggle to balance diversity with accuracy, especially for reasoning tasks.
The Innovation: The authors discovered that when LLMs generate tokens, the logits (pre-softmax scores) naturally separate into two regions:
A "noisy" region following a Gaussian distribution (background noise)
An "informative" region containing the actually relevant tokens
The Solution: Top-nσ works by:
Identifying the maximum logit value
Selecting tokens that are within n standard deviations (σ) of this maximum
Only sampling from these selected tokens
Using temperature to control sampling within this filtered set
Key Benefits:
Maintains consistent performance even at high temperatures, unlike other methods
Computationally efficient as it operates directly on logits
Outperforms both existing sampling methods and greedy decoding on reasoning tasks
Works particularly well for tasks requiring careful reasoning
Results: The method was tested on four reasoning-focused datasets and showed superior performance, especially at higher temperatures where other methods typically fail.
The paper essentially shows that by being more selective about which tokens to sample from based on their statistical properties, you can get better and more reliable results from language models, particularly for tasks that require careful reasoning.
Motivation
Looks to be the best sampler yet, and will be a clear differentiator for llama.cpp
Possible Implementation
See white paper: "Top-nσ Not All Logits Are You Need"
The text was updated successfully, but these errors were encountered: