Skip to content

Latest commit

 

History

History
44 lines (33 loc) · 1.65 KB

softmax.md

File metadata and controls

44 lines (33 loc) · 1.65 KB

Q:When is softmax operation used in LLM and RAG?

A: Let me explain when softmax is used in Language Models (LLMs) and Retrieval-Augmented Generation (RAG):

Softmax in LLMs

  1. Output Layer

    • Most commonly used in the final layer of LLMs to convert raw logits into probability distributions over the vocabulary
    • Helps determine which token is most likely to come next in the sequence
    • Ensures all probabilities sum to 1.0
  2. Attention Mechanism

    • Used in the attention computation to create attention weights
    • Converts attention scores into probabilities that determine how much focus to put on different parts of the input sequence
    • Formula: attention_weights = softmax(QK^T/√d)

Softmax in RAG

  1. Retrieval Scoring

    • Used to normalize relevance scores when ranking retrieved documents
    • Converts raw similarity scores into probabilities for document selection
    • Helps in choosing the most relevant documents to include in the context
  2. Reranking

    • When multiple retrieval models are used, softmax can normalize scores across different scoring systems
    • Enables fair comparison between different retrieval methods

Example of Softmax in Python

import numpy as np

def softmax(x):
    exp_x = np.exp(x - np.max(x))  # Subtract max for numerical stability
    return exp_x / exp_x.sum()

# Example logits
logits = np.array([2.0, 1.0, 0.1])
probabilities = softmax(logits)
print(probabilities)  # Output: [0.59, 0.24, 0.17]

The key benefit of softmax in both LLMs and RAG is its ability to convert raw scores into interpretable probabilities while maintaining relative differences between values.