Skip to content

"Scaling Down Semantic Leakage: Investigating Associative Bias in Smaller Language Models" project repository for Understanding LLMs course at the University of Tübingen

Notifications You must be signed in to change notification settings

smilni/semantic_leakage_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Down Semantic Leakage

Overview

This is the repository for "Scaling Down Semantic Leakage: Investigating Associative Bias in Smaller Language Models" project, completed by me as a part Understanding LLMs course at the University of Tübingen .

The project investigates the concept of semantic leakage in Qwen2.5 language models, particularly focusing on how color-related prompts influence generated outputs. Semantic leakage occurs when unintended associations within a language model lead to contextually unexpected or inappropriate outputs. For instance, prompts like "The dinner was served on pink plates. Today’s dish was..." might result in unexpected completions such as "rose petal soup."

You may find the full text of the project paper here.

Repository Structure

The repository consists of two folders. Please note that each folder features its own README file – you might want to take a look at them.

1. Color-related Leakage-Provoking Prompts Dataset and Generations Datasets

This folder contains datasets for color-related prompts and the model generations:

  • Color-related Prompt Dataset: A carefully constructed dataset expanding on the work of Gonen et al. (2024), including 720 test prompts across three categories and 40 control prompts. The prompts are designed to evaluate semantic leakage in various color-related contexts.

  • Generation Results: Outputs generated by Qwen2.5 models of varying sizes (0.5B, 1.5B, 3B, and 7B parameters) are stored in this folder. Each file contains the generations organized in the same format as the original dataset, with five additional columns for model responses. An additional subfolder includes initial generations based on the dataset from Gonen et al. (2024).

2. Notebooks for Obtaining Model Output and Leakage Analysis

This folder contains Jupyter notebooks designed for generating and analyzing language model responses to color-related prompts, with a focus on semantic leakage:

  • Prompting Qwen2.5 models: A notebook to configure and prompt transformer-based language models (Qwen2.5 family) using the color-related dataset. The notebook generates responses, resulting in datasets with multiple outputs for each prompt.

  • Evaluating Mean Leak-Rate of the models: This notebook calculates semantic leakage rates using similarity metrics such as BERTScore and SentenceBERT. The notebook produces Mean Leak-Rates across different categories and models, allowing for a cross-comparison of semantic leakage behavior.

References

  1. Gonen, H., Blevins, T., Liu, A., Zettlemoyer, L., & Smith, N. A. (2024). Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models. arXiv preprint arXiv:2408.06518.

About

"Scaling Down Semantic Leakage: Investigating Associative Bias in Smaller Language Models" project repository for Understanding LLMs course at the University of Tübingen

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published