Scaling Down Semantic Leakage

Overview

This is the repository for "Scaling Down Semantic Leakage: Investigating Associative Bias in Smaller Language Models" project, completed by me as a part Understanding LLMs course at the University of Tübingen .

The project investigates the concept of semantic leakage in Qwen2.5 language models, particularly focusing on how color-related prompts influence generated outputs. Semantic leakage occurs when unintended associations within a language model lead to contextually unexpected or inappropriate outputs. For instance, prompts like "The dinner was served on pink plates. Today’s dish was..." might result in unexpected completions such as "rose petal soup."

You may find the full text of the project paper here.

Repository Structure

The repository consists of two folders. Please note that each folder features its own README file – you might want to take a look at them.

1. Color-related Leakage-Provoking Prompts Dataset and Generations Datasets

This folder contains datasets for color-related prompts and the model generations:

Color-related Prompt Dataset: A carefully constructed dataset expanding on the work of Gonen et al. (2024), including 720 test prompts across three categories and 40 control prompts. The prompts are designed to evaluate semantic leakage in various color-related contexts.
Generation Results: Outputs generated by Qwen2.5 models of varying sizes (0.5B, 1.5B, 3B, and 7B parameters) are stored in this folder. Each file contains the generations organized in the same format as the original dataset, with five additional columns for model responses. An additional subfolder includes initial generations based on the dataset from Gonen et al. (2024).

2. Notebooks for Obtaining Model Output and Leakage Analysis

This folder contains Jupyter notebooks designed for generating and analyzing language model responses to color-related prompts, with a focus on semantic leakage:

Prompting Qwen2.5 models: A notebook to configure and prompt transformer-based language models (Qwen2.5 family) using the color-related dataset. The notebook generates responses, resulting in datasets with multiple outputs for each prompt.
Evaluating Mean Leak-Rate of the models: This notebook calculates semantic leakage rates using similarity metrics such as BERTScore and SentenceBERT. The notebook produces Mean Leak-Rates across different categories and models, allowing for a cross-comparison of semantic leakage behavior.

References

Gonen, H., Blevins, T., Liu, A., Zettlemoyer, L., & Smith, N. A. (2024). Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models. arXiv preprint arXiv:2408.06518.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
notebooks		notebooks
README.md		README.md
semantic_leakage.pdf		semantic_leakage.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scaling Down Semantic Leakage

Overview

Repository Structure

1. Color-related Leakage-Provoking Prompts Dataset and Generations Datasets

2. Notebooks for Obtaining Model Output and Leakage Analysis

References

About

Releases

Packages

Languages

smilni/semantic_leakage_project

Folders and files

Latest commit

History

Repository files navigation

Scaling Down Semantic Leakage

Overview

Repository Structure

1. Color-related Leakage-Provoking Prompts Dataset and Generations Datasets

2. Notebooks for Obtaining Model Output and Leakage Analysis

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages