This repository is the official implementation of our paper "Neuron-based Personality Trait Induction in Large Language Models".
To better identify personality-related neurons, we first constructed the PersonalityBench dataset, comprising 180,000 open-ended questions tailored to capture distinct personality traits based on Big Five personality theory. Specifically, we utilize the description from IPIP-NEO-300 questionnaire and common real-world topics introduced in UltraChat to generate the situational questions in PersonalityBench. The dataset is shown in NPTI/datset
.
Within a given layer, the FFN module can be expressed as:
where NPTI/neuron_results
directory. Each neuron is represented as a tuple, where each element corresponds to the following in order: (layer number, neuron index, absolute value of activation probability difference, activation probability, 95th percentile value).
To find personality-related neurons in LLaMA-8B-Instruct, you can excute:
bash NPTI/code/search_neuron.sh
To modify certain personality trait of LLM using NPTI, you can excute:
bash NPTI/code/answer_question_change_neuron.sh
To use ChatGPT to automatically assess the degree of expression of a specific personality trait and the fluency of each response, you can excute:
python NPTI/code/gpt4_score.py