likenneth

Kenneth Li likenneth

Achievements

othello_world othello_world Public

Emergent world representations: Exploring a sequence model trained on a synthetic task

Jupyter Notebook 168 40
honest_llama honest_llama Public

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Python 464 37
persona_drift persona_drift Public

Measuring and Controlling Persona Drift in Language Model Dialogs

Python 12 3
q_probe q_probe Public

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Jupyter Notebook 37 1
dialogue_action_token dialogue_action_token Public

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

Python 15 1