A repository for awesome resources in mechanistic interpretability
- Neel Nanda's opinionated list of readings: A list of Neel's favourite papers on the topic
- The Interpretability Playground: A large resources list for safety-minded interpretability research
- AI Safety Ideas mechanistic interpretability research list: A list of research ideas in mechanistic interpretability
- TransformerLens: A Library for Mechanistic Interpretability of Generative Language Models (Colab)
- Unseal: Mechanistic Interpretability for Transformers
- BertViz: BertViz is an interactive tool for visualizing attention in Transformer language models such as BERT, GPT2, or T5. It can be run inside a Jupyter or Colab notebook through a simple Python API that supports most Huggingface models. BertViz extends the Tensor2Tensor visualization tool by Llion Jones, providing multiple views that each offer a unique lens into the attention mechanism.