Skip to content

A repository for awesome resources in mechanistic interpretability

Notifications You must be signed in to change notification settings

apartresearch/mechanisticinterpretability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Awesome Mechanistic Interpretability Awesome

A repository for awesome resources in mechanistic interpretability

Mechanistic Interpretability lists

Libraries

  • TransformerLens: A Library for Mechanistic Interpretability of Generative Language Models (Colab)
  • Unseal: Mechanistic Interpretability for Transformers
  • BertViz: BertViz is an interactive tool for visualizing attention in Transformer language models such as BERT, GPT2, or T5. It can be run inside a Jupyter or Colab notebook through a simple Python API that supports most Huggingface models. BertViz extends the Tensor2Tensor visualization tool by Llion Jones, providing multiple views that each offer a unique lens into the attention mechanism.

Tools

  • Lexoscope: 6 models with a page per neuron, displaying the top 20 maximum activating dataset examples.
  • exBert: Visual Analysis of Transformer Models (click through the safety popup)

Videos

Core readings

About

A repository for awesome resources in mechanistic interpretability

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published