Implementation of Anthropic's A Mathematical Framework for Transformer Circuits paper
https://transformer-circuits.pub/2021/framework/index.html
https://slavachalnev.github.io/circuits/analysis/potter.html
circuits
|-- analysis Reproduce graphs and tables
| |-- assets Images of plots
| | |-- head_11_pos.png Positional attention head
| | |-- induction_heads.png QKV composition
| | |-- one_layer_eigen.png Eigenvalue plot
| | |-- two_layer_eigen.png Eigenvalue positivity
| |-- tests Test analysis tools
| | |-- tests.py Test weight extractor
| |-- one_layer.md One layer results
| |-- one_layer.py One layer circuits, \
| | skip-trigrams, eigenvalues
| |-- potter.html Harry Potter analysis
| |-- two_layer.md Two layer results
| |-- two_layer.py q, k, v composition,
| | attention visualisation
| |-- utils.py Tools to extract head weights
|-- circuits Trains attn-only models
| |-- models Define models
| | |-- model.py Base attn-only model
| | |-- one_attn_layer.py One layer transformer
| | |-- two_attn_layer.py Two layer transformer
| |-- train Train and data prep
| | |-- openwebtext.py Prep webtext dataset
| | |-- train_one_layer.py One layer training script
| | |-- train_two_layer.py Two layer training script
| | |-- trainer.py Training loop
| | |-- utils.py Setup logging and seed.
|-- .gitignore
|-- LICENCE MIT license
|-- README.md
|-- requirements.txt
- Tokenizer: I use the GPT-2 tokenizer. I don't know what tokenizer the paper used but it's definitely different.
- Dataset: I use the OpenWebText dataset whereas the paper uses a mix of "Common Crawl data and internet books, along with a number of smaller distributions, including about 10% python code data"
- Complete attention heads dump (OV/QK values for every token)
- Term importance analysis
- Publish checkpoints