Skip to content

A small repo to experiment with Transformer (and more) architectures.

Notifications You must be signed in to change notification settings

Datta0/nanoformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NanoFormer

NanoFormer is a lightweight transformer model implementation designed for efficient training and inference. It features grouped query attention (GQA) and various architectural optimizations.

Features

  • Configurable transformer architecture with GQA support
  • Dynamic batch size handling with efficient padding
  • Mixed precision training (bfloat16)
  • Gradient checkpointing for memory efficiency
  • Gradient accumulation support
  • Wandb integration for experiment tracking
  • Automatic model checkpointing
  • Custom training loop with validation

Installation

git clone https://github.com/yourusername/nanoformer.git
cd nanoformer

Usage

Training

To train the model with default parameters:

python train.py \
    --dataset "imdatta0/wikipedia_en_sample" \
    --batch_size 8 \
    --gradient_accumulation_steps 16 \
    --num_epochs 1 \
    --lr 5e-4 \
    --hidden_dim 256 \
    --num_hidden_layers 8

To estimate the number of tokens in a dataset and the model's param count with given config: (will need to refactor this to not create the model for estimation)

python train.py \
    --dataset "imdatta0/wikipedia_en_sample" \
    --batch_size 8 \
    --gradient_accumulation_steps 16 \
    --num_epochs 1 \
    --lr 5e-4 \
    --hidden_dim 256 \
    --num_hidden_layers 8 \
    --estimate

TODO

  • Implement Differential Transformer
  • Implement nGPT
  • Implement custom optimisers like Shampoo, SOAP and whatnot
  • Add support for Sliding Window Attention
  • Modify configs to be closer to Chinchilla Optimal Ratios

About

A small repo to experiment with Transformer (and more) architectures.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published