- Add gradient checkpointing, FullyShardedDataParallel
- Model releases
- (CLIP ViT-L-14 / MPT-1B)
- (CLIP ViT-L-14 / MPT-1B Dolly)
- (CLIP ViT-L-14 / RedPajama-3B)
- (CLIP ViT-L-14 / RedPajama-3B Instruct)
- (CLIP ViT-L-14 / MPT-7B)
- Remove color jitter when training
- Fix cross-attention bug when calling generate()
- Initial code release
- Early model release (CLIP ViT-L-14 / LLaMA-7B)