OpenAI CLIP converted to Tensorflow 2/Keras
Official Repository: https://github.com/openai/CLIP
$ python convert_clip.py --help
USAGE: convert_clip.py [flags]
flags:
convert_clip.py:
--[no]all: Export all versions. (will use output location if image_output or
text_output are not present)
(default: 'false')
--image_output: Image encoder Keras SavedModel output destination (optional)
--model: <RN50|RN101|RN50x4|ViT-B/32>: CLIP model architecture to convert
(default: 'RN50')
--output: CLIP Keras SavedModel Output destination
(default: 'models/CLIP_{model}')
--text_output: Text encoder Keras SavedModel output destination (optional)
Example:
$ python convert_clip.py --model RN50 --output models/CLIP_{model}
Output:
Copying weights: 100%|██████████| 482/482 [00:00<00:00, 674.13it/s]
I0523 18:18:40.867926 4600192512 builder_impl.py:774] Assets written to: CLIP_RN50/assets
Model: "clip"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
visual (ModifiedResNet) multiple 38370144
_________________________________________________________________
transformer (Transformer) multiple 37828608
_________________________________________________________________
ln_final (LayerNorm) multiple 1024
=================================================================
Total params: 102,060,385
Trainable params: 102,007,137
Non-trainable params: 53,248
_________________________________________________________________
Classify image: https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
Text options: ['a diagram', 'a dog', 'a cat', 'a neural network']
Pytorch: [[0.24351287 0.00320374 0.00082513 0.7524583 ]]
Tensorflow: [[0.24351244 0.00320391 0.0008252 0.7524584 ]]
Process finished with exit code 0
Exporting standalone encoders:
Image encoder:
$ python convert_clip.py --model RN50 --image_output models/CLIP_image_{model}
Text encoder:
$ python convert_clip.py --model RN50 --text_output models/CLIP_image_{model}
- RN50
- RN101
- RN50x4
- RN50x16
- RN50x64
- ViT-B/32
- ViT-B/16
- ViT-L/14
- ViT-L/14@336px
- Convert PyTorch to Tensorflow model (RN)
- Export as Tensorflow SavedModel
- ViT conversion
- Export standalone image and text encoders
- Installable pip package
- Improve API: loading model, usage
- Float16 support
- Make PyTorch dependency optional (only for updating model from official weights)
- Implement training