Code to perform Zero Shot Segmentation using CLIP's Encoder and U-Net Decoder based on text-prompts trained on PhraseCutDataset
- Clone PhraseCut Dataset
- Download dataset with images using
python download_dataset.py
in PhraseCut Directory - Run Generator to split data
- Run Zero_Shot_Segmentation_Clip
- Model accepts a text prompt and an image as input
- CLIP is used to create an embedding of the text prompt and the input image
- U-Net decoder is trained on PhraseCut Dataset to produce a binary segmentation map
A Basic code to perform segmentation using ClipSeg can be found here