- Support get_tokenizer in clip back inf
- Update more deps (fire, pyarrow, pandas, torch)
- Update deps
- Update scipy requirement from <1.9.2 to <1.11.5
- catch and skip images that fail to load (thanks @heyalexchoi)
- Handle images in multiple folder for files reader and handle uppercase extension (thanks @BIGBALLON)
- Add support for the full open clip model name format : ViT-B-32/laion2b_s34b_b79k (thanks @mehdidc @barinov274)
- Add DeepSparse backend for CLIP inference (thanks @mgoin)
- fix parquet to arrow script failed when number of samples is small (thanks @luke-han)
- Integration with hugging face ClipModel (thanks @Sofianel5)
- Add webp to list of supported files in reader.
- Remove version constraint of fsspec.
- Update versions to fix pex and npm build
- Improve errors for empty input folders.
- Default context to fix bug with some requests returning 404
- Fix truncate
- Make jit=False the default in clip inference
- update webdataset and fsspec
- Add H14 NSFW detector
- Support get tokenizer in clip back (thanks @nousr)
- enable filtering by image with clip-retrieval filter
- update key toggles in inf.main (thanks @nousr)
- Slurm distributor (thanks @nousr)
- Autocast for openclip
- support openclip in clip back
- Read image data from path in case "image_path" is present
- Makes file image reader in clip inference fast
- Make it possible to use an embedding as query of the back
- add clip-client module for querying backend remotely (thanks @afiaka87 )
- use better mclip from https://github.com/FreddeFrallan/Multilingual-CLIP
- add clearer way to disable aesthetic scoring in front
- aesthetic option
- Log error for unsupported input_format (thanks @dmvaldman)
- Add open_clip support (thanks @cat-state)
- fix mclip in clip back
- add violence detector to clip back
- add feature to pass options in config file
- safety model for ViT-B/32
- replace safety heuristic by safety model
- enable back dedup of images
- turn off image dedup by default temporarily
- fix range search use
- add back node build in publish
- new arrow provider in clip back
- index combiner script
- parquet to arrow script
- deduplication of results feature
- one more fix for text only
- fix image_tensor_count vs text_counter count in runner
- fix file count check for input format files
- going back to autofaiss main
- switch to fork of autofaiss
- properly close the wandb run at the end
- fix pex building
- fix version ranges
- fix sample_count == 0 issue in logger and handle no text sample properly in main
- improve logger by checking the file exists before reading
- use zero padding for output file names
- add proper multi gpu support in pyspark distributor
- improve printing of error in logger
- fix another small issue with logger reporting
- small fix in logger computation
- Fix race condition when using mkdir in writer
- Refactor clip inference, make it support distributed inference
- add use_jit option to back and inference, now True by default, add clip_model option to back
- mclip support in clip back and front
- replace null bytes while transforming parquet to hdf5
- Use collate_fn to skip corrupt images without using recursion (thanks @afiaka87)
- truncate text inputs in clip back
- fix url column option bug
- add url column option
- use torch no grad to fix a memleak in clip back
- add default backend url in clip back
- add option in clip end 2 end to avoid running the back
- update for autofaiss
- add missing front building in python publish
- clip retrieval end2end
- minor bug fix about missing .npy extension in output of clip inference
- mclip support
- use fsspec to make it possible to output to any fs
- add indice deduplication in the output of clip back
- use the npy mapping in all cases for ivf reordering since it's fast enough
- save ivf_old_to_new_mapping for the text index to use
- implement ivf re-ordering for much faster metadata fetching
- add download button in front
- fix filterDuplicateUrls issue when there is no url, only images
- fix default columns_to_return
- add a simple filter ipynb notebook
- implement infinite scroll feature
- fix limiting of results in clip back
- fix absence of caption in clip front
- fix an issue in clip front handling of default
- limit the number of results to the number available in clip back
- add compression by default when creating the hdf5 cache file
- add columns_to_return in clip back
- safe mode in front
- fix metrics sorting in metrics summary
- add download url time and descriptions in metrics summary endpoint
- add prometheus endpoint in clip back
- properly display errors in clip index
- add nb cores option in clip index
- add folder name option and catch errors in clip index
- package front in npm
- implement image url search in clip back
- add memory mapping option in clip back : 0 memory usage to load an index!
- add copy metadata option to clip index
- allows controlling the amount of ram used during the creation process of the index
- add logs in clip back to inform when each thing is loaded
- fix PIL call (thanks @pvl)
- expose max_index_memory_usage
- --wds_image_key, --wds_caption_key options (thanks @afiaka87)
- implement h5py caching in clip back
- fix clip back and filter to use sorted metadatas
- fix finding the last batch number (continuing output)
- add warn and continue handler to avoid crashing
- add missing webdataset dep
- webdataset input format
- save in batch
- test files in tests folder
- save metadata as parquet
- use autofaiss in a new clip index
- remove indexing from clip batch and rename to clip inference
- fixes
- it works