-
Notifications
You must be signed in to change notification settings - Fork 211
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add notebook with instructions to host fondant-datacomp-small index
- Loading branch information
1 parent
ee0931f
commit fc6e9f0
Showing
1 changed file
with
195 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d8728dbd-b542-4a4b-b3db-6cc8c9a18a36", | ||
"metadata": {}, | ||
"source": [ | ||
"# Run clip-retrieval back with fondant-ai/datacomp-small-clip index" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "5708ba35-9ed3-4492-96aa-cd07f31cb8c0", | ||
"metadata": {}, | ||
"source": [ | ||
"### Create virtual environment" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "0f5908fa-fe50-432a-8475-e926d4e566be", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!python3 -m venv .env\n", | ||
"!source .env/bin/activate" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1812814f-6193-4d19-abd8-8a64c4c714d0", | ||
"metadata": {}, | ||
"source": [ | ||
"## Download index and metadata" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b0893795-2670-4ea4-9035-83eb2709ebfc", | ||
"metadata": {}, | ||
"source": [ | ||
"### Install requirements" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "1cdfd076-f2a4-4809-aaf6-44868662e6ec", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install dask[dataframe] huggingface_hub" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "41bd6c70-fbec-4e83-93bd-d2887ab48e99", | ||
"metadata": {}, | ||
"source": [ | ||
"### Create the index folder" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "fec09d8d-3aa7-43ea-b356-db93ecd5384c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!mkdir datacomp_small" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "64117004-eb0e-4b07-a0f0-a7fffb9552d8", | ||
"metadata": {}, | ||
"source": [ | ||
"### Download the index" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b4806e99-805f-4d25-b69d-0799fee7bf76", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!wget -O datacomp_small/image.index https://huggingface.co/datasets/fondant-ai/datacomp-small-clip/resolve/main/faiss?download=true -q --show-progress" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "429ea232-95da-4c48-898f-3170a1bc74e4", | ||
"metadata": {}, | ||
"source": [ | ||
"### Download the metadata" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "61d44529-5609-4612-852c-d18cf4560075", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import dask.dataframe as dd\n", | ||
"from dask.diagnostics import ProgressBar\n", | ||
"\n", | ||
"ddf = dd.read_parquet(\"hf://datasets/fondant-ai/datacomp-small-clip/id_mapping\")\n", | ||
"ddf = ddf.rename(columns={\"image_path\": \"url\"})\n", | ||
"ddf = ddf.repartition(npartitions=1)\n", | ||
"\n", | ||
"with ProgressBar():\n", | ||
" ddf.to_parquet(\"datacomp_small/metadata\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "872de5af-5c66-4249-a5db-aa028f5bca58", | ||
"metadata": {}, | ||
"source": [ | ||
"## Run clip-retrieval backend" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "302b6927-45b8-4977-926e-8d3f709d6e60", | ||
"metadata": {}, | ||
"source": [ | ||
"### Install requirements" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b4d1729f-4812-4a7d-9945-16615ecbd51f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install clip-retrieval" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "580cc9e6-1a2b-4432-98ac-c1e20664bfe1", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%writefile indices.json\n", | ||
"{\n", | ||
" \"fondant_datacomp_small\": {\n", | ||
" \"indice_folder\": \"datacomp_small\",\n", | ||
" \"columns_to_return\": [\"url\"],\n", | ||
" \"clip_model\": \"open_clip:ViT-B-32/laion2b_s34b_b79k\",\n", | ||
" \"enable_mclip_option\": false,\n", | ||
" \"provide_aesthetic_embeddings\": false\n", | ||
" }\n", | ||
"}" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "9fbf3533-3864-4abb-86a0-d5816e4deb14", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!clip-retrieval back --port 1234 --indices-paths indices.json --clip_model open_clip:ViT-B-32/laion2b_s34b_b79k" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |