Skip to content

Commit

Permalink
Add notebook with instructions to host fondant-datacomp-small index
Browse files Browse the repository at this point in the history
  • Loading branch information
RobbeSneyders committed Mar 15, 2024
1 parent ee0931f commit fc6e9f0
Showing 1 changed file with 195 additions and 0 deletions.
195 changes: 195 additions & 0 deletions docs/datacomp12M_back.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d8728dbd-b542-4a4b-b3db-6cc8c9a18a36",
"metadata": {},
"source": [
"# Run clip-retrieval back with fondant-ai/datacomp-small-clip index"
]
},
{
"cell_type": "markdown",
"id": "5708ba35-9ed3-4492-96aa-cd07f31cb8c0",
"metadata": {},
"source": [
"### Create virtual environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0f5908fa-fe50-432a-8475-e926d4e566be",
"metadata": {},
"outputs": [],
"source": [
"!python3 -m venv .env\n",
"!source .env/bin/activate"
]
},
{
"cell_type": "markdown",
"id": "1812814f-6193-4d19-abd8-8a64c4c714d0",
"metadata": {},
"source": [
"## Download index and metadata"
]
},
{
"cell_type": "markdown",
"id": "b0893795-2670-4ea4-9035-83eb2709ebfc",
"metadata": {},
"source": [
"### Install requirements"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1cdfd076-f2a4-4809-aaf6-44868662e6ec",
"metadata": {},
"outputs": [],
"source": [
"!pip install dask[dataframe] huggingface_hub"
]
},
{
"cell_type": "markdown",
"id": "41bd6c70-fbec-4e83-93bd-d2887ab48e99",
"metadata": {},
"source": [
"### Create the index folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fec09d8d-3aa7-43ea-b356-db93ecd5384c",
"metadata": {},
"outputs": [],
"source": [
"!mkdir datacomp_small"
]
},
{
"cell_type": "markdown",
"id": "64117004-eb0e-4b07-a0f0-a7fffb9552d8",
"metadata": {},
"source": [
"### Download the index"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4806e99-805f-4d25-b69d-0799fee7bf76",
"metadata": {},
"outputs": [],
"source": [
"!wget -O datacomp_small/image.index https://huggingface.co/datasets/fondant-ai/datacomp-small-clip/resolve/main/faiss?download=true -q --show-progress"
]
},
{
"cell_type": "markdown",
"id": "429ea232-95da-4c48-898f-3170a1bc74e4",
"metadata": {},
"source": [
"### Download the metadata"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "61d44529-5609-4612-852c-d18cf4560075",
"metadata": {},
"outputs": [],
"source": [
"import dask.dataframe as dd\n",
"from dask.diagnostics import ProgressBar\n",
"\n",
"ddf = dd.read_parquet(\"hf://datasets/fondant-ai/datacomp-small-clip/id_mapping\")\n",
"ddf = ddf.rename(columns={\"image_path\": \"url\"})\n",
"ddf = ddf.repartition(npartitions=1)\n",
"\n",
"with ProgressBar():\n",
" ddf.to_parquet(\"datacomp_small/metadata\")"
]
},
{
"cell_type": "markdown",
"id": "872de5af-5c66-4249-a5db-aa028f5bca58",
"metadata": {},
"source": [
"## Run clip-retrieval backend"
]
},
{
"cell_type": "markdown",
"id": "302b6927-45b8-4977-926e-8d3f709d6e60",
"metadata": {},
"source": [
"### Install requirements"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4d1729f-4812-4a7d-9945-16615ecbd51f",
"metadata": {},
"outputs": [],
"source": [
"!pip install clip-retrieval"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "580cc9e6-1a2b-4432-98ac-c1e20664bfe1",
"metadata": {},
"outputs": [],
"source": [
"%%writefile indices.json\n",
"{\n",
" \"fondant_datacomp_small\": {\n",
" \"indice_folder\": \"datacomp_small\",\n",
" \"columns_to_return\": [\"url\"],\n",
" \"clip_model\": \"open_clip:ViT-B-32/laion2b_s34b_b79k\",\n",
" \"enable_mclip_option\": false,\n",
" \"provide_aesthetic_embeddings\": false\n",
" }\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9fbf3533-3864-4abb-86a0-d5816e4deb14",
"metadata": {},
"outputs": [],
"source": [
"!clip-retrieval back --port 1234 --indices-paths indices.json --clip_model open_clip:ViT-B-32/laion2b_s34b_b79k"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit fc6e9f0

Please sign in to comment.