Skip to content

Commit

Permalink
Merge pull request #279 from PrashantDixit0/main
Browse files Browse the repository at this point in the history
restructuring: recommendation sys notebooks
  • Loading branch information
PrashantDixit0 authored Dec 28, 2024
2 parents 5771038 + ef125f1 commit eb5a4fd
Show file tree
Hide file tree
Showing 8 changed files with 1,313 additions and 1,263 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ Create a recommender system application with LanceDB for efficient vector-based
| [Movie Recommender](/examples/movie-recommender/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/movie-recommender/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/movie-recommender/main.py) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)| |
| [Product Recommender](./examples/product-recommender/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/product-recommender/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/product-recommender/main.py) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)| |
| [Arxiv paper recommender](/examples/arxiv-recommender) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/arxiv-recommender/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/arxiv-recommender/main.py) [![LLM](https://img.shields.io/badge/local-llm-green)](#) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)| |
| [Music Recommender](/applications/Music_Recommandation/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/applications/Music_Recommandation/lancedb_music_recommandation.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)| |||||
| [Music Recommender](/applications/Music_Recommendation/) | [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./applications/Music_Recommendation/app_music.py) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)| |||||
||||

### Concepts
Expand Down
File renamed without changes.
2,444 changes: 1,218 additions & 1,226 deletions examples/arxiv-recommender/main.ipynb

Large diffs are not rendered by default.

49 changes: 40 additions & 9 deletions examples/movie-recommender/main.ipynb

Large diffs are not rendered by default.

81 changes: 54 additions & 27 deletions examples/product-recommender/main.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,39 @@
"source": [
"# Product Recommender using Collaborative Filtering and LanceDB\n",
"\n",
"We are going to use **LanceDB** and **Collaborative Filtering** to recommend products based on a user's past buying history. We used the <a href=\"https://www.kaggle.com/datasets/yasserh/instacart-online-grocery-basket-analysis-dataset\">**Instacart dataset**</a> as our data for this example.\n",
"Collaborative filtering is a method to recommend movies by analyzing user preferences. It works by finding patterns in what users like. For example:\n",
"\n",
"1. **User-based filtering**: If two users have similar tastes, movies liked by one can be suggested to the other.\n",
"\n",
"2. **Item-based filtering**: If two movies are often liked together, recommending one suggests the other.\n",
"\n",
"This approach uses past data, like movie ratings, to predict what someone might enjoy.\n",
"\n",
"\n",
"![picture](https://daxg39y63pxwu.cloudfront.net/images/blog/product-recommendation-system-projects/Product_Recommendation_System_Project_Ideas_and_Examples.png)"
]
},
{
"cell_type": "markdown",
"source": [
"In this example, we’ll use **LanceDB** and **Collaborative Filtering** to recommend products based on a user's purchase history. The data comes from the <a href=\"https://www.kaggle.com/datasets/yasserh/instacart-online-grocery-basket-analysis-dataset\">**Instacart dataset**</a>."
],
"metadata": {
"id": "aExXL-0eFCC8"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "lXd46ecEt5G7"
},
"source": [
"To downloading dataset in this example, you must have a Kaggle account.\n",
"## Download dataset from Kaggle\n",
"\n",
"To downloading dataset in this example, you must have a Kaggle account.\n",
"To get the Kaggle API credentials,\n",
"\n",
"Go to the Your Profile -> Settings -> Create Token\n",
"Go to the Your ***Profile -> Settings -> Create Token***\n",
"\n",
"This will download `kaggle.json`, a file containing your API credentials.\n",
"\n",
Expand All @@ -33,6 +50,7 @@
{
"cell_type": "code",
"source": [
"# install and copy credentials for downloading dataset\n",
"! pip install kaggle\n",
"! mkdir ~/.kaggle\n",
"! cp kaggle.json ~/.kaggle/\n",
Expand All @@ -45,7 +63,7 @@
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 1,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -79,7 +97,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
Expand Down Expand Up @@ -167,7 +185,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {
"id": "emp_MSXZt5G8"
},
Expand Down Expand Up @@ -197,7 +215,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
Expand Down Expand Up @@ -231,7 +249,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {
"id": "f3g296nL4_zZ"
},
Expand Down Expand Up @@ -261,7 +279,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {
"id": "cBbbR7Rut5G_"
},
Expand Down Expand Up @@ -294,7 +312,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {
"id": "ZjRh7RYpt5HB"
},
Expand Down Expand Up @@ -325,7 +343,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
Expand Down Expand Up @@ -649,7 +667,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {
"id": "v2_2R7zmt5HE"
},
Expand Down Expand Up @@ -693,14 +711,14 @@
"id": "JDwIxGMnqNx8"
},
"source": [
"# Difference between colloborative and content filtering\n",
"# Difference between Collaborative and Content-based filtering\n",
"\n",
"![picture](https://miro.medium.com/v2/resize:fit:1400/0*R8qw_CXxCc4600bQ.png)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
Expand Down Expand Up @@ -776,7 +794,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
Expand Down Expand Up @@ -845,7 +863,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
Expand Down Expand Up @@ -922,7 +940,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
Expand Down Expand Up @@ -1037,13 +1055,13 @@
"id": "38rssdYCBR4E"
},
"source": [
"## Let's save the data and create a empty LanceDB Table using a Pydantic model.\n",
"### Let's save the data and create a empty LanceDB Table using a Pydantic model\n",
"A Table is designed to store large numbers of columns and huge quantities of data! For those interested, a LanceDB is columnar-based, and uses Lance, an open data format to store data."
]
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {
"id": "3_ykVLT6t5HH"
},
Expand All @@ -1054,7 +1072,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {
"id": "ufHsF0o4t5HI"
},
Expand Down Expand Up @@ -1082,7 +1100,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {
"id": "NOOPF9zOt5HJ"
},
Expand All @@ -1104,12 +1122,12 @@
"id": "j3aU4z-tSbWE"
},
"source": [
"## Let's create an ANN index in order to speed up retrieval. This might take a while."
"### Let's create an ANN index in order to speed up retrieval. This might take a while."
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {
"id": "H8HyvjCFSeaz"
},
Expand All @@ -1130,7 +1148,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {
"id": "Uzgk5Od0t5HK"
},
Expand Down Expand Up @@ -1164,7 +1182,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {
"id": "Wwl7yFKTt5HK"
},
Expand All @@ -1180,12 +1198,12 @@
"id": "wTh61ou3t5HL"
},
"source": [
"## Let's now query LanceDB to retrieve recommendations."
"### Let's now query LanceDB to retrieve recommendations."
]
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
Expand Down Expand Up @@ -2408,6 +2426,15 @@
" display(results)\n",
" display(products_bought_by_user_in_the_past(id, top=15))"
]
},
{
"cell_type": "markdown",
"source": [
"## Tada!! your first Product recommendation system is live"
],
"metadata": {
"id": "SMjD6-nIFt9F"
}
}
],
"metadata": {
Expand Down

0 comments on commit eb5a4fd

Please sign in to comment.