diff --git a/README.md b/README.md index be5e72d..f8b1e2b 100644 --- a/README.md +++ b/README.md @@ -191,6 +191,7 @@ These are ready to use applications built using LanceDB serverless vector databa |-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------| | [Writing assistant](https://github.com/lancedb/vectordb-recipes/tree/main/applications/node/lanchain_writing_assistant) | Writing assistant app using lanchain.js with LanceDB, allows you to get real time relevant suggestions and facts based on you written text to help you with your writing. | ![Writing assistant](https://github.com/user-attachments/assets/87354e93-df4d-40ad-922b-abcbb62d667c) | | [Sentence auto complete](https://github.com/lancedb/vectordb-recipes/tree/main/applications/node/sentance_auto_complete) | Sentance auto complete app using lanchain.js with LanceDB, allows you to get real time relevant auto complete suggestions and facts based on you written text to help you with your writing.You can also upload your data source in the form of a pdf file.You can switch between gpt models to get faster results. | ![Sentance auto complete](https://github.com/lancedb/assets/blob/main/recipes/sentance_Auto_complete.gif) | +| [Article Recommendation](https://github.com/lancedb/vectordb-recipes/tree/main/applications/node/article_recommender) | Article Recommender: Explore vast data set of articles with Instant, Context-Aware Suggestions. Leveraging Advanced NLP, Vector Search, and Customizable Datasets, Our App Delivers Real-Time, Precise Article Recommendations. Perfect for Research, Content Curation, and Staying Informed. Unlock Smarter Insights with State-of-the-Art Technology in Content Retrieval and Discovery!". | ![Article Recommendation](https://github.com/lancedb/assets/blob/main/recipes/article_recommendation_engine.gif) | |||| | Project Name | Description | Screenshot | diff --git a/applications/node/article_recommender/.gitignore b/applications/node/article_recommender/.gitignore new file mode 100644 index 0000000..a547bf3 --- /dev/null +++ b/applications/node/article_recommender/.gitignore @@ -0,0 +1,24 @@ +# Logs +logs +*.log +npm-debug.log* +yarn-debug.log* +yarn-error.log* +pnpm-debug.log* +lerna-debug.log* + +node_modules +dist +dist-ssr +*.local + +# Editor directories and files +.vscode/* +!.vscode/extensions.json +.idea +.DS_Store +*.suo +*.ntvs* +*.njsproj +*.sln +*.sw? diff --git a/applications/node/article_recommender/README.md b/applications/node/article_recommender/README.md new file mode 100644 index 0000000..2cb497e --- /dev/null +++ b/applications/node/article_recommender/README.md @@ -0,0 +1,161 @@ +**AI-Powered Article Recommendation System** +============================================ + +An advanced **AI-driven article recommendation engine** designed to process and retrieve **relevant articles** from a vast dataset of over **2 million articles**. This tool provides real-time, **context-aware article suggestions** by leveraging advanced **vector search** and **natural language processing (NLP)** technologies. + +**Demo** +-------- + +![Real-Time Autocomplete Demo](https://github.com/lancedb/assets/blob/main/recipes/article_recommendation_engine.gif) + + +* * * * * + +**Features** +------------ + +- 🔍 **Keyword-Based Search**: Input any keyword or phrase, and get **top 10 relevant articles** instantly. +- 🌐 **Massive Dataset Support**: Efficiently processes and retrieves results from a **dataset of over 2 million articles**. +- 📈 **High Precision Recommendations**: Articles are ranked based on semantic similarity and relevance using state-of-the-art embeddings. +- 🧠 **AI-Powered Relevance**: Built with **LangChain.js** and **LanceDB** for robust NLP and vector search capabilities. + +* * * * * + +**How It Works** +---------------- + +1. **Data Preprocessing**: Articles are divided into smaller, context-preserving chunks using **RecursiveCharacterTextSplitter**.\ + Example configuration: + + `const splitter = new RecursiveCharacterTextSplitter({ + chunkSize: 25000, // Adjust chunk size for optimal performance + chunkOverlap: 1, // Ensure overlap for context continuity + });` + +2. **Vector Embedding**: The preprocessed data is embedded using **OpenAIEmbeddings**. +3. **Efficient Storage**: Embedded vectors are stored in **LanceDB**, optimized for high-speed similarity search. +4. **Query and Retrieval**: User input is matched against the dataset to retrieve **top 10 semantically similar articles**. + +* * * * * + +**Technical Highlights** +------------------------ + +- **Advanced Vector Search**: Uses LanceDB to enable fast and scalable similarity searches across millions of articles. +- **Real-Time Results**: The system retrieves and ranks articles within milliseconds. +- **Customizable Dataset**: Easily replace the default dataset or upload custom datasets in `.csv` or `.txt` formats. + +* * * * * + +**Use Cases** +------------- + +- **Research and Academic Work**: Find articles that are most relevant to your research topic. +- **Content Curation**: Discover the best content for blogs, newsletters, or social media. +- **Media Monitoring**: Track trends and news articles efficiently. +- **Educational Insights**: Access curated learning material on any subject. + +* * * * * + +**Getting Started** +------------------- + +### **1\. Prerequisites** + +- **Node.js** version **20+** +- A valid [OpenAI API Key](https://platform.openai.com/signup) + +### **2\. Installation** + +Clone the repository and install dependencies: + + +`git clone +cd +npm install` + +### **3\. Configure API Key** + +Add your OpenAI API key in `.env`: + +`OPENAI_API_KEY=your_openai_key` + +* * * * * + + +### **4\. Add your data source** + +Add your data source under the src>Backend>dataSourceFiles as news.csv +If you name it otherwise, you might have to change the data source link in langChainProcessor.mjs file + +* * * * * + +### **5\. Running the System** + +use node >V20 + +`npm install` + +#### Run Backend Server: + +`npm run server` + +#### Run Full Application: + + +`npm run dev` + +Access the app at: + +`http://localhost:5173` + +* * * * * + +**Customizing the Dataset** +--------------------------- + +You can upload or replace the dataset for customized recommendations: + +1. Navigate to `src/Backend/dataSourceFiles`. +2. Replace the existing `.csv` or `.txt` file with your dataset. +3. Restart the backend server to process the new dataset. + +For example, to use the **All the News 2 Dataset**:\ +[A dataset of 180mb size..used for creating this app](https://components.one/datasets/above-the-fold)\ +[All the News 2 Dataset](https://components.one/datasets/all-the-news-2-news-articles-dataset) + +* * * * * + +**API Overview** +---------------- + +**Endpoint**: `/api/articles`\ +**Method**: `POST`\ +**Request Body**: + +`{ + "text": "Your keyword here" +}` + +**Response**: + +`{ + "result": [ + { + "metadata": { + "title": "Sample Title", + "author": "Author Name", + "content": "Snippet of the article..." + } + } + ] +}` + +* * * * * + +**Future Enhancements** +----------------------- + +- **Support for Multi-Modal Datasets**: Images, PDFs, and multimedia support. +- **Interactive Filters**: Filter results by date, author, or publication. +- **Deployable Cloud Versions**: Ready-to-deploy solutions for AWS, Vercel, and Netlify. \ No newline at end of file diff --git a/applications/node/article_recommender/eslint.config.js b/applications/node/article_recommender/eslint.config.js new file mode 100644 index 0000000..238d2e4 --- /dev/null +++ b/applications/node/article_recommender/eslint.config.js @@ -0,0 +1,38 @@ +import js from '@eslint/js' +import globals from 'globals' +import react from 'eslint-plugin-react' +import reactHooks from 'eslint-plugin-react-hooks' +import reactRefresh from 'eslint-plugin-react-refresh' + +export default [ + { ignores: ['dist'] }, + { + files: ['**/*.{js,jsx}'], + languageOptions: { + ecmaVersion: 2020, + globals: globals.browser, + parserOptions: { + ecmaVersion: 'latest', + ecmaFeatures: { jsx: true }, + sourceType: 'module', + }, + }, + settings: { react: { version: '18.3' } }, + plugins: { + react, + 'react-hooks': reactHooks, + 'react-refresh': reactRefresh, + }, + rules: { + ...js.configs.recommended.rules, + ...react.configs.recommended.rules, + ...react.configs['jsx-runtime'].rules, + ...reactHooks.configs.recommended.rules, + 'react/jsx-no-target-blank': 'off', + 'react-refresh/only-export-components': [ + 'warn', + { allowConstantExport: true }, + ], + }, + }, +] diff --git a/applications/node/article_recommender/index.html b/applications/node/article_recommender/index.html new file mode 100644 index 0000000..eb48de3 --- /dev/null +++ b/applications/node/article_recommender/index.html @@ -0,0 +1,12 @@ + + + + + + Article + + +
+ + + diff --git a/applications/node/article_recommender/package.json b/applications/node/article_recommender/package.json new file mode 100644 index 0000000..a4890b5 --- /dev/null +++ b/applications/node/article_recommender/package.json @@ -0,0 +1,54 @@ +{ + "name": "article-recommender", + "private": true, + "version": "0.0.0", + "type": "module", + "scripts": { + "dev": "vite", + "build": "vite build", + "lint": "eslint .", + "preview": "vite preview", + "server": "node src/backend/server.mjs", + "start": "npm-run-all --parallel server dev" + }, + "dependencies": { + "@heroicons/react": "^2.2.0", + "@lancedb/lancedb": "^0.12.0", + "@langchain/community": "^0.3.1", + "@langchain/openai": "^0.3.14", + "@phosphor-icons/react": "^2.1.7", + "@testing-library/jest-dom": "^5.17.0", + "@testing-library/react": "^13.4.0", + "@testing-library/user-event": "^13.5.0", + "body-parser": "^1.20.3", + "cors": "^2.8.5", + "csv-parser": "^3.0.0", + "express": "^4.21.2", + "fs": "^0.0.1-security", + "langchain": "^0.3.7", + "multer": "^1.4.5-lts.1", + "phosphor-react": "^1.4.1", + "react": "^18.3.1", + "react-dom": "^18.3.1", + "react-quill": "^2.0.0", + "react-scripts": "5.0.1", + "vectordb": "^0.1.19", + "web-vitals": "^2.1.4" + }, + "devDependencies": { + "@eslint/js": "^9.15.0", + "@types/react": "^18.3.12", + "@types/react-dom": "^18.3.1", + "@vitejs/plugin-react": "^4.3.4", + "autoprefixer": "^10.4.20", + "eslint": "^9.15.0", + "eslint-plugin-react": "^7.37.2", + "eslint-plugin-react-hooks": "^5.0.0", + "eslint-plugin-react-refresh": "^0.4.14", + "globals": "^15.12.0", + "npm-run-all": "^4.1.5", + "postcss": "^8.4.49", + "tailwindcss": "^3.4.16", + "vite": "^6.0.1" + } +} diff --git a/applications/node/article_recommender/postcss.config.js b/applications/node/article_recommender/postcss.config.js new file mode 100644 index 0000000..2e7af2b --- /dev/null +++ b/applications/node/article_recommender/postcss.config.js @@ -0,0 +1,6 @@ +export default { + plugins: { + tailwindcss: {}, + autoprefixer: {}, + }, +} diff --git a/applications/node/article_recommender/public/assets/logo.svg b/applications/node/article_recommender/public/assets/logo.svg new file mode 100644 index 0000000..ac51620 --- /dev/null +++ b/applications/node/article_recommender/public/assets/logo.svg @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/applications/node/article_recommender/src/App.css b/applications/node/article_recommender/src/App.css new file mode 100644 index 0000000..e69de29 diff --git a/applications/node/article_recommender/src/App.jsx b/applications/node/article_recommender/src/App.jsx new file mode 100644 index 0000000..4c49662 --- /dev/null +++ b/applications/node/article_recommender/src/App.jsx @@ -0,0 +1,81 @@ +import { useState, useEffect } from 'react'; +import axios from 'axios'; + +// Components +import SearchBar from './components/SearchBar'; +import PillBar from './components/PillBar'; +import ArticleCard from './components/ArticleCard'; +import Loader from './components/Loader'; + +// Styles +import './App.css'; + +function App() { + const [searchQuery, setSearchQuery] = useState(''); + const [articles, setArticles] = useState([]); + const [isLoading, setIsLoading] = useState(false); + + useEffect(() => { + const fetchArticles = async () => { + if (searchQuery.trim() !== '') { + setIsLoading(true); + try { + const response = await axios.post('http://localhost:5300/api/articles', { + text: searchQuery, // Send the search query in the body + }); + setArticles(response.data.result || []); // Assuming the server returns `result` in the response + } catch (error) { + console.error('Error fetching articles:', error); + } + setIsLoading(false); + } + }; + + fetchArticles(); + }, [searchQuery]); // Fetch articles when `searchQuery` changes + + return ( +
+
+ Flowbite Logo +
+
+

+
+ + + +
+
Article Recommender
+

+
+ + + {isLoading ? ( + + ) : ( +
+ {articles.map((article) => ( + + ))} +
+ )} +
+
+
+ ); +} + +export default App; diff --git a/applications/node/article_recommender/src/Backend/langchainProcessor.mjs b/applications/node/article_recommender/src/Backend/langchainProcessor.mjs new file mode 100644 index 0000000..8bb105a --- /dev/null +++ b/applications/node/article_recommender/src/Backend/langchainProcessor.mjs @@ -0,0 +1,83 @@ +import fs from 'fs'; +import csvParser from 'csv-parser'; +import { + RecursiveCharacterTextSplitter +} from 'langchain/text_splitter'; +import { + OpenAIEmbeddings +} from '@langchain/openai'; +import { + LanceDB +} from '@langchain/community/vectorstores/lancedb'; + +let retriever; + +// Function to read CSV and return structured data +async function readCSV(filePath) { + return new Promise((resolve, reject) => { + const rows = []; + fs.createReadStream(filePath) + .pipe(csvParser()) + .on('data', (data) => rows.push(data)) + .on('end', () => resolve(rows)) + .on('error', (error) => reject(error)); + }); +} + +// Function to process and store CSV data in LanceDB +export async function processCSVToVectorDB(filePath) { + try { + // Read CSV data + const csvData = await readCSV(filePath); + console.log('CSV Data Loaded:', csvData.length, 'rows'); + + // Prepare data for vectorization + const documents = csvData.map((row) => ({ + pageContent: Object.values(row).join(' '), // Combine row fields into a single text + metadata: row, // Preserve the row metadata for retrieval + })); + + // Split data into smaller chunks + const splitter = new RecursiveCharacterTextSplitter({ + chunkSize: 25000, + chunkOverlap: 1, + }); + const docs = await splitter.splitDocuments(documents); + + // Store in LanceDB with OpenAI embeddings + const vectorStore = await LanceDB.fromDocuments(docs.splice(0, 1000), new OpenAIEmbeddings()); + retriever = vectorStore.asRetriever(10); + console.log('Data stored in LanceDB successfully.'); + } catch (error) { + console.error('Error processing CSV to vector DB:', error); + throw error; + } +} + +// Function to retrieve top 10 results based on a query +export async function getTopResults(query) { + if (!retriever) { + throw new Error('Retriever is not initialized. Please process the data first.'); + } + + try { + const results = await retriever.invoke(query, { + top_k: 10 + }); + const output = results.map((result) => result.metadata); // Extract row metadata + return output; + } catch (error) { + console.error('Error retrieving results:', error); + throw error; + } +} + +// Example Usage +export async function initializationOfDB() { + const csvFilePath = 'src/Backend/dataSourceFiles/news.csv'; // Path to your CSV file + try { + await processCSVToVectorDB(csvFilePath); // Process CSV and store in LanceDB + } catch (error) { + console.error('Error:', error); + } +}; \ No newline at end of file diff --git a/applications/node/article_recommender/src/Backend/server.mjs b/applications/node/article_recommender/src/Backend/server.mjs new file mode 100644 index 0000000..2292c13 --- /dev/null +++ b/applications/node/article_recommender/src/Backend/server.mjs @@ -0,0 +1,69 @@ +// server.js +import express from "express"; +import cors from "cors"; +import bodyParser from "body-parser"; +import { + initializationOfDB, + getTopResults +} from "./langchainProcessor.mjs"; +import { + TextLoader +} from 'langchain/document_loaders/fs/text'; +import path from 'path'; +import multer from "multer"; +import AbortController from 'abort-controller' +const app = express(); +const port = 5300; +const allowedOrigin = "http://localhost:5173"; // Replace with your client-side application URL +let currentAbortController = null; + +app.use( + cors({ + origin: allowedOrigin, + methods: "GET, POST", // Specify allowed methods + credentials: true, // Allow cookies for authenticated requests (if applicable) + allowedHeaders: ["Content-Type"], + }) +); +app.use(bodyParser.json()); + +async function loadDocument(filePath, fileType) { + let loader; + if (fileType === 'txt') { + loader = new TextLoader(filePath); + } + return await loader.load(); +} + +// Handle file uploads +app.post("/api/articles", async (req, res) => { + try { + const text = req.body.text; + let result; + result = await getTopResults(text); + res.header("Access-Control-Allow-Origin", "*"); // Pass the text to your main function logic + res.json({ + result, + }); + } catch (error) { + res.status(500).send("Error processing the text"); + } +}); + +app.listen(port, () => { + console.log(`Server is running on http://localhost:${port}`); +}); + +async function startServer() { + try { + console.log('Starting LangChain process...'); + await initializationOfDB(); // Initialize the process here + console.log('LangChain process initialized successfully.'); + } catch (error) { + console.error('Error initializing LangChain process:', error); + process.exit(1); + } +} + +// Call the startServer function +startServer(); \ No newline at end of file diff --git a/applications/node/article_recommender/src/components/ArticleCard.jsx b/applications/node/article_recommender/src/components/ArticleCard.jsx new file mode 100644 index 0000000..5e47504 --- /dev/null +++ b/applications/node/article_recommender/src/components/ArticleCard.jsx @@ -0,0 +1,53 @@ +import React, { useState } from 'react'; + +const ArticleCard = ({ title, subtitle, date, category, publisher, author, url, content }) => { + const [isExpanded, setIsExpanded] = useState(false); + + const toggleContent = () => { + setIsExpanded(!isExpanded); + }; + + return ( +
+
+ {/* Title with hover and click functionality */} +

window.open(url, '_blank', 'noopener,noreferrer')} + > + {title} +

+ {/* Content with "Read more" functionality */} +

+ {isExpanded ? content : `${content.slice(0, 100)}...`} + {!isExpanded && ( + + )} +

+ {isExpanded && ( + + )} +

{subtitle}

+
+
+

{date}

+
+ {/* Author Name */} +

by {author}

+
+
+
+ ); +}; + +export default ArticleCard; diff --git a/applications/node/article_recommender/src/components/Loader.jsx b/applications/node/article_recommender/src/components/Loader.jsx new file mode 100644 index 0000000..6e372f5 --- /dev/null +++ b/applications/node/article_recommender/src/components/Loader.jsx @@ -0,0 +1,11 @@ +import React from 'react'; + +const Loader = () => { + return ( +
+
+
+ ); +}; + +export default Loader; \ No newline at end of file diff --git a/applications/node/article_recommender/src/components/PillBar.jsx b/applications/node/article_recommender/src/components/PillBar.jsx new file mode 100644 index 0000000..ba450b1 --- /dev/null +++ b/applications/node/article_recommender/src/components/PillBar.jsx @@ -0,0 +1,21 @@ +import React from 'react'; + +const PillBar = ({ onCategoryClick }) => { + const categories = ['Global war', 'climate', 'economy', 'india', 'environment', 'finance']; + + return ( +
+ {categories.map((category, index) => ( + + ))} +
+ ); +}; + +export default PillBar; diff --git a/applications/node/article_recommender/src/components/SearchBar.jsx b/applications/node/article_recommender/src/components/SearchBar.jsx new file mode 100644 index 0000000..54100b4 --- /dev/null +++ b/applications/node/article_recommender/src/components/SearchBar.jsx @@ -0,0 +1,36 @@ +import React, { useState, useEffect } from 'react'; + +const SearchBar = ({ onSearch, externalQuery }) => { + const [query, setQuery] = useState(''); + + useEffect(() => { + if (externalQuery) { + setQuery(externalQuery); + } + }, [externalQuery]); + + const handleSearch = (e) => { + e.preventDefault(); + onSearch(query); + }; + + return ( +
+ setQuery(e.target.value)} + className="flex-1 border-none focus:outline-none text-gray-700 placeholder-gray-400" + /> + +
+ ); +}; + +export default SearchBar; diff --git a/applications/node/article_recommender/src/index.css b/applications/node/article_recommender/src/index.css new file mode 100644 index 0000000..0d6a3e2 --- /dev/null +++ b/applications/node/article_recommender/src/index.css @@ -0,0 +1,6 @@ +@tailwind base; +@tailwind components; +@tailwind utilities; + +:root { +} diff --git a/applications/node/article_recommender/src/main.jsx b/applications/node/article_recommender/src/main.jsx new file mode 100644 index 0000000..b9a1a6d --- /dev/null +++ b/applications/node/article_recommender/src/main.jsx @@ -0,0 +1,10 @@ +import { StrictMode } from 'react' +import { createRoot } from 'react-dom/client' +import './index.css' +import App from './App.jsx' + +createRoot(document.getElementById('root')).render( + + + , +) diff --git a/applications/node/article_recommender/tailwind.config.js b/applications/node/article_recommender/tailwind.config.js new file mode 100644 index 0000000..89a305e --- /dev/null +++ b/applications/node/article_recommender/tailwind.config.js @@ -0,0 +1,11 @@ +/** @type {import('tailwindcss').Config} */ +export default { + content: [ + "./index.html", + "./src/**/*.{js,ts,jsx,tsx}", + ], + theme: { + extend: {}, + }, + plugins: [], +} \ No newline at end of file diff --git a/applications/node/article_recommender/vite.config.js b/applications/node/article_recommender/vite.config.js new file mode 100644 index 0000000..8b0f57b --- /dev/null +++ b/applications/node/article_recommender/vite.config.js @@ -0,0 +1,7 @@ +import { defineConfig } from 'vite' +import react from '@vitejs/plugin-react' + +// https://vite.dev/config/ +export default defineConfig({ + plugins: [react()], +})