From 7dd903e33a2e939d8cd4911404f1845e486191e3 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Tue, 28 May 2024 21:26:37 +0200 Subject: [PATCH 1/8] doc(faq): add first set of faq questions --- docs/faq.md | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 docs/faq.md diff --git a/docs/faq.md b/docs/faq.md new file mode 100644 index 00000000..ddd8ddfe --- /dev/null +++ b/docs/faq.md @@ -0,0 +1,30 @@ +--- +layout: default +title: Freqently Asked Questions +nav_order: 12 +permalink: /faq +--- +# Frequently Asked Questions (FAQ) + + +### How can I set the chunk size? +#### "I wnat to parse my documents into smaller chunks" + + +### How can I set the embedding store? +#### "I want to use a specific embedding store" + + +### How can I set the data store? +#### "I want to use a specific data store" + + +### How can I retrieve more context? +#### "I want to retrieve more context from a query" + + +### How can I set the Large Language Model? +#### "I want to use a different LLM" + +### How can I set the embedding model? +#### "I want to use a different embedding model" From 23e1bd5e5025f17db4d772dad17c31922d69202b Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 1 Jun 2024 12:04:07 +0200 Subject: [PATCH 2/8] doc(faq): add answer to chunk size question --- docs/faq.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/faq.md b/docs/faq.md index ddd8ddfe..a33d49d8 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -9,7 +9,24 @@ permalink: /faq ### How can I set the chunk size? #### "I wnat to parse my documents into smaller chunks" +You can set the chunk size with the ``chunk_size`` parameter of the ``add_files`` method. +The ``add_files`` method from the ``Library`` class has a ``chunk_size`` parameter that controls the chunk size. +The method in addition has a parameter to control the maxium chunk size with ``max_chunk_size``. +These two parameters are passed on to the ``Parser`` class. +In the following example, we add the same files with different chunk sizes to the library ``chunk_size_example``. +```python +from pathlib import Path + +from llmware.library import Library + + +path_to_my_library_files = Path('~/llmware_data/sample_files/Agreements') + +my_library = Library().create_new_library(library_name='chunk_size_example') +my_library.add_files(input_folder_path=path_to_my_library_files, chunk_size=400) +my_library.add_files(input_folder_path=path_to_my_library_files, chunk_size=600) +``` ### How can I set the embedding store? #### "I want to use a specific embedding store" From da683dcadbc5456ab3698b8444849da3b31c5e83 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 1 Jun 2024 16:47:27 +0200 Subject: [PATCH 3/8] doc(faq): add answer to collection store question --- docs/faq.md | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/docs/faq.md b/docs/faq.md index a33d49d8..f0e59139 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -32,8 +32,28 @@ my_library.add_files(input_folder_path=path_to_my_library_files, chunk_size=600) #### "I want to use a specific embedding store" -### How can I set the data store? -#### "I want to use a specific data store" + +### How can I set the collection store? +#### "I want to use a specific collection store" +You can set the collection store with the ``set_active_db`` method of the ``LLMWareConfig`` class. + +The collection store is set using the ``LLMWareConfig`` class with the ``set_active_db`` method. +At the time of writting, **LLMWare** supports the three collection stores *MongoDB*, *Postgres*, and *SQLite* - which is the default. +You can retrieve the supported collection store with the method ``get_supported_collection_db``. +In the example below, we first print the currently active collection store, then we retrieve the supported collection stores, before we swith to *Postgres*. + +```python +import logging + +from llmware.configs import LLMWareConfig + + +logging.info(f'Currently active collection store: {LLMWareConfig.get_active_db()}') +logging.info(f'Currently supported collection stores: {LLMWareConfig().get_supported_collection_db()}') + +LLMWareConfig.set_active_db("postgres") +logging.info(f'Currently active collection store: {LLMWareConfig.get_active_db()}') +``` ### How can I retrieve more context? From 68d1256eda83cd5b2b3829bf560718c0110556d7 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 1 Jun 2024 17:55:44 +0200 Subject: [PATCH 4/8] doc(faq): add answer to embedding store question --- docs/faq.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/docs/faq.md b/docs/faq.md index f0e59139..d83b8ed2 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -30,8 +30,28 @@ my_library.add_files(input_folder_path=path_to_my_library_files, chunk_size=600) ### How can I set the embedding store? #### "I want to use a specific embedding store" +You can set the embedding store with the ``vector_db`` parameter of the ``install_new_embedding`` method, which you call on a ``Library`` object eacht time you want to create an embedding for a *library*. +The ``install_new_embedding`` method from the ``Library`` class has a ``vector_db`` parameter that sets the embedding store. +At the moment of this writting, *LLMWare* supports the embedding stores [chromadb](https://github.com/chroma-core/chroma), [neo4j](https://github.com/neo4j/neo4j), [milvus](https://github.com/milvus-io/milvus), [pg_vector](https://github.com/pgvector/pgvector), [postgres](https://github.com/postgres/postgres), [redis](https://github.com/redis/redis), [pinecone](https://www.pinecone.io/), [faiss](https://github.com/facebookresearch/faiss), [qdrant](https://github.com/qdrant/qdrant), [mongo atlas](https://www.mongodb.com/products/platform/atlas-database), and [lancedb](https://github.com/lancedb/lancedb). +In the following example, we create the same embeddings three times for the same library, but store them in three different embedding stores. +```python +import logging +from pathlib import Path + +from llmware.configs import LLMWareConfig +from llmware.library import Library + + +logging.info(f'Currently supported embedding stores: {LLMWareConfig().get_supported_vector_db()}') +library = Library().create_new_library(library_name='embedding_store_example') +library.add_files(input_foler_path=Path('~/llmware_data/sample_files/Agreements')) + +library.install_new_embedding(vector_db="pg_vector") +library.install_new_embedding(vector_db="milvus") +library.install_new_embedding(vector_db="faiss") +``` ### How can I set the collection store? #### "I want to use a specific collection store" From a19e752bb2c8909d19296a7a5cce534fd170f478 Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sat, 1 Jun 2024 20:44:11 +0200 Subject: [PATCH 5/8] doc(faq): add answer to context size question --- docs/faq.md | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/docs/faq.md b/docs/faq.md index d83b8ed2..21bc1fa9 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -78,7 +78,45 @@ logging.info(f'Currently active collection store: {LLMWareConfig.get_active_db() ### How can I retrieve more context? #### "I want to retrieve more context from a query" +One way to retrieve more context is to set the ``result_count`` parameter of the ``query``, ``text_query``, and ``semantic_query`` methods from the ``Query`` class. +By increasing ``result_count``, the number of retrieved results is increased which increases the context size. + +The ``Query`` class has the methods ``query``, ``text_query``, and ``semantic_query`` methods which allow to set the number of retrieved results with ``result_count``. +On a side note, ``query`` is a wrapper function for ``text_query`` and ``semantic_query``. +The value of ``result_count`` is passed on to the queried embedding store to control the number of retrieved results. +For example, for *pgvector* ``result_count`` is passed on to the value after the ``LIMIT`` keyword. +In the ``SQL`` example below, you can see the resulting ``SQL`` query of ``LLMWare`` if ``result_count=10``, the name of the collectoin being ``agreements``, and the query vector being ``[1, 2, 3]``. +```sql +SELECT + id, + block_mongo_id, + embedding <-> '[1, 2, 3]' AS distance, + text +FROM agreements +ORDER BY distance +LIMIT 10; +``` +In the following example, we execute the same query against a library twice but change the number of retrieved results from ``3`` to ``6``. +```python +import logging +from pathlib import Path + +from llmware.configs import LLMWareConfig +from llmware.library import Library +from llmware.retrieval import Query +logging.info(f'Currently supported embedding stores: {LLMWareConfig().get_supported_vector_db()}') + +library = Library().create_new_library(library_name='context_size_example') +library.add_files(input_foler_path=Path('~/llmware_data/sample_files/Agreements')) +library.install_new_embedding(vector_db="pg_vector") + +query = Query(library) +query_results = query.semantic_query(query='salary', result_count=3, results_only=True) +logging.info(f'Number of results: {len(query_results)}') +query_results = query.semantic_query(query='salary', result_count=6, results_only=True) +logging.info(f'Number of results: {len(query_results)}') +``` ### How can I set the Large Language Model? #### "I want to use a different LLM" From 1ed4d4e18ac1ac187b04da6ac9b815e690192a3a Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 2 Jun 2024 12:58:06 +0200 Subject: [PATCH 6/8] doc(faq): add answer to llm question --- docs/faq.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/docs/faq.md b/docs/faq.md index 21bc1fa9..f745464c 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -120,6 +120,35 @@ logging.info(f'Number of results: {len(query_results)}') ### How can I set the Large Language Model? #### "I want to use a different LLM" +You can set the Large Language Model (LLM) with the ``gen_model`` parameter of the ``load_model`` method from the ``Prompt`` class. + +The ``Prompt`` class has the method ``load_model`` with the ``gen_model`` parameter which sets the LLM. +The ``gen_model`` parameter is passed on to the ``ModelCatalog`` class, which loads the LLM either from HuggingFace or from another source. +The ``ModelCatalog`` allows you to **list all available models** with the method ``list_generative_mdoels``, or just the local models ``list_generative_local_models``, or just the open source models ``list_open_source_models``. +In the example below, we log all available LLMs, including the ones that are available locally and the open source ones, and also create the prompters. +Each prompter uses a different LLM from our [BLING model series](https://llmware.ai/about), which you can also find on [HuggingFace](https://huggingface.co/collections/llmware/bling-models-6553c718f51185088be4c91a). + +```python +import logging + +from llmware.models import ModelCatalog +from llmware.prompts import Prompt + + +llm_gen = ModelCatalog().list_generative_models() +logging.info(f'List of all LLMs: {llm_gen}') + +llm_gen_local = ModelCatalog().list_generative_local_models() +logging.info(f'List of all local LLMs: {llm_local}') + +llm_gen_open_source = ModelCatalog().list_open_source_models() +logging.info(f'List of all open source LLMs: {llm_gen_open_source}') + + +prompter_bling_1b = Prompt().load_model(gen_model='llmware/bling-1b-0.1') +prompter_bling_tiny_llama = Prompt().load_model(gen_model='llmware/bling-tiny-llama-v0') +prompter_bling_falcon_1b = Prompt().load_model(gen_model='llmware/bling-falcon-1b-0.1') +``` ### How can I set the embedding model? #### "I want to use a different embedding model" From 0a61ffff2dde179b53589a750d87792337b002fc Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 2 Jun 2024 14:53:51 +0200 Subject: [PATCH 7/8] doc(faq): add answer to embedding model question --- docs/faq.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/faq.md b/docs/faq.md index f745464c..e972b0cb 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -152,3 +152,26 @@ prompter_bling_falcon_1b = Prompt().load_model(gen_model='llmware/bling-falcon-1 ### How can I set the embedding model? #### "I want to use a different embedding model" +You can set the embedding model with the ``embedding_model_name`` parameter of the ``install_new_embedding`` method from the ``Library`` class. + +The ``Library`` class has the method ``install_new_embedding`` with the ``embedding_model_name`` parameter which sets the embedding model. +The ``ModelCatalog`` allows you to **list all available embedding models** with the ``list_embedding_models`` method. +In the following example, we list all available embedding models, and then we create a library with the name ``embedding_models_example``, which we embedd two times with embedding models ``'mini-lm-sber'`` and ``'industry-bert-contracts'``. + +```python +import logging + +from llmware.models import ModelCatalog +from llmware.library import Library + + +embedding_models = ModelCatalog().list_generative_models() +logging.info(f'List of embedding models: {embedding_models}') + + +library = Library().create_new_library(library_name='embedding_models_example') +library.add_files(input_foler_path=Path('~/llmware_data/sample_files/Agreements')) + +library.install_new_embedding(embedding_model_name='mini-lm-sber') +library.install_new_embedding(embedding_model_name='industry-bert-contracts') +``` From 87990e6bf76ecc94eaf15e01636827dbb72dd61e Mon Sep 17 00:00:00 2001 From: Stefan Bachhofner Date: Sun, 2 Jun 2024 15:09:40 +0200 Subject: [PATCH 8/8] doc(docs): update locked gemfile --- docs/Gemfile.lock | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock index ebad891e..0f92b15b 100644 --- a/docs/Gemfile.lock +++ b/docs/Gemfile.lock @@ -79,9 +79,10 @@ GEM rouge (4.1.3) ruby2_keywords (0.0.5) safe_yaml (1.0.5) - sass-embedded (1.69.5-arm64-darwin) + sass-embedded (1.69.5) google-protobuf (~> 3.23) - sass-embedded (1.69.5-x86_64-linux-gnu) + rake (>= 13.0.0) + sass-embedded (1.69.5-arm64-darwin) google-protobuf (~> 3.23) sawyer (0.9.2) addressable (>= 2.3.5)