diff --git a/docs/docs/01-datasets/_category_.json b/docs/docs/01-datasets/_category_.json new file mode 100644 index 0000000..56fe235 --- /dev/null +++ b/docs/docs/01-datasets/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "Ingestion and Retrieval Flows" +} + diff --git a/docs/docs/02-flows/02-usage.md b/docs/docs/02-flows/02-usage.md new file mode 100644 index 0000000..2f6f414 --- /dev/null +++ b/docs/docs/02-flows/02-usage.md @@ -0,0 +1,48 @@ +--- +title: Usage +--- + +# Using Knowledge + +The knowledge tool itself has two modes of operation: Standalone and Server Mode - Check the sections below to learn more about them. + +Both modes are configured the same way, via environment variables or command line flags: + +## Configuration + +### Embedding Model Provider (must have) + +The model provider is the provider of the embeddings model that is used to encode ingested documents. +Currently, we only support **OpenAI** and **Azure OpenAI** via the following flags / environment variables: + +```bash +--openai-api-base string OpenAI API base ($OPENAI_BASE_URL) (default "https://api.openai.com/v1") +--openai-api-key string OpenAI API key ($OPENAI_API_KEY) (default "sk-foo") +--openai-api-type string OpenAI API type (OPEN_AI, AZURE, AZURE_AD) ($OPENAI_API_TYPE) (default "OPEN_AI") +--openai-api-version string OpenAI API version (for Azure) ($OPENAI_API_VERSION) (default "2024-02-01") +--openai-azure-deployment string Azure OpenAI deployment name (overrides openai-embedding-model, if set) ($OPENAI_AZURE_DEPLOYMENT) +--openai-embedding-model string OpenAI Embedding model ($OPENAI_EMBEDDING_MODEL) (default "text-embedding-ada-002") +``` + +Those are persistent flags, so they can be set on any knowledge subcommand. + + +## 1. Standalone Mode (Default) + +In standalone mode, Knowledge makes use of an embedded database and embedded Vector Database which the client connects to directly. +This is the default and most simple mode of operation and is useful for local usage and offers a great integration with GPTScript. + +### Run the Client + +Any `knowledge` command (except for `knowledge server`) will use the standalone client mode, if no `KNOW_SERVER_URL` environment variable is set. + +## 2. Server Mode + +In server mode, Knowledge uses a separate server for the Vector Database and the Document Database. +This mode is useful when you want to share the data with multiple clients or when you want to use a more powerful server for the Vector Database. + +### Run the Server + +```bash +knowledge server +``` \ No newline at end of file diff --git a/docs/docs/02-flows/03-architecture.md b/docs/docs/02-flows/03-architecture.md new file mode 100644 index 0000000..94ed335 --- /dev/null +++ b/docs/docs/02-flows/03-architecture.md @@ -0,0 +1,33 @@ +--- +title: Architecture +--- + +# Knowledge Architecture + +![Knowledge Architecture](/img/knowledge_architecture.png) + +Knowledge consists of the following components: + +## 1. Knowledge Client + +The Knowledge Client is the main interface to interact with your knowledge bases. +In standalone mode, it makes direct use of embedded databases. It's running fully locally. +It's also the default entrypoint for the CLI. + +## 2. Knowledge Server [Optional] + +The Knowledge Server is a REST API server that can be used to provide a (shared) HTTP Endpoint for your knowledge bases. +You can make use of it in the CLI by setting the `KNOW_SERVER_URL` environment variables for all client commands. + +## 3. Index Database + +The index database is an additional (relational) metadata database which keeps track of all datasets and ingested files and their relationships. +It enables some extra convenience features but does not store the actual data (embeddings). +The current implementation uses **SQLite**. +It's fully embedded and does not require any additional setup. + +## 4. Vector Database + +The vector database is the main storage for the embeddings of the ingested documents along with some metadata (e.g. source file information). +The current implementation uses [**chromem-go**](https://github.com/philippgille/chromem-go). +It's fully embedded and does not require any additional setup. \ No newline at end of file diff --git a/docs/docs/03-cmd/_category_.json b/docs/docs/99-cmd/_category_.json similarity index 100% rename from docs/docs/03-cmd/_category_.json rename to docs/docs/99-cmd/_category_.json diff --git a/docs/docs/03-cmd/knowledge.md b/docs/docs/99-cmd/knowledge.md similarity index 100% rename from docs/docs/03-cmd/knowledge.md rename to docs/docs/99-cmd/knowledge.md diff --git a/docs/docs/03-cmd/knowledge_askdir.md b/docs/docs/99-cmd/knowledge_askdir.md similarity index 100% rename from docs/docs/03-cmd/knowledge_askdir.md rename to docs/docs/99-cmd/knowledge_askdir.md diff --git a/docs/docs/03-cmd/knowledge_create-dataset.md b/docs/docs/99-cmd/knowledge_create-dataset.md similarity index 100% rename from docs/docs/03-cmd/knowledge_create-dataset.md rename to docs/docs/99-cmd/knowledge_create-dataset.md diff --git a/docs/docs/03-cmd/knowledge_delete-dataset.md b/docs/docs/99-cmd/knowledge_delete-dataset.md similarity index 100% rename from docs/docs/03-cmd/knowledge_delete-dataset.md rename to docs/docs/99-cmd/knowledge_delete-dataset.md diff --git a/docs/docs/03-cmd/knowledge_edit-dataset.md b/docs/docs/99-cmd/knowledge_edit-dataset.md similarity index 100% rename from docs/docs/03-cmd/knowledge_edit-dataset.md rename to docs/docs/99-cmd/knowledge_edit-dataset.md diff --git a/docs/docs/03-cmd/knowledge_export.md b/docs/docs/99-cmd/knowledge_export.md similarity index 100% rename from docs/docs/03-cmd/knowledge_export.md rename to docs/docs/99-cmd/knowledge_export.md diff --git a/docs/docs/03-cmd/knowledge_get-dataset.md b/docs/docs/99-cmd/knowledge_get-dataset.md similarity index 100% rename from docs/docs/03-cmd/knowledge_get-dataset.md rename to docs/docs/99-cmd/knowledge_get-dataset.md diff --git a/docs/docs/03-cmd/knowledge_import.md b/docs/docs/99-cmd/knowledge_import.md similarity index 100% rename from docs/docs/03-cmd/knowledge_import.md rename to docs/docs/99-cmd/knowledge_import.md diff --git a/docs/docs/03-cmd/knowledge_ingest.md b/docs/docs/99-cmd/knowledge_ingest.md similarity index 100% rename from docs/docs/03-cmd/knowledge_ingest.md rename to docs/docs/99-cmd/knowledge_ingest.md diff --git a/docs/docs/03-cmd/knowledge_list-datasets.md b/docs/docs/99-cmd/knowledge_list-datasets.md similarity index 100% rename from docs/docs/03-cmd/knowledge_list-datasets.md rename to docs/docs/99-cmd/knowledge_list-datasets.md diff --git a/docs/docs/03-cmd/knowledge_retrieve.md b/docs/docs/99-cmd/knowledge_retrieve.md similarity index 100% rename from docs/docs/03-cmd/knowledge_retrieve.md rename to docs/docs/99-cmd/knowledge_retrieve.md diff --git a/docs/docs/03-cmd/knowledge_server.md b/docs/docs/99-cmd/knowledge_server.md similarity index 100% rename from docs/docs/03-cmd/knowledge_server.md rename to docs/docs/99-cmd/knowledge_server.md diff --git a/docs/docs/03-cmd/knowledge_version.md b/docs/docs/99-cmd/knowledge_version.md similarity index 100% rename from docs/docs/03-cmd/knowledge_version.md rename to docs/docs/99-cmd/knowledge_version.md diff --git a/docs/gendocs/main.go b/docs/gendocs/main.go index bb9eb00..70fd27d 100644 --- a/docs/gendocs/main.go +++ b/docs/gendocs/main.go @@ -31,7 +31,7 @@ func main() { } } - err = doc.GenMarkdownTreeCustom(cmd, "docs/docs/03-cmd", filePrepender, linkHandler) + err = doc.GenMarkdownTreeCustom(cmd, "docs/docs/99-cmd", filePrepender, linkHandler) if err != nil { log.Fatal(err) } diff --git a/docs/static/img/knowledge_architecture.png b/docs/static/img/knowledge_architecture.png new file mode 100644 index 0000000..cb09163 Binary files /dev/null and b/docs/static/img/knowledge_architecture.png differ