Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

Commit

Permalink
add: docs around new config file format
Browse files Browse the repository at this point in the history
  • Loading branch information
iwilltry42 committed Jul 26, 2024
1 parent b258ee8 commit f339ace
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 14 deletions.
33 changes: 21 additions & 12 deletions docs/docs/04-configfile.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,20 +34,29 @@ Here we try to capture all supported configuration items in one example.

```yaml
embeddings:
provider: vertex # this selects one of the below providers
cohere:
apiKey: "${COHERE_API_KEY}" # environment variables are expanded when reading the config file
model: "embed-english-v2.0"
openai:
apiKey: "${OPENAI_API_KEY}"
embeddingEndpoint: "/some-custom-endpoint" # anything that's not the default /embeddings
vertex:
apiKey: "${GOOGLE_API_KEY}"
project: "acorn-io"
model: "text-embedding-004"
providers:
- name: my-cohere
type: cohere
config:
apiKey: "${COHERE_API_KEY}" # environment variables are expanded when reading the config file
model: "embed-english-v2.0"
- name: myopenai
type: openai
config:
apiKey: "${OPENAI_API_KEY}"
embeddingEndpoint: "/some-custom-endpoint" # anything that's not the default /embeddings
- name: foobar
type: vertex
config:
apiKey: "${GOOGLE_API_KEY}"
project: "acorn-io"
# apiEndpoint: https://us-central1-aiplatform.googleapis.com
model: "text-embedding-004"
```
### Sections
- `embeddings`: See [Embedding Models](05-embedding_models.md) for more details.
- `provider`: May as well be set using the command line flag `--embedding-model-provider` or the environment variable `KNOW_EMBEDDING_MODEL_PROVIDER` (default: `openai`).
- Select a provider using the command line flag `--embedding-model-provider` or the environment variable `KNOW_EMBEDDING_MODEL_PROVIDER` (default: `openai`).
- **Note**: If a provider is selected but not specified in the config file, we'll assume that it's a standard provider configured via standard environment variables.
- E.g. you select `vertex`, but that name is not configured, so we default to `type=vertex` and use the `VERTEX_*` environment variables to configure a standard Google Vertex AI provider.
3 changes: 2 additions & 1 deletion docs/docs/10-datasets/02-sharing.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ knowledge export my-dataset --output my-dataset.zip

Importing a Dataset works just fine, but there's a culprit when you want to **ingest additional content into an imported dataset**: You'll have to use the exact same embedding function as the original dataset.
The Embedding function is part of the Vector Database implementation and defines how the content is transformed into a vector representation.
Currently, this is defined solely based on the model provider configuration, so it's fairly simple to replicate - you just have to use the same model (`$OPENAI_EMBEDDING_MODEL`) for it to work.
Currently, this is defined solely based on the model provider configuration, so it's fairly simple to replicate - you just have to use the exact same model usually.
When you try ingesting into an imported dataset with a differing embedding model provider config, the tool will error out if there is a mismatch in a required config field, so you can adjust.

:::

Expand Down
1 change: 0 additions & 1 deletion examples/configfiles/embedding_provider.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
embeddings:
provider: foobar # this selects one of the below
providers:
- name: my-cohere
type: cohere
Expand Down
2 changes: 2 additions & 0 deletions pkg/datastore/ingest.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ func (s *Datastore) Ingest(ctx context.Context, datasetID string, content []byte
if err != nil {
return nil, fmt.Errorf("failed to get embeddings model provider: %w", err)
}

// TODO: Use the dataset-provided config (merge with override)
err = embeddings.CompareRequiredFields(s.EmbeddingModelProvider.Config(), dsEmbeddingProvider.Config())
if err != nil {
slog.Info("Dataset has attached embeddings provider config", "config", ds.EmbeddingsProviderConfig)
Expand Down

0 comments on commit f339ace

Please sign in to comment.