Skip to content
This repository has been archived by the owner on Nov 13, 2024. It is now read-only.

Commit

Permalink
New Pinecone Client (#246)
Browse files Browse the repository at this point in the history
* Move to serverless client (#195)

* Fix tests

* Change todo and erase notebooks

* Fix comments

* Fix tests

* Fix tests

* Add example

* Make unit tests independent

* Fix docs

* Fix

* Fix CI

* Update CI

* Remove connect

* Fix lint

* Remove comment

* Change readme

* Fix comments

* Fix comments

* Fix comments

---------

Co-authored-by: Izel  Levy <izellevy@Izels-MacBook-Pro-2.local>

* Bump version to 0.4.0a0

* Update CI and tests

* Fix comments

* Fix api key

* Fix tests

* Fix lint

* Update client

* Disable pods for now

* Fix lint

---------

Co-authored-by: Izel  Levy <izellevy@Izels-MacBook-Pro-2.local>
Co-authored-by: ilai <ilai@pinecone.io>
Co-authored-by: GitHub Action <relevance@pinecone.io>
  • Loading branch information
4 people authored Jan 16, 2024
1 parent a52a022 commit dec8f94
Showing 16 changed files with 219 additions and 219 deletions.
1 change: 0 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
PINECONE_API_KEY="<PINECONE_API_KEY>"
PINECONE_ENVIRONMENT="<PINECONE_ENVIRONMENT>"
OPENAI_API_KEY="<OPENAI_API_KEY>"
INDEX_NAME="<INDEX_NAME>"
CANOPY_CONFIG_FILE="config/config.yaml"
20 changes: 7 additions & 13 deletions .github/workflows/PR.yml
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up Python 3.9
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
@@ -42,7 +42,6 @@ jobs:
strategy:
matrix:
python-version: [3.9, '3.10', 3.11]
pinecone-plan: ["paid", "starter"]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
@@ -60,18 +59,16 @@ jobs:
if: github.event_name == 'merge_group'
id: gen_suffix
run: |
RAW_SUFFIX="${{ matrix.python-version }}-${{ matrix.pinecone-plan }}"
RAW_SUFFIX="${{ matrix.python-version }}"
SUFFIX="${RAW_SUFFIX//./-}"
echo "${SUFFIX}"
echo "INDEX_NAME_SUFFIX=${SUFFIX}" >> $GITHUB_OUTPUT
- name: Run system tests
id: system_tests
if: github.event_name == 'merge_group'
continue-on-error: ${{ matrix.pinecone-plan == 'starter' }}
env:
INDEX_NAME: system-${{ steps.gen_suffix.outputs.INDEX_NAME_SUFFIX }}
PINECONE_ENVIRONMENT: ${{ matrix.pinecone-plan == 'paid' && secrets.PINECONE_ENVIRONMENT_3 || secrets.PINECONE_ENVIRONMENT_4 }}
PINECONE_API_KEY: ${{ matrix.pinecone-plan == 'paid' && secrets.PINECONE_API_KEY_3 || secrets.PINECONE_API_KEY_4 }}
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANYSCALE_API_KEY: ${{ secrets.ANYSCALE_API_KEY }}
CO_API_KEY: ${{ secrets.CO_API_KEY }}
@@ -87,11 +84,9 @@ jobs:
- name: Run e2e tests
id: e2e_tests
if: github.event_name == 'merge_group'
continue-on-error: ${{ matrix.pinecone-plan == 'starter' }}
env:
INDEX_NAME: e2e-${{ steps.gen_suffix.outputs.INDEX_NAME_SUFFIX }}
PINECONE_ENVIRONMENT: ${{ matrix.pinecone-plan == 'paid' && secrets.PINECONE_ENVIRONMENT_3 || secrets.PINECONE_ENVIRONMENT_4 }}
PINECONE_API_KEY: ${{ matrix.pinecone-plan == 'paid' && secrets.PINECONE_API_KEY_3 || secrets.PINECONE_API_KEY_4 }}
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANYSCALE_API_KEY: ${{ secrets.ANYSCALE_API_KEY }}
CO_API_KEY: ${{ secrets.CO_API_KEY }}
@@ -108,8 +103,7 @@ jobs:
- name: Cleanup indexes
if: (cancelled() || failure()) && github.event_name == 'merge_group'
env:
PINECONE_ENVIRONMENT: ${{ matrix.pinecone-plan == 'paid' && secrets.PINECONE_ENVIRONMENT_3 || secrets.PINECONE_ENVIRONMENT_4 }}
PINECONE_API_KEY: ${{ matrix.pinecone-plan == 'paid' && secrets.PINECONE_API_KEY_3 || secrets.PINECONE_API_KEY_4 }}
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
run: |
export PYTHONPATH=.
poetry run python scripts/cleanup_indexes.py "${{ steps.e2e_tests.outputs.run_id }}"
@@ -118,11 +112,11 @@ jobs:
uses: actions/upload-artifact@v3
if: always()
with:
name: pytest-report-py${{ matrix.python-version }}-${{ matrix.pinecone-plan }}
name: pytest-report-py${{ matrix.python-version }}
path: report*.html
- name: upload e2e test log
uses: actions/upload-artifact@v3
if: failure() && github.event_name == 'merge_group'
with:
name: e2e-log-failure-report-py${{ matrix.python-version }}-${{ matrix.pinecone-plan }}
name: e2e-log-failure-report-py${{ matrix.python-version }}
path: e2e.log
6 changes: 5 additions & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -17,6 +17,10 @@ on:
- 'patch' # bug fixes
- 'minor' # new features, backwards compatible
- 'major' # breaking changes
- 'prepatch' # bumps the patch and moves to a prerelease (1.0.2 -> 1.0.3a0)
- 'preminor' # bumps the minor and moves to a prerelease (1.0.2 -> 1.1.0a0)
- 'premajor' # bumps the major and moves to a prerelease (1.0.2 -> 2.0.0a0)
- 'prerelease' # bumps the prerelease version (1.0.3a0 -> 1.0.3a1)
generateBranch:
description: 'Whether to generate a release branch'
required: true
@@ -28,7 +32,6 @@ on:
type: boolean
default: false


jobs:
build:
name: Build
@@ -134,6 +137,7 @@ jobs:
name: ${{ needs.build.outputs.new_version }}
artifacts: "dist/*"
bodyFile: "CHANGELOG.md"
prerelease: ${{ startsWith(inputs.versionLevel, 'pre') }}

- name: Publish to test pypi
if: ${{ inputs.useTestPyPI == true }}
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -62,7 +62,6 @@ pip install canopy-sdk

```bash
export PINECONE_API_KEY="<PINECONE_API_KEY>"
export PINECONE_ENVIRONMENT="<PINECONE_ENVIRONMENT>"
export OPENAI_API_KEY="<OPENAI_API_KEY>"
export INDEX_NAME="<INDEX_NAME>"
```
@@ -76,7 +75,6 @@ export INDEX_NAME="<INDEX_NAME>"
| Name | Description | How to get it? |
|-----------------------|-----------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `PINECONE_API_KEY` | The API key for Pinecone. Used to authenticate to Pinecone services to create indexes and to insert, delete and search data | Register or log into your Pinecone account in the [console](https://app.pinecone.io/). You can access your API key from the "API Keys" section in the sidebar of your dashboard |
| `PINECONE_ENVIRONMENT`| Determines the Pinecone service cloud environment of your index e.g `west1-gcp`, `us-east-1-aws`, etc | You can find the Pinecone environment next to the API key in [console](https://app.pinecone.io/) |
| `OPENAI_API_KEY` | API key for OpenAI. Used to authenticate to OpenAI's services for embedding and chat API | You can find your OpenAI API key [here](https://platform.openai.com/account/api-keys). You might need to login or register to OpenAI services |
| `ANYSCALE_API_KEY` | API key for Anyscale. Used to authenticate to Anyscale Endpoints for open source LLMs | You can register Anyscale Endpoints and find your API key [here](https://app.endpoints.anyscale.com/)
| `CO_API_KEY` | API key for Cohere. Used to authenticate to Cohere services for embedding | You can find more information on registering to Cohere [here](https://cohere.com/pricing)
25 changes: 20 additions & 5 deletions config/config.yaml
Original file line number Diff line number Diff line change
@@ -92,10 +92,6 @@ chat_engine:
# -----------------------------------------------------------------------------------------------------------
params:
default_top_k: 5 # The default number of document chunks to retrieve for each query
# index_params: # Optional - index creation parameters for `create_canopy_index()` or `canopy new`
# metric: cosine
# pod_type: p1

chunker:
# --------------------------------------------------------------------------
# Configuration for the Chunker subcomponent of the knowledge base.
@@ -116,4 +112,23 @@ chat_engine:
params:
model_name: # The name of the model to use for encoding
text-embedding-ada-002
batch_size: 400 # The number of document chunks to encode in each call to the encoding model
batch_size: 400 # The number of document chunks to encode in each call to the encoding model

create_index_params:
# -------------------------------------------------------------------------------------------
# Initialization parameters to be passed to create a canopy index. These parameters will
# be used when running "canopy new".
# -------------------------------------------------------------------------------------------
metric: cosine
spec:
serverless:
cloud: aws
region: us-west-2
# For pod indexes you can pass the spec with the key "pod" instead of "serverless"
# See the example below:
# pod:
# environment: eu-west1-gcp
# pod_type: p1.x1
# # Here you can specify here replicas, shards, pods, metadata_config if needed.


1 change: 0 additions & 1 deletion docs/deployment-gcp.md
Original file line number Diff line number Diff line change
@@ -65,7 +65,6 @@ in [README.md](https://github.com/pinecone-io/canopy/blob/main/README.md).
```text
OPENAI_API_KEY={open-api-key}
PINECONE_API_KEY={pinecone-api-key}
PINECONE_ENVIRONMENT={pinecone-environment}
INDEX_NAME={index-name}
# Other necessary environment variables if needed
2 changes: 0 additions & 2 deletions docs/library.md
Original file line number Diff line number Diff line change
@@ -39,7 +39,6 @@ pip install canopy-sdk
import os

os.environ["PINECONE_API_KEY"] = "<PINECONE_API_KEY>"
os.environ["PINECONE_ENVIRONMENT"] = "<PINECONE_ENVIRONMENT>"
os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>"
```

@@ -52,7 +51,6 @@ os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>"
| Name | Description | How to get it? |
|-----------------------|-----------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `PINECONE_API_KEY` | The API key for Pinecone. Used to authenticate to Pinecone services to create indexes and to insert, delete and search data | Register or log into your Pinecone account in the [console](https://app.pinecone.io/). You can access your API key from the "API Keys" section in the sidebar of your dashboard |
| `PINECONE_ENVIRONMENT`| Determines the Pinecone service cloud environment of your index e.g `west1-gcp`, `us-east-1-aws`, etc | You can find the Pinecone environment next to the API key in [console](https://app.pinecone.io/) |
| `OPENAI_API_KEY` | API key for OpenAI. Used to authenticate to OpenAI's services for embedding and chat API | You can find your OpenAI API key [here](https://platform.openai.com/account/api-keys). You might need to login or register to OpenAI services |
</details>

17 changes: 6 additions & 11 deletions examples/canopy-lib-quickstart.ipynb
Original file line number Diff line number Diff line change
@@ -32,8 +32,8 @@
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.2.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip available: \u001B[0m\u001B[31;49m22.2.2\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.3.1\u001B[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n"
]
}
],
@@ -63,7 +63,6 @@
"import os\n",
"\n",
"os.environ[\"PINECONE_API_KEY\"] = os.environ.get('PINECONE_API_KEY') or 'YOUR_PINECONE_API_KEY'\n",
"os.environ[\"PINECONE_ENVIRONMENT\"] = os.environ.get('PINECONE_ENVIRONMENT') or 'PINECONE_ENVIRONMENT'\n",
"os.environ[\"OPENAI_API_KEY\"] = os.environ.get('OPENAI_API_KEY') or 'OPENAI_API_KEY'"
]
},
@@ -310,9 +309,8 @@
"outputs": [],
"source": [
"from canopy.knowledge_base import list_canopy_indexes\n",
"\n",
"if not any(name.endswith(INDEX_NAME) for name in list_canopy_indexes()):\n",
" kb.create_canopy_index(indexed_fields=[\"title\"])"
" kb.create_canopy_index()"
]
},
{
@@ -500,8 +498,7 @@
"document: ## Number of vectors\\n\\n\\nThe most important consideration in sizing is the [number of vectors](/docs/insert-data/) you plan on working with. As a rule of thumb, a single p1 pod can store approximately 1M vectors, while a s1 pod can store 5M vectors. However, this can be affected by other factors, such as dimensionality and metadata, which are explained below.\n",
"title: choosing-index-type-and-size\n",
"source: https://docs.pinecone.io/docs/choosing-index-type-and-size\n",
"score: 0.828785\n",
"\n"
"score: 0.828785\n"
]
}
],
@@ -539,8 +536,7 @@
"document: ## Retention\\n\\n\\nIn general, indexes on the Starter (free) plan are archived as collections and deleted after 7 days of inactivity; for indexes created by certain open source projects such as AutoGPT, indexes are archived and deleted after 1 day of inactivity. To prevent this, you can send any API request to Pinecone and the counter will reset.\\n\\nUpdated about 1 month ago \\n\\n\\n\\n---\\n\\n* [Table of Contents](#)\\n* + [Upserts](#upserts)\\n\t+ [Queries](#queries)\\n\t+ [Fetch and Delete](#fetch-and-delete)\\n\t+ [Namespaces](#namespaces)\\n\t+ [Pod storage capacity](#pod-storage-capacity)\\n\t+ [Metadata](#metadata)\\n\t+ [Retention](#retention)\n",
"title: limits\n",
"source: https://docs.pinecone.io/docs/limits\n",
"score: 0.71726948\n",
"\n"
"score: 0.71726948\n"
]
}
],
@@ -850,8 +846,7 @@
"document: This is first line\n",
"title: newline\n",
"source: example\n",
"score: 0.887627542\n",
"\n"
"score: 0.887627542\n"
]
}
],
9 changes: 6 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -10,12 +10,11 @@ packages = [{include = "canopy", from = "src"},
{include = "canopy_server", from = "src"},]

[tool.poetry.dependencies]
python = "^3.9"
pinecone-client = "^2.2.2"
python = ">=3.9,<3.13"
pinecone-client = "^3.0.0"
python-dotenv = "^1.0.0"
openai = "^1.2.3"
tiktoken = "^0.3.3"
pinecone-datasets = "^0.6.2"
pydantic = "^1.10.7"
pandas-stubs = "^2.0.3.230814"
fastapi = ">=0.92.0, <1.0.0"
@@ -33,11 +32,14 @@ pinecone-text = "^0.7.2"
tokenizers = "^0.15.0"
transformers = "^4.35.2"
sentencepiece = "^0.1.99"
pandas = "2.0.0"
pyarrow = "^14.0.1"
cohere = { version = ">=4.37", optional = true }

[tool.poetry.extras]
cohere = ["cohere"]


[tool.poetry.group.dev.dependencies]
jupyter = "^1.0.0"
pytest = "^7.3.2"
@@ -52,6 +54,7 @@ pytest-xdist = "^3.3.1"
types-requests = "^2.31.0.2"
httpx = "^0.25.0"
pydoclint = "^0.3.8"
pytest-dotenv = "^0.5.2"

[build-system]
requires = ["poetry-core"]
1 change: 0 additions & 1 deletion src/canopy/knowledge_base/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
from .knowledge_base import connect_to_pinecone
from .knowledge_base import list_canopy_indexes
from .knowledge_base import KnowledgeBase
Loading

0 comments on commit dec8f94

Please sign in to comment.