This folder houses the initial set of sample code for illustrating the basic usages of the Haystack-AstraDB integration
Details of the examples are described in this DataStax article
File name | Description |
---|---|
haystack_astra_retrieve.py | The example that illustrates the Q&A usage scenario of how to retieve documents from Astra DB using the Haystack pipeline |
haystack_astra_store.py | The example that illustrates how to store documents into Astra DB using the Haystack pipeline |
haystack_astra_utils.py | A utility file that primarily takes care of setting up the environment |
requirements.txt | List out the dependencies for this set of example files |
README.md | This file |
Ensure that you have the following required dependency packages installed
pip install -U astra-haystack sentence-transformers
Ensure also that you have 'cmake' installed, and if not, one of the ways to install it is to use Homebrew, as follows:
brew install cmake
Set the following environment variables - e.g. on MacOS and most Linux shell variants:
export ASTRA_DB_ID=<Your Astra DB ID>
export ASTRA_API_ENDPOINT=<Your database API endpoint>
export ASTRA_DB_KEYSPACE_NAME=<Your database keyspace name>
export ASTRA_DB_APPLICATION_TOKEN=<Your application token>
export ASTRA_DB_COLLECTION_NAME=<The database collection name>
export OPENAI_API_KEY=<Your OpenAI API KEY>
Clone this repo
git clone https://github.com/mgrygles-demo/haystack_astra_blog_example
Go to the directory
cd ./haystack_astra_blog_example
First, store the documents into Astra DB by executing the following on the command line:
python3 haystack_astra_store.py
When you examine the output in the console, you should see that 3 documents have been successfully added:
....
INFO:haystack.core.pipeline.pipeline:Pipeline executed successfully.
3
Next, retrieve the document from Astra DB by running the following example script, which would be prompting with a question that asks about the number of languages that are spoken in the world today, and you should see the console output as well as the response document coming back in dictionary format, similar to this:
INFO:__main__:{'answer_builder': {'answers': [GeneratedAnswer(data='There are over 7,000 languages spoken around the world today.', query='How many languages are there in the world today?', documents=[Document(id=cfe93bc1c274908801e6670440bf2bbba54fad792770d57421f85ffa2a4fcc94, content: 'There are over 7,000 languages spoken around the world today.', score: 0.9267925, embedding: vector of size 384)], meta={'model': 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 14, 'prompt_tokens': 54, 'total_tokens': 68}})]}}
{'answer_builder': {'answers': [GeneratedAnswer(data='There '
'are '
'over '
'7,000 '
'languages '
'spoken '
'around '
'the '
'world '
'today.',
query='How '
'many '
'languages '
'are '
'there '
'in '
'the '
'world '
'today?',
documents=[Document(id=cfe93bc1c274908801e6670440bf2bbba54fad792770d57421f85ffa2a4fcc94, content: 'There are over 7,000 languages spoken around the world today.', score: 0.9267925, embedding: vector of size 384)],
meta={'finish_reason': 'stop',
'index': 0,
'model': 'gpt-3.5-turbo-0613',
'usage': {'completion_tokens': 14,
'prompt_tokens': 54,
'total_tokens': 68}})]}}
Hint
If you encounter a runtime error, such as an ImportError that indicates an unreachable haystack library, try setting up a venv, which should then be able to isolate your particular environment from the "base", and eliminate any potential conflict in not being able to locate the library. Here's a suggestion as to how you can set your venv up in a few steps:
-
Create your venv (in a .venv sub-folder inside your project folder)
python3 -m <your_project_folder>/.venv
-
Activate your venv (bash/zsh)
cd <your_project_folder>/.venv source bin/activate
Other shell environments have the following usages:
Platform | Shell | Command to activate virtual environment |
---|---|---|
POSIX | bash/zsh | $ source <your_venv_dir>/bin/activate |
fish | $ source <your_venv_dir>/bin/activate.fish | |
csh/tcsh | $ source <your_venv_dir>/bin/activate.csh | |
PowerShell | $ <your_venv_dir>/bin/Activate.ps1 | |
Windows | cmd.exe | C:\> <your_venv_dir>\Scripts\activate.bat |
PowerShell | PS C:\> <your_venv_dir>\Scripts\Activate.ps1 |
-
Deactivate your venv on the command-line (**When you are done with your work)
deactivate
-
Remove your venv (**Caution: Only when you no longer need your work)
rm -rf <your_venv_dir>