Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG Vector not working #659

Open
edlouth opened this issue Oct 1, 2024 · 11 comments · Fixed by #660
Open

PG Vector not working #659

edlouth opened this issue Oct 1, 2024 · 11 comments · Fixed by #660
Labels
bug Something isn't working

Comments

@edlouth
Copy link
Contributor

edlouth commented Oct 1, 2024

Describe the bug
I have setup pg_vector like so:

from vanna.pgvector import PG_VectorStore
from vanna.openai import OpenAI_Chat

class CustomVanna(PG_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        PG_VectorStore.__init__(self, config=config)
        OpenAI_Chat.__init__(self, config=config)

vn = CustomVanna(config={
    "api_key": openai_api_key,
    "connection_string": connection_string
})

# The information schema query may need some tweaking depending on your database. This is a good starting point.
df_information_schema = vn.run_sql("SELECT * FROM `...demo.INFORMATION_SCHEMA.COLUMNS`;")

# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)

vn.train(plan=plan)

I get the following error:

AttributeError: 'CustomVanna' object has no attribute 'documentation_collection'

If I run:

vn.ask(question="How many users are there?")

I get an error of object of type 'coroutine' has no len()

@edlouth edlouth added the bug Something isn't working label Oct 1, 2024
@edlouth
Copy link
Contributor Author

edlouth commented Oct 1, 2024

@andreped I thought I'd mention you as it looks like you are the author of these changes.

@andreped
Copy link
Contributor

andreped commented Oct 1, 2024

Hello, @edlouth :] Thank you for reporting the bug!

I was surprised the PR I made was merged so quickly. I don't think we did thorough testing on it, especially not integration tests.
We should aim to fix this before the new release is out.

But corutine ha no len() is likely because of a missing await or accidentally making something async which shouldn't be.

I can draft a PR on this today :]

@edlouth edlouth mentioned this issue Oct 1, 2024
@edlouth
Copy link
Contributor Author

edlouth commented Oct 1, 2024

@andreped thanks for getting back so soon.

I have opened a PR #660 which I think has the changes in question.

@andreped
Copy link
Contributor

andreped commented Oct 1, 2024

I have opened a PR #660 which I think has the changes in question.

OK, great! Then I can review and test it on my local setup :]


EDIT: Yeah, your proposed changes makes sense. I have an async implementation of this for another project, and I think I just mixed the two, as this implementation currently have to remain sync.

@edlouth
Copy link
Contributor Author

edlouth commented Oct 1, 2024

While we are here I have also had some issues with

self.embedding_function = SentenceTransformer("sentence-transformers/all-MiniLM-l6-v2")

I think this works

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

# Define a custom embedding class with the necessary methods
class CustomEmbeddingFunction:
        def __init__(self, model):
                self.model = model

        def embed_documents(self, texts):
                # Return embeddings for documents
                return self.model.encode(texts, convert_to_tensor=False)

        def embed_query(self, text):
                # Return embedding for a query
                return self.model.encode([text], convert_to_tensor=False)[0]
            
        self.embedding_function = CustomEmbeddingFunction(model)

@andreped
Copy link
Contributor

andreped commented Oct 1, 2024

I think this works

Again, we do exactly this for another project :P But if I recall correctly, you should be able to provide your own custom embedding function through config, so we do not need to do any code changes for that, or?

@VirendraSttl
Copy link

Hi @edlouth, could you please guide me on how to utilize vanna.pgvector? I attempted to install Vanna with pgvector using
pip install 'vanna[pgvector]'
but encountered an issue where Vanna 0.7.3 does not offer the extra 'pgvector'.
I intended to utilize my local database (pgvector) for vector storage.

@andreped
Copy link
Contributor

andreped commented Oct 8, 2024

Hi @edlouth, could you please guide me on how to utilize vanna.pgvector? I attempted to install Vanna with pgvector using pip install 'vanna[pgvector]' but encountered an issue where Vanna 0.7.3 does not offer the extra 'pgvector'. I intended to utilize my local database (pgvector) for vector storage.

Hello, @VirendraSttl :]

There has yet to be made a release including the new pgvector support, so you can't use it like this.

A way to install the latest, could be to do something like this below:

pip install git+https://github.com/vanna-ai/vanna.git#egg=vanna[pgvector]

Then again, as this seems broken right now, and has been addressed in the PR by @edlouth, but has yet to be merged, I would install Vanna through the following to get proper pgvector support:

pip install git+https://github.com/edlouth/vanna.git@pgvector_fixes#egg=vanna[pgvector]

At least something like that should work.

@isaacdalmarco
Copy link

Hi @andreped, I am trying to use pgvector as a custom vector database and I have copied the pgvector to my project.
But when it starts i get this error:

../.pyenv/versions/3.12.6/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 106, in _get_embedding_collection_store
from pgvector.sqlalchemy import Vector # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'pgvector.sqlalchemy'; 'pgvector' is not a package

I tried to run in python 3.12 and 3.11, changed pgvector and langchain_postgres versions, but none worked.

Which version of python are you using? Do you have any other suggestion?

@andreped
Copy link
Contributor

andreped commented Oct 11, 2024

@isaacdalmarco I think we should wait till PR #660 is merged, as the code in the main branch for the pgvector implementation is broken.

Then we can try to see how to fix this issue of yours, if it is still an issue after merge.

@zainhoda zainhoda linked a pull request Oct 23, 2024 that will close this issue
@andreped
Copy link
Contributor

@edlouth, @isaacdalmarco A new release of vanna==0.7.4 was just released (see here), which should have resolved the original issue and maybe the one @isaacdalmarco is having.

Could you try to upgrade to the latest version and test if this new version works well with you? :]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants