Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: 🚑 Fix connection to the db failing when db_reset=True #34

Merged
merged 6 commits into from
Aug 5, 2024

Conversation

albireox
Copy link
Member

@albireox albireox commented Aug 1, 2024

Attempt at fixing #32. The problem seems to be related to the ContextVar for _state and the fact that the previous change removed the fastapi dependency.

@albireox albireox requested a review from havok2063 as a code owner August 1, 2024 03:18
@havok2063
Copy link
Contributor

havok2063 commented Aug 1, 2024

This does seem to fix it. I've tested a few valis routes, the cone search and db info search, and they worked. I also tested the zora search submission, which works. However the target page from zora doesn't work, and throws the same error. This may be related to the concurrent queries from the Target page. I'd like to preserve that if possible since it speeds up page loading.

As for the ContextVar thing, we may not need this anymore. I was originally following this FastAPI guide on how to implement peewee with async. Since then, they've deprecated support for peewee since it doesn't play well with async, and suggest using sqlalchemy. https://fastapi.tiangolo.com/how-to/sql-databases-peewee/?h=peewee

Comment on lines +22 to 30
""" Sub-dependency for get_db that resets the context connection state """

from valis.main import settings

if settings.db_reset:
pdb._state._state.set(db_state_default.copy())
pdb._state.reset()


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in development mode, wrapping this back into a Dependency makes the first query slow since that's when it's first establishing the connection, but then all subsequent queries are fast. I think this is ok.

@havok2063
Copy link
Contributor

havok2063 commented Aug 1, 2024

Another question is should we just make the default be db_reset: false? This would also fix the concurrent issue, but would, I think, essentially create a global db connection, which I'm not sure is what we want. FastAPI recommends creating a new connection per request, thus wrapping it up into a Dependency, but maybe we don't care about that?

@albireox
Copy link
Member Author

albireox commented Aug 1, 2024

Ok, I'll have a look at that issue. I think right now if db_reset=True the code should be almost identical to before the original change, but I must have missed something.

I think by default db_reset should always be True, see

db_reset: bool = True
We can discuss whether we want to change that. I think it's safer to create a new connection for each request; the only problem is that creating a new connection triggers a new reflection of the catalogdb tables, which causes a non-negligible overhead. That could make quick queries to take several times what they should due to the reflection, which anyway is not needed. But I think there are ways to prevent that (or we can add options to sdssdb).

@albireox
Copy link
Member Author

albireox commented Aug 3, 2024

OK, can you give this one more try after pulling? Note that I have also update sdssdb to 0.12.3. This is not strictly necessary for this issue but 0.12.2 would cause reflection of catalogdb in pipelines to be extremely slow because of all the missing tables.

I cannot say I fully understand the issue yet, but there seemed to be two additional problems:

  • The reset_db_state depend does need to be an async function or the ContextVar doesn't seem to work properly.
  • When db_reset=True, if Zora requests multiple routes concurrently when the DB has not yet been initialised there would be a race condition error because multiple instances would try to initialise the DB at the same time. So I added an asyncio.Lock to prevent multiple connection attempts at the same time. This works fine with uvicorn in development mode but not with gunicorn; I suspect it's a limitation of the UvicornWorker for gunicorn. Following this I added a timeout to the wsgi configuration.

With these changes the Zora target page works for me in all cases (uvicorn with and without db_reset, and the same with gunicorn in production).

But this is looking like a serious hack at some point. In a different issue we should consider alternatives, but I'll write a couple ideas here for now:

  • We can remove the use of peewee if it doesn't play well with FastAPI and async and replace it with SQLAlchemy 2 (or maybe SQLModel if we want to stay in the ecosystem). This may be less of an issue now that catalogdb is quite stable and maybe we don't need reflection.
  • If we continue using peewee we should consider a persistent DB connection (like db_reset=False does). I think that would be ok since peewee reconnects if the connection has been lost. We could also have the DB connect to the server during the app initialisation so that at that point in which a request is processed there are no more issues with multiple connections because there is a single instance of the connection.

@albireox
Copy link
Member Author

albireox commented Aug 5, 2024

The tests are failing right now but I think that will be fixed by merging #36 first.

Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me testing with prod gunicorn and dev uvicorn. I think this is good for now for the betas. I agree that this could be hacky, and we can see how it performs during the beta and performance testing.

I like the idea of exploring switching to sqlalchemy2. In sdssdb, there's some general migration work needed to bump sqlalchemy >2, plus testing with the new query syntax. And new work to implement the Async options. I started a branch of that a while back but it needs revisiting.

I played around with SQLModel a bit early on, but haven't come back to it lately. My main issue with it was there didn't seem to be a way to wrap existing SQLA Models and convert them to SQLModels. I didn't want to redefine what we have in sdssdb. I like its idea of combining the FastAPI response model with the db model, since currently I have to define new very similar classes.

@albireox
Copy link
Member Author

albireox commented Aug 5, 2024

Sounds good, let's merge this for now and look at long-term solutions.

@albireox albireox merged commit 5fbd882 into main Aug 5, 2024
2 checks passed
@albireox albireox deleted the albireox/fix-reset-state branch August 5, 2024 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants