Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database configuration and max connections #17

Closed
scottyhq opened this issue Jul 13, 2021 · 8 comments
Closed

Database configuration and max connections #17

scottyhq opened this issue Jul 13, 2021 · 8 comments

Comments

@scottyhq
Copy link
Member

scottyhq commented Jul 13, 2021

It seems like we can currently only have a limited number of simultaneous connections (not sure exactly how many or where this configuration lives). cc @micahjohnson150 @jomey @lsetiawan if you want to dig in.

the EC2 config is here https://github.com/snowex-hackweek/jupyterhub/blob/main/terraform/eks/ec2_postgres.tf and the actual database setup is probably documented over in https://snowexsql.readthedocs.io/en/latest/

from snowexsql.db import get_db
db_name = 'snow:[email protected]/snowex'
engine, session = get_db(db_name)
engine.table_names()
OperationalError: (psycopg2.OperationalError) FATAL:  remaining connection slots are reserved for non-replication superuser connections

(Background on this error at: http://sqlalche.me/e/13/e3q8)
Full Traceback
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _wrap_pool_connect(self, fn, connection)
   2335         try:
-> 2336             return fn()
   2337         except dialect.dbapi.Error as e:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in connect(self)
    363         if not self._use_threadlocal:
--> 364             return _ConnectionFairy._checkout(self)
    365 

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in _checkout(cls, pool, threadconns, fairy)
    777         if not fairy:
--> 778             fairy = _ConnectionRecord.checkout(pool)
    779 

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in checkout(cls, pool)
    494     def checkout(cls, pool):
--> 495         rec = pool._do_get()
    496         try:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/impl.py in _do_get(self)
    139                 with util.safe_reraise():
--> 140                     self._dec_overflow()
    141         else:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py in __exit__(self, type_, value, traceback)
     67             if not self.warn_only:
---> 68                 compat.raise_(
     69                     exc_value,

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
    181         try:
--> 182             raise exception
    183         finally:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/impl.py in _do_get(self)
    136             try:
--> 137                 return self._create_connection()
    138             except:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in _create_connection(self)
    308 
--> 309         return _ConnectionRecord(self)
    310 

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __init__(self, pool, connect)
    439         if connect:
--> 440             self.__connect(first_connect_check=True)
    441         self.finalize_callback = deque()

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check)
    660             with util.safe_reraise():
--> 661                 pool.logger.debug("Error on connect(): %s", e)
    662         else:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py in __exit__(self, type_, value, traceback)
     67             if not self.warn_only:
---> 68                 compat.raise_(
     69                     exc_value,

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
    181         try:
--> 182             raise exception
    183         finally:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check)
    655             self.starttime = time.time()
--> 656             connection = pool._invoke_creator(self)
    657             pool.logger.debug("Created new connection %r", connection)

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py in connect(connection_record)
    113                             return connection
--> 114                 return dialect.connect(*cargs, **cparams)
    115 

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/default.py in connect(self, *cargs, **cparams)
    507         # inherits the docstring from interfaces.Dialect.connect
--> 508         return self.dbapi.connect(*cargs, **cparams)
    509 

/srv/conda/envs/notebook/lib/python3.8/site-packages/psycopg2/__init__.py in connect(dsn, connection_factory, cursor_factory, **kwargs)
    121     dsn = _ext.make_dsn(dsn, **kwargs)
--> 122     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
    123     if cursor_factory is not None:

OperationalError: FATAL:  remaining connection slots are reserved for non-replication superuser connections


The above exception was the direct cause of the following exception:

OperationalError                          Traceback (most recent call last)
<ipython-input-2-cf50d0573950> in <module>
      1 # Output the list of tables in the database
----> 2 engine.table_names()

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/base.py in table_names(self, schema, connection)
   2314         """
   2315 
-> 2316         with self._optional_conn_ctx_manager(connection) as conn:
   2317             return self.dialect.get_table_names(conn, schema)
   2318 

/srv/conda/envs/notebook/lib/python3.8/contextlib.py in __enter__(self)
    111         del self.args, self.kwds, self.func
    112         try:
--> 113             return next(self.gen)
    114         except StopIteration:
    115             raise RuntimeError("generator didn't yield") from None

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _optional_conn_ctx_manager(self, connection)
   2084     def _optional_conn_ctx_manager(self, connection=None):
   2085         if connection is None:
-> 2086             with self._contextual_connect() as conn:
   2087                 yield conn
   2088         else:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _contextual_connect(self, close_with_result, **kwargs)
   2300         return self._connection_cls(
   2301             self,
-> 2302             self._wrap_pool_connect(self.pool.connect, None),
   2303             close_with_result=close_with_result,
   2304             **kwargs

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _wrap_pool_connect(self, fn, connection)
   2337         except dialect.dbapi.Error as e:
   2338             if connection is None:
-> 2339                 Connection._handle_dbapi_exception_noconnection(
   2340                     e, dialect, self
   2341                 )

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception_noconnection(cls, e, dialect, engine)
   1581             util.raise_(newraise, with_traceback=exc_info[2], from_=e)
   1582         elif should_wrap:
-> 1583             util.raise_(
   1584                 sqlalchemy_exception, with_traceback=exc_info[2], from_=e
   1585             )

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
    180 
    181         try:
--> 182             raise exception
    183         finally:
    184             # credit to

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _wrap_pool_connect(self, fn, connection)
   2334         dialect = self.dialect
   2335         try:
-> 2336             return fn()
   2337         except dialect.dbapi.Error as e:
   2338             if connection is None:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in connect(self)
    362         """
    363         if not self._use_threadlocal:
--> 364             return _ConnectionFairy._checkout(self)
    365 
    366         try:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in _checkout(cls, pool, threadconns, fairy)
    776     def _checkout(cls, pool, threadconns=None, fairy=None):
    777         if not fairy:
--> 778             fairy = _ConnectionRecord.checkout(pool)
    779 
    780             fairy._pool = pool

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in checkout(cls, pool)
    493     @classmethod
    494     def checkout(cls, pool):
--> 495         rec = pool._do_get()
    496         try:
    497             dbapi_connection = rec.get_connection()

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/impl.py in _do_get(self)
    138             except:
    139                 with util.safe_reraise():
--> 140                     self._dec_overflow()
    141         else:
    142             return self._do_get()

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py in __exit__(self, type_, value, traceback)
     66             self._exc_info = None  # remove potential circular references
     67             if not self.warn_only:
---> 68                 compat.raise_(
     69                     exc_value,
     70                     with_traceback=exc_tb,

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
    180 
    181         try:
--> 182             raise exception
    183         finally:
    184             # credit to

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/impl.py in _do_get(self)
    135         if self._inc_overflow():
    136             try:
--> 137                 return self._create_connection()
    138             except:
    139                 with util.safe_reraise():

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in _create_connection(self)
    307         """Called by subclasses to create a new ConnectionRecord."""
    308 
--> 309         return _ConnectionRecord(self)
    310 
    311     def _invalidate(self, connection, exception=None, _checkin=True):

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __init__(self, pool, connect)
    438         self.__pool = pool
    439         if connect:
--> 440             self.__connect(first_connect_check=True)
    441         self.finalize_callback = deque()
    442 

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check)
    659         except Exception as e:
    660             with util.safe_reraise():
--> 661                 pool.logger.debug("Error on connect(): %s", e)
    662         else:
    663             if first_connect_check:

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py in __exit__(self, type_, value, traceback)
     66             self._exc_info = None  # remove potential circular references
     67             if not self.warn_only:
---> 68                 compat.raise_(
     69                     exc_value,
     70                     with_traceback=exc_tb,

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
    180 
    181         try:
--> 182             raise exception
    183         finally:
    184             # credit to

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check)
    654         try:
    655             self.starttime = time.time()
--> 656             connection = pool._invoke_creator(self)
    657             pool.logger.debug("Created new connection %r", connection)
    658             self.connection = connection

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py in connect(connection_record)
    112                         if connection is not None:
    113                             return connection
--> 114                 return dialect.connect(*cargs, **cparams)
    115 
    116             creator = pop_kwarg("creator", connect)

/srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/default.py in connect(self, *cargs, **cparams)
    506     def connect(self, *cargs, **cparams):
    507         # inherits the docstring from interfaces.Dialect.connect
--> 508         return self.dbapi.connect(*cargs, **cparams)
    509 
    510     def create_connect_args(self, url):

/srv/conda/envs/notebook/lib/python3.8/site-packages/psycopg2/__init__.py in connect(dsn, connection_factory, cursor_factory, **kwargs)
    120 
    121     dsn = _ext.make_dsn(dsn, **kwargs)
--> 122     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
    123     if cursor_factory is not None:
    124         conn.cursor_factory = cursor_factory

OperationalError: (psycopg2.OperationalError) FATAL:  remaining connection slots are reserved for non-replication superuser connections

(Background on this error at: http://sqlalche.me/e/13/e3q8)
@jomey
Copy link
Member

jomey commented Jul 13, 2021

A starting point is the rolconnlimit in the pg_roles table
https://www.postgresql.org/docs/current/view-pg-roles.html

Check what value is currently set in there.

@scottyhq
Copy link
Member Author

Not sure how the JupyterHub on k8s affects things here. Note that each node has a unique internal IP (hostname -I | awk '{print $1}' --> 172.AA.AAA.AA) and each pod has a unique internal IP (172.BB.BBB.BB), but everyone seems to have the same external IP (curl https://ipinfo.io/ip --> 44.CCC.CC.CCC) regardless of where their server is running.

@micahjohnson150
Copy link
Member

I think what @jomey has pointed out here is what is going one. Something I didn't think of.

@micahjohnson150
Copy link
Member

Well I checked what the user had for max connections it was unlimited.

As in the name of this issue @scottyhq made, the postgres config has a max_connection of.... 100 people. I just changed it to 500

@jomey
Copy link
Member

jomey commented Jul 13, 2021

I think the connections != people.

It's a notebook that establishes a connection, so if you have 70 working on 4 parallel notebooks, then you will have 280 connections unless they shut down and restart their kernel

@jomey
Copy link
Member

jomey commented Jul 13, 2021

I am not certain on this one and I doubt it exists, but maybe there is a 'idle_timeout' at which point SQLAlchemy shuts down the connection and re-connects if it is used again

@micahjohnson150
Copy link
Member

Yep per our convo, I just set the tcp_keepalives_idle to 900 seconds (15 minutes).

@jomey
Copy link
Member

jomey commented Jul 14, 2021

I am closing this issue in favor of continuing with the logged SnowExSql issue

@jomey jomey closed this as completed Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants