Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabling TCP keepalive to avoid connections getting closed #205

Open
traffetseder opened this issue May 23, 2018 · 12 comments
Open

enabling TCP keepalive to avoid connections getting closed #205

traffetseder opened this issue May 23, 2018 · 12 comments

Comments

@traffetseder
Copy link

traffetseder commented May 23, 2018

We ran into an issue with NAT gateway closing connections for long running queries that do not stream data back to the client. To fix this, we enabled TCP keepalive on the socket.

I haven't created a PR because this is linux specific code, but wanted to add it here to see if the community finds it useful.

See diff below.

-Thomas

--- a/vertica_python/vertica/connection.py
+++ b/vertica_python/vertica/connection.py
@@ -111,6 +112,20 @@ class Connection(object):
         self.transaction_status = None
         self.socket = None

+    def set_keepalive_linux(self, sock, after_idle_sec=60, interval_sec=60, max_fails=10):
+        """Set TCP keepalive on an open socket.
+        It activates after after_idle_sec of idleness,
+        then sends a keepalive ping once every interval_sec,
+        and closes the connection after max_fails failed ping ()
+        """
+        sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
+        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, after_idle_sec)
+        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, interval_sec)
+        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, max_fails)
+
     def _socket(self):
         if self.socket is not None:
             return self.socket
@@ -121,9 +136,13 @@ class Connection(object):
         raw_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
         raw_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
         if connection_timeout is not None:
             raw_socket.settimeout(connection_timeout)
         raw_socket.connect((host, port))

+        self.set_keepalive_linux(sock=raw_socket, after_idle_sec=60, interval_sec=60, max_fails=10)
+
         ssl_options = self.options.get('ssl')
         if ssl_options is not None and ssl_options is not False:
             from ssl import CertificateError, SSLError
@ericwoodall
Copy link

ericwoodall commented Feb 18, 2021

I think I am having exactly this issue but I cannot tell. I have a long running query and after ~5 minutes, it stops with a failure:
Unhandled error while executing query
[Errno 32] Broken pipe
Using vertica-python in --> dbt-vertica package.

@sitingren
Copy link
Member

@ericwoodall It's hard to tell the problem from your description, here are some of directions for debug:

The latest vertica-python already set TCP keepalive as default: Source code, except you cannot config keepalive ping parameters (TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT) like the code above. Therefore TCP keepalive should not be the root of your problem if you are using latest version of vertica-python. In addition, vertica-python provides connection_timeout as a connection parameter to control socket operations timeout (see README), queries can be affected by this. Please note that dbt-vertica is a third-party library that we don't support. If you want to set connection_timeout, you may need to change your libraries source code.

On the other hand, a Broken Pipe error could mean a lot of things. It is possible that your Vertica server closes this connection. So I think it's necessary to check the query itself as well.

@ericwoodall
Copy link

@sitingren thanks for the reply! I feel confident that it is not the query because I can execute the same query in DBViz and it returns data. I am going to install the latest version of both vertica-python and dbt-vertica and see if this fixes it. Will report back. Thanks again!

@ericwoodall
Copy link

@sitingren I installed the latest version of vertica-python (1.0.1) and I am still getting the same broken pipe error. Also, I verified that it is not the query, as I did a table insert [using the same query] and it succeeded.

@sitingren
Copy link
Member

@ericwoodall Is DBViz also using vertica-python? Have you toggled connection_timeout setting in dbt-vertica? Since there is no traceback message or log to look at, I think running the query on vertica-python directly might narrow down the problem to be whether on dbt-vertica or vertica-python side.

@ericwoodall
Copy link

@sitingren , no DBViz is not using vertica-python but the query returns fine so it is not a server misconfiguration. I am in the process of creating a test python script that uses vertica-python directly. Will report back.

@C-h-e-r-r-y
Copy link

+1 for Connection reset and [Errno 32] Broken pipe - same issue for 5+ minutes request to vertica from python code (not Dbt). Really irritating issue. Same code with different SQL works.

I am not expert but connection_timeout uses different method:

raw_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
raw_socket.settimeout(connection_timeout)

I have a code that have being broken for 1 year due to this error! Any ideas how set socket.SO_KEEPALIVE?

@sitingren
Copy link
Member

@C-h-e-r-r-y What vertica-python version are you using? Have you checked out your server configs?
In https://docs.vertica.com/12.0.x/en/sql-reference/config-parameters/general-parameters/, you can see parameters such as KeepAliveIdleTime, KeepAliveProbeCount, KeepAliveProbeInterval.

@C-h-e-r-r-y
Copy link

@sitingren
Many thanks for quick reply!!!! I have tried different versions: 0.9.2, 1.2+ and 1.3.0

Vertica sever version is Vertica Analytic Database v9.2.0-0

SHOW CURRENT ALL; give a lot of values but KeepAliveIdleTime, KeepAliveProbeCount, KeepAliveProbeInterval - are not present. :(

About serverside configs - good idea but I do not know what to try :(

@C-h-e-r-r-y
Copy link

C-h-e-r-r-y commented Apr 22, 2023

@sitingren
Pleae note that connection for same SQL request works from Intellij Idea. E.g. looks like something wrong with python library or settings. Intellij Idea is run from Windows while python from Docker

@sitingren
Copy link
Member

@C-h-e-r-r-y Please use the latest vertica-python version if possible.
Have you set 'connection_timeout' in your connection options?

You can inspect the open socket once you create a Connection object:

import vertica_python
conn_info = {'host': 'xxx', 'user': 'xxx', ..., 'connection_timeout': 10}

with vertica_python.connect(**conn_info) as conn:
  print(conn.socket.gettimeout())  # set by 'connection_timeout'
  print(conn.socket.getsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE))  # should be 1
  print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE))
  print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL))
  print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT))

@C-h-e-r-r-y
Copy link

C-h-e-r-r-y commented Apr 25, 2023

My steps and result of investigations:

  1. I have tried the 1.3.0 but connection_timeout or read_timeout did not help me.
  2. I have setup odbc and result was the same.
  3. I have setup jdbc via jaydebeapi and got more or less same behavior with timeout errors.
  4. Next I run python code (vertica-python version) in Docker with --sysctl net.ipv4.tcp_keepalive_time=120 --sysctl net.ipv4.tcp_keepalive_intvl=60 --sysctl net.ipv4.tcp_keepalive_probes=6 --sysctl net.ipv4.tcp_fin_timeout=30 and after that I got my code working.

I have not yet investigated this in detail. The only what I can say now is connection_timeout and read_timeout are "ignored" and to set up timeouts you have to set them on OS level.

Your post about KeepAliveIdleTime, KeepAliveProbeCount, KeepAliveProbeInterval - gave me right direction but set them from client's side. Again many thanks for this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants