Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS Handshake failures with Go 1.23 #2296

Open
dan-lind opened this issue Sep 12, 2024 · 22 comments
Open

TLS Handshake failures with Go 1.23 #2296

dan-lind opened this issue Sep 12, 2024 · 22 comments
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@dan-lind
Copy link

dan-lind commented Sep 12, 2024

Bug Description

When attempting to connect through the proxy, I'm getting connection reset by peer

Starting the proxy is fine

2024/09/12 08:58:14 Authorizing with Application Default Credentials
...
2024/09/12 08:58:15 [redacted-prod:europe-west1:redacted-live-773ed67c] Listening on /tmp/redacted-prod:europe-west1:redacted-live-773ed67c
2024/09/12 08:58:15 [redacted-prod:europe-west1:redacted-live-1c3b6da9] Listening on /tmp/redacted-prod:europe-west1:redacted-live-1c3b6da9
2024/09/12 08:58:15 [redacted-prod:europe-west1:redacted-live-6aab2946] Listening on /tmp/redacted-prod:europe-west1:redacted-live-6aab2946
2024/09/12 08:58:15 [redacted-prod:europe-west1:redacted-live-23a55252] Listening on /tmp/redacted-prod:europe-west1:redacted-live-23a55252
2024/09/12 08:58:15 The proxy has started successfully and is ready for new connections!
2024/09/12 08:58:47 [redacted-prod:europe-west1:redacted-live-156c5af9] Accepted connection from

but when I attempt to make the actual connection, I get

2024/09/12 08:58:47 [redacted-up-prod:europe-west1:redacted-live-6aab2946] failed to connect to instance: Dial error: handshake failed (connection name = "redacted-up-prod:europe-west1:redacted-live-6aab2946"): read tcp 172.22.164.170:53244->redacted:3307: read: connection reset by peer

In the client (Datagrip 2024.2.1) I see the error Socket is not connected.
On the CloudSQL side, I can also see a warning log in Logs explorer like 1893634 [Warning] [MY-000000] [Server] Failed to connect to mysql: 1893634. Server-side error number: 1159

Stacktrace

Output from tcpdump


tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes
08:58:47.153440 IP 172.22.164.170.53244 > redacted.3307: Flags [S], seq 2094967633, win 65535, options [mss 1360,nop,wscale 6,nop,nop,TS val 1730041868 ecr 0,sackOK,eol], length 0
08:58:47.216068 IP redacted.3307 > 172.22.164.170.53244: Flags [S.], seq 542659392, ack 2094967634, win 32384, options [mss 1332,sackOK,TS val 3082097024 ecr 1730041868,nop,wscale 7], length 0
08:58:47.216124 IP 172.22.164.170.53244 > redacted.3307: Flags [.], ack 1, win 2062, options [nop,nop,TS val 1730041931 ecr 3082097024], length 0
08:58:47.216806 IP 172.22.164.170.53244 > redacted.3307: Flags [.], seq 1:1321, ack 1, win 2062, options [nop,nop,TS val 1730041931 ecr 3082097024], length 1320
08:58:47.216810 IP 172.22.164.170.53244 > redacted.3307: Flags [P.], seq 1321:1486, ack 1, win 2062, options [nop,nop,TS val 1730041931 ecr 3082097024], length 165
08:58:47.264582 IP redacted.3307 > 172.22.164.170.53244: Flags [R.], seq 1, ack 1321, win 2062, length 0

Steps to reproduce?

This happens every time I try connecting though the terminal

Environment

  1. OS type and version: macos 14.6.1
  2. Cloud SQL Proxy version 2.13
  3. Proxy invocation command: cloud-sql-proxy --config-file ~/cloudsql/cloudsql-config.toml

Content of cloudsql-config.toml

#CloudSQL proxy config
unix-socket = "/tmp"
private-ip = true

...
instance-connection-name-30 = "redacted-prod:europe-west1:redacted-live-156c5af9"
instance-connection-name-31 = "redacted-prod:europe-west1:redacted-live-773ed67c"
instance-connection-name-32 = "redacted-prod:europe-west1:redacted-live-1c3b6da9"
instance-connection-name-33 = "redacted-prod:europe-west1:redacted-live-6aab2946"
instance-connection-name-34 = "redacted-prod:europe-west1:redacted-live-23a55252"

Additional Details

This happens only when I setup the proxy using the terminal. If I use Intellij/Datagrip, and use the built in MySQL CloudSql proxy driver, I can connect without issues.

@dan-lind dan-lind added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Sep 12, 2024
@jackwotherspoon jackwotherspoon added the priority: p3 Desirable enhancement or fix. May not be included in next release. label Sep 12, 2024
@jackwotherspoon
Copy link
Collaborator

jackwotherspoon commented Sep 12, 2024

Hi @dan-lind thanks for raising an issue on the Cloud SQL Proxy!

The Cloud SQL Proxy forwards local traffic to port 3307 of your Cloud SQL instance. Because of this you will need to update your VPC network firewall rules potentially, they should allow TCP egress connection on port 3307 of your Cloud SQL instance private IP address. Can you see if this is maybe the cause of your error?

From our docs:

While the Cloud SQL Auth Proxy can listen on any port, it creates outgoing or egress connections to your Cloud SQL instance IP address on port 3307. If your client machine has an outbound firewall policy, make sure it allows outgoing connections to port 3307 on your Cloud SQL instance's IP.

Let me know as this is a common cause for the issue you are seeing 😄

@enocom
Copy link
Member

enocom commented Sep 12, 2024

Am I reading your config correctly that you're connecting to 34 instances?

@dan-lind
Copy link
Author

Hi @dan-lind thanks for raising an issue on the Cloud SQL Proxy!

The Cloud SQL Proxy forwards local traffic to port 3307 of your Cloud SQL instance. Because of this you will need to update your VPC network firewall rules potentially, they should allow TCP egress connection on port 3307 of your Cloud SQL instance private IP address. Can you see if this is maybe the cause of your error?

From our docs:

While the Cloud SQL Auth Proxy can listen on any port, it creates outgoing or egress connections to your Cloud SQL instance IP address on port 3307. If your client machine has an outbound firewall policy, make sure it allows outgoing connections to port 3307 on your Cloud SQL instance's IP.

Let me know as this is a common cause for the issue you are seeing 😄

Hi! Like I wrote in "Additional details", connecting using the MySQL CloudSQL proxy in Datagrip works without any issues. Another tcpdump on that traffic shows that this connection also happens on 3307.

tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes
18:44:01.730197 IP 172.22.164.170.52213 > redacted.3307: Flags [S], seq 308421121, win 65535, options [mss 1360,nop,wscale 6,nop,nop,TS val 1740779703 ecr 0,sackOK,eol], length 0
18:44:01.776055 IP redacted.3307 > 172.22.164.170.52213: Flags [S.], seq 2473669167, ack 308421122, win 32384, options [mss 1332,sackOK,TS val 3117211483 ecr 1740779703,nop,wscale 7], length 0
18:44:01.776115 IP 172.22.164.170.52213 > redacted.3307: Flags [.], ack 1, win 2062, options [nop,nop,TS val 1740779749 ecr 3117211483], length 0
18:44:01.777291 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 1:441, ack 1, win 2062, options [nop,nop,TS val 1740779750 ecr 3117211483], length 440
18:44:01.834037 IP redacted.3307 > 172.22.164.170.52213: Flags [.], ack 441, win 250, options [nop,nop,TS val 3117211536 ecr 1740779750], length 0
18:44:01.834064 IP redacted.3307 > 172.22.164.170.52213: Flags [.], seq 1:1321, ack 441, win 250, options [nop,nop,TS val 3117211538 ecr 1740779750], length 1320
18:44:01.834065 IP redacted.3307 > 172.22.164.170.52213: Flags [P.], seq 1321:1763, ack 441, win 250, options [nop,nop,TS val 3117211538 ecr 1740779750], length 442
18:44:01.834087 IP 172.22.164.170.52213 > redacted.3307: Flags [.], ack 1321, win 2041, options [nop,nop,TS val 1740779807 ecr 3117211538], length 0
18:44:01.834101 IP 172.22.164.170.52213 > redacted.3307: Flags [.], ack 1763, win 2034, options [nop,nop,TS val 1740779807 ecr 3117211538], length 0
18:44:01.835368 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 441:447, ack 1763, win 2034, options [nop,nop,TS val 1740779808 ecr 3117211538], length 6
18:44:01.838264 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 447:1422, ack 1763, win 2048, options [nop,nop,TS val 1740779811 ecr 3117211538], length 975
18:44:01.840919 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 1422:1724, ack 1763, win 2048, options [nop,nop,TS val 1740779814 ecr 3117211538], length 302
18:44:01.841028 IP 172.22.164.170.52213 > redacted.3307: Flags [P.], seq 1724:1798, ack 1763, win 2048, options [nop,nop,TS val 1740779814 ecr 3117211538], length 74
18:44:01.894001 IP redacted.3307 > 172.22.164.170.52213: Flags [.], ack 1724, win 249, options [nop,nop,TS val 3117211600 ecr 1740779808], length 0
18:44:01.894003 IP redacted.3307 > 172.22.164.170.52213: Flags [P.], seq 1763:2836, ack 1724, win 249, options [nop,nop,TS val 3117211601 ecr 1740779808], length 1073

@dan-lind
Copy link
Author

Am I reading your config correctly that you're connecting to 34 instances?

This config has 34 entries, yes. However I've tried reducing it down to a single entry as well, it doesn't change anything.

@enocom
Copy link
Member

enocom commented Sep 12, 2024

This happens only when I setup the proxy using the terminal. If I use Intellij/Datagrip, and use the built in MySQL CloudSql proxy driver, I can connect without issues.

That's very interesting, because the built-in driver (which uses the Java Connector) does effectively exactly the same thing.

Have you tried connecting through the Proxy using a CLI tool, or something other than Datagrip to determine if Datagrip is involved?

@dan-lind
Copy link
Author

Yes I have. I've tried "dumbing things down" as much as possible, for example to rule out any issues with the unix sockets.

$ cloud-sql-proxy --private-ip --port 3306 redacted-prod:europe-west1:redacted-live-156c5af9

2024/09/12 19:53:52 Authorizing with Application Default Credentials
2024/09/12 19:53:52 [redacted-prod:europe-west1:redacted-156c5af9] Listening on 127.0.0.1:3306
2024/09/12 19:53:52 The proxy has started successfully and is ready for new connections!

then using the mysql client (v8.4)

$ mysql -u redacted -p --host 127.0.0.1 --port 3306
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2

yields the same result

2024/09/12 19:54:13 [redacted-prod:europe-west1:redacted-156c5af9] Accepted connection from 127.0.0.1:53534
2024/09/12 19:54:13 [redacted-prod:europe-west1:redacted-156c5af9] failed to connect to instance: Dial error: handshake failed (connection name = "redacted-prod:europe-west1:redacted-156c5af9"): read tcp 172.19.50.151:53535->redacted:3307: read: connection reset by peer

tcpdump looks similar

$ sudo tcpdump --interface any -nn port 3307                                                                                                                                               

tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type PKTAP (Apple DLT_PKTAP), snapshot length 524288 bytes
19:54:13.730494 IP 172.19.50.151.53535 > redacted.3307: Flags [S], seq 1648980533, win 65535, options [mss 1360,nop,wscale 6,nop,nop,TS val 2312443875 ecr 0,sackOK,eol], length 0
19:54:13.796034 IP redacted.3307 > 172.19.50.151.53535: Flags [S.], seq 1731921490, ack 1648980534, win 32384, options [mss 1332,sackOK,TS val 1671508660 ecr 2312443875,nop,wscale 7], length 0
19:54:13.796091 IP 172.19.50.151.53535 > redacted.3307: Flags [.], ack 1, win 2062, options [nop,nop,TS val 2312443941 ecr 1671508660], length 0
19:54:13.796688 IP 172.19.50.151.53535 > redacted.3307: Flags [.], seq 1:1321, ack 1, win 2062, options [nop,nop,TS val 2312443941 ecr 1671508660], length 1320
19:54:13.796695 IP 172.19.50.151.53535 > redacted.3307: Flags [P.], seq 1321:1486, ack 1, win 2062, options [nop,nop,TS val 2312443941 ecr 1671508660], length 165
19:54:13.843445 IP redacted.3307 > 172.19.50.151.53535: Flags [R.], seq 1, ack 1321, win 2062, length 0

@enocom
Copy link
Member

enocom commented Sep 12, 2024

Thanks for the report. In that case I think we need to look into this -- that's wholly unexpected and something we definitely don't see in our automated testing.

Out of curiosity do you see the same behavior across instances?

@dan-lind
Copy link
Author

dan-lind commented Sep 12, 2024 via email

@jackwotherspoon
Copy link
Collaborator

Hi @dan-lind this looks like an issue on the server side.

Do you happen to have a support contract with Google? If so you could open a case and they will have access to check your server logs.

Let me know, otherwise I can try and figure out a way to take a look deeper for you.

@dan-lind
Copy link
Author

We do yes, I've reached out to them referring to this issue.

@jackwotherspoon
Copy link
Collaborator

We do yes, I've reached out to them referring to this issue.

Perfect, that is great to hear.

@TokieSan
Copy link

TokieSan commented Oct 6, 2024

I'm experiencing the same issue when trying to connect through the proxy. I receive a "connection reset by peer" error during the handshake process. Here’s a snippet from my logs:

failed to connect to instance: Dial error: handshake failed (connection name = "[redacted]-prod:europe-west1:[redacted]-live-6aab2946"): read tcp [ip]:[port]->[redacted]:3307: read: connection reset by peer

This issue occurs consistently (direct access & via python's connectors), I have tried reinitializing from my main device & my alt but nothing seems to work. Other users (on same project, with the same permissions) are not facing this issue even though they are following the same connection procedure & running the same OS with the latest updates.

UPDATE: Downgrading to v2.11.3 seems to have solved the issue for now as a work around, will try to identify the version where the issue started

UPDATE 2: After some git bisecting the commit that started the issue was fd6bd49242c884508f641c754eb5cec5d28ac665 with the bump to go 1.23, reverting the version solves the issue

@jackwotherspoon
Copy link
Collaborator

UPDATE: Downgrading to v2.11.3 seems to have solved the issue for now as a work around, will try to identify the version where the issue started

UPDATE 2: After some git bisecting the commit that started the issue was fd6bd49242c884508f641c754eb5cec5d28ac665 with the bump to go 1.23, reverting the version solves the issue

@TokieSan Thanks for the detailed analysis and extra data points! I would recommend pinning to an older version for now, in the meantime I will investigate this further to see if there is maybe something on the server side that is causing issues with the latest Go 1.23 TLS changes.

@dan-lind
Copy link
Author

dan-lind commented Oct 7, 2024

Confirming that downgrading to v2.11.3 solve the issue for me as well!

@jackwotherspoon
Copy link
Collaborator

Confirming that downgrading to v2.11.3 solve the issue for me as well!

Thanks for confirming @dan-lind , appreciate it.

I am just curious if you were able to successfully open up a support case for this issue? If so I can escalate it to get folks to look at the server side and see if we can pinpoint the issue.

@dan-lind
Copy link
Author

dan-lind commented Oct 7, 2024

I am just curious if you were able to successfully open up a support case for this issue? If so I can escalate it to get folks to look at the server side and see if we can pinpoint the issue.

Yes, I have a support case open!

EDIT: @jackwotherspoon I don't know how this usually works, but I've asked in my support case for them to reach out to you.

@jackwotherspoon
Copy link
Collaborator

EDIT: @jackwotherspoon I don't know how this usually works, but I've asked in my support case for them to reach out to you.

That is perfect, thanks

@jackwotherspoon
Copy link
Collaborator

@dan-lind @TokieSan I am taking a look at the server side to see if there are any issues with the use of Go 1.23

In the meantime, I wanted to point out #2294 which was also an issue for 2.13.0 when connecting to a Private IP Cloud SQL instance via a VPN...

looking at the changelog of go version 1.22..1.23, I got this change: experimental post-quantum key exchange mechanism X25519Kyber768Draft00 is now enabled by default when Config.CurvePreferences is nil. The default can be reverted by adding tlskyber=0 to the GODEBUG environment variable.

disabling it, using GODEBUG=tlskyber=0 cloud-sql-proxy-2.13.0 ... make cloud-sql-proxy works again inside the vpn.

I wonder if one of you could try starting the v2.13.0 Proxy with GODEBUG=tlskyber=0 cloud-sql-proxy ... and see if maybe this error is also a symptom of the same issue?

@dan-lind
Copy link
Author

dan-lind commented Oct 9, 2024

I wonder if one of you could try starting the v2.13.0 Proxy with GODEBUG=tlskyber=0 cloud-sql-proxy ... and see if maybe this error is also a symptom of the same issue?

Yes! GODEBUG=tlskyber=0 indeed works with 2.13, i.e. the error goes away and I can connect just as when downgrading to v2.11.3.

@jackwotherspoon
Copy link
Collaborator

Yes! GODEBUG=tlskyber=0 indeed works with 2.13, i.e. the error goes away and I can connect just as when downgrading to v2.11.3.

@dan-lind Great to hear it!

I am going to look at open Go issue's and see if I need to create one as it seems like the new default preferences implemented in golang/go#67061 are not playing nicely with VPNs

@enocom
Copy link
Member

enocom commented Oct 11, 2024

I think we have a similar problem with SOCKS proxies too. GoogleCloudPlatform/alloydb-auth-proxy#714

@jackwotherspoon jackwotherspoon changed the title Connection reset by peer when connecting through the terminal proxy TLS Handshake failures with Go 1.23 Oct 11, 2024
@enocom enocom assigned jackwotherspoon and unassigned hessjcg Oct 11, 2024
@jackwotherspoon jackwotherspoon added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Oct 11, 2024
@jackwotherspoon jackwotherspoon removed the priority: p3 Desirable enhancement or fix. May not be included in next release. label Oct 11, 2024
@enocom
Copy link
Member

enocom commented Nov 6, 2024

FYI GoogleCloudPlatform/alloydb-auth-proxy#714 (comment)

We should probably make this a feature request, since the root cause of these TLS handshake failures are buggy TLS servers.

@jackwotherspoon jackwotherspoon added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants