-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Feature/double hbone #1429
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
e27a4a8
to
40bbcd0
Compare
66db8ae
to
99d622f
Compare
@@ -107,7 +112,7 @@ impl Outbound { | |||
debug!(component="outbound", dur=?start.elapsed(), "connection completed"); | |||
}).instrument(span); | |||
|
|||
assertions::size_between_ref(1000, 1750, &serve_outbound_connection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we get these numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by looking a the current size and adding a small amount of buffer.
99d622f
to
96bb4de
Compare
@@ -83,6 +83,26 @@ struct ConnSpawner { | |||
|
|||
// Does nothing but spawn new conns when asked | |||
impl ConnSpawner { | |||
async fn new_unpooled_conn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything here we can do higher up I think, but things might change if we decide to implement pooling in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if we want double-hbone conns to be unpooled and thus need ~none of this surrounding machinery, then I'd be inclined to just start proxy/double-hbone.rs
and use that directly, rather than complicating the purpose of this file.
(Could also just have a common HboneConnMgr
trait or something too)
src/proxy/outbound.rs
Outdated
// This always drops ungracefully | ||
// drop(conn_client); | ||
// tokio::time::sleep(std::time::Duration::from_secs(1)).await; | ||
// drain_tx.send(true).unwrap(); | ||
// tokio::time::sleep(std::time::Duration::from_secs(1)).await; | ||
drain_tx.send(true).unwrap(); | ||
let _ = driver_task.await; | ||
// this sleep is important, so we have a race condition somewhere | ||
// tokio::time::sleep(std::time::Duration::from_secs(1)).await; | ||
res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anybody have any info on how to properly drop/terminate H2 connections over stream with nontrivial drops (e.g. shutting down TLS over HTTP2 CONNECT). Right now, I'm just dropping things/aborting tasks randomly until something works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking about how to cleanup after, for example, a RST_STREAM to the inner tunnel? Or something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda. I mostly mean the outer TLS stream because that's what I've looked at. It seems like if I drop conn_client
before termination driver_task
the TCP connection will close without sending close notifies. So yes, I'm asking if there is a way to explicitly do cleanup rather than relying on implicit drops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the code changed; do you still need help figuring this out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im still not confident in it. It works (on my machine), but I couldn't find any docs on proper connection termination/dropping.
@@ -217,12 +267,12 @@ impl OutboundConnection { | |||
copy::copy_bidirectional(copy::TcpStreamSplitter(stream), upgraded, connection_stats).await | |||
} | |||
|
|||
async fn send_hbone_request( | |||
fn create_hbone_request( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git merge is getting confused here
@@ -70,7 +70,7 @@ const IPV6_ENABLED: &str = "IPV6_ENABLED"; | |||
|
|||
const UNSTABLE_ENABLE_SOCKS5: &str = "UNSTABLE_ENABLE_SOCKS5"; | |||
|
|||
const DEFAULT_WORKER_THREADS: u16 = 2; | |||
const DEFAULT_WORKER_THREADS: u16 = 40; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed in the description, but why the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping it would making debugging async rust easier (it didn't)
src/proxy/outbound.rs
Outdated
// This always drops ungracefully | ||
// drop(conn_client); | ||
// tokio::time::sleep(std::time::Duration::from_secs(1)).await; | ||
// drain_tx.send(true).unwrap(); | ||
// tokio::time::sleep(std::time::Duration::from_secs(1)).await; | ||
drain_tx.send(true).unwrap(); | ||
let _ = driver_task.await; | ||
// this sleep is important, so we have a race condition somewhere | ||
// tokio::time::sleep(std::time::Duration::from_secs(1)).await; | ||
res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking about how to cleanup after, for example, a RST_STREAM to the inner tunnel? Or something else
1ea75fb
to
f1cc535
Compare
|
||
// Inner HBONE | ||
let upgraded = TokioH2Stream::new(upgraded); | ||
// TODO: dst should take a hostname? and upstream_sans currently contains E/W Gateway certs | ||
let inner_workload = pool::WorkloadKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will reorganize later.
a8856a4
to
565f41f
Compare
Protocol::TCP => None, | ||
}; | ||
let (upstream_sans, final_sans) = match us.workload.protocol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding from talking to @keithmattix is that Upstream.service_sans
will be repurposed to contain the identities of remote pods/waypoints, so I should change the logic of the other protocols to only use us.workload.identity
instead of us.workload_and_services_san
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is correct; only the double hbone codepath needs to be added/changed because there are two sans being considered: the e/w gateway SAN and the SANs of the backends. So what you have looks right to me
@@ -511,10 +578,10 @@ impl WorkloadHBONEPool { | |||
#[derive(Debug, Clone)] | |||
// A sort of faux-client, that represents a single checked-out 'request sender' which might | |||
// send requests over some underlying stream using some underlying http/2 client | |||
struct ConnClient { | |||
sender: H2ConnectClient, | |||
pub struct ConnClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixme
Initial double HBONE implementation
Right now, inner HBONE will only hold one connect tunnel. Once the inner tunnel terminates, so will the outer tunnel (but not the outer HBONE). So when ztunnel receives its first connection to a double HBONE host (E/W gateway), it will perform two TLS handshakes. Subsequent connections to the same host will perform one TLS handshake.
This behavior is not great, but if we put the inner HBONE in the connection pool, then we pin ourselves to a pod in the remote cluster since ztunnel performs connection pooling, but is not aware of the E/W gateway's routing decision.
That being said, I think this is a good place to stop and think about control plane implementation and get some feedback on how I'm approaching this.
NOTE: The TLS/certificate related code changes are just for me to tests.
Tasks:
Some open questions:
N
inner HBONE connections per E/W or per remote cluster.References: