Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid shutdown race condition on service startup #369

Closed
wants to merge 177 commits into from

Conversation

craigfe
Copy link
Member

@craigfe craigfe commented Dec 4, 2020

Fix #352. The lwt and async service loops previously took condition variables that were used to signal when to shutdown. This resulted in a race condition: the shutdown signal can be broadcast before the server is waiting on it.

Fixed by having the service loops take switches as configuration instead:

val Conduit_lwt.serve   : ?stop:Lwt_switch.t    -> ... -> cfg -> unit Lwt.t
val Conduit_async.serve : ?stop:unit Deferred.t -> ... -> cfg -> unit Deferred.t

The most closely corresponding definitions would probably have Conduit_async.serve taking an Ivar.t instead, but from a glance at similar service loops in the ecosystem it seems to be more common to use a Deferred.t instead. Happy to change that if you think it's more appropriate.

dinosaure added 30 commits June 16, 2020 13:59
The core library gives only abstractions needed further. It lets
the end-user to choose:
- the type `input`
- the type `output`
- the type `+'a s` (the scheduler)

Definition of them will create an internal hashtbl which prospects
available protocols.

The core library provides without application of functors a local
map `resolvers` which is needed to start a connection (eg.
`Conduit.flow`). `resolvers` represents the global process to
resolve a domain-name to a `flow` used by `conduit`.

More technically:
- E1 is the local map `resolvers`
- E0 is the hidden global hashtbl of `conduit`
- Sigs contains signatures needed by `conduit`

This commit adds a README.md which explains how to use `conduit`.
…ism of the TLS layer with another protocol

`ocaml-tls` is a new package which provides internally a _functor_
to compose a Conduit.Sigs.PROTOCOL with TLS. It gives an example
of composition of protocols in `conduit`.

It requires an implementation of `conduit` which, at least, uses
`Cstruct.t` as `input` and `output`.

A realistic example is the composition of tcp/ip protocols provided by:
- `conduit-lwt-unix`
- `conduit-async`
- `conduit-mirage`

Each of them are differents (about type and implementation) but they
use this package to provide a TLS layer on top of them. At the end,
composition should be less than 10 lines of code.

Due to the non ability to use the scheduler, provided implementation
is not protected against data-race condition. The documentation tells
you more about that.
… helper to start a server

`conduit-lwt` is roughly an application of `conduit` with:
- type input = Cstruct.t
- type output = Cstruct.t
- type +'a s = 'a Lwt.t

Due to the ability to play with the scheduler, `conduit-lwt` provides:
- an implementation of the interface `mirage-flow.2.0.1` (deprecated)
- an helper to start a server
- conduit-tls with lwt

The first is deprecated due to the difference between `Conduit.recv`
and `Mirage_flow.recv`. The documentation tells you more about that.

The second is what `conduit` provided before, a common loop to start
the server with an _handler_. It is used by `ocaml-cohttp`. This helper
is not restricted to a specific protocol.

The third is an application of `conduit-tls` with `conduit-lwt`.
…l layers

This is the first real implementation of `conduit` with some protocols:
- tcp/ip with `Lwt_unix`
- tls with `conduit-tls`
- ssl with `Lwt_ssl`

Due to the ability to use `Lwt_io`, this package provides an helper to
give `Lwt_io.channel` from an abstracted `Conduit_lwt{_unix}.flow`.

It provides TLS + provided tcp/ip protocol, a possible composition
with SSL layer (with `Lwt_ssl`) and SSL + provided tcp/ip protocol.

It provides a `resolv_conf` function as the main DNS resolver. It's a
call of `gethostbyname` and finally trusts on your `resolv.conf`.
`conduit-lwt` with the tcp/ip stack provided by `mirage` and
a composition of it with the TLS layer provided by `conduit-tls`.

It exposes a `mirage-flow` implementation as `conduit-lwt`
(deprecated).
…p, tls and ssl layers

`conduit` with `async` and:
- tcp/ip layer provided by `Async.Tcp`
- ssl layer with `Async_ssl` and its implementation with the given
  tcp/ip stack
- tls layer with `conduit-tls` and its implementation with the given
  tcp/ip stack
- An helper to get `Reader.t`/`Writer.t` from an abstracted
  `Conduit_async.flow`
- a `resolv_conf` resolver which trusts on your `resolv.conf`
- We split `conduit` into 2 parts:
    + A client part to connect/recv/send/close
    + A server part to make/accept/close
- Deletion of the type witness value `'a key`

  The old design has 2 sets, a set of `'a key` and a set of `'a
  Witness.protocol`. The first represents the type of the required value
  to _initialize_ the connection. The second represents the type of the
  created `flow`. With such design, the end-user had to do the link by
  himself between `'a key` and `'a protocol` when he calls
  `register_protocol`.

  However, the user must keep the link by himself when he wants to
  extract the implementation (eg. `impl_of_protocol`).

  It appears that, from an usability point-of-view, a `'a protocol` has
  only one and uniq key. So this commit merges `'a key` and `'a
  protocol` into one and uniq type witness value `('edn, 'flow)
  protocol`.

  The commit does the same about service (with `'cfg`).
- Expose extensible variant
  This commit de-abstracts `type flow` to `type flow = private ..`

  So we still must use `register` to extend `flow`, but we have the
  ability to _pattern-match_ on the type `flow` as long as the protocol
  implementer exposes `type Conduit.flow += T of t`. The protocol
  implementer can do that with the new function `Conduit.repr` - an
  example exists on `conduit-lwt-unix.tcp` and `conduit-lwt-unix.tls`.

  However, `Conduit.is` still exists to allow the deconstruction of the
  type `flow` with a given `('edn, 'flow) protocol`.
- Rename `Conduit.flow` to `Conduit.connect`
- `protocol` given by `Conduit.Service.serve` is wrapped into a GADT to
  hidden the _endpoint_ type. We still continue to be able to abstract
  the `'flow` given by `accept` but a step is added to abstract it to
  the type `Conduit.flow`.
- And we remove the required protocol to register an new service
dinosaure and others added 18 commits December 2, 2020 13:35
core: expand aliases exposed in _intf file
Re-export Conduit.Refl constructor
This is a regression test for the issue introduced by
mirage#357 and fixed by
mirage#361.
The `opam/` pattern doesn't apply to local switches created with `opam
link` (which are not directories).
Add a test that Service.equal is reflexive
Remove unused Conduit.S.resolvers type
@craigfe craigfe force-pushed the fix--shutdown-race-condition branch from c6ac3af to 2a0d559 Compare December 4, 2020 14:04
Fix mirage#352.

The `lwt` and `async` service loops previously took condition variables
that were used as a shutdown signal. This resulted in a race condition:
the shutdown signal can be broadcast before the server is waiting on it.
Fixed by having the service loops take switches as configuration
instead.
@craigfe craigfe force-pushed the fix--shutdown-race-condition branch from 2a0d559 to e427423 Compare December 4, 2020 14:10
@dinosaure
Copy link
Member

/cc @talex5 who is more expert than me.

Copy link
Member

@dinosaure dinosaure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be, in your case in Irmin, it can be interesting to add a new argument launched which signals that the server was properly launched after that? Note that it could useful for us, in our tests, when we need to sleep a bit with async before to connect clients.

@craigfe
Copy link
Member Author

craigfe commented Dec 7, 2020

I spent a while thinking about ways to know that the server is ready to receive messages, and I'm not sure it can be done with Lwt. As you say, we would want to emit a signal after the first time that accept blocks, but without knowing how accept is implemented we can't pull any tricks with the scheduler to get a signal out after that thread sleeps (e.g. it might contain several non-blocking portions, and we don't know how many).

I'd be very happy to be proven wrong on this though; currently this seems like a pretty annoying limitation of Lwt and of concurrency monads more generally.

@dinosaure
Copy link
Member

dinosaure commented Dec 7, 2020

I spent a while thinking about ways to know that the server is ready to receive messages, and I'm not sure it can be done with Lwt. As you say, we would want to emit a signal after the first time that accept blocks, but without knowing how accept is implemented we can't pull any tricks with the scheduler to get a signal out after that thread sleeps (e.g. it might contain several non-blocking portions, and we don't know how many).

Service.init should be enough. The main error is when a client wants to connect to a specific port which is not yet bound. In a certain sense, Service.init allocates the ressource and initialise it - so, about Conduit_lwt.TCP for example, it allocates the socket and bind it. At this stage, you can launch a client cooperatively.

@craigfe
Copy link
Member Author

craigfe commented Dec 7, 2020

Good point. I've done as you suggest in #371.

@samoht
Copy link
Member

samoht commented Mar 8, 2021

I've just switched the master branch and the v3 branch. If you are still interested by experimenting on the v3 API, would you mind re-opening a PR vs the v3 branch?

@samoht samoht closed this Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Lwt_switch instead of Lwt_condition to stop the server loop
6 participants