Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneForAll strategy #11

Open
adinapoli opened this issue Feb 1, 2016 · 6 comments
Open

OneForAll strategy #11

adinapoli opened this issue Feb 1, 2016 · 6 comments
Assignees

Comments

@adinapoli
Copy link
Owner

No description provided.

@srijs
Copy link
Collaborator

srijs commented Feb 2, 2016

So I've been looking into implementing this, and what gives me trouble is that RestartStrategy bundles both the RetryStatus and the actual strategy (e.g. OneForOne, etc.). As a consequence, if I understand this correctly (please correct me if I'm wrong), the strategy on how to restart is actually linked to the individual child process.

If you look at how Erlang implements it, you'll see that the strategy with which to restart children is on a supervisor level, and retry state is maintained on a per-child basis. i.e. if a child terminates, we want to be looking at what the strategy of the supervisor is to deal with this, rather than what strategy is linked to the child that died.

The changes necessary to implement this (from my perspective) would be that Supervisor_ would carry the RestartStrategy, instead of Child_. I would suggest to move the RetryStatus from RestartStrategy into Child_ as well.

Since those would be breaking changes (and we would necessarily also need to change the way a supervisor is constructed), could you have a look @adinapoli and tell me if there's a simpler option in your mind?

@adinapoli
Copy link
Owner Author

Hey @srijs ! I will look at this during off hours (it's quite unfortunate that during our timezones your hacking time matches my working hours 😉 ).

What I can tell you, on top of my mind, after months of not touching this code, is (I will refine this later once I refresh my brain):

  • Surely it would be nice to stick to what Erlang does, as it's a proved model. I'm not an Erlang programmer and I needed only one kind of strategy, so I implemented only what I needed based on a superficial understanding of how Erlang Supervisors work.
  • I'm not too much concerned about breaking changes; I'm much more compelled with having the library in a healthier, more active, more maintained, more used state. Nobody really cares about breaking changes if nobody is using your library 😉
  • I will add an extra comment once I read the code on what I think about the issue you mentioned, and what was my rational behind the original implementation.

@srijs
Copy link
Collaborator

srijs commented Feb 2, 2016

Thanks for taking the time to comment. I like your attitude :)

I'm looking forward to hearing more from you later. In the meantime, I'll try to open a PR with the changes as I would propose them. It is almost always better to be able to talk about something that is concrete :)

@adinapoli
Copy link
Owner Author

No problem at all! Thanks for contributing, it's really nice to see this library moving forward 😉

Later!

@srijs
Copy link
Collaborator

srijs commented Feb 7, 2016

Thinking further regarding this feature, I think we might need to introduce something like "epochs" for the children of a supervisor and the events that are emitted from it. Here's the problem that it would be solving:

  • Suppose we have a supervisor that has child process which rely on each other. Think of two coroutines that send each other messages, but fails if they can't reach each other.
  • Now one of those children crashes for some reason, and following the OneForAll strategy, the supervisor proceeds to terminate all other processes.
  • In the meantime, the other child was not able to send a message to the child that crashed, and thus terminates as well. When it does, a new DeadLetter is queued up.
  • The supervisor has terminated all processes, and now proceeds to bring them up again.
  • After all children are restarted, the next iteration of handleEvents starts, picking up on the DeadLetter that has happened before the restart.
  • All children are terminated again (not what we want).

By introducing epochs, each supervisor would track an increasing number (the epoch) for each child. Whenever it decides to restart (or terminate) a child, it increases the epoch for it. DeadLetter messages with an epoch lower than the currently tracked one are ignored.

What do you think, @adinapoli?

@adinapoli
Copy link
Owner Author

Hey @srijs !

I think that what you are proposing makes a great deal of sense, but just to play the devil's advocate a bit, are you aware if the Erlang OTP does something similar or it's a novel approach? Generally speaking might be good to see what they do to rely on proved, battle-tested models!

Said that, I have also jotted down a couple of caffeine-deprived thoughts on the shape of the library here:

#15 (comment)

Please chime in 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants