Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby phase 2 - Workflows #96

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

cretz
Copy link
Member

@cretz cretz commented Oct 24, 2024

Summary

  • This is the proposal for the second phase of the Ruby SDK about workflows. Please review and leave comments!
  • See it rendered.
  • Much of this is inspired by the previous Temporal Ruby SDK (and its proposals which were moved to a sub directory) and our other SDKs.
  • The design of workflows and their handlers is an early one and we welcome any suggestions for alternative designs here. Nothing is concrete.
  • There are plenty of outstanding questions on the proposal, some of which need more implementation and performance measurements to decide.
  • Use this PR for comments/discussion. Don't be afraid of making too many comments; if it gets too noisy, I'll just create another PR.
  • Any points of contention will be discussed in-person by the SDK team and a decision will be made.

For general discussion beyond comments here, please also join us on #ruby-sdk in Slack.

@cretz cretz requested a review from a team October 24, 2024 12:18
* `Temporalio::Workflow` is both the class to extend and the class with class methods for calling statically to do
things.
* The `workflow_` class methods on the `Temporalio::Workflow` can only be called at class definition time and cannot
be called on `Temporalio::Workflow` directly. These are for adjusting the definition.
Copy link

@dandavison dandavison Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user writing client code writes something like

supported_languages = wf_handle.query(
  GreetingWorkflow.<TAB>

will their IDE show them all the "framework" methods (including the unavailable methods starting workflow_)? It is nice that in other languages the user's IDE only offers the methods from their own workflow interface. I think we would like to have the same be true for Ruby if possible.

Copy link
Member Author

@cretz cretz Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I am afraid this is only available at runtime (via method_missing). Ruby does not allow you to access an instance method at the class level.

I think we would like to have the same be true for Ruby if possible.

Completely agree, but I cannot think of a way to do this without some form of proxying which were trying to avoid for a few reasons (mostly because we'd have to lie about what the type actually is). And even if we did do proxying, Ruby is often not expressive enough to represent this well statically. Here may be some other options, but all have more tradeoffs than the current approach I think.

class Workflow
  # @return [self]
  def self.stub
    StubDelegator.new(self)
  end
end

# This if you want to keep accurate with typing, but no clear way to provide
# handle and args
my_wf = MyWorkflow.stub
my_wf.update('my arg') 

# This if you want better options, but you'd fail a type checker
my_wf_handle.update(MyWorkflow.stub.update, 'my arg')

# You could use blocks, but it gets confusing for users too
my_wf_handle.update(wait_for_stage: whatever) { |stub| stub.update('my arg') }

Open to suggestions here, but I fear Ruby just cannot statically express things to the IDE like this. Ruby developers are used to having to use symbols or runtime methods that are not in their IDE. Of course a user can provide YARD macros or Sorbet/RBI types to support this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've probably already implicitly explained this, but in Python this is done by implementing static things like @query decorators as functions on a workflow module. Can you (re-)summarize why an approach like that doesn't work for Ruby?

Copy link
Member Author

@cretz cretz Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python you can reference an instance method via Class.instance_method, but in Ruby you cannot. You must have an instance of a class to reference an instance method on it. FooWorkflow.my_query_method will give a NoMethodError. You can do hacky things like FooWorkflow.instance_method(:my_query_method) to get an unbounde method, but you're into reflection there not IDE help.

We will support FooWorkflow.my_query_method via method_missing approaches, but that is runtime, not definition time like IDEs need. But I'm open to ideas here. There are concepts of "stubs" we have used in Java, but they have many of their own problems.

I am investigating https://github.com/Shopify/tapioca#writing-custom-dsl-compilers to see if we may be able to write a custom DSL compiler/extension to help here.

french: 'Bonjour, monde',
hindi: 'नमस्ते दुनिया',
portuguese: 'Olá mundo',
spanish: '¡Hola mundo'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bugged me:

Suggested change
spanish: '¡Hola mundo'
spanish: '¡Hola mundo!'

None of the others have an exclamation mark though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! temporalio/samples-python#149

I removed the exclamation marks from all of them after I realized I had it on the wrong end for Arabic... but apparently I missed the leading one in Spanish.

Comment on lines +321 to +322
* Notice lack of `timeout` on `wait_condition`, we will support Ruby `Timeout` via the fiber scheduler.
* Notice lack of `sleep`, we will support Ruby `sleep` via the fiber scheduler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree with these. I thought we were gaining some consensus that the "magic" patched APIs aren't terribly helpful long-run and end up confusing users rather than teaching them that workflows have special constraints. Inevitably users need to learn this, so why delay that? If the main counter argument is that you can re-use some existing code, I know I've heard agreement from you and others before that that's very rare to actually be viable.

I think it'd be good to instead error on these calls and suggest our provided alternatives instead.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm. Ruby explicitly delegates these two things to the scheduler: https://docs.ruby-lang.org/en/master/Fiber/Scheduler.html#method-i-timeout_after and https://docs.ruby-lang.org/en/master/Fiber/Scheduler.html#method-i-kernel_sleep.

The concern with erroring on these is that we are probably going to recommend use of the https://github.com/socketry/async library (and definitely heavily test for its use) because Ruby doesn't have good async primitives and that library leans on existing fibers. So users expect to be able to call https://socketry.github.io/async/source/Async/Task/index.html#Async::Task#with_timeout and https://socketry.github.io/async/source/Async/Task/index.html#Async::Task#sleep (now deprecated because they too encourage stdlib sleep).

This is the same type of thing with Queue where Ruby built in support for customized async inside it so people can use it without concern. I don't expect us to say "queue.pop confuses users and we should have our own queue" either.

In this case, I don't think these are considered "magic patched", I think it's expected by users that they work.

If the main counter argument is that you can re-use some existing code, I know I've heard agreement from you and others before that that's very rare to actually be viable.

I think that's true elsewhere, but in Ruby, the lack of async primitives means we have to either write a ton of them, or allow use of the common async library built on the same fiber abstraction we're building on (almost all of that library's primitives will work based on research so far). But open to options here. I think we need to pick between:

  • Allow/encourage use of https://github.com/socketry/async
  • Write our own primitives (e.g. Mutex and Notification and such) and disallow/discourage use of that library

We already have the basic future primitive, so technically the second option is viable, but the proposal currently is expecting the first. (pardon the long response)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yeah I can buy the argument about socketry using these under the hood.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #96 (comment), maybe we don't want to encourage this async lib after all. It does put too much trust in their deterministic fiber scheduler use (even across versions).

Comment on lines +331 to +332
* ❓ Is this acceptable? This is a consequence of us wanting to share workflow context class with workflow base
class. Basically at the top of these calls there's a `if self != Workflow` check.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's pretty much fine, but... why not just have a WorkflowContext class and be explicit? Or, have that be Workflow and the other be WorkflowDefinition

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely a good question and subject to debate and is the reason I put "Temporalio::Workflow is both the class to extend and the class with class methods for calling statically to do things." at the top of the controversial decisions.

So basically you have these options:

Option 1 - Together

class MyWorkflow < Temporalio::Workflow
  def execute
    Temporalio::Workflow.wait_condition { false }
  end
end

Option 2 - Context class named separate

class MyWorkflow < Temporalio::Workflow
  def execute
    Temporalio::Workflow::Context.wait_condition { false }
  end
end

One benefit to this is that we do have Temporalio::Activity::Context that houses all of the activity things, but if you look at some other SDKs like Java and .NET, the class name is "activity context" but not "workflow context", so that disparity does exist elsewhere.

Option 3 - Workflow class named separate

class MyWorkflow < Temporalio::WorkflowImpl
  def execute
    Temporalio::Workflow.wait_condition { false }
  end
end

I am using WorkflowImpl here because there will be a Temporalio::Workflow::Definition that represents the static definition built from the impl (and signal, query, and definition classes, same as there is already Temporalio::Activity::Definition). Open to a better name, I just struggled to come up with one.

Option 4 - Don't even require class extension

class MyWorkflow
  def execute
    Temporalio::Workflow.wait_condition { false }
  end
end

You aren't buying that much with a class extension, but it is clear and is common in Ruby and is what we do for activities.

Thoughts/preference? Any other/better options here? Mixins would suffer the same class-for-two-purposes reuse issue.

Copy link
Member

@Sushisource Sushisource Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like option 3. Clear separation, just as terse as option 1 for the vast majority of the code..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, which is basically the same as option 2 just different on who gets the "Workflow" part. I will say I like "Context" for users more than I like "WorkflowImpl" for extenders, but I do like callers of workflow utilities to only need Temporalio::Workflow instead of Temporalio::Workflow::Context (and what drove my original approach). Either way, I am starting to lean towards separating what you extend from what you invoke. Any ideas for other names though besides either "Context" for workflow utilities or "WorkflowImpl" for the extension class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really lol. Those are pretty obvious choices. Maybe WorkflowDefinition like I suggested instead of impl. I would err on making that longer since you type it way less.

* 💭 Why? It is helpful for callers/client-side to be able to reference something more explicit than a symbol, and it
also helps with making sure if the name is changed we use that name instead of the method name. Symbols can always
be used though.
* ❓ Is this acceptable or too much magic?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this kind of magic is largely fine because it's not really pretending some behavior is different, it's just advanced sugar for fetching the definition in a nice way. Works for me. The only sort of magic bit I suppose is that, if you call that, it's not invoking the method, but it would seem fairly obviously undesirable to do that anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It is not very common in Ruby to have class methods and instance methods of the same name with different behavior, though it does happen here and there. But I think our use case uniquely justifies it.

The bigger struggle is type safety integration, but Ruby type safety is in such poor state I 1) cannot find a good way to support caller-side type safety in RBS/Sorbet (unless relying on something like this which doesn't work in IDEs anyways), and 2) do not want to degrade ease of use by the majority of users using this as normal Ruby without type safety checks. Users can of course write their own signatures for these things, e.g. in RBS you might have:

class MyWorkflow < Temporalio::Workflow
  def self.my_update: -> Temporalio::Workflow::Definition::Update[MyReturnType, MyArgType]

  def my_update: (MyArgType arg) -> MyReturnType
end

I am open to any/all designs here. Requirements for others reading:

  • Workflow handlers for signal, query, and update must be easily implemented as instance methods
  • Workflow handlers for signal, query, and update must be easily usable by clients without having an instance of the workflow (we don't want fake/proxy objects where method invocation does different things)

Comment on lines +348 to +349
* Create a new `Temporalio::Converters::RawValue` object that is a pass-through object for raw values (still subject to
payload codecs), and a `payload_converter` is available in workflow/activity context to convert it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Python and .NET have these and they are very valuable (even for people not using dynamic, but just want to defer/control conversion for whatever reason)

Comment on lines +396 to +399
def self.any_of_no_raise(*futures)
# Returns future whose result is set to nil when all futures complete
# regardless of whether any future failed.
def self.all_of_no_raise(*futures)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are confusing. The wait and wait_no_raise method accomplishes the waiting & maybe raising part just fine.

From what I understand these are really more like different conditions for when any/all resolve. The first is more like any_success and the second is really all_of, where the current all_of is really try_all_of or something. IE: It looks like the originals short circuit on any failure, and these versions do not.

That or I'm not understanding them, because what's the difference between any_of_no_raise(...).wait and any_of_no_raise(...).wait_no_raise? The latter reads particularly strangely. Either way if I'm wrong or right, I think it says the names are a bit confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because what's the difference between any_of_no_raise(...).wait and any_of_no_raise(...).wait_no_raise?

No big difference if you have both of these calls together. But they have value independently (the first one returns a future the second you have chosen not to even be able to look at the failure) and have value as consistent shortcuts. I can't imagine anyone would call the latter combo, wait_no_raise will more often be used for people using singular futures they don't care about.

Most languages have forms of these. But I am open to better names though don't want to get too inconsistent with each other and don't want these to be arguments since they change the return type. Basic requirements I think are:

  • Ability to wait for first of multiple futures or raise on first failure
  • Ability to wait for first of multiple futures but do not raise on first failure
  • Ability to wait on all of multiple futures or raise on first failure
  • Ability to wait on all of multiple futures but do not raise on first failure
  • Ability to wait on a future or raise on failure
  • Ability to wait on a future and not raise on failure

This was my best attempt after many permutations to get those 6 methods in. Definitely open to clearer suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would then change the names a-la my suggestions - the most important part I think is just not to use no_raise in these names since it's not really about not raising, it's about not short-circuiting in the event one fails.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would then change the names a-la my suggestions

Do you have a specific suggestion on names? So instead of:

  • any_of
  • any_of_no_raise
  • all_of
  • all_of_no_raise
  • wait
  • wait_no_raise

reading comment above, something like:

  • any_of
  • any_of_success - says success but includes failure?
  • try_all_of - fails future if any fail?
  • all_of - does not fail future if any fail, so one "x_of" doesn't ail the future and one does?
  • wait
  • wait_no_raise

Maybe something like try instead of _no_raise, e.g.

  • any_of
  • try_any_of
  • all_of
  • try_all_of
  • wait
  • try_wait

Looking for symmetry in naming between messages that fail their things and ones that don't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the try names, but I think they should actually be the short-circuiting versions. IE: Existing any_of and all_of become the try_xxx versions, and the current _no_raise versions just become any_of and all_of. Since "try" implies "try until something breaks". This also matches the Rust naming scheme for these, FWIW.

I would probably keep wait_no_raise though rather than try there just because that's more obvious. And, I think, to my point, that is actually about raising or not where the other stuff is about short-circuiting.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, to me "try" means failure is captured or doesn't fail as it otherwise would, e.g. .NET int.Parse vs int.TryParse. One fails as normal, another captures failure (same as the actual try statement in many languages).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try here is like "try them, but fail if any fails soon" whereas without it is "definitely do all the futures even if some of them fail".

But, I wouldn't cry about having it the other way around, I think that can work too w/ clear docs. I do think wait and wait_no_raise should stay that way though.

* ❓ Is this an acceptable default? Python uses `[Etc.nprocessors, 4].max` as the max threads, but we figure it can
be unbounded here and remain bounded by the Core `max_concurrent_workflow_tasks` (TODO: expose this, it was left
off in `main`).
* ❓ Should the default share the same thread pool as activity executor?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, specifically because of some of the workflow isolation stuff that may or may not need to happen in them. Seems easier and cleaner to keep them separate without much downside.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference between thread pool and executor in this case. One benefit of reusing the thread pool (but still different executors of course) is that it is an "unbounded caching thread pool" which means the same threads could be reused across workflows and activities. The threads, once returned back to the pool, should have no side effects from its use. By having separate unbounded thread pools here, both pools will have idle threads that the other could be using unnecessarily. Definitely not an easy/hard thing to share vs keep separate because we will have lazily created global defaults for these (and already do for activity executor).

So the only argument against I can think of is maybe some kind of concern for user clarity, but my hope is that users never touch this stuff for the most part.

is that not only are Ractors not stable and may have bugs, but they also give off a warning. A workflow instance will be
in a Ractor and Core activations will be communicated.

* ❓ Should we enable this by default, disable this by default, or somehow force the user to make a choice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone cares about choosing between these two really. I think we need to figure out what has acceptable performance and correctness and do that. The warning / experimental status for Ractors seems like a nonstarter, though.

But, other than that, I can't really imagine why someone is coming along and deciding "Oh I really need to use Ractors and not TracePoint" for some reason. The only thing they care about is that the workflows run fast & errors get caught.

That is, of course, excepting some other weird interactions that aren't mentioned here or we don't know about.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning / experimental status for Ractors seems like a nonstarter, though.

I am not sure this is true. The Ractor gives so much benefit to ensuring safe determinism, it's basically a language-supported sandbox. And if we don't enable them by default (or force the user to choose whether to use them for now), we never can enable them by default. And they are quite decent for now for limited uses from my understanding.

I can't really imagine why someone is coming along and deciding "Oh I really need to use Ractors and not TracePoint" for some reason

This section is purely for Ractors, I think we will turn on TracePoint by default.

It is a very tough call here on whether to enable Ractors by default, because Ractors are a perfect fit for this use case and we would support turning them off (at the worker level), and if we have a non-Ractor default now, we will never be able to change to the safer Ractor default if we want to later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never can enable them by default.

So why is that true?

I think the warning isn't so much the problem as the content of the warning that basically says "these are buggy". If that's not actually true, then cool, I think we can be fine with it, but, we should know that for sure.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why is that true?

To clarify I mean "if we want this as default, we have to do the more restrictive default now because we can't later" because you can't go from a less-restrictive default to a more-restrictive default without breaking people. Ractors prevent state sharing (e.g. a global is different in each Ractor). Also, things that are shared across Ractors must be marked as such via https://docs.ruby-lang.org/en/master/Ractor.html#method-c-make_shareable (which is an awesome feature and we already require this be done with payload and failure converters knowing they may be used inside sandbox). So turning on Ractors by default when they weren't before would break a lot of people that were, say, using their logger or OTel impl or other things people sometimes opt out of sandboxes for.

I think the warning isn't so much the problem as the content of the warning that basically says "these are buggy". If that's not actually true, then cool, I think we can be fine with it, but, we should know that for sure.

We definitely have to test here, and they may be buggy for some advanced uses and maybe not our own. But even if that's the case, defaulting to experimental thing is rough and it spits out a warning, even though plenty of Ruby users do use Ractors today. Even if we default, we have to make it very easy to change and document it very clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see. But arguably they already need to not be doing all that stuff for proper workflows anyway?

I think, if we find that we're not running into implementation bugs, it can be an OK default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see. But arguably they already need to not be doing all that stuff for proper workflows anyway?

Usually, yes, but some things people accept they need to cheat (e.g. interceptor sending exception to sentry when not replaying, or a logger that writes to disk but is not the logger we offer)

I think, if we find that we're not running into implementation bugs, it can be an OK default.

I'm leaning there too. But it will be super loud that we use them and how not to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to force users to explicitly specify whether to use Ractors or not.

If we think we might want to change the default in the future, I think it's fine to require users to explicitly specify it now and remove the requirement later (so long as we're not asking them to do that for too many things). This enables us to migrate to the more restrictive default for future new users without breaking existing users.

We did something similar for Update IIRC.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, no SDK requires extra options for running workflows. I think if we choose Ractors we can change the default to be less restrictive if we ever had to (though I don't suspect we ever would).

We did something similar for Update IIRC.

This is not comparable I don't think because we know we will provide a default in the future but it is not yet implemented. Here, we don't know, so this isn't necessarily temporary. I think we might as well decide now, because in 5 years I don't think our decision would change.

I'm leaning towards the safer-though-technically-warns-as-experimental default that is easy to opt out of. Obviously it will be a heavily tested default.

Copy link

@josh-berry josh-berry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general non-blocking comments, no need to wait for my approval.

Comment on lines +373 to +375
* ❓ Why not just take a dependency on `async` gem? It does have a deterministic scheduler and does a decent job of
everything we want, but not only do we want to avoid transitive dependencies in our library (for versioning and
other reasons), we cannot rely on its logic into the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is asyncs deterministic behavior actually part of its contract or no? (I'm thinking of how we got burned by Python's change in behavior for asyncio; I imagine you are too)

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is asyncs deterministic behavior actually part of its contract or no?

Not explicitly, but most of the things it does are delegated to the underlying fiber scheduler. The other problem is not just that it is deterministic, but that it also doesn't make changes that would be incompatible (e.g. schedule two fibers in a different order in a newer version of the library).

Concerning determinism, they do say as a comment inside https://github.com/socketry/async/blob/72fae62658b68e85d53d4c6adf13d3b34108cfe8/lib/async/scheduler.rb#L428-L435 that they intentionally try to preserve determinism, but obviously a code comment is not a clear contract (and that's their scheduler, we'll have our own).

It's a tough call, because it's a full featured library with lots of utilities that leans on the deterministic fiber scheduler, and we surely don't want to recreate every construct they have already created. But maybe we don't want to encourage use of this project? Java SDK gets along just fine with its primitive Promise support. We may still want to lean on traditional Ruby things like queues, timeouts, and sleep I think, but maybe not (see #96 (comment)). Regardless, we'd have to write our own mutex (we're already writing our own future).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would be a much easier call if determinism were an explicit design goal, but since it's not as such, I think it's probably safer to avoid the dependency for now. Perhaps we could align our own API surface with its API so we could switch later if there is user demand? That might also help reduce the cognitive overhead of learning Temporal if the user already knows async.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably safer to avoid the dependency for now

To clarify, we would never have a dependency, just that it would happen to work and we could recommend its use (and we'd test that it does in our tests). Having said that, I think I agree maybe we should not even recommend its use or discuss it.

Perhaps we could align our own API surface with its API so we could switch later if there is user demand

It's too large. I think we can have Future (what they call Task) and Mutex manually implemented, and then decide whether to explicitly implement sleep and Timeout and Queue or allow reuse of the standard library ones (knowing in Ruby this isn't the same as leaning on a large set of async primitives, these are 3 very limited low-level common Ruby things that will not change behavior).

I think I'm leaning towards no async library recommendation, Future + Mutex implemented, sleep + Timeout + Queue just work from the standard library. We haven't really needed more than this in any other language.

Comment on lines +490 to +495
* `Temporalio::Worker::WorkflowTaskExecutor::ThreadPool` that accepts a thread pool which defaults to max threads
unbounded.
* There is a `default` class method with a lazily created global default.
* ❓ Is this an acceptable default? Python uses `[Etc.nprocessors, 4].max` as the max threads, but we figure it can
be unbounded here and remain bounded by the Core `max_concurrent_workflow_tasks` (TODO: expose this, it was left
off in `main`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhmmm, unbounded thread creation seems scary to me. I think I'm OK leaving it bounded by max_concurrent_workflow_tasks as long as it's actually designed that way intentionally and it's not just something that happens by accident (e.g. it should be explicit in our code/code comments that this is the intention).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bounded by that from Core. But that is per worker and the thread pool is shared across workers for obvious benefits. So yes, Core ensures it cannot be more than the sum of max concurrent tasks across all workers using the thread pool.

is that not only are Ractors not stable and may have bugs, but they also give off a warning. A workflow instance will be
in a Ractor and Core activations will be communicated.

* ❓ Should we enable this by default, disable this by default, or somehow force the user to make a choice

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to force users to explicitly specify whether to use Ractors or not.

If we think we might want to change the default in the future, I think it's fine to require users to explicitly specify it now and remove the requirement later (so long as we're not asking them to do that for too many things). This enables us to migrate to the more restrictive default for future new users without breaking existing users.

We did something similar for Update IIRC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants