Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add running_workflows to ScheduleListInfo, for ListSchedules #464

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lina-temporal
Copy link
Contributor

@lina-temporal lina-temporal commented Oct 16, 2024

What changed?

  • A new field RunningWorkflows was added to ScheduleListInfo, matching the field in ScheduleInfo.

Why?

  • Customers would like to be able to drill down into running workflow executions without having to follow a ListSchedule call with fan-outs to DescribeSchedule).

Breaking changes

  • None here

Server PR

// Running workflows returned here are eventually consistent, and their
// status may be out-of-date. For a strongly consistent view of a schedule's
// running workflows, use the `DescribeSchedule` API.
repeated temporal.api.common.v1.WorkflowExecution running_workflows = 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This goes in the memo so I think we should keep it as short as possible.
  2. Part of the request was to include the close state of recent workflows, right? Is that going to be another change?

I'm thinking that for the close state, we can put it in ScheduleActionResult and fill it in when we know it.

For running workflows... since we don't track allowall anymore, there will be 0 or 1 running workflows, and if it's 1, it'll be the most recent action. So instead of repeating the wf id and run id from recent_actions here, we can just use a flag in the ScheduleActionResult for whether it's running or not? But actually the close state enum is just that: if it's not set yet, then it's still running, if it's set to completed/failed/timedout/cancelled/terminated, it's not running.

(If we do track running workflows from allowall, then that's not enough since there could be an old one still running. But we have no plans to do that right now so maybe we can wait until we do.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the request was to include the close state of recent workflows, right? Is that going to be another change?

Discussed during standup, I'll do this as part of this change.

I'm thinking that for the close state, we can put it in ScheduleActionResult and fill it in when we know it.

Sounds good, will do!

So instead of repeating the wf id and run id from recent_actions here, we can just use a flag in the ScheduleActionResult for whether it's running or not?

Sounds good in terms of state, I'll update. For the API, however, that would make the ListSchedulesResult distinct compared to DescribeSchedule (which has the separate running_workflows field), which doesn't seem great from a UX perspective; what do you think about populating a running_workflows field in API responses based on the flag in our mutable state memo's ScheduleActionResult?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reverse question, the presence of the new field in Describe: If we just add the extra field to the action result and fill it in in the state, it'll automatically come out in Describe (query) as well as List, and I think that's good, it's slightly more information than is there now (and doesn't need any versioning in the workflow).

For your actual question, about a running_workflows list in List: I'm concerned that we won't always have enough information to fill it in. E.g. if we use the state machine impl and have it track a limited number of concurrent workflows that aren't the latest (from AllowAll), and supply them in Describe, that's great, but we might not be able to fit them in the memo (state machines don't help there at all), so we'd still be inconsistent. Except then also misleading.

Or we might put a bunch in the memo, but only as int timestamps (to save space) and then reconstruct the workflow ids. But we wouldn't have the run ids.

So I'm leaning towards no, just the enum in recents in list results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For running workflows... since we don't track allowall anymore, there will be 0 or 1 running workflows, and if it's 1, it'll be the most recent action.

For the state machine rewrite this may not be true anymore but the current structure would still work for this, we'd just need to ensure we limit the tracked action count.

This all should have probably gone in a oneof to prepare for when we have other scheduled actions and for consistency with ScheduleAction but that may be too late.

Here's how I think it should have been done:

message ScheduleActionResult {
    // Time that the action was taken (according to the schedule, including jitter).
    google.protobuf.Timestamp schedule_time = 1;
    // Time that the action was taken (real time).
    google.protobuf.Timestamp actual_time = 2;

    message Workflow {
        temporal.api.common.v1.WorkflowExecution start_workflow_result = 1;
        temporal.api.enums.v1.WorkflowExecutionStatus status = 2;
    }
    
    oneof variant {
        Workflow workflow = 3;
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergundy I like the structure suggestion. I think it's probably too late for the existing API, but maybe we could consider introducing a V2 API when we support different schedule action types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. I approved the PR already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed the oneof is cleaner, maybe it's not too late though... we could do:

message ScheduleActionResult {
    // Time that the action was taken (according to the schedule, including jitter).
    google.protobuf.Timestamp schedule_time = 1;
    // Time that the action was taken (real time).
    google.protobuf.Timestamp actual_time = 2;

    message WorkflowExecutionWithStatus {  // superset of common.WorkflowExecution
        string workflow_id = 1;
        string run_id = 2;
        temporal.api.enums.v1.WorkflowExecutionStatus status = 3;
    }

    oneof variant {
        // If action was start_workflow:
        WorkflowExecutionWithStatus workflow = 11;
    }
}

That's fully backwards compatible at the proto level, though source code will have to change.

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this latest solution of just adding the status much better

// Running workflows returned here are eventually consistent, and their
// status may be out-of-date. For a strongly consistent view of a schedule's
// running workflows, use the `DescribeSchedule` API.
repeated temporal.api.common.v1.WorkflowExecution running_workflows = 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed the oneof is cleaner, maybe it's not too late though... we could do:

message ScheduleActionResult {
    // Time that the action was taken (according to the schedule, including jitter).
    google.protobuf.Timestamp schedule_time = 1;
    // Time that the action was taken (real time).
    google.protobuf.Timestamp actual_time = 2;

    message WorkflowExecutionWithStatus {  // superset of common.WorkflowExecution
        string workflow_id = 1;
        string run_id = 2;
        temporal.api.enums.v1.WorkflowExecutionStatus status = 3;
    }

    oneof variant {
        // If action was start_workflow:
        WorkflowExecutionWithStatus workflow = 11;
    }
}

That's fully backwards compatible at the proto level, though source code will have to change.


// If the action was start_workflow, this field will reflect an
// eventually-consistent view of the started workflow's status.
temporal.api.enums.v1.WorkflowExecutionStatus status = 12;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do with this extra field and not an embedded message, we should name it workflow_status or started_workflow_status so it's clear it's just for workflows, if there are more action types in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants