Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched: fix deadline init value and add a test case #69429

Closed
wants to merge 3 commits into from

Conversation

TaiJuWu
Copy link
Member

@TaiJuWu TaiJuWu commented Feb 25, 2024

The init value of deadline scheduler algorithm should be INT_MAX.
If this value set to 0, it means normal task is going to run before deadline task.

@TaiJuWu TaiJuWu changed the title sched: fix deadline init value and a test case sched: fix deadline init value and add a test case Feb 26, 2024
@TaiJuWu TaiJuWu force-pushed the fix_deadline branch 2 times, most recently from eacf7cd to 9cb3cc9 Compare February 26, 2024 07:12
@TaiJuWu TaiJuWu marked this pull request as draft February 26, 2024 07:41
@TaiJuWu TaiJuWu marked this pull request as ready for review February 26, 2024 07:53
@TaiJuWu
Copy link
Member Author

TaiJuWu commented Feb 27, 2024

Hi @andyross @peter-mitsis @cfriedt, Could you take a look?
I think this is a critical bug about deadline scheduler if I'm right.

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TaiJuWu - the provided test definitely needs to be elaborated a bit more and it looks like there are still some races.

I haven't looked to see if we already have sufficient coverage for the deadline scheduler but it might be worth looking at how some of the other scheduler experiments have been set up.

Note: the argument to k_thread_deadline_set() has the same units as the value returned from k_cycle_get_32(), so that can be used to form the experiment in terms of time durations, which is maybe easier to deal with in terms of setting up the right conditions.

From what I understand, the lower the deadline, the earlier a thread must be scheduled.

If we're lacking a kind of stress test for the deadline scheduler it might be useful to hash that out before revising.

@TaiJuWu
Copy link
Member Author

TaiJuWu commented Feb 27, 2024

@TaiJuWu - the provided test definitely needs to be elaborated a bit more and it looks like there are still some races.

The test environment is 1 cpu so it's not race.
You can see here.

I haven't looked to see if we already have sufficient coverage for the deadline scheduler but it might be worth looking at how some of the other scheduler experiments have been set up.

Note: the argument to k_thread_deadline_set() has the same units as the value returned from k_cycle_get_32(), so that can be used to form the experiment in terms of time durations, which is maybe easier to deal with in terms of setting up the right conditions.

From what I understand, the lower the deadline, the earlier a thread must be scheduled.

You are right, I have same idea but the init valuse is 0.
here is the place which user set deadline and always larger than 0.
The code compare priority is here so you can find the normal task will run first.

If we're lacking a kind of stress test for the deadline scheduler it might be useful to hash that out before revising.

@peter-mitsis
Copy link
Collaborator

@TaiJuWu - After an initial reading, I think that you are on to something here. However, my poor brain is going to need some more time to parse through this.

@TaiJuWu
Copy link
Member Author

TaiJuWu commented Feb 27, 2024

@TaiJuWu - After an initial reading, I think that you are on to something here. However, my poor brain is going to need some more time to parse through this.

Thanks for your response.
If I’m wrong, please tell me.
Maybe this patch is a bad solution or I missed something.

Copy link
Collaborator

@peter-mitsis peter-mitsis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging into it more, I think the current code is correct for the model that it uses. There are two different time scales at play here which I will call T(x) and t(x). T(x) represents the lower 32-bits of absolute time (k_cycle_get_32()). t(x) represents the sliding window of relative time such that t(0) = T(now), or t(0) = T(k_cycle_get_32()). When we call k_thread_deadline_set(), the deadline is specified in relative time. Internally we convert it to T(x) absolute time using deadline + now.

Two other important things to note ...

  1. Unlike T(x) which is 32 bits unsigned, t(x) is 32 bits signed.
  2. The number of cycles between the “first” deadline in the scheduler queue and the “last” deadline must be less than 2^31

Combining this altogether, we see that when k_cycle_get_32() exceeds 0x80000000, this means that that T(0) effectively becomes a future point in time, and z_sched_prio_cmp() will treat it as such.

If normal and deadline task coexist, the deadline task should execute
first.

Signed-off-by: TaiJu Wu <[email protected]>
@TaiJuWu
Copy link
Member Author

TaiJuWu commented Feb 28, 2024

After digging into it more, I think the current code is correct for the model that it uses. There are two different time scales at play here which I will call T(x) and t(x). T(x) represents the lower 32-bits of absolute time (k_cycle_get_32()). t(x) represents the sliding window of relative time such that t(0) = T(now), or t(0) = T(k_cycle_get_32()). When we call k_thread_deadline_set(), the deadline is specified in relative time. Internally we convert it to T(x) absolute time using deadline + now.

Two other important things to note ...

  1. Unlike T(x) which is 32 bits unsigned, t(x) is 32 bits signed.
  2. The number of cycles between the “first” deadline in the scheduler queue and the “last” deadline must be less than 2^31

Combining this altogether, we see that when k_cycle_get_32() exceeds 0x80000000, this means that that T(0) effectively becomes a future point in time, and z_sched_prio_cmp() will treat it as such.

Thanks for your answers. I need more time to think this issue more deep.

And I want to check a thing which is normal task should run after deadline task, right?
In my new testcase, the normal task will run first and deadline task start to run. It seems not our expection.

The init value of deadline scheduler algorithm should be INT_MIN.

Case study 1: both tasks are deadline task.
These condition already include test/kernel/sched/deadline

In below, I will use 4-bit to replace 32-bit operation.

Case study 2: T1 is deadline task, T2 is also deadline task and  will make overflow
Example:
If T1's deadline is 0100 -> both signed and unsingend is 4.
If T2's deadline is 1001 (uint4_t + int4_t) -> signed is -7 and unsigned is 9.
(uint4_t)T2-(uint4_t)T1 = 9 -2 = 7 -> 0111 > 0. T1 execute first.

Case study 3: T1 is normal task and T2 is deadline task.
Example:
If normal task deadline is INT_MIN(1000 in this case.)
If deadline is 0100->both signed and unsigned is 4.
(uint4_t)T2 - (uint4_t)T1 = 0100 - 1000 = -4 < 0. T2 execute first.

Signed-off-by: TaiJu Wu <[email protected]>
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the way deadline comparisons work. The units of the deadline field are in units of cycles (c.f. k_cycle_get_32()) with expected rollover. They aren't absolute. Picking a different number just means that the scheduling vs. threads that don't set their deadlines will be arbitrarily different.

Really the kernel doesn't support mixing deadline and non-deadline threads in the same priority. If you have EDF enabled, you're expected to set deadlines for threads that want to share a priority, or else to leave them unset (and thus equal, which then falls through to "traditional" round robin ordering).

FWIW, one thing that's been suggested in the past is having a way to "clear" a deadline (i.e. to recover equality/round-robin scheduling behavior), which the kernel doesn't provide. IIRC I had suggested at one point doing this when the priority changes, but you could add a separate call instead. If you're worried about this area of the code, that might be more productive.

@TaiJuWu
Copy link
Member Author

TaiJuWu commented Feb 29, 2024

This isn't the way deadline comparisons work. The units of the deadline field are in units of cycles (c.f. k_cycle_get_32()) with expected rollover. They aren't absolute. Picking a different number just means that the scheduling vs. threads that don't set their deadlines will be arbitrarily different.

Really the kernel doesn't support mixing deadline and non-deadline threads in the same priority. If you have EDF enabled, you're expected to set deadlines for threads that want to share a priority, or else to leave them unset (and thus equal, which then falls through to "traditional" round robin ordering).

FWIW, one thing that's been suggested in the past is having a way to "clear" a deadline (i.e. to recover equality/round-robin scheduling behavior), which the kernel doesn't provide. IIRC I had suggested at one point doing this when the priority changes, but you could add a separate call instead. If you're worried about this area of the code, that might be more productive.

I want to summarize your response.
In zephyr CONFIG_DEADLINE enabled, all threads should set deadline?
If not set, the thread will run first, right?
So all behavior is working as your expect?

IMHO, if users does not set deadline, a possible is they hope this task start running until deadline task finished.

@TaiJuWu
Copy link
Member Author

TaiJuWu commented Mar 2, 2024

After digging into it more, I think the current code is correct for the model that it uses. There are two different time scales at play here which I will call T(x) and t(x). T(x) represents the lower 32-bits of absolute time (k_cycle_get_32()). t(x) represents the sliding window of relative time such that t(0) = T(now), or t(0) = T(k_cycle_get_32()). When we call k_thread_deadline_set(), the deadline is specified in relative time. Internally we convert it to T(x) absolute time using deadline + now.

Two other important things to note ...

  1. Unlike T(x) which is 32 bits unsigned, t(x) is 32 bits signed.
  2. The number of cycles between the “first” deadline in the scheduler queue and the “last” deadline must be less than 2^31

Combining this altogether, we see that when k_cycle_get_32() exceeds 0x80000000, this means that that T(0) effectively becomes a future point in time, and z_sched_prio_cmp() will treat it as such.

I notice a truth which is if k_cycle_get_32() exceeds 0x80000000 and the deadline is smaller than 0 (This is allowed because we don't check the parameter of deadline).
We can't distinguish the job should be done in future or in past.

Copy link

github-actions bot commented May 2, 2024

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label May 2, 2024
@github-actions github-actions bot closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants