sched: fix deadline init value and add a test case #69429

TaiJuWu · 2024-02-25T17:04:53Z

The init value of deadline scheduler algorithm should be INT_MAX.
If this value set to 0, it means normal task is going to run before deadline task.

TaiJuWu · 2024-02-27T07:21:17Z

Hi @andyross @peter-mitsis @cfriedt, Could you take a look?
I think this is a critical bug about deadline scheduler if I'm right.

cfriedt

@TaiJuWu - the provided test definitely needs to be elaborated a bit more and it looks like there are still some races.

I haven't looked to see if we already have sufficient coverage for the deadline scheduler but it might be worth looking at how some of the other scheduler experiments have been set up.

Note: the argument to k_thread_deadline_set() has the same units as the value returned from k_cycle_get_32(), so that can be used to form the experiment in terms of time durations, which is maybe easier to deal with in terms of setting up the right conditions.

From what I understand, the lower the deadline, the earlier a thread must be scheduled.

If we're lacking a kind of stress test for the deadline scheduler it might be useful to hash that out before revising.

TaiJuWu · 2024-02-27T08:49:04Z

@TaiJuWu - the provided test definitely needs to be elaborated a bit more and it looks like there are still some races.

The test environment is 1 cpu so it's not race.
You can see here.

I haven't looked to see if we already have sufficient coverage for the deadline scheduler but it might be worth looking at how some of the other scheduler experiments have been set up.

Note: the argument to k_thread_deadline_set() has the same units as the value returned from k_cycle_get_32(), so that can be used to form the experiment in terms of time durations, which is maybe easier to deal with in terms of setting up the right conditions.

From what I understand, the lower the deadline, the earlier a thread must be scheduled.

You are right, I have same idea but the init valuse is 0.
here is the place which user set deadline and always larger than 0.
The code compare priority is here so you can find the normal task will run first.

If we're lacking a kind of stress test for the deadline scheduler it might be useful to hash that out before revising.

peter-mitsis · 2024-02-27T16:14:09Z

@TaiJuWu - After an initial reading, I think that you are on to something here. However, my poor brain is going to need some more time to parse through this.

TaiJuWu · 2024-02-27T17:00:52Z

@TaiJuWu - After an initial reading, I think that you are on to something here. However, my poor brain is going to need some more time to parse through this.

Thanks for your response.
If I’m wrong, please tell me.
Maybe this patch is a bad solution or I missed something.

peter-mitsis

After digging into it more, I think the current code is correct for the model that it uses. There are two different time scales at play here which I will call T(x) and t(x). T(x) represents the lower 32-bits of absolute time (k_cycle_get_32()). t(x) represents the sliding window of relative time such that t(0) = T(now), or t(0) = T(k_cycle_get_32()). When we call k_thread_deadline_set(), the deadline is specified in relative time. Internally we convert it to T(x) absolute time using deadline + now.

Two other important things to note ...

Unlike T(x) which is 32 bits unsigned, t(x) is 32 bits signed.
The number of cycles between the “first” deadline in the scheduler queue and the “last” deadline must be less than 2^31

Combining this altogether, we see that when k_cycle_get_32() exceeds 0x80000000, this means that that T(0) effectively becomes a future point in time, and z_sched_prio_cmp() will treat it as such.

If normal and deadline task coexist, the deadline task should execute first. Signed-off-by: TaiJu Wu <[email protected]>

TaiJuWu · 2024-02-28T18:12:50Z

After digging into it more, I think the current code is correct for the model that it uses. There are two different time scales at play here which I will call T(x) and t(x). T(x) represents the lower 32-bits of absolute time (k_cycle_get_32()). t(x) represents the sliding window of relative time such that t(0) = T(now), or t(0) = T(k_cycle_get_32()). When we call k_thread_deadline_set(), the deadline is specified in relative time. Internally we convert it to T(x) absolute time using deadline + now.

Two other important things to note ...

Unlike T(x) which is 32 bits unsigned, t(x) is 32 bits signed.

The number of cycles between the “first” deadline in the scheduler queue and the “last” deadline must be less than 2^31

Combining this altogether, we see that when k_cycle_get_32() exceeds 0x80000000, this means that that T(0) effectively becomes a future point in time, and z_sched_prio_cmp() will treat it as such.

Thanks for your answers. I need more time to think this issue more deep.

And I want to check a thing which is normal task should run after deadline task, right?
In my new testcase, the normal task will run first and deadline task start to run. It seems not our expection.

The init value of deadline scheduler algorithm should be INT_MIN. Case study 1: both tasks are deadline task. These condition already include test/kernel/sched/deadline In below, I will use 4-bit to replace 32-bit operation. Case study 2: T1 is deadline task, T2 is also deadline task and will make overflow Example: If T1's deadline is 0100 -> both signed and unsingend is 4. If T2's deadline is 1001 (uint4_t + int4_t) -> signed is -7 and unsigned is 9. (uint4_t)T2-(uint4_t)T1 = 9 -2 = 7 -> 0111 > 0. T1 execute first. Case study 3: T1 is normal task and T2 is deadline task. Example: If normal task deadline is INT_MIN(1000 in this case.) If deadline is 0100->both signed and unsigned is 4. (uint4_t)T2 - (uint4_t)T1 = 0100 - 1000 = -4 < 0. T2 execute first. Signed-off-by: TaiJu Wu <[email protected]>

andyross

This isn't the way deadline comparisons work. The units of the deadline field are in units of cycles (c.f. k_cycle_get_32()) with expected rollover. They aren't absolute. Picking a different number just means that the scheduling vs. threads that don't set their deadlines will be arbitrarily different.

Really the kernel doesn't support mixing deadline and non-deadline threads in the same priority. If you have EDF enabled, you're expected to set deadlines for threads that want to share a priority, or else to leave them unset (and thus equal, which then falls through to "traditional" round robin ordering).

FWIW, one thing that's been suggested in the past is having a way to "clear" a deadline (i.e. to recover equality/round-robin scheduling behavior), which the kernel doesn't provide. IIRC I had suggested at one point doing this when the priority changes, but you could add a separate call instead. If you're worried about this area of the code, that might be more productive.

TaiJuWu · 2024-02-29T00:51:16Z

This isn't the way deadline comparisons work. The units of the deadline field are in units of cycles (c.f. k_cycle_get_32()) with expected rollover. They aren't absolute. Picking a different number just means that the scheduling vs. threads that don't set their deadlines will be arbitrarily different.

Really the kernel doesn't support mixing deadline and non-deadline threads in the same priority. If you have EDF enabled, you're expected to set deadlines for threads that want to share a priority, or else to leave them unset (and thus equal, which then falls through to "traditional" round robin ordering).

FWIW, one thing that's been suggested in the past is having a way to "clear" a deadline (i.e. to recover equality/round-robin scheduling behavior), which the kernel doesn't provide. IIRC I had suggested at one point doing this when the priority changes, but you could add a separate call instead. If you're worried about this area of the code, that might be more productive.

I want to summarize your response.
In zephyr CONFIG_DEADLINE enabled, all threads should set deadline?
If not set, the thread will run first, right?
So all behavior is working as your expect?

IMHO, if users does not set deadline, a possible is they hope this task start running until deadline task finished.

TaiJuWu · 2024-03-02T14:26:52Z

After digging into it more, I think the current code is correct for the model that it uses. There are two different time scales at play here which I will call T(x) and t(x). T(x) represents the lower 32-bits of absolute time (k_cycle_get_32()). t(x) represents the sliding window of relative time such that t(0) = T(now), or t(0) = T(k_cycle_get_32()). When we call k_thread_deadline_set(), the deadline is specified in relative time. Internally we convert it to T(x) absolute time using deadline + now.

Two other important things to note ...

Unlike T(x) which is 32 bits unsigned, t(x) is 32 bits signed.

The number of cycles between the “first” deadline in the scheduler queue and the “last” deadline must be less than 2^31

Combining this altogether, we see that when k_cycle_get_32() exceeds 0x80000000, this means that that T(0) effectively becomes a future point in time, and z_sched_prio_cmp() will treat it as such.

I notice a truth which is if k_cycle_get_32() exceeds 0x80000000 and the deadline is smaller than 0 (This is allowed because we don't check the parameter of deadline).
We can't distinguish the job should be done in future or in past.

github-actions · 2024-05-02T00:29:24Z

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

zephyrbot added the area: Kernel label Feb 25, 2024

zephyrbot requested review from andyross, ceolin, cfriedt, dcpleung, nashif, npitre and peter-mitsis February 25, 2024 17:05

zephyrbot assigned andyross Feb 25, 2024

TaiJuWu changed the title ~~sched: fix deadline init value and a test case~~ sched: fix deadline init value and add a test case Feb 26, 2024

TaiJuWu force-pushed the fix_deadline branch 2 times, most recently from eacf7cd to 9cb3cc9 Compare February 26, 2024 07:12

TaiJuWu marked this pull request as draft February 26, 2024 07:41

TaiJuWu marked this pull request as ready for review February 26, 2024 07:53

cfriedt reviewed Feb 27, 2024

View reviewed changes

peter-mitsis requested changes Feb 28, 2024

View reviewed changes

test: mix deadline and normal task

9e4e5da

If normal and deadline task coexist, the deadline task should execute first. Signed-off-by: TaiJu Wu <[email protected]>

TaiJuWu force-pushed the fix_deadline branch from 9cb3cc9 to 6b3dcb6 Compare February 28, 2024 21:07

TaiJuWu force-pushed the fix_deadline branch from 6b3dcb6 to 4764049 Compare February 28, 2024 21:11

TaiJuWu requested a review from peter-mitsis February 28, 2024 21:11

andyross requested changes Feb 28, 2024

View reviewed changes

allow empty

f4215c3

github-actions bot added the Stale label May 2, 2024

github-actions bot closed this May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sched: fix deadline init value and add a test case #69429

sched: fix deadline init value and add a test case #69429

TaiJuWu commented Feb 25, 2024

TaiJuWu commented Feb 27, 2024

cfriedt left a comment

TaiJuWu commented Feb 27, 2024 •

edited

Loading

peter-mitsis commented Feb 27, 2024

TaiJuWu commented Feb 27, 2024 •

edited

Loading

peter-mitsis left a comment

TaiJuWu commented Feb 28, 2024 •

edited

Loading

andyross left a comment

TaiJuWu commented Feb 29, 2024 •

edited

Loading

TaiJuWu commented Mar 2, 2024

github-actions bot commented May 2, 2024

sched: fix deadline init value and add a test case #69429

sched: fix deadline init value and add a test case #69429

Conversation

TaiJuWu commented Feb 25, 2024

TaiJuWu commented Feb 27, 2024

cfriedt left a comment

Choose a reason for hiding this comment

TaiJuWu commented Feb 27, 2024 • edited Loading

peter-mitsis commented Feb 27, 2024

TaiJuWu commented Feb 27, 2024 • edited Loading

peter-mitsis left a comment

Choose a reason for hiding this comment

TaiJuWu commented Feb 28, 2024 • edited Loading

andyross left a comment

Choose a reason for hiding this comment

TaiJuWu commented Feb 29, 2024 • edited Loading

TaiJuWu commented Mar 2, 2024

github-actions bot commented May 2, 2024

TaiJuWu commented Feb 27, 2024 •

edited

Loading

TaiJuWu commented Feb 27, 2024 •

edited

Loading

TaiJuWu commented Feb 28, 2024 •

edited

Loading

TaiJuWu commented Feb 29, 2024 •

edited

Loading