Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new API to pause and resume the CPUs and handle it properly #6608

Merged
merged 3 commits into from
Jan 13, 2025

Conversation

neel-samsung
Copy link
Contributor

No description provided.

os/arch/arm/src/armv7-a/arm_assert.c Outdated Show resolved Hide resolved
Comment on lines 628 to 633
#ifdef CONFIG_SMP
/* If SMP is enabled then we need to resume all the other cpu's
* which we paused earlier.
*/
up_cpu_resume_all();
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right to resume all cpus before recovering the fault?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there will be two cases:

  1. If it is kernel fault, then the board will reboot. Resuming all CPUs won't affect this case.
  2. If the crash is in app binary, then we will be pausing all the CPUs inside recovery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Even we call reboot, why do we need to resume?
  2. Who(where) pauses the all cpus in recovery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah, I agree. There is no point in resuming here for this case.
  2. As, we are not pausing the CPUs in recovery as of now, let me revert this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. When app fault is done in recovery, all the cpus should be resumed. You should add resuming at somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check it properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, we don't support recovery with SMP, let us add it separately while supporting recovery

os/arch/arm/src/armv7-a/arm_cpupause.c Outdated Show resolved Hide resolved
Comment on lines 427 to 448
* Returned Value:
* Zero on success; a negated errno value on failure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always returns OK. Need the return?

Copy link
Contributor Author

@neel-samsung neel-samsung Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we can make return type void.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You changed the return type to void, but this line is not changed.

os/arch/arm/src/armv7-a/arm_cpupause.c Outdated Show resolved Hide resolved
os/arch/arm/src/armv7-a/arm_cpupause.c Show resolved Hide resolved
os/arch/arm/src/armv7-a/arm_assert.c Outdated Show resolved Hide resolved
@@ -90,6 +90,14 @@ void mm_manage_alloc_fail(struct mm_heap_s *heap, int startidx, int endidx, size
{
irqstate_t flags = enter_critical_section();

#ifdef CONFIG_SMP
/* If SMP is enabled then we need to pause all the other cpu's immediately
* because the kernel state might be invalid at this point. If we don't pause
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think the kernel state will be invalid. Let us just mention that we pause the other cpu to avoid mixed logs.

This patch adds new API to pause and resume all the cpus other than current cpu.

Signed-off-by: neel-samsung <[email protected]>
…nt log mixup

This patch adds code to pause all the CPUs while printing the memory allocation
failure logs. It immediately resume the paused cores after printing it.

Signed-off-by: neel-samsung <[email protected]>
@sunghan-chang
Copy link
Contributor

@r-prabu @kishore-sn please review and merge this.

/* If SMP is enabled and there is a crash, then we need to pause all
* the other cpu's immediately because the kernel state might be
* invalid at this point. If we dont pause other cpu's then it might
* lead to multiple asserts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is not valid. Let us change as follows:
"Pause all other cpus to avoid mix up of logs while printing asssert logs"

Earlier, we pause the CPUs only when the crash is in kernel space.
But now, regardless of the assert space, we pause all the other CPUs.
We also resume the CPUs just after the printing of assert logs.

Signed-off-by: neel-samsung <[email protected]>
@sunghan-chang
Copy link
Contributor

@ewoodev Could you let us know how to reproduce the issue?
@neel-samsung Once @ewoodev shares the way, could you confirm this PR?
@r-prabu And then let's merge this.

@ewoodev
Copy link
Contributor

ewoodev commented Jan 7, 2025

I havn't reproduct this issue.
but I think, you can make log mixing env with adding printing log cpu1 and making cpu0 alloc fail.

@sunghan-chang
Copy link
Contributor

I havn't reproduct this issue. but I think, you can make log mixing env with adding printing log cpu1 and making cpu0 alloc fail.

@neel-samsung Could you confirm this PR with the way @ewoodev suggested?

@neel-samsung
Copy link
Contributor Author

I havn't reproduct this issue. but I think, you can make log mixing env with adding printing log cpu1 and making cpu0 alloc fail.

@neel-samsung Could you confirm this PR with the way @ewoodev suggested?

yes, I have made cpu0 to mem alloc fail and on the other hand created new thread to run on the core1 to print the log.
There is no log mixup. Hence, the PR seems to be tested and ready to merge.

@sunghan-chang
Copy link
Contributor

@kishore-sn @r-prabu Let's merge

@kishore-sn kishore-sn merged commit 2dec2fe into Samsung:master Jan 13, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants