Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy forcibly forked test using SIGKILL #211

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tkowalcz
Copy link
Contributor

@tkowalcz tkowalcz commented Aug 2, 2024

If the JVM of the monitored forked test hangs then Process:destroy might not be effective. On unix systems it will send SIGINT. An alternative is to use Process: destroyForcibly method to use SIGKILL.

Using SIGKILL will not allow the monitored process to cleanly shutdown, but the question is if it already timed out then it will probably never do any cleanup.

Alternatively watchdog could try destroy first and then destroyForcibly.

Comments?
Thanks!

@jaikiran
Copy link
Member

jaikiran commented Aug 3, 2024

Hello Tomasz, can you tell us the details of the issue you are running into? That might help understand what change needs to be done.

@tkowalcz
Copy link
Contributor Author

tkowalcz commented Aug 4, 2024

Hello Tomasz, can you tell us the details of the issue you are running into? That might help understand what change needs to be done.

Absolutely. I just wanted to get the conversation started. Thanks for taking time to reply.

When using junitlauncher with timeout:

 <junitlauncher
    taskname="JUnit5"
    haltonfailure="${junit.haltonfailure}"
    failureproperty="junit.failures"
    printsummary="false">
    ...
    <fork timeout="${junit.timeout}">
        ...
    </fork>
</junitlauncher>

it setups ExecuteWatchdog that will terminate the forked process after timeout passes. If the forked JVM is very busy (e.g. doing GC back to back) it will not terminate (it has the signal handler installed but fails to act upon receiving the signal). The only option is to issue a SIGKILL.

In our case we had the JVM stuck for yet to be discovered reason. Tools like jstack were unable to attach to it unless -Force option was used. The CI job that was running test suite got stuck waiting for ant task to time out but it never did. Eventually job level timeout of Jenkins kicked in and terminated the parent process.

I was able to verify following - sending SIGINT to the forked JVM did not shut it down. Sending SIGKILL did and the junitlauncher properly continued - set failureproperty and continued if haltonfailure was set to false.

Since the test in question got stuck consistently I verified that after this change ExecuteWatchdog correctly terminated forked process.

@jaikiran
Copy link
Member

Hello Tomasz,

Tools like jstack were unable to attach to it unless -Force option was used.

Were you able to get hold of a thread dump with -F?

This code dealing with process termination resides in a core layer of Ant and has been around for a long time. So having as much details as possible to see what's causing this issue will help understand if this code deserves a change or if we should address this in a different manner (perhaps in the junitlauncher task specific code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants