Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion of multiprocess mechanism in MultiThreadedAugmenter #100

Open
SeanCho1996 opened this issue Jul 8, 2022 · 0 comments
Open

Suggestion of multiprocess mechanism in MultiThreadedAugmenter #100

SeanCho1996 opened this issue Jul 8, 2022 · 0 comments

Comments

@SeanCho1996
Copy link

Hi, I noticed that the finish procedure in MultiThreadedAumenter uses the terminate() method of Process to end the child process by sending SIGTERM.

if len(self._processes) != 0:
logging.debug("MultiThreadedGenerator: shutting down workers...")
[i.terminate() for i in self._processes]

In my project, my main process has a sigterm-handler set up, which was meant to stop the process via SIGTERM at the end of my training, shown as follow:

 def _sigterm_handler(_signo, _stack_frame):
        logger.warn("Terminal signal received: %s, %s" % (_signo, _stack_frame))
        stop_worker()
        exit(0)

However, the following problem occurs when working with MultiThreadedAugmenter's terminate(): when the child process is created, it forks all the methods of the main process, including my sigterm-handler, which causes MultiThreadedAugmenter's ending SIGTERM will be caught by the sigterm-handler, which will directly end my training process.

A temporary solution I came up with is to override the child process with a default signal handler signal.SIG_DFL, so that the SIGTERM of the child process does not trigger the sigterm-handler forked from main process, which means adding one line at the beginning of producer() function:

def producer(queue, data_loader, transform, thread_id, seed, abort_event, wait_time: float = 0.02):
    signal.signal(signal.SIGTERM, signal.SIG_DFL)
    ...

Is it possible that a similar operation needs to be added to the source code to avoid the impact of the child process signal on the main process?

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant