Skip to content

Commit

Permalink
bugfix: the pcontext.pids return a dict with rank as key, and pid as …
Browse files Browse the repository at this point in the history
…value, not a pids set. so we need to call values() to get all the worker pids
  • Loading branch information
majieyue committed Oct 5, 2024
1 parent df662a4 commit 7951837
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions dlrover/python/elastic_agent/torch/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -783,11 +783,10 @@ def _stop_workers(
stop all the children processes before shutdown the workers
"""
if self._pcontext is not None:
for pid in self._pcontext.pids():
logger.info(f"kill process {pid} and its sub processes")
if pid == 0:
logger.info("skip invalid process 0")
continue
pc_pids = set(self._pcontext.pids().values())
logger.info(f"try to kill child processes of %s", pc_pids)
for pid in pc_pids:
logger.info(f"kill child processes of process {pid}")
try:
pp = psutil.Process(pid)
cp = pp.children()
Expand Down

0 comments on commit 7951837

Please sign in to comment.