Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job stuck in an infinite loop #59

Closed
mheily opened this issue Apr 16, 2016 · 5 comments
Closed

job stuck in an infinite loop #59

mheily opened this issue Apr 16, 2016 · 5 comments
Assignees
Labels

Comments

@mheily
Copy link
Owner

mheily commented Apr 16, 2016

sysadm goes to 100% CPU when launched by launchd. At first I thought it was signal related, but now I suspect a stray file descriptor. According to truss(1) it's in an infinite loop polling a set of file descriptors. This line is repeated:

poll({ 7/POLLIN 11/POLLIN },2,0)                 = 0 (0x0)

Here are the files it has open:

% sudo procstat -f 4839
  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME        
 4839 sysadm-binary     text v r r-------   -       - -   /usr/local/bin/sysadm-binary
 4839 sysadm-binary      cwd v d r-------   -       - -   /                 
 4839 sysadm-binary     root v d r-------   -       - -   /                 
 4839 sysadm-binary        0 v c r-------   1       0 -   /dev/null         
 4839 sysadm-binary        1 v c rw------   4       0 -   /dev/null         
 4839 sysadm-binary        2 v c rw------   4       0 -   /dev/null         
 4839 sysadm-binary        3 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary        4 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary        5 v r -wa-----   1  498659 -   -                 
 4839 sysadm-binary        6 s - rw---n--   1       0 TCP ::.12150 ::.0
 4839 sysadm-binary        7 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary        8 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary        9 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary       10 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary       11 k - rw------   1       0 -   -                 
 4839 sysadm-binary       12 v r r-------   2       0 -   /var/log/lpreserver/lpreserver.log
 4839 sysadm-binary       13 v r r-------   2       0 -   /var/log/lpreserver/lastrep-send-log

This is possibly related to bug #54

@mheily mheily self-assigned this Apr 16, 2016
@mheily mheily added the bug label Apr 16, 2016
@mheily
Copy link
Owner Author

mheily commented Apr 16, 2016

The job launches sysadm-server, which then launches a child named sysadm-binary. It is the child process (sysadm-binary) that exhibits the bad behavior.

It appears that sysadm-server dies, and this causes sysadm-binary to spin trying to talk to it.

Here is how sysadm-server was being spawned by rc(8):

sudo -i daemon -r -P /var/run/sysadm-daemon.pid -p /var/run/sysadm.pid /usr/local/bin/sysadm-server

When spawned like this, it opens the following files:

  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME        
16719 daemon            text v r r-------   -       - -   /usr/sbin/daemon  
16719 daemon             cwd v d r-------   -       - -   /root             
16719 daemon            root v d r-------   -       - -   /                 
16719 daemon               0 v c rw------  10   19604 -   /dev/pts/0        
16719 daemon               1 v c rw------  10   19604 -   /dev/pts/0        
16719 daemon               2 v c rw------  10   19604 -   /dev/pts/0        
16719 daemon               3 v r -w---n-l   1       0 -   /var/run/sysadm.pid
16719 daemon               4 v r -w---n-l   1       0 -   /var/run/sysadm-daemon.pid

I suspect that it is unhappy that launchd opens stdin/out/err to /dev/null, rather than /dev/pts/0.

@mheily
Copy link
Owner Author

mheily commented Apr 16, 2016

Here is the procstat output for volmand(8) that shows it redirects stdio descriptors to /dev/null

  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME        
  949 daemon            text v r r-------   -       - -   /usr/sbin/daemon  
  949 daemon             cwd v d r-------   -       - -   /                 
  949 daemon            root v d r-------   -       - -   /                 
  949 daemon               0 v c rw------   8       0 -   /dev/null         
  949 daemon               1 v c rw------   8       0 -   /dev/null         
  949 daemon               2 v c rw------   8       0 -   /dev/null         
  949 daemon               3 v r -w---n-l   1       0 -   /var/run/volmand.pid

There is a "-f" flag to daemon(8) that will do the standard I/O redirection, so I think it is normal and proper for daemons to work this way.

@mheily
Copy link
Owner Author

mheily commented Apr 16, 2016

Turns out /dev/null isn't the issue. I was able to run sysadm-server with stdio redirected to null via this command:

sudo -i daemon -f -r -P /var/run/sysadm-daemon.pid -p /var/run/sysadm.pid /usr/local/bin/sysadm-server

note the "-f" was added to the original command.

Maybe it's environment variables?

@mheily
Copy link
Owner Author

mheily commented Apr 16, 2016

Here's the environment variables when started under daemon(8):

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin PWD=/ HOME=/

If I modify the job manifest to set these variables, the job works fine.

{
        "Label": "org.pcbsd.sysadm-rest",
        "EnvironmentVariables": {
                "PATH": "/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin",
                "PWD": "/",
                "HOME": "/",
        },
        "ProgramArguments": ["/usr/local/bin/sysadm-server"],
        "RunAtLoad": true
}

@mheily
Copy link
Owner Author

mheily commented Apr 16, 2016

Here are the default variables currently set by launchd(8) for a different job:

2981 syscache-daemon  LOGNAME=root USER=root HOME=/root PATH=/usr/bin:/bin:/usr/local/bin SHELL=/bin/csh TMPDIR=/tmp

I think for daemons it would be wise to make launchd emit the same variables as daemon(8).

mheily added a commit that referenced this issue Apr 16, 2016
…on(8)

  does it on FreeBSD. (Fixes bug #59)

Also, stop ignoring SIGCHLD, and restore signal handlers using signal(2).
@mheily mheily closed this as completed Apr 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant