Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collect] Local node does not finalize_sos_cmd with same arguments as remote nodes #3828

Open
TrevorBenson opened this issue Oct 31, 2024 · 7 comments

Comments

@TrevorBenson
Copy link
Member

TrevorBenson commented Oct 31, 2024

Executing:

sudo sos collect --nopasswd-sudo --ssh-key /etc/project/id_edd2519 --ssh-user project-user --nodes 172.31.100.121 -o block

The sos_logs/sos.log shows that the final command run by the local system does not run the same options,

2024-10-31 21:00:00,656 INFO: [system2:finalize_sos_cmd] Final sos command set to /usr/sbin/sos report --batch  --chroot auto --only-plugins=block
2024-10-31 21:00:00,656 INFO: [system1:finalize_sos_cmd] Final sos command set to /usr/sbin/sos report --batch  --chroot auto

The --only-plugins=block is only applied to remote nodes. I found this was the same regardless if the argument is being provided to sos collect on the command line or if it was part of the /etc/sos/sos.conf defined in globals, report, collect, or plugin_options.

  • OS: Enterprise Linux 8 (Rocky Linux 8.10 in this case)
  • Version of sos: sos-4.7.2-2.el8_10.noarch
@jcastill
Copy link
Member

jcastill commented Nov 1, 2024

I managed to reproduce this in RHEL 8 only, going to RHEL 8 and RHEL 9 nodes. But I cannot repro in RHEL 9, going to 8 and 9 nodes, that's really interesting. I'll see if I can find where the problem is, but if anybody else knows what's going on please comment.

Also, regarding this from #3827 that I imagine happens here as well:

[192.168.124.191:run_command] Running command sudo -S sos report -l
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range
[192.168.124.191:_regex_sos_help] Error parsing sos help: list index out of range

The problem is that 'sos report -l' output contains empty lines, so a split on the line (or rather, access to an element of that split()) was failing. I've sent a PR that solves the issue.

@TrevorBenson
Copy link
Member Author

I managed to reproduce this in RHEL 8 only, going to RHEL 8 and RHEL 9 nodes. But I cannot repro in RHEL 9, going to 8 and 9 nodes, that's really interesting. I'll see if I can find where the problem is, but if anybody else knows what's going on please comment.

Interesting. Was this running the sos-4.7.2-3.el9 RPM?

@jcastill
Copy link
Member

jcastill commented Nov 1, 2024

Interesting. Was this running the sos-4.7.2-3.el9 RPM?

Yes. We only had 4.7.2-2 in RHEL 8, and sos-4.7.2-3 only in RHEL9, but both versions (-2, -3) are basically the same.

I think the issue is on sudo here. If you run it as root directly, we get no problems. When it fails in my case, I get an error saying that the plugin doesn't exist:

[192.168.124.156:finalize_sos_cmd] Requested plugins ['block'] were requested to be enabled but do not exist

The list of plugins (both 'enabled' and 'disabled') is empty, as well as other elements that we should be populating in sos_info:

{'version': '4.7.2-3', 'enabled': [], 'disabled': [], 'options': [], 'presets': [], 'sos_cmd': 'sosreport --batch '}

Interestingly, in my case it's the remote node the one that fails to run '-o block'. I'm going to test now with RHEL8 as the local host, and RHEL9 and another RHEL8 as remote nodes, and see what happens.

@TrevorBenson
Copy link
Member Author

TrevorBenson commented Nov 2, 2024

I think the issue is on sudo here. If you run it as root directly, we get no problems. When it fails in my case, I get an error saying that the plugin doesn't exist:

Maybe partly? As reported in my original ticket sudo sos report works perfectly fine. I just double checked to be 100% sure, and sudo sos report -o block also works perfectly fine:

https://gist.github.com/TrevorBenson/682f6df6645ef3f81010471835afa507#file-gistfile1-txt-L24-L34

For good measure, I tried sos collect directly from the root account to compare with the sudo sos collect. In all cases sos collect (both with and without sudo to escalate privileges) using 4.7.2-2 under an EL8 distro leads to an issue identifying plugins and a difference between the local and remote nodes.

So I only observe this issue when using sos collect.

@TrevorBenson
Copy link
Member Author

I do notice that regardless of using sudo sos collect --nopasswd-sudo --ssh-user project-user or sos collect --nopasswd-sudo --ssh-user project-user I still observe a sudo timeout waiting for password on the local/collector host.


Current password: 
sudo: timed out reading password

sudo: unable to change expired password: Authentication token manipulation error

sudo: a password is required

I don't have an EL9 included in the mix, but under EL8 it is the local node who seems to be ignoring the --nopasswd-sudo option and timing out waiting on a sudo password.

I suspect this cascades into differing finalized commands due to no list of plugins for the local node.

@jcastill
Copy link
Member

jcastill commented Nov 3, 2024

This is really curious. Can you see anything interesting if you run collect with -vvv ? I get this in the RHEL 8 local node:

[localhost:run_command] Running command hostname
[localhost:_get_hostname] Hostname set to localhost.localdomain
[localhost:determine_host_policy] using local policy Red Hat Enterprise Linux
[localhost:_load_sos_info] sos version is 4.7.2-2
[localhost:run_command] Running command sudo -S sos report -l
[localhost:run_command] Running command sudo -S sos report --list-presets

When I try to reproduce with all the different commands you've tried.

@TrevorBenson
Copy link
Member Author

The sudo -S sos report -l lasts around 60 seconds. The sudo -S sos report --list-presets lasts around 60 seconds.

The complete output (minus the banner messages from --batch usage) is:

$ sudo sos collect --batch --nopasswd-sudo --debug -vvv --ssh-key /etc/project/pki/salt-bootstrap --ssh-user project-user --nodes 172.31.100.121 -o block
[sos_collector:__init__] Executing /usr/sbin/sos collect --batch --nopasswd-sudo --debug -vvv --ssh-key /etc/project/pki/salt-bootstrap --ssh-user project-user --nodes 172.31.100.121 -o block
[sos_collector:__init__] Found cluster profiles: dict_keys(['ceph', 'jbon', 'juju', 'kubernetes', 'ocp', 'rhosp', 'ovirt', 'rhhi_virt', 'rhv', 'pacemaker', 'saltstack', 'satellite'])

sos-collector (version 4.7.2)

[...]

[localhost:run_command] Running command hostname
[system1:_get_hostname] Hostname set to system1
[localhost:determine_host_policy] using local policy Rocky Linux
[system1:_load_sos_info] sos version is 4.7.2-2
[system1:run_command] Running command sudo -S sos report -l
[system1:run_command] Running command sudo -S sos report --list-presets
[ocp] oc base command set to oc
[system1:run_command] Running command oc whoami
[system1:_run_command_with_pexpect] The command was not found or was not executable: oc.
[system1:run_command] Running command sudo -S stat /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml
Cluster type could not be determined, but --nodes is provided. Attempting to continue using JBON cluster type and the node list
[sos_collector:get_nodes_from_cluster] Node list: []
[sos_collector:get_nodes] Force adding 172.31.100.121 to node list
[sos_collector:reduce_node_list] Node list reduced to ['172.31.100.121']

The following is a list of nodes to collect from:
	system1      
	172.31.100.121

[archive:TarFileArchive] initialised empty FileCacheArchive at '/var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw'
[archive:TarFileArchive] created directory at 'sos_logs' in FileCacheArchive '/var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw'

Connecting to nodes...
[172.31.100.121:_connect] Opening SSH session to create control socket
[172.31.100.121:_connect] Successfully created control socket at /var/tmp/sos.j4rz03uc/.sos-collector-172.31.100.121
[172.31.100.121:run_command] Running command hostname
[system2:_get_hostname] Hostname set to system2
[172.31.100.121:read_file] Reading file /etc/os-release
[system2:read_file] Reading file /etc/os-release
[system2:run_command] Running command cat /etc/os-release
[system2:run_command] Running command rpm -qa --queryformat "%{NAME}|%{VERSION}|%{RELEASE}\n"
[system2:run_command] Running command flatpak list --columns=name,version,branch
[172.31.100.121:determine_host_policy] loaded policy Rocky Linux for host
[system2:_load_sos_info] sos version is 4.7.2-2
[system2:run_command] Running command sudo -S sos report -l
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:_regex_sos_help] Error parsing sos help: list index out of range
[system2:run_command] Running command sudo -S sos report --list-presets

Beginning collection of sosreports from 2 nodes, collecting a maximum of 4 concurrently

[system2:finalize_sos_cmd] Final sos command set to /usr/sbin/sos report --batch  --chroot auto --only-plugins=block
[system1:finalize_sos_cmd] Requested plugins ['block'] were requested to be enabled but do not exist
[system1:finalize_sos_cmd] Final sos command set to /usr/sbin/sos report --batch  --chroot auto
system1  : Generating sos report...
system2        : Generating sos report...
[system1:run_command] Running command sudo -S /usr/sbin/sos report --batch  --chroot auto
[system2:run_command] Running command sudo -S /usr/sbin/sos report --batch  --chroot auto --only-plugins=block
[system2:run_command] Shell requested, command is now /bin/bash -c 'sudo -S /usr/sbin/sos report --batch  --chroot auto --only-plugins=block'
[system1:run_command] Shell requested, command is now /bin/bash -c 'sudo -S /usr/sbin/sos report --batch  --chroot auto'
[system2:finalize_sos_path] Final sos path: /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz
[system2:run_command] Running command sudo -S chmod o+r /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz
[system2:retrieve_sosreport] Retrieving sos report from 172.31.100.121
system2        : Retrieving sos report...
[system2:run_command] Running command stat /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz
[system2:retrieve_file] Copying remote /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz to local /var/tmp/sos.j4rz03uc/
system2        : Successfully collected sos report
[system2:run_command] Running command stat /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz
[system2:remove_file] Removing file /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz
[system2:run_command] Running command sudo -S rm -f /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz
[system2:run_command] Running command stat /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz.sha256
[system2:remove_file] Removing file /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz.sha256
[system2:run_command] Running command sudo -S rm -f /var/tmp/sosreport-system2-2024-11-04-gatwlpx.tar.xz.sha256
[system1:execute_sos_command] Error running sos report. rc = 1 msg = sudo: Account or password is expired, reset your password and try again
Current password: 
sudo: timed out reading password
sudo: unable to change expired password: Authentication token manipulation error
sudo: a password is required

[system1:execute_sos_command] Exception during sos report execution: sudo attempt failed
system1  : Error running sos report: sudo attempt failed
[system1:sosreport] Error during sos execution: sudo attempt failed

Successfully captured 1 of 2 sosreports
[sos_collector:close_all_connections] Closing connection to localhost
[system1:disconnect] Successfully disconnected from node
[sos_collector:close_all_connections] Closing connection to 172.31.100.121
[system2:disconnect] Successfully disconnected from node
Creating archive of sosreports...
[archive:TarFileArchive] added '/var/tmp/sos.j4rz03uc/sosreport-system2-2024-11-04-gatwlpx.tar.xz' to FileCacheArchive '/var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw'
[archive:TarFileArchive] added open file to FileCacheArchive '/var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw'
[archive:TarFileArchive] added open file to FileCacheArchive '/var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw'
[archive:TarFileArchive] Making leading paths for sos_reports
[archive:TarFileArchive] Making path /var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw/sos_reports
[archive:TarFileArchive] Making directory /var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw/sos_reports
[archive:TarFileArchive] added string at 'sos_reports/manifest.json' to FileCacheArchive '/var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw'
[archive:TarFileArchive] finalizing archive '/var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw' using method 'auto'
[archive:TarFileArchive] built archive at '/var/tmp/sos.j4rz03uc/sos-collector-2024-11-04-rvgxw.tar.xz' (size=36864)
Archive created as /var/tmp/sos-collector-2024-11-04-rvgxw.tar.xz

The following archive has been created. Please provide it to your support team.
	/var/tmp/sos-collector-2024-11-04-rvgxw.tar.xz

[sos_collector:close_all_connections] Closing connection to localhost
[system1:disconnect] Successfully disconnected from node

To check I ran sudo sos report -l which completes all output in under 10 seconds, and --list-presets also in under 10 seconds.

I then ran sudo sos report --batch --debug -vvv -o cups. While block runs to completion, it creates nearly 2000 lines of output due to the number of attached block devices, so I figured a short example via cups would be sufficient to show sudo works fine without being run by collect.

$ sudo sos report --batch --debug -vvv -o cups

sosreport (version 4.7.2)

set sysroot to '/' (default)
[sos.report:setup] executing 'sos report --batch --debug -vvv -o cups'
[sos.report:setup] using 'none' preset defaults ()
[sos.report:setup] effective options now: --batch --debug --only-plugins cups -vvv
This command will collect system configuration and diagnostic
information from this Rocky Linux system.

For more information on Rocky Enterprise Software Foundation visit:

        Distribution Website : https://rockylinux.org
        Vendor Website       : https://resf.org

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.



 Setting up archive ...
[archive:TarFileArchive] initialised empty FileCacheArchive at '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] created directory at 'sos_commands' in FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] created directory at 'sos_logs' in FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] created directory at 'sos_reports' in FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
 Setting up plugins ...
[plugin:cups] packed command: binary=False, changes=False, chroot=True, cmd=lpstat -t, cmd_as_tag=False, container_cmd=None, env=None, foreground=False, priority=10, root_symlink=None, runas=None, runat=None, sizelimit=25, stderr=True, subdir=None, suggest_filename=None, tags=[], timeout=None, to_file=False
[plugin:cups] added cmd output 'lpstat -t'
[plugin:cups] packed command: binary=False, changes=False, chroot=True, cmd=lpstat -s, cmd_as_tag=False, container_cmd=None, env=None, foreground=False, priority=10, root_symlink=None, runas=None, runat=None, sizelimit=25, stderr=True, subdir=None, suggest_filename=None, tags=[], timeout=None, to_file=False
[plugin:cups] added cmd output 'lpstat -s'
[plugin:cups] packed command: binary=False, changes=False, chroot=True, cmd=lpstat -d, cmd_as_tag=False, container_cmd=None, env=None, foreground=False, priority=10, root_symlink=None, runas=None, runat=None, sizelimit=25, stderr=True, subdir=None, suggest_filename=None, tags=[], timeout=None, to_file=False
[plugin:cups] added cmd output 'lpstat -d'
 Running plugins. Please wait ...

  Starting 1/1   cups            [Running: cups]
[plugin:cups] unpacked command: binary=False, changes=False, chroot=True, cmd=lpstat -t, cmd_as_tag=False, container_cmd=None, env=None, foreground=False, priority=10, root_symlink=None, runas=None, runat=None, sizelimit=25, stderr=True, subdir=None, suggest_filename=None, tags=[], timeout=None, to_file=False
[plugin:cups] collecting output of 'lpstat -t'
[plugin:cups] could not run 'lpstat -t': command not found
[plugin:cups] unpacked command: binary=False, changes=False, chroot=True, cmd=lpstat -s, cmd_as_tag=False, container_cmd=None, env=None, foreground=False, priority=10, root_symlink=None, runas=None, runat=None, sizelimit=25, stderr=True, subdir=None, suggest_filename=None, tags=[], timeout=None, to_file=False
[plugin:cups] collecting output of 'lpstat -s'
[plugin:cups] could not run 'lpstat -s': command not found
[plugin:cups] unpacked command: binary=False, changes=False, chroot=True, cmd=lpstat -d, cmd_as_tag=False, container_cmd=None, env=None, foreground=False, priority=10, root_symlink=None, runas=None, runat=None, sizelimit=25, stderr=True, subdir=None, suggest_filename=None, tags=[], timeout=None, to_file=False
[plugin:cups] collecting output of 'lpstat -d'
[plugin:cups] could not run 'lpstat -d': command not found
[plugin:cups] collected plugin 'cups' in 0.04128909111022949

  Finished running plugins

[archive:TarFileArchive] added open file to FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] added open file to FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] added open file to FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] added string at 'version.txt' to FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] added open file to FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] added open file to FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
[archive:TarFileArchive] added string at 'sos_reports/manifest.json' to FileCacheArchive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy'
Creating compressed archive...
[archive:TarFileArchive] finalizing archive '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy' using method 'auto'
[archive:TarFileArchive] built archive at '/var/tmp/sos.pvm77oov/sosreport-system1-2024-11-04-yjyaeyy.tar.xz' (size=6304)

Your sosreport has been generated and saved in:
	/var/tmp/sosreport-system1-2024-11-04-yjyaeyy.tar.xz

 Size	6.16KiB
 Owner	root
 sha256	d445d0cec018537a476499a0e2ea3e80a691a7db49e9707a623e82b359f5585c

Please send this file to your support representative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants