Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [gnmi] test_gnoi_killprocess.py fails to run on kvm testbed #16238

Closed
yutongzhang-microsoft opened this issue Dec 26, 2024 · 10 comments · Fixed by #16303
Closed

[Bug]: [gnmi] test_gnoi_killprocess.py fails to run on kvm testbed #16238

yutongzhang-microsoft opened this issue Dec 26, 2024 · 10 comments · Fixed by #16303
Assignees
Labels

Comments

@yutongzhang-microsoft
Copy link
Contributor

Issue Description

test_gnoi_killprocess.py fails to run on kvm testbed because of below reason catched by log analyzer

https://elastictest.org/scheduler/testplan/676bcc4d98ec838ee83d592b?testcase=gnmi%2Ftest_gnoi_killprocess.py&type=console

Results you see

E               Failed: Processes "['analyze_logs--<MultiAsicSonicHost vlab-01>']" failed with exit code "1"
E               Exception:
E               match: 2
E               expected_match: 0
E               expected_missing_match: 0
E               
E               Match Messages:
E               2024 Dec 25 09:49:18.306689 vlab-01 ERR sonic-db-cli: :- guard: RedisReply catches system_error: command: *9\r\n$4\r\nHSET\r\n$25\r\nDEVICE_METADATA|localhost\r\n$21\r\nchassis_serial_number\r\n$6\r\nFailed\r\n$2\r\nto\r\n$4\r\nread\r\n$6\r\nsystem\r\n$6\r\nEEPROM\r\n$4\r\ninfo\r\n, reason: ERR wrong number of arguments for 'hset' command: Input/output error
E               
E               2024 Dec 25 09:49:18.308198 vlab-01 INFO snmp.sh[69266]: RedisReply catches system_error: command: *9\r\n$4\r\nHSET\r\n$25\r\nDEVICE_METADATA|localhost\r\n$21\r\nchassis_serial_number\r\n$6\r\nFailed\r\n$2\r\nto\r\n$4\r\nread\r\n$6\r\nsystem\r\n$6\r\nEEPROM\r\n$4\r\ninfo\r\n, reason: ERR wrong number of arguments for 'hset' command: Input/output error: Input/output error
E               
E               Traceback:
E               Traceback (most recent call last):
E                 File "/var/src/sonic-mgmt/tests/common/helpers/parallel.py", line 35, in run
E                   Process.run(self)
E                 File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
E                   self._target(*self._args, **self._kwargs)
E                 File "/var/src/sonic-mgmt/tests/common/helpers/parallel.py", line 245, in wrapper
E                   target(*args, **kwargs)
E                 File "/var/src/sonic-mgmt/tests/common/plugins/loganalyzer/__init__.py", line 45, in analyze_logs
E                   dut_analyzer.analyze(markers[node.hostname], fail_test, store_la_logs=store_la_logs)
E                 File "/var/src/sonic-mgmt/tests/common/plugins/loganalyzer/loganalyzer.py", line 409, in analyze
E                   self._verify_log(analyzer_summary)
E                 File "/var/src/sonic-mgmt/tests/common/plugins/loganalyzer/loganalyzer.py", line 140, in _verify_log
E                   raise LogAnalyzerError(result_str)
E               tests.common.plugins.loganalyzer.loganalyzer.LogAnalyzerError: match: 2
E               expected_match: 0
E               expected_missing_match: 0
E               
E               Match Messages:
E               2024 Dec 25 09:49:18.306689 vlab-01 ERR sonic-db-cli: :- guard: RedisReply catches system_error: command: *9\r\n$4\r\nHSET\r\n$25\r\nDEVICE_METADATA|localhost\r\n$21\r\nchassis_serial_number\r\n$6\r\nFailed\r\n$2\r\nto\r\n$4\r\nread\r\n$6\r\nsystem\r\n$6\r\nEEPROM\r\n$4\r\ninfo\r\n, reason: ERR wrong number of arguments for 'hset' command: Input/output error
E               
E               2024 Dec 25 09:49:18.308198 vlab-01 INFO snmp.sh[69266]: RedisReply catches system_error: command: *9\r\n$4\r\nHSET\r\n$25\r\nDEVICE_METADATA|localhost\r\n$21\r\nchassis_serial_number\r\n$6\r\nFailed\r\n$2\r\nto\r\n$4\r\nread\r\n$6\r\nsystem\r\n$6\r\nEEPROM\r\n$4\r\ninfo\r\n, reason: ERR wrong number of arguments for 'hset' command: Input/output error: Input/output error

Results you expected to see

Please confirm if this script can run on kvm or not.

Is it platform specific

generic

Relevant log output

No response

Output of show version

No response

Attach files (if any)

No response

wangxin pushed a commit that referenced this issue Dec 27, 2024
An issue has been identified with test_gnoi_killprocess.py on the KVM testbed, and we have raised issue #16238 to track it. In this PR, we temporarily skip this script in PR testing by using the conditional mark.
@hdwhdw
Copy link
Contributor

hdwhdw commented Dec 30, 2024

My kvm run fails but with a slightly different message suggesting dbus doesn't support gnmi.

ERROR gnmi/test_gnoi_killprocess.py::test_gnoi_killprocess_then_restart[gnmi-False-Dbus does not support gnmi service management] - Failed: Failed to start gnmi server
ERROR gnmi/test_gnoi_killprocess.py::test_gnoi_killprocess_then_restart[nonexistent-False-Dbus does not support nonexistent service management] - Failed: Failed to start gnmi server
ERROR gnmi/test_gnoi_killprocess.py::test_gnoi_killprocess_then_restart[-False-Dbus stop_service called with no service specified] - Failed: Failed to start gnmi server

Looks like the test isn't supported on kvm. This is a real test gap and I should fix it.

@hdwhdw
Copy link
Contributor

hdwhdw commented Jan 2, 2025

The (set of) tests has multiple problems:

2025 Jan  2 12:48:10.699687 vlab-01 ERR swss#orchagent: :- isAutoNegEnabled: Failed to get port AutoNeg status for port pid:1000000000021

Not sure the cause but we can disable these tests for now.

@hdwhdw
Copy link
Contributor

hdwhdw commented Jan 6, 2025

Across multiple runs, killing and restarting pmon and snmp consistently show the same error, rsyslogd is flaky (with the same error).

2025 Jan  6 15:22:37.489359 vlab-01 ERR swss#orchagent: :- isAutoNegEnabled: Failed to get port AutoNeg status for port pid:1000000000018

@congh-nvidia
Copy link
Contributor

Hi @yutongzhang-microsoft , I have also opened a bug regarding this test, could you please take a look?
#15507

@yutongzhang-microsoft
Copy link
Contributor Author

Hi @yutongzhang-microsoft , I have also opened a bug regarding this test, could you please take a look? #15507

@hdwhdw Please take a look.

@hdwhdw
Copy link
Contributor

hdwhdw commented Jan 15, 2025

@yutongzhang-microsoft this is incredibly useful insight. Let me update my fix.

@hdwhdw hdwhdw closed this as completed Jan 16, 2025
@hdwhdw
Copy link
Contributor

hdwhdw commented Jan 16, 2025

Closing this as duplicate, going to track it in #15507

@yejianquan
Copy link
Collaborator

Reopen this issue, since it's used for skipping it on PR test.

- "asic_type in ['vs'] and https://github.com/sonic-net/sonic-mgmt/issues/16238"

If we close this, it's blocking PR test, sample:
https://elastictest.org/scheduler/testplan/678986a046e46d15f748815b?testcase=gnmi%2Ftest_gnoi_killprocess.py%7C%7C%7C2&type=console

@hdwhdw
Please raise PR to migrate the condition issue to #15507

Then we can close the duplicated issue

@yejianquan yejianquan reopened this Jan 17, 2025
@hdwhdw
Copy link
Contributor

hdwhdw commented Jan 17, 2025

The snmp error

2024 Dec 25 09:49:18.308198 vlab-01 INFO snmp.sh[69266]: RedisReply catches system_error: command: *9\r\n$4\r\nHSET\r\n$25\r\nDEVICE_METADATA|localhost\r\n$21\r\nchassis_serial_number\r\n$6\r\nFailed\r\n$2\r\nto\r\n$4\r\nread\r\n$6\r\nsystem\r\n$6\r\nEEPROM\r\n$4\r\ninfo\r\n, reason: ERR wrong number of arguments for 'hset' command: Input/output error: Input/output error

is likely coming from this line https://github.com/sonic-net/sonic-buildimage/blob/39e2131a7b76f6c3d5257b7e02c540dd33a24d5b/files/build_templates/docker_image_ctl.j2#L114

{%- elif docker_container_name == "snmp" %}
    $SONIC_DB_CLI STATE_DB HSET 'DEVICE_METADATA|localhost' chassis_serial_number $(decode-syseeprom -s)

I ran sudo decode-syseeprom -s on vs the the result aligns:

sudo decode-syseeprom -s
Failed to read system EEPROM info

Someone should verify whether this line is okay on virtual switch.

Before that, I will the disable testcase on snmp.

qiluo-msft pushed a commit that referenced this issue Jan 24, 2025
Description of PR
Summary:

Gracefully skip the test for killprocess if the process does not exists, instead of failing it. Some process such as telemetry is not always there on KVM testbed and the validation of killprocess should not depend on it.
Avoid killing swss and snmp process as it leads to errors on KVM testbed:isAutoNegEnabled: Failed to get port AutoNeg status for port pid:1000000000021
Fixes #16238

Approach
The test tries to kill a process then restart it but the API throws an exception if the process does not exist.
Here we gracefully catch the exception and skip the test.

What is the motivation for this PR?
Fix the test for KVM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants