Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiple failures in KillProcess test on KVM. #16303

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

hdwhdw
Copy link
Contributor

@hdwhdw hdwhdw commented Jan 2, 2025

Description of PR

Summary:

  • Gracefully skip the test for killprocess if the process does not exists, instead of failing it. Some process such as telemetry is not always there on KVM testbed and the validation of killprocess should not depend on it.
  • Avoid killing swss and snmp process as it leads to errors on KVM testbed:isAutoNegEnabled: Failed to get port AutoNeg status for port pid:1000000000021

Fixes #16238

Type of change

  • [x ] Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

The test tries to kill a process then restart it but the API throws an exception if the process does not exist.
Here we gracefully catch the exception and skip the test.

What is the motivation for this PR?

Fix the test for KVM.

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@hdwhdw hdwhdw changed the title Gracefully skip the test for killprocess if the process does not exists. Fix multiple failures in KillProcess test. Jan 6, 2025
@yutongzhang-microsoft
Copy link
Contributor

Have you tested locally?

@hdwhdw
Copy link
Contributor Author

hdwhdw commented Jan 7, 2025

@yutongzhang-microsoft Yes.

============================= 11 passed, 3 skipped, 1 warning in 219.49s (0:03:39) ======================

@yutongzhang-microsoft
Copy link
Contributor

Can we consider the issue #16238 has been fixed? Is so, you can close it and let's monitor the result in PR test.

@hdwhdw
Copy link
Contributor Author

hdwhdw commented Jan 8, 2025

Thanks I will close the bug once this is merged.

@hdwhdw hdwhdw changed the title Fix multiple failures in KillProcess test. Fix multiple failures in KillProcess test on KVM. Jan 8, 2025
@@ -15,23 +15,20 @@
("gnmi", False, "Dbus does not support gnmi service management"),
("nonexistent", False, "Dbus does not support nonexistent service management"),
("", False, "Dbus stop_service called with no service specified"),
("snmp", True, ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snmp

It should be a valid reqirement to kill snmp cotainer. why remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed valid. Restarting snmp container will produce an error log due to this line:
https://github.com/sonic-net/sonic-buildimage/blob/39e2131a7b76f6c3d5257b7e02c540dd33a24d5b/files/build_templates/docker_image_ctl.j2#L114

{%- elif docker_container_name == "snmp" %}
    $SONIC_DB_CLI STATE_DB HSET 'DEVICE_METADATA|localhost' chassis_serial_number $(decode-syseeprom -s)

Because

sudo decode-syseeprom -s
Failed to read system EEPROM info

I think this is known:

# For kvm testbed, command `show platform syseeprom` will return the expected Error

This will also cause a similar issue when killing pmon (I think this is due to "missing sonic_platform module".)

So for now let's just skip these two for vs platform. I don't think this affect our ability to quality the KillProcess implementation.

("dhcp_relay", True, ""),
("radv", True, ""),
("restapi", True, ""),
("lldp", True, ""),
("sshd", True, ""),
("swss", True, ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swss

what is wrong to kill swss?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it is because the test wasn't written correctly: We need to explicitly wait for critical processes to start after killing swss. Looks like killing and restarting swss will make a lot of other processes restart, and if we don't wait and immediately start the next testcase, it will generate some swss error (in the next testcase).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed after adding a wait for critical process back.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: [gnmi] test_gnoi_killprocess.py fails to run on kvm testbed
5 participants