Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [T2][Chassis] - TSA/TSB - Intermittent issues seen with neighbors' route verification for traffic-shift tests #16577

Open
sanjair-git opened this issue Jan 17, 2025 · 3 comments
Assignees

Comments

@sanjair-git
Copy link
Contributor

Issue Description

Traffic shift test cases such as 'test_traffic_shift.py', 'test_startup_tsa_tsb_service.py', 'test_reliable_tsa.py' fails intermittently either during 'verify_current_routes_announced_to_neighs' or during 'verify_only_loopback_routes_are_announced_to_neighs' mostly on upstream T2 line card with max configuration.

Looks to be a timing issue in both cases.

Results you see

            # Verify that all routes advertised to neighbor at the start of the test
            if not wait_until(300, 3, 0, verify_current_routes_announced_to_neighs, linecard, dut_nbrhosts[linecard],
                              orig_v4_routes[linecard], cur_v4_routes, 4):
                if not check_and_log_routes_diff(linecard, dut_nbrhosts[linecard],
                                                 orig_v4_routes[linecard], cur_v4_routes, 4):
>                   pytest.fail("Not all ipv4 routes are announced to neighbors")
E                   Failed: Not all ipv4 routes are announced to neighbors
 pytest_assert(verify_only_loopback_routes_are_announced_to_neighs(
                    duthosts, linecard, dut_nbrhosts[linecard], traffic_shift_community),
                    "Failed to verify routes on nbr in TSA")
E                   Failed: Failed to verify routes on nbr in TSA

Results you expected to see

All traffic shift test cases such as 'test_traffic_shift.py', 'test_startup_tsa_tsb_service.py', 'test_reliable_tsa.py' should pass without any of these issues.

Is it platform specific

generic

Relevant log output

Output of show version

SONiC Software Version: SONiC.20240532.02
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-22-2-amd64
Build commit: b3bd0a2dc2
Build date: Fri Jan 10 11:30:50 UTC 2025
Built by: azureuser@329539e8c000000

Attach files (if any)

No response

@Javier-Tan
Copy link
Contributor

Hi @sanjair-git , we have a PR for test_reliable_tsa #16523 currently, I will have a look across all tests and see if adding wait_until will help there

@Javier-Tan Javier-Tan self-assigned this Jan 18, 2025
@sanjair-git
Copy link
Contributor Author

Hi @sanjair-git , we have a PR for test_reliable_tsa #16523 currently, I will have a look across all tests and see if adding wait_until will help there

Hi @Javier-Tan, thanks for adding wait_until for reliable_tsa. It would be better if we might need to add the same to 'startup-tsa-tsb' and 'traffic-shift' test suite as well. I see similar 'verify_only_loopback_routes_are_announced_to_neighs' failures on these test suites too.

@Javier-Tan
Copy link
Contributor

Javier-Tan commented Jan 21, 2025

Hi @sanjair-git , we have a PR for test_reliable_tsa #16523 currently, I will have a look across all tests and see if adding wait_until will help there

Hi @Javier-Tan, thanks for adding wait_until for reliable_tsa. It would be better if we might need to add the same to 'startup-tsa-tsb' and 'traffic-shift' test suite as well. I see similar 'verify_only_loopback_routes_are_announced_to_neighs' failures on these test suites too.

@sanjair-git I have updated #16523 to include wait_until / assert for all tests using verify_only_loopback_routes_are_announced_to_neighs, I will create one more PR for verify_current_routes_announced_to_neighs before closing this issue

yejianquan pushed a commit that referenced this issue Jan 29, 2025
…_until (#16523)

Description of PR
Summary:
Partially tackles #16577

Approach
What is the motivation for this PR?
Tests are flaky, sometimes failing on verify_only_loopback_routes_are_announced_to_neighs: Failed to verify routes on nbr in TSA

How did you do it?
Add a function wrapping verify_only_loopback_routes_are_announced_to_neighs with wait_until and assert to allow time for neighbor to update routes

How did you verify/test it?
These logs were seen in passed tests, showing that wait_until helps avoid false negatives

17/01/2025 05:42:18 utilities.wait_until                     L0153 DEBUG  | verify_only_loopback_routes_are_announced_to_neighs is False, wait 10 seconds and check again

05:42:12 route_checker.verify_loopback_route_with L0014 INFO   | Verifying only loopback routes are announced to bgp neighbors
05:42:18 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71002> terminated with exit code None
05:42:18 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71003> terminated with exit code None
05:42:18 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71001> terminated with exit code None
05:42:18 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71000> terminated with exit code None
05:42:18 parallel.parallel_run                    L0221 INFO   | Completed running processes for target "parse_routes_process" in 0:00:03.470030 seconds
05:42:18 route_checker.verify_loopback_route_with L0035 INFO   | Verifying only loopback routes(ipv4) are announced to ARISTA04T3
05:42:18 route_checker.verify_loopback_route_with L0047 WARNING| missing loopback address or some other routes present on neighbor
05:42:28 route_checker.verify_loopback_route_with L0014 INFO   | Verifying only loopback routes are announced to bgp neighbors
05:42:34 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71002> terminated with exit code None
05:42:34 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71000> terminated with exit code None
05:42:34 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71003> terminated with exit code None
05:42:34 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71001> terminated with exit code None
05:42:34 parallel.parallel_run                    L0221 INFO   | Completed running processes for target "parse_routes_process" in 0:00:03.332572 seconds
05:42:34 route_checker.verify_loopback_route_with L0035 INFO   | Verifying only loopback routes(ipv4) are announced to ARISTA04T3
05:42:34 route_checker.verify_loopback_route_with L0035 INFO   | Verifying only loopback routes(ipv4) are announced to ARISTA01T3
05:42:34 route_checker.verify_loopback_route_with L0035 INFO   | Verifying only loopback routes(ipv4) are announced to ARISTA06T3
05:42:34 route_checker.verify_loopback_route_with L0035 INFO   | Verifying only loopback routes(ipv4) are announced to ARISTA03T3
05:42:34 route_checker.verify_loopback_route_with L0014 INFO   | Verifying only loopback routes are announced to bgp neighbors
05:42:39 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71000> terminated with exit code None
05:42:39 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71002> terminated with exit code None
05:42:39 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71001> terminated with exit code None
05:42:39 parallel.on_terminate                    L0085 INFO   | process parse_routes_process--<EosHost VM71003> terminated with exit code None
05:42:39 parallel.parallel_run                    L0221 INFO   | Completed running processes for target "parse_routes_process" in 0:00:02.681303 seconds
All affected test suites were run

Signed-off-by: Javier Tan [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants