Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3850 Improve speed #2

Open
edurguti opened this issue Jun 28, 2019 · 11 comments
Open

3850 Improve speed #2

edurguti opened this issue Jun 28, 2019 · 11 comments

Comments

@edurguti
Copy link

edurguti commented Jun 28, 2019

I have found out that when doing http transfer on 3850s (probably 3650s), it takes forever (denali and up) - mine takes almost 6hours.
Increasing tcp windows size, helps tremendously:

ip tcp window-size 1073741823
ip tcp path-mtu-discovery

I would try a few window sizes, this is the maximum window size allowed (maybe try/expect and decrease it if there's an IO error - I tried it on a 2960X and it failed a few times, I'll need to do further testing)

Below is an example of the file transfer across the atlantic, only 291.601 seconds, before this it was 5+ hours.

-CSWT02#copy http://10.95.238.50/cat3k_caa-universalk9.16.08.01a.SPA.bin flash: Destination filename [cat3k_caa-universalk9.16.08.01a.SPA.bin]? Accessing http://10.95.238.50/cat3k_caa-universalk9.16.08.01a.SPA.bin... Loading http://10.95.238.50/cat3k_caa-universalk9.16.08.01a.SPA.binbytes copied in 291.601 secs (1460654 bytes/sec)

@austind
Copy link
Owner

austind commented Jul 1, 2019

Thanks for the feedback.

Does ip tcp window-size apply to management traffic, or all traffic?

@edurguti
Copy link
Author

edurguti commented Jul 1, 2019

As far as I know it's only affecting traffic destined to the switch itself.

@austind
Copy link
Owner

austind commented Jul 2, 2019

TCP window size is ideally set to (link bitrate x link latency). Setting a window size too high can cause memory issues, and packet loss will cause more delays.

I can programmatically determine latency with a ping test, but link bandwidth is tricky. I can't infer line rate of attached interface.

Will you run some tests and see what works well for your bandwidth and latency?

@austind austind closed this as completed May 28, 2020
@ktbyers
Copy link

ktbyers commented Jun 12, 2020

In the past, adjusting TCP windows size hasn't consistently worked. This is in the context of SCP transfers using Netmiko.

Some related discussion here:

ktbyers/netmiko#491

I also recently did an upgrade of a couple of IOS-XE devices from 16.08.01 to 16.12.03 using Ansible's net_put and the TCP window size did not make a performance difference (IIRC). It was taking over 3 hours to transfer a 542MB file to the device.

ip ssh bulk-mode in IOS-XE 17 looks interesting (this is from @edurguti )

@ktbyers
Copy link

ktbyers commented Jun 12, 2020

It would be interesting to get more data on some of these performance problems and workarounds as the default transfer speeds are crazy slow...

@austind austind reopened this Jun 15, 2020
@austind
Copy link
Owner

austind commented Jun 15, 2020

@ktbyers My testing on the cat3k series suggests the limit is the control plane CPU. Even plain FTP pegs the CPU at <500Kbps, only a marginal improvement over TFTP.

I looked for a way to force a weaker (and therefore less CPU-intensive) cipher suite for SCP, but didn't turn up anything.

@edurguti
Copy link
Author

Hi, I just tested again with http:
ip tcp window-size 1073741823 ip tcp path-mtu-discovery

NASGH3850-10-121-CSWT02#copy http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.bin flash: Destination filename [cat3k_caa-universalk9.16.09.05.SPA.bin]? Accessing http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.bin... Loading http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.binbytes copied in 1268.110 secs (355082 bytes/sec)

NASGH3850-10-121-CSWT02#dir flash: | i 16.09 64643 -rw- 450283034 Jun 16 2020 07:30:46 -05:00 cat3k_caa-universalk9.16.09.05.SPA.bin

I have another one running without adjusting the tcp window size and it's still going
Below is the latency:

NASGH3850-10-121-CSWT02#ping 10.95.238.50 source vlan 101 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 10.95.238.50, timeout is 2 seconds: Packet sent with a source address of 10.89.228.3 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 70/72/80 ms

@ktbyers
Copy link

ktbyers commented Jun 16, 2020

@edurguti I guess the broader question is--is there any pattern when we can expect changing the TCP window size will work versus when it won't (as it seems to work in some cases, but not in others)?

And I don't really have a good explanation as to why except for some vague "maybe something platform or OS-version specific".

Let me see if I can test on my ISR2 router(s) and see whether your exact change works there or not.

@edurguti
Copy link
Author

I never had it fail on 3850/3650, but on 2960x it does fail.
Here's how long it took on the one where I didn't change the windows size
NASGH3850-10-120-CSW(config)#end NASGH3850-10-120-CSWT01#copy http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.bin flash: Destination filename [cat3k_caa-universalk9.16.09.05.SPA.bin]? Accessing http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.bin... Loading http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.binbytes copied in **10066.090** secs (44733 bytes/sec)

@austind
Copy link
Owner

austind commented Jun 16, 2020

@ktbyers @edurguti can either of you post the output of sh proc cpu sort during the transfer?

@edurguti
Copy link
Author

edurguti commented Jun 16, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants