-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipinfusion OcNOS 6.4.1-37 not working #177
Comments
well tried putting in further debug statements in launch.py thinking maybe the startup passed the waiting for the > prompt. However, it never passes that state. The prompt is correct if I run the qcow under kvm. This places the issue back into vrnetlab.py. I have also bumped up the cores and memory which are hard coded in this lab but unfortunately that makes no difference. I have had the VM boot once in containerlab.....subsequent restart of the lab and it failed again at the same place. So this appears to be some potential race condition in vrnetlab. I've rebooted under kvm about 30 times now and the vm has never failed to start....so something explicitly in vrnetlab for this vm unfortunately.....when the VM did boot it ran successfully and ssh worked etc....the docker container itself is running and functioning so something definitely in vrnetlab and the serial prompt connectivity by the looks. Running out of time to try and resolve this unfortunately. |
i'm assuming that the vm is dying after login attempt for some reason.....i have tried leaving the container for over an hour but no luck....from the container I can't local ssh to the forwarded vm port...so assuming that the vm is not running properly inside of the container. |
well it is interesting....the problem is associated with the boot of the vm in the container first time around. I can monitor the docker logs and the first boot sits and waits at the waiting for > prompt. If i exec into the container and kill the qemu instance it automatically restarts and runs properly from thern....so now a bit lost as to ow to get this going. |
Looking at dmesg inside ocnos launcher container and seems its doing a stack dump. This only occurs with the launcher container by the looks. I dont think that its to do with my system because i can run up xrv9k adjacent to this without a problem. starting to get passed me. Maybe there is some extra qmeu config or something that may make the image reliable inside a container and run under qemu but appears to be unstable. If the container does not dump then the image appears to run fine...this unfortunately is a rare occurrence. [ 527.653384] ------------[ cut here ]------------ |
even without the dump the ocnos boot does fail and everything in dmesg looks ok. basically to get to run you need to exec into the container and kill qemu process. it will restart automatically and then run fine. Something with vrnetlab/hellt and maybe upstream... |
stack dump comes from having a second nic in the system that was not connected to anything. removed that and now no more dumps but the launcher od always stops at waiting for a > prompt until you kill the qemu process works post that. |
appears that there is a race condition with launch.py and the VM itself. The VM gives the login prompt, launch.py hits it before the VM is fully ready resulting in the VM appearing to hang. The answer here is to patch the launch.py file and add a delay post detecting the login prompt. Currently I just made this 15 seconds and haven't tried to tune it to a minimum. I'd rather the VM just start than save a couple of seconds. https://github.com/hellt/vrnetlab/blob/master/ocnos/docker/launch.py Change to the following after importing time. |
I shad already changed the makefile pattern match. The image appears to start up and run but never progresses past login. From docker logs it does appear to login but then never gets pass the waiting for > prompt. I'm still trying to debug the launcher to get sme more information. If I spin the qcow up in kvm the prompt post login is OcNOS> so think that if it is pattern matching on > that should work but it appears to hang. Additional there is no ssh accessibility even after 10 minutes. In kvm it take less than a minute to become accessible. Just thought I'd log the issue in case I never solve it.
2024-03-16 00:05:02,643: launch TRACE OUTPUT: Starting agent_daemon service...
[ OK ] Started agent_daemon service.
2024-03-16 00:05:07,974: launch DEBUG matched login prompt
2024-03-16 00:05:07,974: launch DEBUG trying to log in with 'ocnos'
2024-03-16 00:05:07,974: vrnetlab DEBUG writing to serial console: 'ocnos'
2024-03-16 00:05:07,974: vrnetlab TRACE waiting for 'Password:' on serial console
2024-03-16 00:05:08,023: vrnetlab TRACE read from serial console: ' ocnos
Password:'
2024-03-16 00:05:08,023: vrnetlab DEBUG writing to serial console: 'ocnos'
2024-03-16 00:05:08,023: launch INFO applying bootstrap configuration
2024-03-16 00:05:08,023: vrnetlab DEBUG writing to serial console: ''
2024-03-16 00:05:08,023: vrnetlab TRACE waiting for '>' on serial console
The text was updated successfully, but these errors were encountered: