-
Notifications
You must be signed in to change notification settings - Fork 938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto IP function duplicates IP address #1139
Comments
Hey, thank you for describing the issue you discovered. Do you only use one supernode or do you use a federated supernode setup? We do not use real DHCP but some other simple algorithm. As you pointed to the name string, that's exactly what we hash into an IP address, and go up/downwards in case it is already taken. This does not work in a distributed way yet. |
No, I'm using single supernode, but with multiple communities, and like I said every community defines it's subnet in community.list and this is the first one with the problem (more than 100 communities). Yeah, I did went trough PRs and code to see it's Pearson hash, but didn't really had any idea how to properly debug it since this runs in production and I can't simply take it offline for the sake of debugging. Also, I've tried verbose debug output on the client, but couldn't find anything that sticks out of normal. If you need more info let me know. |
I currently am a bit far away from the code, so I am guessing here until I get another chance to take a deeper look. But, what comes to mind is MAC address. As far as I remember, MAC address is assigned randomly if not specified by -m. Maybe some other computer remembers same IP address having previous MAC and communicates a duplicate or so? (Thinking of some ARP issue, but again, not sure here). So, one thing I would try for testing is a fixed MAC address using -m at that edge. |
Note to myself: also check the code for gratuitous ARP sent by edges with auto-IP on. |
I will try with fixed mac address, but keep in mind I already tried removing all TAP adapters multiple times, as well as reinstalling TAP driver which changed MAC address of used adapter multiple times.
|
I don't think it's ARP problem.
|
Okay, thanks for the report then. I will have a look when I get to it, more likely weeks than days. Did I understand correctly that it normally would be assigned the 15, and 16 already is the counted-up address? |
I think it's actually count down address. |
Good news, I was able to get 100% reproducibility on local machine.
Conclusion: supernode does not takes in account changes in IP address in the list of connected edges which leads to wrong information on managment port and duplicated IP's on edge clients. You will also find verbose logs from supernode and both edge clients (first and second start) attached in the zip file. |
Thank you very much for the detailed description! It will be an extremely good starting point for debugging. It could be related to the edge not properly de-registering (does it work better when you wait with restarting until the edge data is purged at supernode, around 90 to120 seconds after you quit the edge?) but also definitely includes an additional underlying bug. I will take care of it as soon as I can, not before in three weeks though. |
Currently when existing edge is matched by mac address there is additional check if socket has change, and if it has, then updates existing edge with new socket info. Following this logic, I think it's safe in this case to update it's device address also.
I tested again, no duplicate IP's, and management port displays valid info. |
Looks extremely good to me, the update of the IP address will help auto-logic to always have an up-to-date list of actually occupied IP addresses. I currently do not remember how we transmit and handle not-automatically assigned IP addresses I and hope it does not conflict, but yes, your approach looks extremely reasonable. Only thing, not sure if it needs some checks before copying. We obviously never thought of this case (IP address of node changed) in the update path. Good catch! |
Do you want to test for a while and then provide a pull request? |
I just tried to pull the latest changes to make PR and it seems this was fixed by you 2 years ago. |
So the bug has already been fixed? Does that revised version work for you? |
Nah, I was looking at the wrong place, IP address info was updated in the part of code run when new edge is connected. |
Thank you for actively contributing to development! |
Thanks from me too. @Logan007 - should we backport this bugfix to the stable branch? |
Sounds reasonable, as far as I can see it does not break compatibility. |
Fixed with 1142 |
I have my supernode nice and running on Windows, with about 100 communities in community list.
Each community has subnet defined in community list, ie.
Each community has one server available 24/7, on predefined IP address set by edge client (ie. 10.0.10.1)
Other clients are dynamic, and don't have IP specified - they get it from supernode in range specified in community.list
Each edge sends it's machine name with -I option.
It's all was working perfectly, until recent problem when one of edge clients, let's call it XXX, in one specific community would get IP from supernode (ie. 10.0.10.15) but connection would not work and
ipconfig /all
would show problematic IP address marked as DUPLICATE.Looking at output from supernode managment port, there is no clients with that IP.
If I keep disconnecting and connecting, edge client would eventually get some other IP and everything would work fine.
Currently I worked around it by randomizing information sent with -I option.
Now, this is where things get really interesting.
On this same subnet/community I have another client, lets call it YYY that usually gets IP allocated really close to the one from problematic client (IE. 10.0.10.16 or 10.0.10.17). And I believe this is the origin of the problem.
Whenever I check managment port it shows YYY connected, but NOT with actual IP address used in community subnet.
For example, not it shows 10.0.10.17, while actual address on TAP adapter of YYY is 10.0.10.16.
I can't ping 10.0.10.17 from any edge client, but everything is fine with 10.0.10.16
Keep in mind, YYY is connected every morning, has no issues at all, can work with server on 10.0.10.1 just fine all day.
And I can see it on managment port connected all the time but with invalid IP address (usually with offset +/- of one).
Conclusion, there is list of potential problems
The text was updated successfully, but these errors were encountered: