Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getServersCopy() - NoSuchElementException #645

Open
Sneakometer opened this issue May 22, 2021 · 12 comments
Open

getServersCopy() - NoSuchElementException #645

Sneakometer opened this issue May 22, 2021 · 12 comments

Comments

@Sneakometer
Copy link

Sneakometer commented May 22, 2021

Hey, i'm developing for Waterfall and need to retrieve the server list every now and then.
I'm using the getServersCopy() method provided by Waterfall in the ProxyServer instance, as the other one is marked as deprecated.
However, i started to notice some weird behaviour lately, throwing the following exception everytime the getServersCopy() method is called:

[21:43:23] [pool-10-thread-5/ERROR]: java.util.NoSuchElementException
[21:43:23] [pool-10-thread-5/ERROR]:     at gnu.trove.impl.hash.THashIterator.moveToNextIndex(THashIterator.java:139)
[21:43:23] [pool-10-thread-5/ERROR]:     at gnu.trove.impl.hash.THashIterator.next(THashIterator.java:91)
[21:43:23] [pool-10-thread-5/ERROR]:     at gnu.trove.map.hash.TCustomHashMap$MapBackedView.toArray(TCustomHashMap.java:729)
[21:43:23] [pool-10-thread-5/ERROR]:     at com.google.common.collect.Iterables.toArray(Iterables.java:295)
[21:43:23] [pool-10-thread-5/ERROR]:     at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:406)
[21:43:23] [pool-10-thread-5/ERROR]:     at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:391)
[21:43:23] [pool-10-thread-5/ERROR]:     at net.md_5.bungee.conf.Configuration.getServersCopy(Configuration.java:189)
[21:43:23] [pool-10-thread-5/ERROR]:     at net.md_5.bungee.BungeeCord.getServersCopy(BungeeCord.java:648)

Restarting the proxy fixed this for some time. The issue will however come back after some hours to a few days.
I'm also using the Cloudnet v3.4 orchestration to dynamically create servers at runtime. First i suspected the problem there, but a developer told me it's not caused by cloudnet.

I'm running the build from @Janmm14 (#609) for a while but only recently noticed this error. So i thought it's unlikely to be related to the dos mitigations.

Java 8
Debian 10

I'm not sure how to reproduce this other than just lettings the proxy run for a while.
I am willing to investigate furthe,r but i'm kinda out of ideas now.

//Edit the code that triggers the exception is:

for (ServerInfo server : ProxyServer.getInstance().getServersCopy().values()) {
...
}
@electronicboy
Copy link
Member

Something mutated the map while you where tryna create a copy of it seemingly

@Sneakometer
Copy link
Author

How to avoid this? Is there a thread safe way modifying or cloning it?

@electronicboy
Copy link
Member

electronicboy commented May 26, 2021

modifying the servers list is best done with the remove/add methods on the ProxyServer class, we already lock the collection in many cases but there is a limitation due to getServers itself returning a mutable list, part of the ideal setup would be to make this immutable, but, shamefully plugins get in the way of that, but, I question if the better alternative might be to override the relevant methods to direct calls to remove, etc, to the synchronized methods for this, but it is pretty gross

@Sneakometer Sneakometer changed the title gerServersCopy() - NoSuchElementException getServersCopy() - NoSuchElementException May 26, 2021
@Sneakometer
Copy link
Author

Sneakometer commented May 26, 2021

modifying the servers list is best done with the remove/add methods on the ProxyServer class,

Found them in the Configuration class, thanks. They synchronize on the serversLock object. Is this a feature from Waterfall, as i can't find this in BungeeCord itself?
Also, i found that there are still many cases where the map is accessed without synchronizing on the serversLock object in BungeeCord. This can eventually cause the same issue, or am i wrong?

What is the reason for using troves tmap in the first place? Why not use a ConcurrentHashMap, which would fix most if not all of the concurrency issues?

@Janmm14
Copy link
Contributor

Janmm14 commented May 26, 2021

@Sneakometer The idea is probably that for most servers the server map is not changing, that's probably why it is using a fast map and manual synchronization

@astei
Copy link
Contributor

astei commented May 27, 2021

Sadly, the broken BungeeCord API in question can't be fixed. If a plugin only uses the Waterfall methods, then everything will work, but CloudNet presumably needs to support upstream BungeeCord too, thus you get the issue.

I'm going to sound like a broken record, but that's just one reason why @electronicboy encouraged me to go on my own instead.

@Sneakometer
Copy link
Author

Sneakometer commented May 27, 2021

@Sneakometer The idea is probably that for most servers the server map is not changing, that's probably why it is using a fast map and manual synchronization

I'm not a java expert, but i can't believe that troves map is that much faster for those few entries (<100) in the servers map.

I'm going to sound like a broken record, but that's just one reason why @electronicboy encouraged me to go on my own instead.

I already thought about switching to velocity at some point, but it requires quite a lot of hassle for me adding the support to all of my self made plugins. Some of them (at my fault) not using any abstraction to bungeecord at all. I was fine with bungeecord before using cloudnet for years, so i'm kinda hoping there is another solution to this :/

@astei
Copy link
Contributor

astei commented May 27, 2021

There really isn't, unless we break BungeeCord API compatibility. And at that point, you might as well just use Velocity.

@electronicboy
Copy link
Member

electronicboy commented May 27, 2021

using a CHM as-is is not viable as we'd need to override a good chunk of its functionality in order to work properly
The performance is not really a consideration here at all, it's the API which is the issue

@Janmm14
Copy link
Contributor

Janmm14 commented May 27, 2021

You could ask CloudNet to use Waterfall's functions if they're found.

@Packsolite
Copy link

Packsolite commented Dec 3, 2023

So after years this issue still persists, even with the suggested fix by cloudnet in place.
The trove map can get into a "corrupted" state from which it will not recover.
This code will then always procude the mentioned exception:

for (ServerInfo server : ProxyServer.getInstance()
		.getServersCopy()
		.values()) {
	server.sendData("abc", data);
}

Only temporary "fix" is to restart the proxy.

@electronicboy
Copy link
Member

This is likely a won't fix as the underlying issue is generally the unsafe collection that upstream has decided to use here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants