Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourcesPlugin: Multithreaded lazy start #1219

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jukzi
Copy link
Contributor

@jukzi jukzi commented Feb 29, 2024

Reads all projects in parallel. As this was done during ResourcePlugin.start() it had to be deferred as multithreaded classloading during BundleActivator#start(BundleContext) is not supported.

All that happens while splash screen still shown.

Copy link
Contributor

github-actions bot commented Feb 29, 2024

Test Results

  198 files   -   438    198 suites   - 438   39m 12s ⏱️ -42s
2 910 tests  - 1 035  2 887 ✅  - 1 035   23 💤 ±0  0 ❌ ±0 
9 336 runs   - 3 105  9 172 ✅  - 3 105  164 💤 ±0  0 ❌ ±0 

Results for commit 35c4169. ± Comparison against base commit adb8427.

This pull request removes 1035 tests.
AllTests AllCheatSheetTests AllCompositeTests TestCheatSheetManagerEvents ‑ testNoHandler
AllTests AllCheatSheetTests AllCompositeTests TestCheatSheetManagerEvents ‑ testOneHandler
AllTests AllCheatSheetTests AllCompositeTests TestCheatSheetManagerEvents ‑ testTwoHandlers
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testBackwardDependency
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testBadURL
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testChoiceNoChild
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testCircularDependency
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testCompositeNoName
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testDependency
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testDependencyWithInvalidId
…

♻️ This comment has been updated with latest results.

@jukzi
Copy link
Contributor Author

jukzi commented Feb 29, 2024

org.opentest4j.AssertionFailedError: 

Expecting actual:
  ["instance", "configuration", "user", "default"]
to contain exactly (and in same order):
  ["project", "instance", "configuration", "user", "default"]
but could not find the following elements:
  ["project"]

	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at org.eclipse.core.tests.internal.preferences.PreferencesServiceTest.testLookupOrder(PreferencesServiceTest.java:236)

Copy link
Contributor

@laeubi laeubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't bind the workspace to static methods for its initialization.

@@ -475,7 +483,12 @@ private static String getSystemEncoding() {
* @return the single instance of this plug-in runtime class
*/
public static ResourcesPlugin getPlugin() {
return plugin;
ResourcesPlugin p = plugin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong as it will prevent the workspace from become available unless someone calls this static method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added new Thread(ResourcesPlugin::getPlugin, "Async ResourcesPlugin start").start(); to start() - ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really why do you do the init in this method and not in the activate that makes no sense to me at all...if you want to defer the tracking to happen in a different thread (for whatever reason) why not simply wrap the open call in a new thread right there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other plugins cal the static as soon as the Resourceplugin is started and a synchronize needs to wait for completion to finish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is most likely not correct, it is the other way round. A plugin is starting, calling the static method and this triggers the activator to run and this then is automatically synchronized by the classloader lock already, so why one wants to break this behavior and instead add another synchronization.
These static access is actually bad and has lead to numerous problem in the past, so now requiring this for the startup of the plugin is wrong in many ways and you probably want to better sync on the project loading if projects are accessed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show me that THIS implementation makes an error, not yours. It does not even be a junit. just put breakpoints such that it fails if executed in a predefined order of your choice. But it boils down to that synchronize() just works: only one thread will do the init without waiting and all others wait.
The problem is simple that Eclipse on Windows needs >10 sec to start and >2 of them are caused by loading the projects.
image
loading them in parallel speeds that up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show me that THIS implementation makes an error, not yours.

If you show me that MY implementation makes an error and not YOURS I might be convinced... longer loading times are not an error, you can for sure make the loading IO of MarkerManager parallel, but not by changing the Activator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizing loading time of IDE is a valid and desired topic. Giving only spurious arguments does not help and contradict our code of conduct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claiming other wasting your time and their implementation is failed is also "contradict our code of conduct" .. I gave you several insight and ways to archive your goal differently so please don't claim any contradiction anytime someone does not agree with you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jukzi : startup performance and functional integrity of code are two different things.

Independently how fast this PR will make startup for some use case, if it will be broken for other use case, it is not acceptable.

And as you propose the change, it is your duty to proof it is safe and works, assuming current state is not broken (which is the case).

@@ -542,7 +562,7 @@ public void start(BundleContext context) throws Exception {
workspaceInitCustomizer);
plugin = this; // must before open the tracker, as this can cause the registration of the
// workspace and this might trigger code that calls the static method then.
instanceLocationTracker.open();
new Thread(this::initialized, "Async ResourcesPlugin start").start(); //$NON-NLS-1$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still wrong, the contract of the Resources plugin is that it tracks the workspace in the activator, as the activation usually (but that's not guaranteed!) happens by a lazy-activation and in this case must be guarded by the classloader lock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Cristoph. The code expected to have workspace initialized after activator is done, not asynchronously after that.

Please put breakpoint at org.eclipse.core.internal.resources.Workspace.open(IProgressMonitor) and check that this is executed in context of org.eclipse.core.resources.ResourcesPlugin.start(BundleContext) call.

Initialization of ResourcesPlugin should be changed very carefully, there are different actors playing here and not everything can be validated via automated tests.

"Error with project " + project.getName(), e); //$NON-NLS-1$
}
return null;
}).filter(Objects::nonNull).toArray(IStatus[]::new)).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must not call get() here as it blocks, and you don't want to be blocked. Instead you should either return a Stream<..> with an onClose( ... ), a Future<..> or similar or even better a CompletableFuture<..> then you can use that in an event driven non blocking way.

@@ -542,7 +562,7 @@ public void start(BundleContext context) throws Exception {
workspaceInitCustomizer);
plugin = this; // must before open the tracker, as this can cause the registration of the
// workspace and this might trigger code that calls the static method then.
instanceLocationTracker.open();
new Thread(this::initialized, "Async ResourcesPlugin start").start(); //$NON-NLS-1$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Cristoph. The code expected to have workspace initialized after activator is done, not asynchronously after that.

Please put breakpoint at org.eclipse.core.internal.resources.Workspace.open(IProgressMonitor) and check that this is executed in context of org.eclipse.core.resources.ResourcesPlugin.start(BundleContext) call.

Initialization of ResourcesPlugin should be changed very carefully, there are different actors playing here and not everything can be validated via automated tests.

@laeubi
Copy link
Contributor

laeubi commented Mar 4, 2024

I agree with Cristoph. The code expected to have workspace initialized after activator is done, not asynchronously after that.

Yes because of that any code that calls into resource plugin while the tracker.open is runs is not an indication of something is blocking, it means that the synchronization is actually effective, synchronize after that anything can happen (including it works, deadlock, get an exception) and you maybe just notice that randomly.

If one want to improve that there are some ways (and that was already performed by many components but still work in progress):

  1. Do not call ResourcePlugin methods but pass the objects by reference, as in this special case of init is is allowed for internal code to "early access" these objects (carefully).
  2. delay the access e.g do not call any code in static initializers of the class but on first access, this can be combined with (1)
  3. decouple the code, e.g. by react to the workspace registered (one example is CheckMissingNaturesListener) and performing nay action the free of loader locks.
  4. Create proxy object that themself only call when they are called the "real" object.
  5. ... maybe more ...

@jukzi
Copy link
Contributor Author

jukzi commented Mar 4, 2024

Yes because of that any code that calls into resource plugin while the tracker.open is runs is not an indication of something is blocking, it means that the synchronization is actually effective, synchronize after that anything can happen (including it works, deadlock, get an exception) and you maybe just notice that randomly.

It is not clear what you try say here. ||: Please give a concrete example how it can go wrong :||

Reads all projects in parallel. As this was done during
ResourcePlugin.start() it had to be deferred as multithreaded
classloading during BundleActivator#start(BundleContext) is not
supported.

All that happens while splash screen still shown.
@vogella
Copy link
Contributor

vogella commented Mar 7, 2024

To continue here, you could ensure that the parallel reading finishes before the Activator is left @jukzi. This would still read in parallel but keep the contract.

@jukzi
Copy link
Contributor Author

jukzi commented Mar 7, 2024

To continue here, you could ensure that the parallel reading finishes before the Activator is left

I would love todo, and i had no plan to not so, but it turns out it is not possible, because using the resourceplugin classes in another thread before the activator finished can (and will very often) cause a deadlock.
And i still see no reason why the workspace should have been fully read when plugin is started. Every method that can access that data will just wait on it and it works.
More over the behaviour of ResourcesPlulugin.start() to take that long not only violates the contract of org.osgi.framework.BundleActivator.start(BundleContext) This method must complete and return to its caller in a timely manner. but actually the Classloader does already detects that misbehavior with a timeout and does already skip to further wait on it. So in fact the current implementation is broken to not wait on loading finished, while this PR fixes that.

@laeubi
Copy link
Contributor

laeubi commented Mar 7, 2024

So in fact the current implementation is broken to not wait on loading finished, while this PR fixes that.

This PR does not fix that it simply breaks out the classloader synchronization, another thread is not allowed to load additional classes (because that's what lazy-activation is ensuring) that does not mean that the current thread is not allowed to offload some work to other threads.

And that's what I have already written here you can:

  1. perform some work (e.g. I/O) in parallel threads from within the activator
  2. refactor the caller of any static ResourcesPlugin#getXXXXX to something that either uses DI (e.g. SCR with reference to IWorkspaceService) or by passing it as a parameter instead.
  3. do not load the data on startup but when it is actually accessed (e.g. markers can be loaded once code actually want to access the first marker)

both things will let you escape out of your classlaoderlock,improves the code and do not require to break the lazy-activation.

@jukzi
Copy link
Contributor Author

jukzi commented Mar 7, 2024

BTW the timeout (after 5 or 30sec) looks like

org.osgi.framework.BundleException: Unable to acquire the state change lock for the module: osgi.identity; type="osgi.bundle"; version:Version="3.20.200.qualifier"; osgi.identity="org.eclipse.core.resources"; singleton:="true" [id=110] STARTED [STARTED]
	at org.eclipse.osgi.container.Module.lockStateChange(Module.java:373)

Can be found for other plugins on stackoverflow aswell and the solution there was for example:

https://git.eclipse.org/r/c/sourceediting/webtools.sourceediting/+/154548/2/core/bundles/org.eclipse.wst.sse.core/src/org/eclipse/wst/sse/core/internal/SSECorePlugin.java

i.e starting a asynchronus thread in the BundleActivator.start() and now guess who create that commit?
So @iloveeclipse please explain why you think it is vaild there but not here.

@iloveeclipse
Copy link
Member

please explain why you think it is vaild there but not here.

One should not compare apples with oranges.

Here we are talking about code contract that is as old as Eclipse is, in a bundle that one of the basic building blocks of the SDK with lot of consumers that relied on the existing behavior, implicitly or explicitly.

Trying to gain some (not even quantified) startup improvement by ignoring possible compatibility break is not an option.

@laeubi
Copy link
Contributor

laeubi commented Mar 7, 2024

@jukzi As mentioned before you can fork a job for any loading in any component (if it is guaranteed that before the component access the results is joining on the job) and for example CheckMissingNaturesListener does exactly that, but here you do fork the whole workspace init into an arbitrary thread and because of that sync on arbitrary other threads.

Ans especially the case where lazy-activation thru calls to these methods happens because on classlaoding is a special case of activation, that is even problematic because it could result in "activation rush", but this is not to be fixed on the Activator side, it must be fixed on the Caller site!

So again, if any code calls ResourcePlugin.getXXX it is expected that before it can do so (that is where the clasloading starts) the activator is first run, and if an instance location is available it is initialized before the classlaoding call returns, that is because otherwise if the instance area is NOT set this is an error and must throw an exception. If one does not throw an exception it could happen that there are deadlocks of threads or even UI lockdowns, because of that it is discouraged to call the static methods.

@jukzi
Copy link
Contributor Author

jukzi commented Mar 7, 2024

Here we are talking about code contract that is as old as Eclipse is

The instanceLocationTracker was introduced just 2 year ago. I Ionly added the the async initialisation thread to make that still work even though barely nobody uses it but only the static method are used. For example Eclipse IDE and all its tests work without it, because the Resourceplugin is started in the UI thread as one of the first plugins anyway.

possible compatibility break

So what exactly do you think is breaking here?

@jukzi
Copy link
Contributor Author

jukzi commented Mar 7, 2024

Resourceplugin is started in the UI thread

More exactly it is implicitly started in IDEApplication.start(IApplicationContext) as the IDEWorkbenchAdvisor uses IContainer from org.eclipse.core.resources

@laeubi
Copy link
Contributor

laeubi commented Mar 7, 2024

The instanceLocationTracker was introduced just 2 year ago.

Please look how the code was before it simply FAILED, but it never did anything async... thats the point and as said you have two options:

  1. static access -> slow, discouraged, will run into classlaoder lock
  2. "async" / service oriented, will not lock

More exactly it is implicitly started in IDEApplication.start(IApplicationContext) as the IDEWorkbenchAdvisor uses IContainer from org.eclipse.core.resources

that's not true, this is one possible entry point it could be started earlier, e.g. if there is a Instance Area saved/set already set before and a bundle wants a workspace it will be started before the IDE shows up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants