-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Bake faster #728
base: master
Are you sure you want to change the base?
WIP: Bake faster #728
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick findings. I wonder if it would be better to use a global ExSvc instead of shutting new ones down all the time (see comments about waiting).
private DocumentList<DocumentModel> query(String sql) { | ||
private synchronized DocumentList<DocumentModel> query(String sql) { | ||
activateOnCurrentThread(); | ||
OResultSet results = db.query(sql); | ||
return DocumentList.wrap(results); | ||
} | ||
|
||
private DocumentList<DocumentModel> query(String sql, Object... args) { | ||
private synchronized DocumentList<DocumentModel> query(String sql, Object... args) { | ||
activateOnCurrentThread(); | ||
OResultSet results = db.command(sql, args); | ||
return DocumentList.wrap(results); | ||
} | ||
|
||
private void executeCommand(String query, Object... args) { | ||
private synchronized void executeCommand(String query, Object... args) { | ||
activateOnCurrentThread(); | ||
db.getTransaction().begin(); | ||
db.command(query, args); | ||
db.getTransaction().commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use synchronize here? Instead, you could catch OConcurrentModificationException
and retry.
See: https://orientdb.org/docs/3.0.x/general/Concurrency.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That could be done. But currently we have only this one Contentstore that is passed around everywhere, so all Threads use the same db session, which can lead to a deadlock especially when adding new documents.
I think It would be better to restructure the contentstore to have one single orient instance and each thread uses it's own database session. Like they recommend in the docs.
That's a bit more work that needs to be done. So I decided to keep it simple first and synchronize the potential critical parts.
try { | ||
try (Writer out = createWriter(outputFile)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this double-nesting needed? I do not see a reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup mistake. Can be merged together. Thanks for the hint.
utensils.getRenderer().shutdown(); | ||
errors.addAll(asset.getErrors()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shutdown() does not wait for the ES to run all threads until finished (ie non-blocking operation). You need for all threads to finish, or you will not get all asset errors.
} catch (InterruptedException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Re-Interrupt
- use logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. That's right. A log message would be better. And thinking about it we also miss a shutdown handler to shutdown all executors directly.
@@ -216,7 +217,7 @@ private void resetDocumentTypesAndExtractors() { | |||
/** | |||
* Load {@link RenderingTool} instances and delegate rendering of documents to them | |||
*/ | |||
private void renderContent() { | |||
private void renderContent() throws InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method never checks if it got interrupted. The for loop would be a sane choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm....my thought was....if an interrupt happens, better stop processing and fail fast....but yes. could be handled in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there could be an interrupt in a method call inside the loop. But what happens if the interrupt happens on the "jump back" instruction? I am by no means an expert with Threading and Interrupts, but I think adding a simple check in each iteration would be a good idea:
if (Thread.isInterrupted()) { /* LOG, then */ throw new InterruptedException(); }
public void shutdown() throws InterruptedException { | ||
this.excutor.shutdown(); | ||
this.excutor.awaitTermination(10, TimeUnit.MINUTES); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you made it synchronous. Why wait here? Better name it "shutdownAndWait" then, because usually the shutdown method does not wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will rename it to make the intention clearer. cooldown would be another idea to name it to keep the oven analogy.
Like at the end of the baking process you let the oven cooldown and afterwards cleanup the dirt that may have happened.
This is a Proof of Concept to speed up the baking process using parallel execution.
It speeds up the following three core activities:
This is just a quick sketch to start a discussion on possible solutions on how to get more performance out of the baking process.
I kept it as simple as possible to just get it working.
But the code needs to be cleaned up/improved a little bit. And there are missing tests regarding possible side effects.
I think it would be better to walk the file tree instead of using a parallel stream with recursive calls for example.
But first I would be happy to know what others think about the current approach.