Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Bake faster #728

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

ancho
Copy link
Member

@ancho ancho commented Oct 3, 2021

This is a Proof of Concept to speed up the baking process using parallel execution.

It speeds up the following three core activities:

  • crawling
  • rendering
  • asset management (copy stuff around)

This is just a quick sketch to start a discussion on possible solutions on how to get more performance out of the baking process.

I kept it as simple as possible to just get it working.
But the code needs to be cleaned up/improved a little bit. And there are missing tests regarding possible side effects.
I think it would be better to walk the file tree instead of using a parallel stream with recursive calls for example.

But first I would be happy to know what others think about the current approach.

Copy link

@bmarwell bmarwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick findings. I wonder if it would be better to use a global ExSvc instead of shutting new ones down all the time (see comments about waiting).

Comment on lines -293 to +302
private DocumentList<DocumentModel> query(String sql) {
private synchronized DocumentList<DocumentModel> query(String sql) {
activateOnCurrentThread();
OResultSet results = db.query(sql);
return DocumentList.wrap(results);
}

private DocumentList<DocumentModel> query(String sql, Object... args) {
private synchronized DocumentList<DocumentModel> query(String sql, Object... args) {
activateOnCurrentThread();
OResultSet results = db.command(sql, args);
return DocumentList.wrap(results);
}

private void executeCommand(String query, Object... args) {
private synchronized void executeCommand(String query, Object... args) {
activateOnCurrentThread();
db.getTransaction().begin();
db.command(query, args);
db.getTransaction().commit();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use synchronize here? Instead, you could catch OConcurrentModificationException and retry.

See: https://orientdb.org/docs/3.0.x/general/Concurrency.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That could be done. But currently we have only this one Contentstore that is passed around everywhere, so all Threads use the same db session, which can lead to a deadlock especially when adding new documents.

I think It would be better to restructure the contentstore to have one single orient instance and each thread uses it's own database session. Like they recommend in the docs.

That's a bit more work that needs to be done. So I decided to keep it simple first and synchronize the potential critical parts.

Comment on lines +51 to +52
try {
try (Writer out = createWriter(outputFile)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this double-nesting needed? I do not see a reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup mistake. Can be merged together. Thanks for the hint.

Comment on lines +177 to 178
utensils.getRenderer().shutdown();
errors.addAll(asset.getErrors());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdown() does not wait for the ES to run all threads until finished (ie non-blocking operation). You need for all threads to finish, or you will not get all asset errors.

Comment on lines +186 to +187
} catch (InterruptedException e) {
e.printStackTrace();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Re-Interrupt
  • use logger

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That's right. A log message would be better. And thinking about it we also miss a shutdown handler to shutdown all executors directly.

@@ -216,7 +217,7 @@ private void resetDocumentTypesAndExtractors() {
/**
* Load {@link RenderingTool} instances and delegate rendering of documents to them
*/
private void renderContent() {
private void renderContent() throws InterruptedException {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method never checks if it got interrupted. The for loop would be a sane choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm....my thought was....if an interrupt happens, better stop processing and fail fast....but yes. could be handled in the loop.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there could be an interrupt in a method call inside the loop. But what happens if the interrupt happens on the "jump back" instruction? I am by no means an expert with Threading and Interrupts, but I think adding a simple check in each iteration would be a good idea:

if (Thread.isInterrupted()) { /* LOG, then */ throw new InterruptedException(); }

Comment on lines +304 to +307
public void shutdown() throws InterruptedException {
this.excutor.shutdown();
this.excutor.awaitTermination(10, TimeUnit.MINUTES);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you made it synchronous. Why wait here? Better name it "shutdownAndWait" then, because usually the shutdown method does not wait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename it to make the intention clearer. cooldown would be another idea to name it to keep the oven analogy.
Like at the end of the baking process you let the oven cooldown and afterwards cleanup the dirt that may have happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants