You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It happens that we have connection issues to our MariaDB. This is an intrinsic problem of having multiple components in a software system. Networks, databases and in general, software fail.
The last episode of connectivity issue has been a crash while executing "_fire_queries" from blocking.py. MariaDB was rejecting connections because it was overloaded. Soweego relies strongly on multithreading and mix-n-match works on our same database, thus maxing out the pool of connections can happen frequently.
It's undeniable that a pipeline that takes weeks to compute the results, cannot fail just for a temporary malfunction.
I propose a new design approach in our code. It should be tried in the blocking code cited above.
The main idea is having a queue of queries shared among the thread.
Each thread draws one of them and runs it.
Running a query has three possible outcomes:
Success: everything goes on as designed.
Failure: something went wrong, but it's because of some temporary issues. Connectivity issues belong to this case.
Fatal error: the query is somehow malformed and needs to be thrown away.
After a failure, we enqueue the failed query. After a timeout, we can then dequeuing the next query.
This timeout should grow if the failures are consequential and reset to the default value a success happens.
It would be great building a reusable component that will act as a black box for the developers.
This is the overall design idea, but I'm sure the implementation will hold some interesting challenges.
The text was updated successfully, but these errors were encountered:
It happens that we have connection issues to our MariaDB. This is an intrinsic problem of having multiple components in a software system. Networks, databases and in general, software fail.
The last episode of connectivity issue has been a crash while executing "_fire_queries" from blocking.py. MariaDB was rejecting connections because it was overloaded. Soweego relies strongly on multithreading and mix-n-match works on our same database, thus maxing out the pool of connections can happen frequently.
It's undeniable that a pipeline that takes weeks to compute the results, cannot fail just for a temporary malfunction.
I propose a new design approach in our code. It should be tried in the blocking code cited above.
The main idea is having a queue of queries shared among the thread.
Each thread draws one of them and runs it.
Running a query has three possible outcomes:
Success: everything goes on as designed.
Failure: something went wrong, but it's because of some temporary issues. Connectivity issues belong to this case.
Fatal error: the query is somehow malformed and needs to be thrown away.
After a failure, we enqueue the failed query. After a timeout, we can then dequeuing the next query.
This timeout should grow if the failures are consequential and reset to the default value a success happens.
It would be great building a reusable component that will act as a black box for the developers.
This is the overall design idea, but I'm sure the implementation will hold some interesting challenges.
The text was updated successfully, but these errors were encountered: