-
-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of the latest_of_each method of the manager #1360
Conversation
@bagababaoglu did you happen to also test this on a mysql database? |
@bagababaoglu did you compare the Subquery approach that's not vendor specific to this approach? |
@tim-schilling yes, I tested it for all databases using
Or do you mean if I have tested the performance on mysql database? In that case, no I haven't as I had the data for Postgresql only. Do you think the performance improvement would be different for mysql? |
@tim-schilling I did now. Here are the results: This is the current version with not exists subquery
This is for the id__in filter and id subquery
Although the difference is not that big, not exists subquery is still performing better. IMO, it is better to avoid |
Thank you! I'm in favor of this. @ddabble thoughts? |
Ooh, that's a satisfying simplification! I'll check it out more closely sometime this week :) |
This makes the method available for use in other test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice, great job! 😄
@bagababaoglu Do you have any feedback on the changes I made? 🙂 Did I misrepresent anything in the comments/descriptions?
Thank you for your additions 🙏 They look fine to me. |
Description
latest_of_each
method of the manager is currently getting all the latestpks
and using apk__in
filter to determine final results. Butin
filter is performing bad, especially when there are a lot of rows in the db. Every item in the database needs to be compared by this long list. This can be avoided by using another subquery which would annotate the existence of later items and then it can be used to filter the query by this field to get latest of each item.This also removes the need to have database specific implementation as the resulting query should work on each.
Related Issue
Currently there are no related issues.
Motivation and Context
Improves performance for
latest_of_each
method which is necessary for history tables which are having a lot of data.How Has This Been Tested?
Existing tests for the method has been used.
Results from query analysis which ran on postgresql. The table 1.8 million entries.
This is the cost analysis for the improved query
This is the cost analysis for the normal query
According to these results, there is %84,7338936 improvement.
Screenshots (if appropriate):
Types of changes
Checklist:
pre-commit run
command to format and lint.AUTHORS.rst
CHANGES.rst