-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[patch] Escape database queries #1735
base: main
Are you sure you want to change the base?
Conversation
Queries containing underscores take significantly longer when used in LIKE sql queries, often this is the result of filenames or directory paths containing them and not intended. If the sqlite backend is not used, no escaping is done.
for more information, see https://pre-commit.ci
@@ -880,10 +900,10 @@ def get_items_dict( | |||
self.conn.connection.create_function("like", 2, self.regexp) | |||
|
|||
result = self.conn.execute(query) | |||
row = result.fetchall() | |||
results = [row._asdict() for row in result.fetchall()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slightly faster (~10%) alternative would be
results = [row._asdict() for row in result.fetchall()] | |
results = result.mappings().all() |
but the result is then a list of sqlalchemy objects that behave like dicts, not standard dicts. One can also feed result.mappings()
directly into a DataFrame directly, but that would change api more dramatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add some numbers for a query on a project with ~2M jobs:
- ~10s for the plain sqlalchemy query
- ~20s for a call to get_items_dict
- ~30s for a call to job_table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
database reports around 3s for your 2024%
queries. The rest seems to be sqlalchemy and pyiron.
Fixes #1725, but the performance gain is mainly eaten by the need to transform the query result into a list of dicts and then to a DataFrame. |
if c in s: | ||
s = s.replace(escape_char + c, c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you are doing it, but I wonder if this case ever happens. Do you have an example when this is the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I had just asked ChatGPT to make sure it handles that case and it seemed to work in my testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Queries containing underscores take significantly longer when used in LIKE sql queries, often this is the result of filenames or directory paths containing them and not intended.
If the sqlite backend is not used, no escaping is done.