Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch] Escape database queries #1735

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

[patch] Escape database queries #1735

wants to merge 4 commits into from

Conversation

pmrv
Copy link
Contributor

@pmrv pmrv commented Jan 15, 2025

Queries containing underscores take significantly longer when used in LIKE sql queries, often this is the result of filenames or directory paths containing them and not intended.

If the sqlite backend is not used, no escaping is done.

pmrv and others added 2 commits January 15, 2025 14:12
Queries containing underscores take significantly longer when used in
LIKE sql queries, often this is the result of filenames or directory
paths containing them and not intended.

If the sqlite backend is not used, no escaping is done.
@jan-janssen jan-janssen added the format_black reformat the code using the black standard label Jan 15, 2025
@@ -880,10 +900,10 @@ def get_items_dict(
self.conn.connection.create_function("like", 2, self.regexp)

result = self.conn.execute(query)
row = result.fetchall()
results = [row._asdict() for row in result.fetchall()]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slightly faster (~10%) alternative would be

Suggested change
results = [row._asdict() for row in result.fetchall()]
results = result.mappings().all()

but the result is then a list of sqlalchemy objects that behave like dicts, not standard dicts. One can also feed result.mappings() directly into a DataFrame directly, but that would change api more dramatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add some numbers for a query on a project with ~2M jobs:

  1. ~10s for the plain sqlalchemy query
  2. ~20s for a call to get_items_dict
  3. ~30s for a call to job_table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database reports around 3s for your 2024% queries. The rest seems to be sqlalchemy and pyiron.

@pmrv
Copy link
Contributor Author

pmrv commented Jan 15, 2025

Fixes #1725, but the performance gain is mainly eaten by the need to transform the query result into a list of dicts and then to a DataFrame.

@pmrv pmrv added integration Start the integration tests with pyiron_atomistics/contrib for this PR patch backward compatible bug fixes labels Jan 15, 2025
@pmrv pmrv requested review from jan-janssen and XzzX January 15, 2025 14:42
@github-actions github-actions bot changed the title Escape database queries [patch] Escape database queries Jan 15, 2025
@pmrv pmrv marked this pull request as ready for review January 15, 2025 14:42
Comment on lines +325 to +326
if c in s:
s = s.replace(escape_char + c, c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you are doing it, but I wonder if this case ever happens. Do you have an example when this is the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I had just asked ChatGPT to make sure it handles that case and it seemed to work in my testing.

Copy link
Member

@jan-janssen jan-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@niklassiemer niklassiemer removed the format_black reformat the code using the black standard label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration Start the integration tests with pyiron_atomistics/contrib for this PR patch backward compatible bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants