Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Replace need dicts/lists with views (with fast filtering) #1281

Merged
merged 7 commits into from
Oct 1, 2024

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Sep 5, 2024

This PR addresses issues in performance scalability of needs filtering:

  • In a known user project, with ~40,000 needs, there are 5316 individual filter calls, across directives such as needarch, needtable, needpie, etc. Each takes on average 0.33 seconds, cumulative totalling 1751 seconds
  • Applying this PR reduces the average time to 0.033 seconds, cumulatively totalling 181 seconds

The main issue with filtering (a.k.a. querying) needs, is that it requires looping over every need and executing a Python evaluation of the filter string (e.g. id == "xxx"), which scales with O(N) time.

To reduce this to O(1) time, for common patterns, we look to do two things:

  1. More tightly control access to needs data across its lifecycle (see Better abstraction for accessing needs data (to improve scaling) #1264):
    By making needs data strictly immutable/read-only during the write/analysis phase (after it has been fully collected and post-processed), and moving access behind an abstract interface (as opposed to a standard dictionary),
    we can perform indexing on standard need fields, making value lookups O(1)

  2. Analysing the filter string for common patterns (by parsing/analysing the Python AST), we can then utilise index lookups rather than row scans in these cases (akin to SQL query plans).


One complication in creating the indexes and abstract interfaces, is that within the code base there are essentially two ways of accessing needs data:

  • As a mapping of need ID to need data (now NeedsView)
  • As a list of both needs and "expanded" parts (now NeedsAndPartsListView)

Particularly when it comes to e.g. id == "xxx", this then has different meanings, as the former is only filtering for need IDs and the latter is filtering for both Need IDs and part IDs


It is of note that this will be breaking for any projects that currently attempts to mutate needs data within the write phase.
This has been mitigated by the addition of two sphinx events, which give more precise control for this use case:

  • needs-before-post-processing: callbacks func(app, needs) are called just before the needs are post-processed (e.g. processing dynamic functions and back links)
  • needs-before-sealing: callbacks func(app, needs) just after post-processing, before the needs are changed to read-only

Additionally, env.needs_all_needs has been replaced with env._needs_all_needs, since this "raw" data-structure should not be accessed directly.

The get_needs_view function has been added to the sphinx_needs.api to mitigate this, and it should perhaps be made clearer that users should NOT be accessing any sphinx_needs API outside this module.


Because filter strings are (Turing complete) Python code, it is fundamentally impossible to fully process all such strings via AST parsing.
The new analysis code could be improved to recognise more patterns in the future, but really it would be ideal to move to a more "well-defined" filter syntax, such as the SQLite boolean expressions.

Additionally, it may be ultimately beneficial to move the storage of needs data from an in-memory Python structure, to something like an sqlite database.


Note, a follow-up PR will look to add configuration for warning about particularly long-running filtering code, so that it may be accessed for improvements.

Copy link

codecov bot commented Sep 5, 2024

Codecov Report

Attention: Patch coverage is 90.54441% with 33 lines in your changes missing coverage. Please review.

Project coverage is 87.08%. Comparing base (4e10030) to head (abed2c1).
Report is 93 commits behind head on master.

Files with missing lines Patch % Lines
sphinx_needs/views.py 92.30% 14 Missing ⚠️
sphinx_needs/filter_common.py 88.50% 10 Missing ⚠️
sphinx_needs/data.py 80.00% 4 Missing ⚠️
sphinx_needs/functions/common.py 50.00% 2 Missing ⚠️
sphinx_needs/api/need.py 66.66% 1 Missing ⚠️
sphinx_needs/directives/needflow/_shared.py 88.88% 1 Missing ⚠️
sphinx_needs/roles/need_count.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1281      +/-   ##
==========================================
+ Coverage   86.87%   87.08%   +0.20%     
==========================================
  Files          56       61       +5     
  Lines        6532     7224     +692     
==========================================
+ Hits         5675     6291     +616     
- Misses        857      933      +76     
Flag Coverage Δ
pytests 87.08% <90.54%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chrisjsewell chrisjsewell changed the title ♻️ Replace need dicts/lists with views ♻️ Replace need dicts/lists with views (with fast filtering) Sep 5, 2024
@chrisjsewell chrisjsewell force-pushed the typing-needs-parts-filter-views branch from 62590a8 to 98e1a3c Compare September 5, 2024 22:16
Copy link
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pretty much like the idea of this. Users need to be informed about these capabilities in the docs, so they don't end up looping over dict(SphinxNeedsData(app.env).get_needs_view()) recursively to filter things.
needs_all_needs has to be deprecated and then removed.

sphinx_needs/views.py Outdated Show resolved Hide resolved
return self._needs


class NeedsAndPartsListView:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would I get all parts of a certain need?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it would be via;

def iter_need_parts(need: NeedsInfoType) -> Iterable[NeedsInfoType]:

although to note, this is not currently part of the "public API" (and never has been)

to note also, this NeedsAndPartsListView view is added separately so as not to be back-breaking for current user code (see #1264),
if it was not for this, I would probably have just added an iter method to NeedsView

@chrisjsewell
Copy link
Member Author

Users need to be informed about these capabilities in the docs

yep, I've already done this is a little in this PR, although obviously want to clarify the PR/API before bothering to add more docs.

needs_all_needs has to be deprecated and then removed.

I would note that sphinx.BuildEnvironment.needs_all_needs was never part of the "public API" 😅,
i.e. not in the sphinx_needs.api module,
so I would question how much it really needs to be "deprecated" (PACE/bosch have already agreed that it is "their problem" if they are using anything like this)

The needs, at least fo now, will still be stored on the environment, but perhaps changing it to sphinx.BuildEnvironment._needs_all_needs, with the _ will make it clearer that this is not for public access

@chrisjsewell chrisjsewell force-pushed the typing-needs-parts-filter-views branch from dd79aa3 to fb77726 Compare September 6, 2024 20:43
@chrisjsewell chrisjsewell force-pushed the typing-needs-parts-filter-views branch 3 times, most recently from f1b6fa1 to ee2c72c Compare September 26, 2024 18:51
@chrisjsewell chrisjsewell force-pushed the typing-needs-parts-filter-views branch 3 times, most recently from 4909bf3 to 81f4789 Compare September 26, 2024 20:54
@chrisjsewell chrisjsewell force-pushed the typing-needs-parts-filter-views branch 3 times, most recently from 1c2533f to 0fb692c Compare September 28, 2024 01:05
@chrisjsewell chrisjsewell force-pushed the typing-needs-parts-filter-views branch 2 times, most recently from 64a6d92 to 8a863c1 Compare September 30, 2024 22:06
@chrisjsewell chrisjsewell force-pushed the typing-needs-parts-filter-views branch from 8a863c1 to ec62d7f Compare September 30, 2024 22:59
@chrisjsewell chrisjsewell marked this pull request as ready for review October 1, 2024 10:09
@chrisjsewell chrisjsewell requested review from danwos and ubmarco October 1, 2024 10:18
Copy link
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an awesome PR, approved!

@chrisjsewell chrisjsewell merged commit cb03029 into master Oct 1, 2024
20 checks passed
@chrisjsewell chrisjsewell deleted the typing-needs-parts-filter-views branch October 1, 2024 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants