-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
♻️ Replace need dicts/lists with views (with fast filtering) #1281
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1281 +/- ##
==========================================
+ Coverage 86.87% 87.08% +0.20%
==========================================
Files 56 61 +5
Lines 6532 7224 +692
==========================================
+ Hits 5675 6291 +616
- Misses 857 933 +76
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
62590a8
to
98e1a3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pretty much like the idea of this. Users need to be informed about these capabilities in the docs, so they don't end up looping over dict(SphinxNeedsData(app.env).get_needs_view())
recursively to filter things.
needs_all_needs
has to be deprecated and then removed.
return self._needs | ||
|
||
|
||
class NeedsAndPartsListView: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would I get all parts of a certain need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it would be via;
def iter_need_parts(need: NeedsInfoType) -> Iterable[NeedsInfoType]: |
although to note, this is not currently part of the "public API" (and never has been)
to note also, this NeedsAndPartsListView
view is added separately so as not to be back-breaking for current user code (see #1264),
if it was not for this, I would probably have just added an iter method to NeedsView
yep, I've already done this is a little in this PR, although obviously want to clarify the PR/API before bothering to add more docs.
I would note that The needs, at least fo now, will still be stored on the environment, but perhaps changing it to |
dd79aa3
to
fb77726
Compare
f1b6fa1
to
ee2c72c
Compare
4909bf3
to
81f4789
Compare
1c2533f
to
0fb692c
Compare
64a6d92
to
8a863c1
Compare
8a863c1
to
ec62d7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an awesome PR, approved!
This PR addresses issues in performance scalability of needs filtering:
needarch
,needtable
,needpie
, etc. Each takes on average 0.33 seconds, cumulative totalling 1751 secondsThe main issue with filtering (a.k.a. querying) needs, is that it requires looping over every need and executing a Python evaluation of the filter string (e.g.
id == "xxx"
), which scales withO(N)
time.To reduce this to
O(1)
time, for common patterns, we look to do two things:More tightly control access to needs data across its lifecycle (see Better abstraction for accessing needs data (to improve scaling) #1264):
By making needs data strictly immutable/read-only during the write/analysis phase (after it has been fully collected and post-processed), and moving access behind an abstract interface (as opposed to a standard dictionary),
we can perform indexing on standard need fields, making value lookups
O(1)
Analysing the filter string for common patterns (by parsing/analysing the Python AST), we can then utilise index lookups rather than row scans in these cases (akin to SQL query plans).
One complication in creating the indexes and abstract interfaces, is that within the code base there are essentially two ways of accessing needs data:
NeedsView
)NeedsAndPartsListView
)Particularly when it comes to e.g.
id == "xxx"
, this then has different meanings, as the former is only filtering for need IDs and the latter is filtering for both Need IDs and part IDsIt is of note that this will be breaking for any projects that currently attempts to mutate needs data within the write phase.
This has been mitigated by the addition of two sphinx events, which give more precise control for this use case:
needs-before-post-processing
: callbacksfunc(app, needs)
are called just before the needs are post-processed (e.g. processing dynamic functions and back links)needs-before-sealing
: callbacksfunc(app, needs)
just after post-processing, before the needs are changed to read-onlyAdditionally,
env.needs_all_needs
has been replaced withenv._needs_all_needs
, since this "raw" data-structure should not be accessed directly.The
get_needs_view
function has been added to thesphinx_needs.api
to mitigate this, and it should perhaps be made clearer that users should NOT be accessing anysphinx_needs
API outside this module.Because filter strings are (Turing complete) Python code, it is fundamentally impossible to fully process all such strings via AST parsing.
The new analysis code could be improved to recognise more patterns in the future, but really it would be ideal to move to a more "well-defined" filter syntax, such as the SQLite boolean expressions.
Additionally, it may be ultimately beneficial to move the storage of needs data from an in-memory Python structure, to something like an sqlite database.
Note, a follow-up PR will look to add configuration for warning about particularly long-running filtering code, so that it may be accessed for improvements.