Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeds up module.add_debug_info and add_metadata by 2x #887

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

bslatkin
Copy link
Contributor

@bslatkin bslatkin commented Oct 31, 2022

  • Only generates the key for module._metadatacache a single time instead of twice on cache miss
  • Avoids extra assignments in NameScope._basenamemap when there are name conflicts
  • Caches computed hashes for MDValue and DIValue instances, while ensuring they'll still pickle
  • Adds a test to measure performance of this path so regressions will be easy to spot
  • Adds a test to make sure pickling behavior continues to work

In my own compiler's codebase, this change reduces the time it takes for my debug info regression test to complete from 180 seconds down to 20 seconds. The more nesting of metadata there is, the larger the effect.

@bslatkin
Copy link
Contributor Author

Looks like I need to run flake8 and address some style issues. Will do that shortly.

@gmarkall
Copy link
Member

gmarkall commented Nov 1, 2022

Thanks for the PR! I'm curious about

In my own compiler's codebase, ...

Is there any public info about your compiler?

@esc esc added this to the v0.40.0 milestone Nov 1, 2022
@bslatkin
Copy link
Contributor Author

bslatkin commented Nov 1, 2022

It's not public yet! Hopefully early next year I'll publish it for people to see. There are a couple of sneak peeks here:

https://twitter.com/haxor/status/1580959518984241158
https://twitter.com/haxor/status/1582878447134572544

I will also share the private project with you, Graham, so you can take a look for context.

The part that this PR specifically improves is how I assign builder.debug_metadata to a new value on nearly every token in the source file during code generation (see the file src/code_generator.py in the private project). Doing this provides excellent debug information when using tools like lldb, but it slows down deduplicating DIValues.

Thank you for looking and the attention on this PR!

@bslatkin
Copy link
Contributor Author

bslatkin commented Nov 7, 2022

Is there anything else you need from me on this PR? Just making sure. Thank you for the reviews!

@gmarkall
Copy link
Member

gmarkall commented Nov 7, 2022

Thanks for checking in - it's taking me a little while to review as I'm unfamiliar with most of the parts that are changed in this PR - apologies for the delay. I do have some review notes drafted but I'd like to finish the review before I post them.

Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the PR, and apologies for the delay in reviewing.

The performance measurement reported by test_debug_info_caching is more than doubled (execution time is less than half) when using this patch on my machine, but I can see how it has even more performance benefit for real-world use cases (as you report in the description). I think this is a good change overall.

I have a few questions / comments, which are mainly on the diff.

In addition, I think the debug metadata performance measurement is valuable to have because it'll enable us to work out where a performance regression occurred (if we notice one) but I don't think we can keep it in the unit test suite itself - generally the test suite should not output anything, and the test more than doubles the testsuite execution time on my system. Could you please:

  • Copy the code from test_debug_info_caching into a standalone example in the examples directory, complete with timing and output?
  • Remove the timing and printing from test_debug_info_metadata, and have the test code only execute a single time (as opposed to 10) so that the test takes less time? I think it should be possible to simplify the test and make it much more "straight-line code" since separate functions won't be needed for timeit in the context of the test.

I hope the comments are clear and helpful - please let me know if any clarification or additional information would be helpful.

@@ -8,7 +7,7 @@ class DuplicatedNameError(NameError):
class NameScope(object):
def __init__(self):
self._useset = set([''])
self._basenamemap = defaultdict(int)
self._basenamemap = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a defaultdict slower than a normal dict? It seems that making this change here requires you to then emulate a default dict below - is there a reason not to keep it as a defaultdict?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Follow-up - I tried reverting the changes in this file and it seemed to make no performance difference in test_debug_info_caching)

Comment on lines +26 to +29
try:
ident = self._basenamemap[basename]
except KeyError:
ident = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the emulation of a default dict I mentioned in the comment above - could it just be

Suggested change
try:
ident = self._basenamemap[basename]
except KeyError:
ident = 0
ident = self._basenamemap[basename]

with _basenamemap remaining a defaultdict?

@@ -667,6 +667,7 @@ def __init__(self, parent, values, name):
types.MetaDataType(),
name=name)
self.operands = tuple(values)
self.hash_cache = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like hash_cache is an implementation detail rather than an attribute we'd want to expose - should this be _hash_cache instead?

# Ensure that the hash is not cached between Python invocations
# due to pickling or other serialization. The hash seed changes
# which will cause these not to match.
self.hash_cache = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd to mutate state inside __getstate__ - do you see any downside to removing the hash cache from a copy of the dict?

Comment on lines +702 to +703
self.hash_cache = None
return self.__dict__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

Suggested change
self.hash_cache = None
return self.__dict__
state = self.__dict__.copy()
state['hash_cache'] = None
return state

@@ -725,6 +735,7 @@ def __init__(self, parent, is_distinct, kind, operands, name):
self.is_distinct = is_distinct
self.kind = kind
self.operands = tuple(operands)
self.hash_cache = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment on the name to above, i.e. should it be _hash_cache?

# Ensure that the hash is not cached between Python invocations
# due to pickling or other serialization. The hash seed changes
# which will cause these not to match.
self.hash_cache = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should this invalidate the hash cache on a copy?


print('test_debug_info_performance took', total_time, 'to finish')

self.assertEqual(100004, len(mod._metadatacache))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the expected result could do with a comment explaining its derivation - it took me a while to work out that it's 100000 locations, plus one file, one compile unit, one subprogram, and [di_subprogram, di_subprogram].

md = self._metadatacache[key]

fixed_operands = self._fix_metadata_operands(operands)
key = hash(tuple(fixed_operands))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description, I see

Only generates the key for module._metadatacache a single time instead of twice on cache miss

I'm not sure I can understand where it was generated twice before - am I missing that somewhere?

In any case, I do think the changes here are an improvement - reverting the specific changes to this file does slow down the performance test in test_debug_info_caching below.

@bslatkin
Copy link
Contributor Author

bslatkin commented Feb 4, 2023

Hi @gmarkall! My apologies for the delay. This is great feedback. I'll start working through it and update the PR once I've addressed everything. Thank you

@gmarkall gmarkall modified the milestones: v0.40.0, v0.41.0 Mar 28, 2023
@esc
Copy link
Member

esc commented Jul 10, 2023

@bslatkin Hi! and thank you for your initial work on this. I am the release manager for 0.41.x and am checking on the status of this PR. I see that it has been in progress for quite some time. 0.41.x is now scheduled to be tagged during the week 7th August 2023, which is in 4 weeks time. What do you think, will this PR make it by then?

@bslatkin
Copy link
Contributor Author

Thank you for the encouragement! I'm behind where I wanted to be on this, obviously :) I'll give it a try to turn it around before then. But I'm not sure how much additional review it might take, especially during the summer time. So perhaps I should aim for the release after that?

@esc
Copy link
Member

esc commented Jul 10, 2023

Thank you for the encouragement! I'm behind where I wanted to be on this, obviously :) I'll give it a try to turn it around before then. But I'm not sure how much additional review it might take, especially during the summer time. So perhaps I should aim for the release after that?

It's up to you, just let me know what you prefer. Reviewers are the most scarce resource in the Numba ecosystem so I share your gut feeling that perhaps the 0.42 release is a more realistic goal.

Here is the Gnatt chart for the next few months, which shows the approximate tag dates for Numba 0.59 and 0.60 -- these will coincide with llvmlite releases, as the two libraries are released in lockstep:

numba/numba#8971

Effectively, the next releases will be in the Nov 2023 (0.42.x) and Feb 2024 (0.43.x) time-frame.

@esc esc removed this from the v0.41.0rc1 milestone Jul 13, 2023
@gmarkall
Copy link
Member

@bslatkin Hope you are well - it's been a while! I'm having a bit of a tidy-up, and checking in - do you see a future for this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants