Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up of compare_values and has_value methods #101

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RinkeHoekstra
Copy link

I noticed very slow performance on the to_rdf procedure for a JSON-LD file with several tens of thousands of typed object values for a single property.

Running cProfiles, it turned out that there was an inordinate amount of type spent in the methods compare_values and has_value.

This pull request introduces the following changes:

  • Re-implemented compare_values with exception handling rather than if-then statements. Also changed the ordering and removed the boolean comparison between primitive values (It may need to return, but I couldn't understand the reason behind it)
  • Re-implemented the has_value method (which called compare_values so very frequently) to perform checks only once, and only compare values of the same type.

There's also a question on the way the has_value is implemented: it seems that if the value parameter that's passed is an array, it is completely ignored. Is that the correct behavior?

@davidlehn
Copy link
Member

Thanks, I'll take a look when I get a chance. May need to finish getting the test suite up-to-date so we can check if the changes are ok.

Can you give a brief example of the form of data that was slow? Just one or two properties is fine, not 10000+ :-) I've started setting up a benchmarking system and issues like this are useful target for auto-generating test data inputs.

@RinkeHoekstra
Copy link
Author

RinkeHoekstra commented Oct 22, 2020

I'd like to bump this as it fixes major issues we're facing internally, with slow framing on large JSON-LD files.

I will try to generate some artificial data (large amount of data, few properties).

@dvsrepo
Copy link

dvsrepo commented Dec 3, 2020

Hi! Any plans to include this? We are also facing issues with framing large JSON-LD files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants