Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dictionary hashing with pydantic models as keys #404

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

apost71
Copy link

@apost71 apost71 commented Dec 13, 2024

PR to address issue #392

for key in context_dict:
if isinstance(key, BaseModel):
context_dict[get_qualified_name(key)] = context_dict[key]
del context_dict[key]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this change. This is not idempotent (its changing the context dict without the user knowing, and not changing it back) -- tbh, probably better to just do hash(str(context_dict) + hash_secret) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, updated and added a test

@coveralls
Copy link

coveralls commented Dec 14, 2024

Coverage Status

coverage: 73.778% (+0.06%) from 73.723%
when pulling 049dd98 on apost71:main
into 5c3b425 on run-llama:main.

@masci
Copy link
Member

masci commented Dec 16, 2024

I included in this PR the fix for the rabbitmq e2e tests, so the ones failing now are actual failures related to this changeset Please ignore, I see the same failure on main, having a look now.

@apost71
Copy link
Author

apost71 commented Dec 17, 2024

I included in this PR the fix for the rabbitmq e2e tests, so the ones failing now are actual failures related to this changeset Please ignore, I see the same failure on main, having a look now.

I pushed a change that I think fixed the tests but I don't think I fixed the underlying issue - it seems like the hash is being computed by some other process that I can't find. The model_validator should look like this:

    @model_validator(mode="before")
    @classmethod
    def validate_hash(cls, data: Any) -> Any:
        if isinstance(data, dict):
            if "hash" not in data and "state" in data:
                data["hash"] = hash(str(data["state"]) + hash_secret)
        return data

but the tests fail if I allow that hash to be provided

@apost71
Copy link
Author

apost71 commented Dec 18, 2024

@masci I discovered the issue is that the pydantic dict validation can make some small changes to the structure, specifically it was changing a tuple to a sequence that was breaking the hash checks. By making the hash a computed field in the pydantic model we can ensure that the hash is always computed based on the validated dict. I also changed the hashing to use zlib.adler32 because the vanilla hash method is based on memory locations which could cause issues if the hashes are ever computed in different processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants