Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapped validates_schema method is not called when .load is provided with a generator. #1898

Open
ziplokk1 opened this issue Oct 28, 2021 · 5 comments

Comments

@ziplokk1
Copy link

Issue

When passing a generator to .load, the wrapped validates_schema method is never called.

Expected

When passing a generator to .load, it should behave the same as passing a list or tuple and call the method wrapped by validates_schema.

Environment

python v3.9.5
marshmallow 3.13.0

Example

>>> from marshmallow import validates_schema, Schema, fields
>>> class MySchema(Schema):
...     fld = fields.String()
...     @validates_schema()
...     def _validate(self, data, **kwargs):
...         print('validate schema called')
>>> MySchema().load(({'fld': 'abc'}, {'fld': 'def'}), many=True)
validate schema called
validate schema called
[{'fld': 'abc'}, {'fld': 'def'}]
>>> def gen(): yield from ({'fld': 'abc'}, {'fld': 'def'})
>>> MySchema().load(gen(), many=True)
[{'fld': 'abc'}, {'fld': 'def'}]
@deckar01 deckar01 added the bug label Nov 11, 2021
@deckar01
Copy link
Member

I don't think this argument is intended to accept generator iterators. The Iterable annotation appears to be a documentation regression introduced in #1488. The annotation should have be Sequence.

@ziplokk1
Copy link
Author

That still poses the risk of the validator accepting a generator at runtime and not behaving as one would expect. Is it viable to implement an assertion or a warning if a generator is provided so that there is some form of runtime guard?

@deckar01
Copy link
Member

deckar01 commented Nov 11, 2021

Oh, there is a validator and it does explicitly allow any iterable.

def is_iterable_but_not_string(obj) -> bool:
"""Return True if ``obj`` is an iterable object that isn't a string."""
return (hasattr(obj, "__iter__") and not hasattr(obj, "strip")) or is_generator(obj)
def is_collection(obj) -> bool:
"""Return True if ``obj`` is a collection type, e.g list, tuple, queryset."""
return is_iterable_but_not_string(obj) and not isinstance(obj, Mapping)

I suspect we would want to wrap it in list rather than teeing it. It looks like dump() has the same issue.

@sirosen
Copy link
Contributor

sirosen commented May 26, 2022

I started on a patch to do the list() invocation, but I'm hesitating to submit it. The following change passes all tests:

diff --git a/src/marshmallow/schema.py b/src/marshmallow/schema.py
index ff119a4..449a45c 100644
--- a/src/marshmallow/schema.py
+++ b/src/marshmallow/schema.py
@@ -836,11 +836,22 @@ class Schema(base.SchemaABC, metaclass=SchemaMeta):
         unknown = unknown or self.unknown
         if partial is None:
             partial = self.partial
+        # before preprocessors, convert data to a list if it is an iterable and many=True
+        # this ensures that if a generator is used, it will be iterated only once
+        processed_data: typing.Mapping[str, typing.Any] | list[
+            typing.Mapping[str, typing.Any]
+        ] = data
+        if is_collection(data) and many:
+            processed_data = list(data)
         # Run preprocessors
         if self._has_processors(PRE_LOAD):
             try:
                 processed_data = self._invoke_load_processors(
-                    PRE_LOAD, data, many=many, original_data=data, partial=partial
+                    PRE_LOAD,
+                    processed_data,
+                    many=many,
+                    original_data=data,
+                    partial=partial,
                 )
             except ValidationError as err:
                 errors = err.normalized_messages()

But this means that if I have a pre_load(many=True) hook which is meant to consume a special collection type, you'll get a list instead. So it's a potentially breaking change in order to support a usage which I don't think was really meant to be supported.

I would prefer updating the annotation to take Sequence, tweak any docs appropriately, and treat this as an annotation inaccuracy. No runtime behavior changes.

As for checking and rejecting generators as inputs... we could, but you can also use other types which implement Iterator but not Sequence. I'm not convinced that an explicit is_generator check adds much value, even though it's easy to add.

@sloria
Copy link
Member

sloria commented Jan 19, 2025

#2795 addresses the incorrect annotation as well as the run-time type-checking in the code (i.e. checking isinstance(data, Sequence) instead of is_iterable_but_not_string which was too permissive). as a consequence a ValidationError is raised when a generator is passed.

@sloria sloria removed the bug label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants