-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exception: "ancestor argument should match namespace" during /_ah/pipeline/output #42
Comments
for now I'm forcing use_barrier_indexes = False so that it uses the legacy code-path... then I can at least get things deployed |
Good call using If we move to fix this, it may be surprising behavior for others. With all of that noted, I want to make sure I understand the problem: The datastore is complaining because your using an ancestor query, where the namespace of the ancestor does not match the namespace passed to the query (defaults to empty string). (https://cloud.google.com/appengine/docs/python/ndb/queryclass) I believe the fix may be as simple as changing this line in |
when you say "defaults to empty" I don't think that's what is happening... instead I believe the query is defaulting to the namespace of the task that called it (child task which was in namespace "1" called /output which triggered BarrierHandler), but the ancestor key is from the root pipeline's namespace (which is '')... that's how it looks from the last line printed out in the trace:
when I grab the string keys from the headers and build datastore Keys out of them, it appears the ancestor has the root pipeline ID in it, but that ID doesn't exist in the current namespace (even in the console if you open the detail barrier record, you can't click on the ancestor because it doesn't exist) guessing, as you explained how you guys normally use the namespaces, that the Key path is just getting generated with a list of kind/id assuming they are all in the same namespace.... but in my case the top-level kind/id isn't... so it makes that key technically invalid I can probably re-engineer our stuff to kick off a pipeline per namespace... but let me know if you think you can get it working! Thanks! |
Obviously, longterm: you don't want to kick off one pipeline per namespace permanently, you lose the major benefits of pipeline fan-out abortion and success if you aren't yielding child pipelines. You are also correct about the default namespace - it would be the namespace of the process that created the task, for any of the pipeline tasks A much more comfortable solution, now that I'm seeing a larger scope here, would involve keeping ALL pipeline entities in the namespace of the root pipeline. Then, if you need namespace switching, you would explicitly achieve this per pipeline. This also simplifies the answer to "How do I find a pipeline with a given One paradigm I use a lot is class inheritance for pipelines with a common setup function, to prepare for any common variables. This is good for your larger pipeline chains, like I believe you may have. class MyRootPipeline(Pipeline):
def setup(self, **kwargs):
"""Perform setup for MyRootPipeline and all derivative pipes."""
# Get pipeline information, and do setup.
self.namespace = kwargs.get('namespace', None)
self.kwargs = kwargs.copy() # changes to kwargs shouldn't affect local copy
if self.namespace:
# Sets the namespace for the current HTTP request.
namespace_manager.set_namespace(self.namespace)
def run(self, **kwargs):
# Do your setup
self.setup(**kwargs)
# Do stuff
# Yield child with same kwargs, and any additional args.
kwargs['namespace'] = "other_namespace"
yield ChildPipeline(child_wants_candy=True, **kwargs)
class ChildPipeline(MyRootPipeline):
"""Subclassed from MyRootPipeline for common setup procedure."""
def run(self, child_wants_candy, **kwargs):
# Performs setup, switches namespace, ...
self.setup(**kwargs)
# Stuff and things in the new namespace. This won't work now, until the pipeline knows to explicitly use the namespace for the yielded child pipeline and callback tasks, etc. - but that's the support I'd consider targeting for this issue. I hope this all makes sense!!! 🐻 |
Funny... that's pretty much exactly our pipeline subclass paradigm as well
as for our existing structure, we don't need to run a single pipeline across namespaces, we just had it setup that way. we could definitely restructure it to be one per namespace... but as you stated, long term, it would be better to not have to think about it... and it would indeed be easier to figure out where the pipeline records are (and clear out the old ones) if they were always stored in the default namespace So yeah, this all makes sense... and I really appreciate your input! |
The root of this particular pipeline starts out in the default namespace, then it's immediate children change to a different namespace each (one for each of our customers), and those continue to spawn other pipeline children within that namespace. Something is happening when the pipeline is trying to finalize and roll up to the top... looks like
slot_key
being used as ancestor in BarrierIndex query has the default namespace ("" from our root pipeline) but the BarrierIndex query is using namespace from the child pipeline ("1" because new tasks, by default, take on the namespace of the task that spawned them)... and thus seeing BadArgumentErrorThis is my best guess at what is happening... and I don't understand the details of the code in order to try and render a fix... tho I'm going to try...
The text was updated successfully, but these errors were encountered: