-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query builder seems unable to produce a fully working left join #658
Comments
From what you describe this would indeed seem to be a bug in the QueryConverter class. A first step would be to add a functional test for this scenario and get the fix in before the upcoming 1.3 release. |
I added a very simple example of the scenario. It uses the existing joins functional test to demonstrate that we have a The actual query generated is currently: SELECT * FROM [nt:unstructured] AS a
LEFT OUTER JOIN [nt:unstructured] AS b
ON a.username=b.username
WHERE ((a.username = 'anonymous'
AND (a.[phpcr:class] = 'Doctrine\Tests\Models\CMS\CmsUser' OR a.[phpcr:classparents] = 'Doctrine\Tests\Models\CMS\CmsUser'))
AND (b.[phpcr:class] = 'Doctrine\Tests\Models\CMS\CmsAuditItem' OR b.[phpcr:classparents] = 'Doctrine\Tests\Models\CMS\CmsAuditItem'))
ORDER BY a.username ASC I think it should be something like: SELECT a.* FROM [nt:unstructured] AS a
LEFT OUTER JOIN [nt:unstructured] AS b
ON a.username=b.username
WHERE ((a.username = 'anonymous'
AND (a.[phpcr:class] = 'Doctrine\Tests\Models\CMS\CmsUser' OR a.[phpcr:classparents] = 'Doctrine\Tests\Models\CMS\CmsUser'))
AND (b.[phpcr:class] = 'Doctrine\Tests\Models\CMS\CmsAuditItem' OR b.[phpcr:classparents] = 'Doctrine\Tests\Models\CMS\CmsAuditItem' OR b.[phpcr:class] IS NULL))
ORDER BY a.username ASC This one would actually work with a
That of course should be another test, but I wanted to illustrate this with something simple to confirm it is actually a bug. |
I was able to make a few changes to the The obvious solution is to use a Let me know if this could be more clear; I also posed the JCR-SQL2 question on StackOverflow. |
Thanks for the effort. Using the But I will also try and look at this and see if we can't find a workaround here when I have some spare time. |
…g its class or classparents in the query -- doctrine#658.
If you could review my most recent commit above, I'd be curious to hear if this could be a valid strategy that could be a candidate for a PR. What this does is add a third optional parameter to the This solves the problem I was struggling with in my own code base by allowing such a join, provided I migrate the affected nodes to a new type. This could still use a few more tests and a documentation update, which I'm happy to provide if this is an acceptable idea. |
thanks for working on this @sarcher ! i do think this is a bug and not intended. does the "The query that I actually want is this:" work as it should? if so, to solve the original problem, we would need the query converter to handle left (and right?) join specifically by adding the prefix to SELECT and make the pd.class pd.classparent in a way that its also allowed to not exist. something like if we can manage this, it probably makes more sense than adding a new type of query. though using types could be an interesting thing regardless of this bug. |
That is actually the problem; if the query includes the That was definitely my first approach though, just to see if we could add the I was motivated to make something work in the short term to unblock my project, which is now working with the commit above, though I am certainly interested in a long term/non-fork solution. Maybe if it is a Jackalope problem, we can go back to simply adding the |
well, then lets wrap up the case for the select by type first if that is
already working for you.
|
Instead of adding a parameter, could we not automatically use the PHPCR primary type IF it has been defined? I can't really think of any reason why that wouldn't be the optimal strategy in most if not all cases. |
My thought around that was that there are two problems:
As it relates to my use case, I am fine with either direction and could implement either one. Thanks for helping me find a solution here! |
Reading metadata will/should not incur a significant overhead, metadata is cached and we read from it hundreds of times per-request. Also we could cache queries. For the second point, the current behavior is wrong, so it's a bug fix - I would be OK with this /cc @dbu On 19 September 2015 03:57:17 BST, Shane Archer [email protected] wrote:
Sent from my Android device with K-9 Mail. Please excuse my brevity. |
i am not sure if the overhead is that small. reading metadata of a loaded document is easy. but finding the metadata of all mapped documents could be more work and might not even be possible, i am not sure. doing it for any specific type sounds dangerous. one option instead of an argument could be to add something to the metadata instead of on query. then you only do this once and gain performance with it in queries. (it could be used not only for join but for all selects) regarding BC breaks: every bugfix of something that was behaving incorrectly has the potential to screw up somebody who relied on the incorrect behaviour. doing such a fix in a minor version, with a note in the changelog seems good to me. if they wanted a full join they should not have used a left join, sorry. |
Why would we need to load all the metadatas? We only need the metadata for documents being selected from in the query (if I have understood), and this metadata is already loaded for each joined source (I believe from a quick scan of the code). My opinion here is that adding "workarounds" to the interface (API or metadata) is bad, but would prefer the explicit metadata approach to the API change and would prefer the implicit "workaround" with the metadata to the explicit metadata approach (if it has no significant performance penalty). And finally I would prefer that it just worked to all of the above :) |
Why would we need to load all the metadatas?
if we want to know whether a phpcr type is unique, we need to check all
documents to see if any other uses the same type.
|
Yea, that is correct. If we just looked to see "does this have a type other than unstructured" then it would lead to some very weird and bad situations if somebody happened to map two documents to the same type (which is otherwise a very possible scenario). Perhaps the best solution would be to add a new class metadata property, calculated at the time class metadata is cached, which indicates whether or not a type is unique in the space of all known, mapped documents. I am not entirely sure how to do this, though, since it looks like metadata is calculated and cached on a class-by-class basis (and that is one area of Doctrine I have not really played around with). That would certainly make the user experience better, since all you need to do is map something to a unique type and then everything just works as expected. |
in production mode, there is a cache warmer that does all the work upfront and looks in all known places for documents. i don't know exactly how, but here it could work. in debug mode, the metadata and proxies are done on the fly, when a specific class is needed. i fear that it could be really expensive as each time any class changed we would have to re-scan everything. or we would need to build some sort of cache that we then use to check which classes/metadata changed since the cache was built. if we ask the user to annotate documents he thinks have a unique type and have a validation command for that information that we use when cache warming and can be triggered manually. that way, the information is available fast and without too much overhead. this would have the added benefit of being explicit that something is supposed to be unique. this will prevent weird issues when somebody copy-pastes a document to do a new document and is confused why his left joins stop working... |
So to summarize, it sounds like we would:
If that is a reasonable solution, I can begin to take a stab at it. |
sounds good to me. great that you want to work on this! and thanks for including the doc in this list, very important ! the validator should be a class that can be used independently, and then we provide a command that triggers this validator. the cache warming (not sure if that is here or in the bundle) then uses a service, not the other command. apart from this your list seems pretty complete to me. adding tests would be nice - ideally also one where validation fails, but that will be interesting to do without breaking all other tests :-) can you create a new issue with this? or directly a pull request. use |
Thanks for the info -- I will create the PR once I have some code laid down, I have one other project that got in front of this but hope to get to it soon. |
great!
|
…g its class or classparents in the query -- doctrine#658.
I squashed everything, rebased, and opened #667 with the changes to this repository. Will move forward with the other work in |
Work is now complete in #667 and ready for review. Thanks for the guidance! |
with #667, it is now possible to outer join based on the node type, when using unique node types. with the fixes on jackalope, i think the OR NULL should now work as well: #658 (comment) - the exception you describe is exactly what we get rid of in jackalope/jackalope-jackrabbit#127 / jackalope/jackalope-doctrine-dbal#308 (see the test phpcr/phpcr-api-tests#174 that is now working) if you could have a look at that, would be awesome! then outer join would work even without the unique node types. |
I am attempting to use a
LEFT JOIN
behavior on child documents, but it seems to be impossible based on what I've seen.I have the following situation:
Product
ProductDetail
Take the following query builder as an example:
Now, PHPCR-ODM produces a JCR-SQL2 query like:
This all works as expected, if and only if the
Product
nodes contain at least oneProductDetail
child -- in other words, it seems to behave as anINNER JOIN
. AnyProduct
node that has no such child will not be included in the results of the query, because of two things:SELECT *
rather thanSELECT p.*
, which is requiring a result for thepd
aliasAND
clause at the end, which will only match if the aliaspd
is present and of the appropriate classThe query that I actually want is this:
This works as one would expect a
LEFT JOIN
to, delivering results whose node names match "12345" regardless of whether or not they contain one or moreProductDetail
children.What I cannot figure out is if it is possible to get to my desired query; is it? This seems to be a bug, but perhaps I am misunderstanding the intent here.
If this is intended, is there a workaround to achieve what I am trying to do? I realize I could just pass the JCR-SQL2 query to the session (my environment uses Jackrabbit) and then manually invoke hydration, but that seems a bit ugly. :)
The text was updated successfully, but these errors were encountered: