Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apoc.create.virtual.fromNode should wrap the existing node while keeping it's node-id and elementId but it doesn't #570

Open
jexp opened this issue Jan 12, 2024 · 4 comments

Comments

@jexp
Copy link
Member

jexp commented Jan 12, 2024

e.g. apoc.create.virtual.fromNode(movie, ["title","released","url"])

returns virtual nodes with negative ids but it shouldn't in this case.

Seems like an implementation bug that somehow crept in.

Need it for excluding node embeddings and large text properties for returning in neo4j tools

@Lojjs
Copy link
Contributor

Lojjs commented Jan 16, 2024

@jexp Do you want us in Cypher surface to look into this? In that case could you provide us with a setup so we can reproduce?

@jexp
Copy link
Member Author

jexp commented Jan 16, 2024

you can test it on https://demo.neo4jlabs.com:7473/browser/
username/password/database is recommendations

The original idea was to have an easy way to select which properties to return from a node to a client UI while keeping the functionality that you have when returning regular nodes.
This is meant to exclude heavy or irrelevant properties like large texts, embeddings, large number of properties and focus on the relevant properties for captions and querying so that graphs can be visualized without overloading the UI.

neo4j-contrib/neo4j-apoc-procedures#148

So wrapping a node .i.e. keeping it's identity and then limiting which properties are available.

As I recently discovered the "keeping the identity" part was not implemented, so that the wrapped nodes are not tied to the original node or its relationships, which would have to be awkwardly reconstructed.

Performance comparison for the query below, which returns roughly 34 nodes and 37 rels is 10 seconds vs 10ms for the approach excluding the huge properties.
It fetches two embedding and one large text property per node.

// 10s+
call db.index.fulltext.queryNodes("movieFulltext","Forrest Gump", {limit:1}) yield node as n, score as s1
call db.index.vector.queryNodes("moviePlotsEmbedding",5, n.plotEmbedding) yield node as movie, score
match (person:Person)-[r1]->(movie)-[r2:IN_GENRE]->(genre)
return *

Here is the example of how it should have worked:

// doesn't work, nodes are disconnected from relationships
call db.index.fulltext.queryNodes("movieFulltext","Forrest Gump", {limit:1}) yield node as n, score as s1
call db.index.vector.queryNodes("moviePlotsEmbedding",5, n.plotEmbedding) yield node as movie, score
match (person:Person)-[r1]->(movie)-[r2:IN_GENRE]->(genre)

return genre, r1, r2, apoc.create.virtual.fromNode(movie, ["title","released","url"]) as movie,
 apoc.create.virtual.fromNode(person, ["name","born","url"]) as person

and what you actually have to do now to make it work

// 10ms
call db.index.fulltext.queryNodes("movieFulltext","Forrest Gump", {limit:1}) yield node as n, score as s1
call db.index.vector.queryNodes("moviePlotsEmbedding",5, n.plotEmbedding) yield node as movie, score
match path = (person:Person)-[rp]->(movie)-[rg:IN_GENRE]->(genre)

with collect(path) as paths
call apoc.graph.fromPaths(paths,"results",{}) yield graph
with graph.nodes as nodes, graph.relationships as rels
with rels, apoc.map.fromPairs([n in nodes | [coalesce(n.tmdbId, n.name), apoc.create.vNode(labels(n),n {.*, plotEmbedding:null, posterEmbedding:null, plot:null, bio:null })]]) as nodes
return nodes, [r in rels | apoc.create.vRelationship(nodes[coalesce(startNode(r).tmdbId,startNode(r).name)], type(r), properties(r), nodes[coalesce(endNode(r).tmdbId,endNode(r).name)])] as rels

@Lojjs
Copy link
Contributor

Lojjs commented Jan 17, 2024

@gem-neo4j did some digging for me and it seems like this was implemented with positive node ids earlier but changed because of a bug where changes to the virtual node were updating the real one: neo4j-contrib/neo4j-apoc-procedures#2252

@jexp
Copy link
Member Author

jexp commented Jan 17, 2024

Yeah but that was intentional that it is a wrapper of the real node so all operations on it should pass through.
Perhaps the naming with "virtual" was not that good it should have been "wrapped".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants