Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset delete does not work #57

Open
mmmenno opened this issue Dec 14, 2015 · 16 comments
Open

Dataset delete does not work #57

mmmenno opened this issue Dec 14, 2015 · 16 comments
Assignees

Comments

@mmmenno
Copy link

mmmenno commented Dec 14, 2015

Someone accidentally uploaded the rm dataset with wrong id's, i think. Suddenly there where id's like rm/8007.0 instead of the rm/8007 it should have been.

Deleting the entire dataset and upload it again with the proper ideas should do the trick, one imagines. Not so. While https://api.histograph.io/datasets/rm/pits gives you a "Dataset 'rm' not found", https://api.histograph.io/search?q=rm/8007.0 keeps returning pits.

Maybe deleting pits / relations from ES doesn't work? Very strange & a bit alarming. Any ideas, @ocataco ?

@bertspaan
Copy link
Member

This is caused by a known bug graphmalizer/graphmalizer-core#18.

Graphmalizer's Cypher queries need to be changed, so that these VACANT nodes are deleted.

There is an easy fix which would probably also work:

Change https://github.com/histograph/neo4j-plugin and make sure it only returns non-VACANT nodes. This can be done here, probably: https://github.com/histograph/neo4j-plugin/blob/master/src/main/java/org/waag/histograph/plugins/ExpandConcepts.java#L33

@wires
Copy link
Contributor

wires commented Jan 3, 2016

sorry guys, missed this issue! I'll schedule some time to dive into graphmalizer/graphmalizer-core#18 again

@sbocconi sbocconi added the ready label Jan 14, 2016
@sbocconi
Copy link

There seems to be a solution for this issue, as suggested by @bertspaan. Issue labeled as ready (to be solved) then.

@tomdemeyer
Copy link

VACANT nodes are created on delete? This seems not ok; how to fix?
Or is this a separate issue??

import dataset pdc:
match(n {dataset: "pdc" } ) return count(n)
--> 5402
match(n:_VACANT {dataset: "pdc" } ) return count(n)
--> 0
after delete dataset:
match(n {dataset: "pdc" } ) return count(n)
--> 1707
match(n:_VACANT {dataset: "pdc" } ) return count(n)
--> 1707

@wires
Copy link
Contributor

wires commented Jan 29, 2016

probably related, didn't have time to look at this, somewhere in queries.yml I forget the clear the vacant nodes)

@sbocconi
Copy link

sbocconi commented Feb 1, 2016

VACANT: Betekent dit: als je relatie toevoegt tussen A en B, maar PIT’s A en B bestaan (nog) niet, wordt de relatie toegevoegd, en A en B ook, maar A en B zijn dan VACANT

There are several issues here, but can we say that adding a dataset and then removing it should leave the repository in an unchanged status? Or is this not true? This can be part of the tests

@sbocconi
Copy link

sbocconi commented Feb 1, 2016

Another comment from @wires on Slack:

volgens mij is het de situatie [+] b -> c ; [+] a == b ; [+] c == d ; [-] b -> c die problemen oplevert
dus met [+] … bedoel ik toevoegen

@mmmenno
Copy link
Author

mmmenno commented Feb 1, 2016

I only experienced this problem with the faulty rm dataset that had id's like rm/1641.0 - so maybe it had something to do with those strange id's. @tomdemeyer deleted the faulty set manually, so for now we got rid of the problem.

With the VACANT node diagnosis, we should've seen this problem more often and we should be able to reproduce the problem, right?

@sbocconi
Copy link

sbocconi commented Feb 3, 2016

From a discussion with @bertspaan:

Drie problemen:

1. IO verwijdert ES-index niet, omdat dit problemen opleverde met ES-service van Amazon (als je index verwijdert die al verwijderd is maakt ES de index opnieuw aan, maar met verkeerde mapping. Of iets dergelijks)
2. neo4j-plugin grijpt ​*alle*​ PITs uit graaf, ook met label `VACANT`
3. Graphmalizer verwijdert `VACANT` niet (of niet altijd) op de goede manier

Number 3 is related to graphmalizer/graphmalizer-core#18

@wires
Copy link
Contributor

wires commented Feb 4, 2016

Regarding nr 3 above / issue 18 ; the guarantee graphmalizer gives you (when it's not bugged) is that

there will never appear a "dangling vacant node", that is, for all (n:_VACANT), degree(n) > 0.

So they should only appear when there is some edge referring to them.

What we notice in #18 is that there appears to arise a situation here a single _VACANT node remains connected to a 'concept node' (the representative node of an equivalence class; the collection of equivalent things).

In any case, I can run the tests again, so I will dive into this.

ps. It is a little bit subtle as I try to remain some "mathematical" properties of the program. For instance, we -kind of- have the following guarantee on concept nodes:

there will never appear "a dangling concept node", that is, for all (n:'='), degree(n) > 1

This can be either useful or annoying or both or neither :-) it also doesn't exclude the case ()--<>--() where you have two "dangling vacant nodes" connected to a concept node. This shouldn't be allowed, etc.

anyway, enough talk! COMPUTEREN!

@tomdemeyer
Copy link

What we notice in #18 is that there appears to arise a situation here a single _VACANT node remains connected to a 'concept node' (the representative node of an equivalence class; the collection of equivalent things).

This is exactly the case; 100% repeatable by just deleting a dataset.
For me it is not clear if the connected concept node is ever also still connected to other nodes.

I am now manually (in neo4j) deleting the vacant nodes and relations (of a particular dataset), leaving the concept nodes alone.
Will investigate further.

Tom Demeyer
Waag Society
http://waag.org

@wires
Copy link
Contributor

wires commented Feb 5, 2016

Reproducible you say? I'm having trouble with reproducing it... @tomdemeyer @mmmenno could you send me this rm dataset? I started rewriting the cypher queries that make up graphmalizer and trying to speed everything up as well...

@tomdemeyer
Copy link

I was reproducing it with a TNL dataset a couple of times.
Can send, but your schema’s & everything don’t match.

Tom Demeyer
Waag Society
http://waag.org

On 05 Feb 2016, at 10:59, Jelle Herold [email protected] wrote:

Reproducible you say? I'm having trouble with reproducing it... @tomdemeyer @mmmenno could you send me this rm dataset? I started rewriting the cypher queries that make up graphmalizer and trying to speed everything up as well...


Reply to this email directly or view it on GitHub.

@wires
Copy link
Contributor

wires commented Feb 5, 2016 via email

@tomdemeyer
Copy link

I just tested with a small dataset, deletes everything a-ok.
pff.

except, of course, the ES index…

Tom Demeyer
Waag Society
http://waag.org

On 05 Feb 2016, at 11:16, Jelle Herold [email protected] wrote:

If you have a small dataset that does the job, the smaller the better, I’m interested!
Types and schema’s don’t matter; I will turn the dataset into a test

Reply to this email directly or view it on GitHub.

@mmmenno
Copy link
Author

mmmenno commented Feb 5, 2016

@wires you could use (a small subset of) https://api.histograph.io/datasets/rm/pits. To really reproduce this issue, change the id's from id to id.0. O, and please leave the rm dataset on production as it is now, since I'll be giving a presentation at RCE - proud owners of rm - next wednesday!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants