Dataset delete does not work #57

mmmenno · 2015-12-14T13:40:00Z

Someone accidentally uploaded the rm dataset with wrong id's, i think. Suddenly there where id's like rm/8007.0 instead of the rm/8007 it should have been.

Deleting the entire dataset and upload it again with the proper ideas should do the trick, one imagines. Not so. While https://api.histograph.io/datasets/rm/pits gives you a "Dataset 'rm' not found", https://api.histograph.io/search?q=rm/8007.0 keeps returning pits.

Maybe deleting pits / relations from ES doesn't work? Very strange & a bit alarming. Any ideas, @ocataco ?

bertspaan · 2015-12-14T15:24:38Z

This is caused by a known bug graphmalizer/graphmalizer-core#18.

Graphmalizer's Cypher queries need to be changed, so that these VACANT nodes are deleted.

There is an easy fix which would probably also work:

Change https://github.com/histograph/neo4j-plugin and make sure it only returns non-VACANT nodes. This can be done here, probably: https://github.com/histograph/neo4j-plugin/blob/master/src/main/java/org/waag/histograph/plugins/ExpandConcepts.java#L33

wires · 2016-01-03T20:52:36Z

sorry guys, missed this issue! I'll schedule some time to dive into graphmalizer/graphmalizer-core#18 again

sbocconi · 2016-01-14T15:25:21Z

There seems to be a solution for this issue, as suggested by @bertspaan. Issue labeled as ready (to be solved) then.

tomdemeyer · 2016-01-29T11:54:30Z

VACANT nodes are created on delete? This seems not ok; how to fix?
Or is this a separate issue??

import dataset pdc:
match(n {dataset: "pdc" } ) return count(n)
--> 5402
match(n:_VACANT {dataset: "pdc" } ) return count(n)
--> 0
after delete dataset:
match(n {dataset: "pdc" } ) return count(n)
--> 1707
match(n:_VACANT {dataset: "pdc" } ) return count(n)
--> 1707

wires · 2016-01-29T14:56:09Z

probably related, didn't have time to look at this, somewhere in queries.yml I forget the clear the vacant nodes)

sbocconi · 2016-02-01T10:26:24Z

VACANT: Betekent dit: als je relatie toevoegt tussen A en B, maar PIT’s A en B bestaan (nog) niet, wordt de relatie toegevoegd, en A en B ook, maar A en B zijn dan VACANT

There are several issues here, but can we say that adding a dataset and then removing it should leave the repository in an unchanged status? Or is this not true? This can be part of the tests

sbocconi · 2016-02-01T10:35:42Z

Another comment from @wires on Slack:

volgens mij is het de situatie [+] b -> c ; [+] a == b ; [+] c == d ; [-] b -> c die problemen oplevert
dus met [+] … bedoel ik toevoegen

mmmenno · 2016-02-01T11:16:50Z

I only experienced this problem with the faulty rm dataset that had id's like rm/1641.0 - so maybe it had something to do with those strange id's. @tomdemeyer deleted the faulty set manually, so for now we got rid of the problem.

With the VACANT node diagnosis, we should've seen this problem more often and we should be able to reproduce the problem, right?

sbocconi · 2016-02-03T16:31:59Z

From a discussion with @bertspaan:

Drie problemen:

1. IO verwijdert ES-index niet, omdat dit problemen opleverde met ES-service van Amazon (als je index verwijdert die al verwijderd is maakt ES de index opnieuw aan, maar met verkeerde mapping. Of iets dergelijks)
2. neo4j-plugin grijpt *alle* PITs uit graaf, ook met label `VACANT`
3. Graphmalizer verwijdert `VACANT` niet (of niet altijd) op de goede manier

Number 3 is related to graphmalizer/graphmalizer-core#18

wires · 2016-02-04T10:37:17Z

Regarding nr 3 above / issue 18 ; the guarantee graphmalizer gives you (when it's not bugged) is that

there will never appear a "dangling vacant node", that is, for all (n:_VACANT), degree(n) > 0.

So they should only appear when there is some edge referring to them.

What we notice in #18 is that there appears to arise a situation here a single _VACANT node remains connected to a 'concept node' (the representative node of an equivalence class; the collection of equivalent things).

In any case, I can run the tests again, so I will dive into this.

ps. It is a little bit subtle as I try to remain some "mathematical" properties of the program. For instance, we -kind of- have the following guarantee on concept nodes:

there will never appear "a dangling concept node", that is, for all (n:'='), degree(n) > 1

This can be either useful or annoying or both or neither :-) it also doesn't exclude the case ()--<>--() where you have two "dangling vacant nodes" connected to a concept node. This shouldn't be allowed, etc.

anyway, enough talk! COMPUTEREN!

tomdemeyer · 2016-02-05T09:36:20Z

What we notice in #18 is that there appears to arise a situation here a single _VACANT node remains connected to a 'concept node' (the representative node of an equivalence class; the collection of equivalent things).

This is exactly the case; 100% repeatable by just deleting a dataset.
For me it is not clear if the connected concept node is ever also still connected to other nodes.

I am now manually (in neo4j) deleting the vacant nodes and relations (of a particular dataset), leaving the concept nodes alone.
Will investigate further.

Tom Demeyer
Waag Society
http://waag.org

wires · 2016-02-05T09:59:54Z

Reproducible you say? I'm having trouble with reproducing it... @tomdemeyer @mmmenno could you send me this rm dataset? I started rewriting the cypher queries that make up graphmalizer and trying to speed everything up as well...

tomdemeyer · 2016-02-05T10:02:22Z

I was reproducing it with a TNL dataset a couple of times.
Can send, but your schema’s & everything don’t match.

Tom Demeyer
Waag Society
http://waag.org

On 05 Feb 2016, at 10:59, Jelle Herold [email protected] wrote:

Reproducible you say? I'm having trouble with reproducing it... @tomdemeyer @mmmenno could you send me this rm dataset? I started rewriting the cypher queries that make up graphmalizer and trying to speed everything up as well...

—
Reply to this email directly or view it on GitHub.

wires · 2016-02-05T10:16:03Z

If you have a small dataset that does the job, the smaller the better, I’m interested! Types and schema’s don’t matter; I will turn the dataset into a test

tomdemeyer · 2016-02-05T10:50:13Z

I just tested with a small dataset, deletes everything a-ok.
pff.

except, of course, the ES index…

Tom Demeyer
Waag Society
http://waag.org

On 05 Feb 2016, at 11:16, Jelle Herold [email protected] wrote:

If you have a small dataset that does the job, the smaller the better, I’m interested!
Types and schema’s don’t matter; I will turn the dataset into a test
—
Reply to this email directly or view it on GitHub.

mmmenno · 2016-02-05T11:54:25Z

@wires you could use (a small subset of) https://api.histograph.io/datasets/rm/pits. To really reproduce this issue, change the id's from id to id.0. O, and please leave the rm dataset on production as it is now, since I'll be giving a presentation at RCE - proud owners of rm - next wednesday!

sbocconi added the ready label Jan 14, 2016

mmmenno mentioned this issue Jan 15, 2016

Pop from queue tail, instead of from head? #61

Open

jobspierings added the bug label Jan 15, 2016

wires self-assigned this Jan 15, 2016

jobspierings assigned sbocconi and unassigned wires Jan 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset delete does not work #57

Dataset delete does not work #57

mmmenno commented Dec 14, 2015

bertspaan commented Dec 14, 2015

wires commented Jan 3, 2016

sbocconi commented Jan 14, 2016

tomdemeyer commented Jan 29, 2016

wires commented Jan 29, 2016

sbocconi commented Feb 1, 2016

sbocconi commented Feb 1, 2016

mmmenno commented Feb 1, 2016

sbocconi commented Feb 3, 2016

wires commented Feb 4, 2016

tomdemeyer commented Feb 5, 2016

wires commented Feb 5, 2016

tomdemeyer commented Feb 5, 2016

wires commented Feb 5, 2016 via email

tomdemeyer commented Feb 5, 2016

mmmenno commented Feb 5, 2016

Dataset delete does not work #57

Dataset delete does not work #57

Comments

mmmenno commented Dec 14, 2015

bertspaan commented Dec 14, 2015

wires commented Jan 3, 2016

sbocconi commented Jan 14, 2016

tomdemeyer commented Jan 29, 2016

wires commented Jan 29, 2016

sbocconi commented Feb 1, 2016

sbocconi commented Feb 1, 2016

mmmenno commented Feb 1, 2016

sbocconi commented Feb 3, 2016

wires commented Feb 4, 2016

tomdemeyer commented Feb 5, 2016

wires commented Feb 5, 2016

tomdemeyer commented Feb 5, 2016

wires commented Feb 5, 2016 via email

tomdemeyer commented Feb 5, 2016

mmmenno commented Feb 5, 2016