Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtree matching, not all operations are returned #53

Open
BramVanroy opened this issue Jan 27, 2020 · 5 comments
Open

Subtree matching, not all operations are returned #53

BramVanroy opened this issue Jan 27, 2020 · 5 comments

Comments

@BramVanroy
Copy link

BramVanroy commented Jan 27, 2020

I'm not sure whether this is expected behaviour or not. I have two trees. One tree is basically the subtree of another where some nodes are dropped (B, C, E) and where one node has shifted a level(L).

from zss import simple_distance, Node

OPERATIONS = {
    0: 'remove',
    1: 'insert',
    2: 'update',
    3: 'match'
}

A = (
    Node("A")
        .addkid(Node("B"))
        .addkid(Node("C"))
        .addkid(Node("D")
            .addkid(Node("E"))
            .addkid(Node("F"))
            .addkid(Node("G")
                .addkid(Node("H"))
                .addkid(Node("I")))
            .addkid(Node("J")
                .addkid(Node("K"))))
        .addkid(Node("L"))
    )
B = (
    Node("D")
        .addkid(Node("F"))
        .addkid(Node("G")
            .addkid(Node("H"))
            .addkid(Node("I")))
        .addkid(Node("J")
            .addkid(Node("K")))
        .addkid(Node("L"))
    )


dist, opts = simple_distance(A, B, return_operations=True)
print(dist)
# expected to have the same number operations as the max number of items in the tree
print(len(opts))
for opt in opts:
    s = OPERATIONS[opt.type]
    if opt.arg1 is not None:
        s += f"\t{opt.arg1.label}"
    if opt.arg2 is not None:
        s += f"\t{opt.arg2.label}"
    print(s)

Printed output:

5.0
10
remove	E
match	F	F
match	H	H
match	I	I
match	G	G
match	K	K
match	J	J
remove	D
match	L	L
update	A	D

When looking at the operations, though, I don't see any operations done on the labels 'B' and 'C'. Is that specific to the algorithm or the implementation?

As a comparison, here is the output of the same tree comparison with APTED:

from apted import APTED, helpers

src = "{A{B}{C}{D{E}{F}{G{H}{I}}{J{K}}}{L}}"
tgt = "{D{F}{G{H}{I}}{J{K}}{L}}"

tree1 = helpers.Tree.from_text(src)
tree2 = helpers.Tree.from_text(tgt)

apted = APTED(tree1, tree2)
ted = apted.compute_edit_distance()
maps = apted.compute_edit_mapping()

print(f"distance: {ted}")

for edit_map in maps:
    print(' -> '.join(map(str, edit_map)))

Output of apted:

distance: 5
{A{B}{C}{D{E}{F}{G{H}{I}}{J{K}}}{L}} -> {D{F}{G{H}{I}}{J{K}}{L}}
{D{E}{F}{G{H}{I}}{J{K}}} -> None
{E} -> None
{C} -> None
{B} -> None
{F} -> {F}
{G{H}{I}} -> {G{H}{I}}
{H} -> {H}
{I} -> {I}
{J{K}} -> {J{K}}
{K} -> {K}
{L} -> {L}

Thanks in advance!

@BramVanroy
Copy link
Author

There was an error in my initial code of the APTED example. Now they do behave the same. The only difference seems to be that zss doesn't return all operations, unless I am missing something.

@BramVanroy BramVanroy changed the title Subtree matching, nodes are ignored Subtree matching, not all operations are returned Jan 27, 2020
@timtadh
Copy link
Owner

timtadh commented Jan 29, 2020

The operations code was a external feature contribution (and I have to admit I haven't used it). There could very well be a bug in it -- and it certainly looks like B and D should have been removed. Since the distance being returned is correct my guess is the operations mapping is not correct.

@BramVanroy
Copy link
Author

The operations code was a external feature contribution (and I have to admit I haven't used it). There could very well be a bug in it -- and it certainly looks like B and D should have been removed. Since the distance being returned is correct my guess is the operations mapping is not correct.

I think so, too. Unfortunately I don't have the time to look further into this. As per your suggestion over email, I will use APTED in the future.

@illeatmyhat
Copy link

illeatmyhat commented Oct 4, 2024

I ran into this problem earlier. Even simple trees seem to fail, for example

from zss import Node, simple_distance
bar = Node("f")
foo = Node("f")\
    .addkid(Node("a"))\
    .addkid(Node("b"))\
    .addkid(Node("c"))\
    .addkid(Node("d"))
print(simple_distance(bar, foo, return_operations=True))
# (4.0, [<Operation Insert>, <Operation Match>])

@illeatmyhat
Copy link

Not really sure what the problem is or how to fix it, but comparing to implementations in other languages, there's a step that this library doesn't perform here:
https://github.com/kaby76/ZhangShashaCSharp/blob/main/Tree.cs#L295-L296
but fd and partial_ops can't be indexed with i and j so can't be done here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants