Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How best to return and view actual diffs? #121

Closed
arkadianriver opened this issue Oct 16, 2023 · 2 comments
Closed

How best to return and view actual diffs? #121

arkadianriver opened this issue Oct 16, 2023 · 2 comments

Comments

@arkadianriver
Copy link

arkadianriver commented Oct 16, 2023

The doc says that if the patch listing doesn't successfully recreate the doc it's a bug, but it also says there's no guarantee on consistently arriving there the same way each time. How can I best use the resulting listing for comparison? I found when I removed a leaf node, and compared before and after, instead of a simple DeleteNode, the patch listing first moved a second level node in an unexpected way, causing the need to rebuild the entire tree with moves, inserts, and updates all over the place. It recreated the doc beautifully, but what can I do with that edit list to be able to easily see what the differences are between the before and after?

My goal is to capture when text nodes are updated or attributes are changed, and to insert an attribute to indicate the change in the result.. a revised="yes" of sorts. With the behavior I described it doesn't seem that's possible because if the entire tree might be recreated, searching for UpdateTextIns will yield far more updates than necessary.

Thanks for any help.

@regebro
Copy link
Contributor

regebro commented Oct 16, 2023

It's a common problem when making test data that your nodes are too similar to each other, and that xmldiff can't tell the difference between you moving a node or changing it. Real XML data is usually more complex and doesn't have that problem as much. Trying out different modes to see what best fits your data is helpful there.

The xmldiff xml-formatter does something seminar to what you want, it adds information to the leaves on the actions, like diff:rename="br" showing that this node was called "br" in the first file, and diff:insert="" if it's new and diff:delete="" if it was removed.

@regebro regebro closed this as completed Jan 4, 2024
@arkadianriver
Copy link
Author

Sorry for not getting back to you sooner. Not sure what you meant by test data, I was diffing between an old and new version of a file that was edited in a way an author might edit it. I think I just deleted a paragraph from a section.

Anyway, I took a look at the xml-formatter output and its results are indeed more promising than using the patch listing. It marked changes only in those areas where there actually were changes. Except, I did find, however, that it deleted a node and re-added it. But maybe I can try to detect those instances somehow and see if I can ignore them.

e.g.

In the context of the entire file diff, the xml-formatted result was this. Notice that the <myValue> node is the same but was deleted and re-inserted. I double-checked and the whitespace is the same in both files.

            <myItem rev="v1.1" diff:add-attr="rev">
                <myCrit>
                    <p>Lorem ipsum</p>
                </myCrit>
                <myValue diff:delete="">&gt; <myLabel
                        keyref="KEY_TWO" />
                </myValue>
                <myValue diff:insert="">&gt; <myLabel
                        keyref="KEY_TWO" diff:insert="" />
                </myValue>
            </myItem>

But, if I were to pull out just that section with only the added rev, the xml-formatted diff works just fine.

f1.xml

<myItem>
  <myCrit>
      <p>Lorem ipsum</p>
  </myCrit>
  <myValue>&gt; <myLabel
          keyref="KEY_TWO" />
  </myValue>
</myItem>

f2.xml

<myItem rev="v1.1">
  <myCrit>
      <p>Lorem ipsum</p>
  </myCrit>
  <myValue>&gt; <myLabel
          keyref="KEY_TWO" />
  </myValue>
</myItem>

result

<myItem xmlns:diff="http://namespaces.shoobx.com/diff" rev="v1.1" diff:add-attr="rev">
<myCrit><p>Lorem ipsum</p></myCrit><myValue>&gt; <myLabel keyref="KEY_TWO"/>
  </myValue></myItem>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants