Skip to content
jhgbrt edited this page Oct 4, 2020 · 6 revisions

This page describes the API for the CSharp port of the google-diff-match-patch library. For further examples, see the test harness.

Computing the difference between two pieces of text

Diff.Compute(text1, text2) → diffs

An array of differences is computed which describe the transformation of text1 into text2. Each difference is a Diff object. A Diff object has information on the type of operation (insertion (1), a deletion (-1) or an equality (0)), and the affected text.

var diffs = Diff.Compute("Good dog", "Bad dog") → [(-1, "Goo"), (1, "Ba"), (0, "d dog")]

Despite the large number of optimisations used in this function, diff can take a while to compute. Diff.Compute has an optional timeout parameter to specify how many seconds any diff's exploration phase may take. The default value is 0.0, and disables the timeout and lets diff run until completion. Should diff timeout, the return value will still be a valid difference, though probably non-optimal.

Extension method: OptimizeForReadability() → diffs

This function processes an input list of Diff objects and returns a modified list of Diffs, optimised for readability.

A diff of two unrelated texts can be filled with coincidental matches. For example, the diff of "mouse" and "sofas" is: var diffs = Diff.Compute("mouse", "sofas"); // -> [(-1, "m"), (1, "s"), (0, "o"), (-1, "u"), (1, "fa"), (0, "s"), (-1, "e")]

While this is the optimum diff, it is difficult for humans to understand. The OptimizeForReadability() extension method on a list of Diffs will use semantic cleanup to rewrite the list, expanding it into a more intelligible format. The above example would become:

var diffs = Diff.Compute("mouse", "sofas").OptimizeForReadability(); // -> [(-1, "mouse"), (1, "sofas")]

If a diff is to be human-readable, it should be passed to OptimizeForReadability().

Extension method: OptimizeForMachineProcessing() → diffs

This function uses another approach, optimising for 'efficiency' instead of 'readability'. The results of both cleanup types are often the same.

This efficiency cleanup is based on the observation that a diff made up of large numbers of small diffs edits may take longer to process (in downstream applications) or take more capacity to store or transmit than a smaller number of larger diffs. The diffEditCost parameter sets what the cost of handling a new edit is in terms of handling extra characters in an existing edit. The default value is 4, which means if expanding the length of a diff by three characters can eliminate one edit, then that optimisation will reduce the total costs.

Extension method: Levenshtein(diffs) → int

Given a list of diffs, measure its Levenshtein distance in terms of the number of inserted, deleted or substituted characters. The minimum distance is 0 which means equality, the maximum distance is the length of the longer string.

Extension method: PrettyHtml() → html

Takes a list of diffs and returns a pretty HTML sequence.

Finding matching text patterns

Extension method: [string.]FindBestMatchingIndex(pattern, loc) → location

Given a text to search, a pattern to search for and an expected location in the text near which to find the pattern, return the location which matches closest. The function will search for the best match based on both the number of character errors between the pattern and the potential match, as well as the distance between the expected location and the potential match.

The following example is a classic dilemma. There are two potential matches, one is close to the expected location but contains a one character error, the other is far from the expected location but is exactly the pattern sought after:

"abc12345678901234567890abbc".FindBestMatchingIndex("abc", 26)

Which result is returned (0 or 24) is determined by the MatchSettings parameter, which has a Treshold and a Distance property.

An exact letter match which is 'distance' characters away from the fuzzy location would score as a complete mismatch. For example, a distance of '0' requires the match be at the exact location specified, whereas a threshold of '1000' would require a perfect match to be within 800 characters of the expected location to be found using a 0.8 threshold (see below). The larger MatchSettings.MatchDistance is, the slower this function may take to compute. The match distance defaults to 1000.

Another property is MatchSettings.MatchThreshold which determines the cut-off value for a valid match. If MatchThreshold is closer to 0, the requirements for accuracy increase. If MatchThreshold is closer to 1 then it is more likely that a match will be found. The larger MatchThreshold is, the slower this function may take to compute. This variable defaults to 0.5. If no match is found, the function returns -1.

Patch functions

Patch.Compute(text1, text2) → patches

Computes a set of patches to transform text1 into text2

Patch.FromDiffs(diffs) → patches

Computes a set of patches from a list of diffs.

Patch.Compute(text1, diffs) → patches

Given an input text and a set of diffs, compute a set of patches. This form (text1, diffs) is preferred; use it if you happen to have that data available.

Extension method: PatchList.ToText(patches) → text

Reduces an array of patch objects to a block of text which looks extremely similar to the standard GNU diff/patch format. This text may be stored or transmitted.

Patch.Parse(text) → patches

Parses a block of text (which was presumably created by the Patch.ToText() and returns an array of patch objects.

PatchList.Apply(text1, patches) → [text2, results]

Applies a list of patches to text1. The first element of the return value is the newly patched text. The second element is an array of true/false values indicating which of the patches were successfully applied. [Note that this second element is not too useful since large patches may get broken up internally, resulting in a longer results list than the input with no way to figure out which patch succeeded or failed. A more informative API is in development.]

The previously mentioned MatchSettings (treshold and distance) parameters are used to evaluate patch application on text which does not match exactly. In addition, the PatchSettings.DeleteTreshold property determines how closely the text within a major (~64 character) delete needs to match the expected text. If this treshold is closer to 0, then the deleted text must match the expected text more closely. If it is closer to 1, then the deleted text may contain anything. In most use cases PatchSettings.DeleteTreshold should just be set to the same value as MatchSettings.MatchTreshold.

Hello World

Here's a minimal example of a diff in C#:

using DiffMatchPatch;
using System;
using System.Collections.Generic;

public class Program 
    {
        public static void Main(string[] args) 
        {
            var diffs = Diff.Compute("Hello World.", "Goodbye World.");
            // Result: [(-1, "Hell"), (1, "G"), (0, "o"), (1, "odbye"), (0, " World.")]
            diffs = diffs.OptimizeForReadability();
            // Result: [(-1, "Hello"), (1, "Goodbye"), (0, " World.")]
            foreach (var diff in diffs)
            {
                Console.WriteLine(diff[i]);
            }
        }
    }
}