Intial commit - Release 3.2 Official

nusnlp · Apr 8, 2016 · 2122ffd · 2122ffd
commit 2122ffd
Show file tree

Hide file tree

Showing 14 changed files with 2,330 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
diff --git a/README b/README
@@ -0,0 +1,244 @@
+Release 3.2
+Revision: 22 April 2014
+
+This README file describes the NUS MaxMatch (M^2) scorer.
+Copyright (C) 2013 Daniel Dahlmeier, Hwee Tou Ng, Christian Hadiwinoto,
+and Raymond Hendy Susanto
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+This program is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+If you are using the NUS M^2 scorer in your work, please include a
+citation of the following paper:
+
+Daniel Dahlmeier and Hwee Tou Ng. 2012. Better Evaluation for
+Grammatical Error Correction. In Proceedings of the 2012 Conference of
+the North American Chapter of the Association for Computational
+Linguistics: Human Language Technologies (NAACL 2012).
+
+Any questions regarding the NUS M^2 scorer should be directed to
+Hwee Tou Ng ([email protected]).
+
+
+Contents
+========
+0. Quickstart
+1. Pre-requisites
+2. Using the scorer
+  2.1 System output format
+  2.2 Scorer's gold standard format
+3. Converting the CoNLL-2014 data format
+4. Revisions
+   4.1 Alternative edits
+   4.2 F-beta measure
+   4.3 Handling of insertion edits
+   4.4 Bug fix for scoring against multiple sets of gold edits, and
+       dealing with sequences of insertion/deletion edits
+
+
+0. Quickstart
+=============
+ ./m2scorer [-v] SYSTEM SOURCE_GOLD 
+
+SYSTEM = the system output in sentence-per-line plain text.
+SOURCE_GOLD = the source sentences with gold edits.
+
+
+1. Pre-requisites
+=================
+The following dependencies have to be installed to use the M^2 scorer.
+
++ Python (>= 2.6.4, < 3.0, older versions might work but are not tested)
++ nltk (http://www.nltk.org, needed for sentence splitting) 
+
+
+2. Using the scorer
+===================
+Usage: m2scorer [OPTIONS] SYSTEM SOURCE_GOLD
+where
+ SYSTEM          -   system output, one sentence per line
+ SOURCE_GOLD     -   source sentences with gold token edits
+
+OPTIONS
+  -v    --verbose             -  print verbose output
+  --very_verbose              -  print lots of verbose output
+  --max_unchanged_words N     -  Maximum unchanged words when extracting edits. Default = 2.
+  --ignore_whitespace_casing  -  Ignore edits that only affect whitespace and casing. Default no.
+  --beta                      -  Set the ratio of recall importance against precision. Default = 0.5.
+
+
+2.1 System output format
+========================
+SYSTEM = File that contains the output of the error correction
+system. The sentences should be in tokenized plain text, sentence-per-line
+format.
+
+Format:
+<tokenized system output for sentence 1>
+<tokenized system output for sentence 2>
+ ...
+
+Examples of tokenization:
+-------------------------
+ Original  : He said, "We shouldn't go to the place. It'll kill one of us."
+ Tokenized : He said , " We should n't go to the place . It 'll kill one of us . "
+
+Note: Tokenization in the CoNLL-2014 shared task uses NLTK word tokenizer.
+
+Sample output:
+--------------
+===> system <===
+A cat sat on the mat .
+The Dog .
+
+
+2.2 Scorer's gold standard format
+=================================
+SOURCE_GOLD = source sentences (i.e. input to the error correction
+system) and the gold annotation in TOKEN offsets (starting from zero). 
+
+Format:
+S <tokenized system output for sentence 1>
+A <token start offset> <token end offset>|||<error type>|||<correction1>||<correction2||..||correctionN|||<required>|||<comment>|||<annotator id>
+A <token start offset> <token end offset>|||<error type>|||<correction1>||<correction2||..||correctionN|||<required>|||<comment>|||<annotator id>
+
+S <tokenized system output for sentence 2>
+A <token start offset> <token end offset>|||<error type>|||<correction1>||<correction2||..||correctionN|||<required>|||<comment>|||<annotator id>
+
+
+Notes:
+------
+ - Each source sentence should appear on a single line starting with "S "
+ - Each source sentence is followed by zero or more annotations.
+ - Each annotation is on a separate line starting with "A ".
+ - Sentences are separated by one or more empty lines.
+ - The source sentences need to be tokenized in the same way as the system output.
+ - Start and end offset for annotations are in token offsets (starting from zero).
+ - The gold edits can include one or more possible correction strings. Multiple corrections should be separate by '||'.
+ - The error type, required field, and comment are not used for scoring at the moment. You can put dummy values there.
+ - The annotator ID is used to identify a distinct annotation set by which system edits will be evaluated.
+   - Each distinct annotation set, identified by an annotator ID, is an alternative
+   - If one sentence has multiple annotator IDs, score will be computed for each annotator.
+   - If one of the multiple annotation alternatives is no edit at all, an edit with type 'noop' or with offsets '-1 -1' must be specified.
+   - The final score for the sentence will use the set of edits by an annotation set maximizing the score.
+
+
+Example:
+--------
+===> source_gold <===
+S The cat sat at mat .
+A 3 4|||Prep|||on|||REQUIRED|||-NONE-|||0
+A 4 4|||ArtOrDet|||the||a|||REQUIRED|||-NONE-|||0
+
+S The dog .
+A 1 2|||NN|||dogs|||REQUIRED|||-NONE-|||0
+A -1 -1|||noop|||-NONE-|||-NONE-|||-NONE-|||1
+
+S Giant otters is an apex predator .
+A 2 3|||SVA|||are|||REQUIRED|||-NONE-|||0
+A 3 4|||ArtOrDet|||-NONE-|||REQUIRED|||-NONE-|||0
+A 5 6|||NN|||predators|||REQUIRED|||-NONE-|||0
+A 1 2|||NN|||otter|||REQUIRED|||-NONE-|||1
+
+
+
+===> system <===
+A cat sat on the mat .
+The dog .
+Giant otters are apex predator .
+
+./m2scorer  system source_gold 
+Precision   : 0.8
+Recall      : 0.8
+F_0.5       : 0.8
+
+For sentence #1, the system makes two valid edits {(at-> on),
+(\epsilon -> the)} and one unnecessary edit (The -> A).
+
+For sentence #2, despite missing one gold edit (dog -> dogs) according
+to annotation set 0, the system misses nothing according to set 1.
+
+For sentence #3, according to annotation set 0, the system makes two
+valid edits {(is -> are), (an -> \epsilon)} and misses one edit
+(predator -> predators); however according to set 1, the system makes
+two unnecessary edits {(is -> are), (an -> \epsilon)} and misses one
+edit (otters -> otter).
+
+By the case above, there are four valid edits, one unnecessary edit,
+and one missing edit. Therefore precision is 4/5 = 0.8. Similarly for
+recall. In the above example, the beta value for the F-measure is 0.5
+(the default value).
+
+
+3. Converting the CoNLL-2014 data format
+========================================
+The data format used in the M^2 scorer differs from the format used in
+the CoNLL-2014 shared task (http://www.comp.nus.edu.sg/~nlp/conll14st.html)
+in two aspects:
+ - sentence-level edits
+ - token edit offsets
+
+To convert source files and gold edits from the CoNLL-2014 format into
+the M^2 format, run the preprocessing script bundled with the CoNLL-2014
+training data.
+
+
+4. Revisions
+============
+
+4.1 Alternative edits
+
+In this release, there is a major modification which enables scoring
+with multiple sets of gold edits. For every sentence, the system
+output will be scored against every available set of gold edits for
+the sentence, and the set of gold edits that maximizes the F score of
+the sentence is chosen.
+
+This modification was carried out by Christian Hadiwinoto, 2013.
+
+
+4.2 F-beta measure
+
+While the previous release always uses the F1 measure, i.e. beta =
+1.0, this release supports any value for beta. The default value for
+beta for this version is 0.5.
+
+This modification was carried out by Raymond Hendy Susanto, 2013.
+
+
+4.3 Handling of insertion edits
+
+Multiple insertion edits (starting and ending at the same offset) that
+match a gold edit were counted repeatedly, leading to erroneous and
+inflated scores. A fix has been made to handle this. The order of
+insertion edits, which was not handled in the previous version, is now
+enforced.
+
+This modification was jointly carried out by Raymond Hendy Susanto and
+Christian Hadiwinoto, 2014.
+
+
+4.4 Bug fix for scoring against multiple sets of gold edits, and
+dealing with sequences of insertion/deletion edits
+
+Fixed a bug in the M2 scorer arising from scoring against gold edits
+from multiple annotators. Specifically, the bug sometimes caused
+incorrect scores to be reported when scoring against the gold edits of
+subsequent annotators (other than the first annotator).
+
+Fixed a bug in the M2 scorer that caused erroneous scores to be
+reported when dealing with insertion edits followed by deletion edits
+(or vice versa).
+
+The above modifications were carried out by Christian Hadiwinoto,
+2014.
diff --git a/example/README b/example/README
@@ -0,0 +1,6 @@
+(execute these examples from the m2scorer top-level directory)
+
+
+./m2scorer example/system_output.txt example/source_gold
+
+
diff --git a/example/source_gold b/example/source_gold
@@ -0,0 +1,14 @@
+S The cat sat at mat .
+A 3 4|||Prep|||on|||REQUIRED|||-NONE-|||0
+A 4 4|||ArtOrDet|||the||a|||REQUIRED|||-NONE-|||0
+
+S The dog .
+A 1 2|||NN|||dogs|||REQUIRED|||-NONE-|||0
+A -1 -1|||noop|||-NONE-|||-NONE-|||-NONE-|||1
+
+S Giant otters is an apex predator .
+A 2 3|||SVA|||are|||REQUIRED|||-NONE-|||0
+A 3 4|||ArtOrDet|||-NONE-|||REQUIRED|||-NONE-|||0
+A 5 6|||NN|||predators|||REQUIRED|||-NONE-|||0
+A 1 2|||NN|||otter|||REQUIRED|||-NONE-|||1
+
diff --git a/example/system b/example/system
@@ -0,0 +1,3 @@
+A cat sat on the mat .
+The dog .
+Giant otters are apex predator .
diff --git a/example/system2 b/example/system2
@@ -0,0 +1,3 @@
+A cat sat on mat .
+The dog .
+Giant otters are apex predator .
diff --git a/m2scorer b/m2scorer
@@ -0,0 +1 @@
+./scripts/m2scorer.py