You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The research code in this directory implements reference aggregation, an efficiency method for MBR that uses aggregate reference representations for faster utility estimation.
We apply reference aggregation to two metrics: ChrF and COMET.
Unlike the mbr package, the code in this directory is purely research-oriented (= reproducing the tables and figures in our paper) and not optimized for usability.
Installation
Requires Python >= 3.9 and PyTorch.
pip install -r requirements.txt
Reproducing the experiments
Creating the samples
Warning: The following code downloads a large translation model from PyTorch Hub (if not already present) and generates 1024 samples per segment, which will take some time.
Samples will be stored in a JSON lines file in the directory samples/.
Performing this analysis is computationally heavy because we run it for many different values of s (x-axis of Figure 1).
We run N-by-N MBR, N-by-S MBR and Reference Aggregation in a single script, and all values of s, so that the embedding part of COMET only needs to run once.
The results are stored in a JSON lines file in the directory validation_output/. Each line describes the output for one method and one value of s.
In addition, the top translations will be stored in text files (one translation per line) in the translations/ directory, to allow for easy evaluation.
The utility metric is either "chrf", "cometinho" or "comet22".
In the test results table, we compare the translation quality of beam search, epsilon sampling, standard (pairwise) MBR, and reference aggregation. We also experiment with aggregate-to-fine MBR.
The following scripts create the translations and store them in the translations/ directory.