Updated results

MinishLab · Oct 2, 2024 · 3e46b26 · 3e46b26
1 parent 0bd7cb0
commit 3e46b26
Showing 1 changed file with 19 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -231,13 +231,16 @@ print(make_leaderboard(task_scores))
 
 Model2Vec is evaluated on MTEB, as well as two additional tasks: [PEARL](https://github.com/tigerchen52/PEARL) (a phrase representation task) and WordSim (a collection of _word_ similarity tasks). The results are shown in the table below.
 
-| Model            | Avg (All)   | Avg (MTEB) | Class | Clust | PairClass | Rank  | Ret   | STS   | Sum   | PEARL | WordSim |
-|------------------|-------------|------------|-------|-------|-----------|-------|-------|-------|-------|-------|---------|
-| all-MiniLM-L6-v2 | 56.08       | 56.09      | 62.62 | 41.94 | 82.37     | 58.04 | 41.95 | 78.90 | 30.81 | 60.83 | 49.91   |
-| M2V_base_glove   | 48.58       | 47.60      | 61.35 | 30.52 | 75.34     | 48.50 | 29.26 | 70.31 | 31.50 | 50.28 | 54.29   |
-| M2V_base_output  | 46.79       | 45.34      | 61.25 | 25.58 | 74.90     | 47.63 | 26.14 | 68.58 | 29.20 | 54.02 | 49.18   |
-| GloVe_300d       | 42.84       | 42.36      | 57.31 | 27.66 | 72.48     | 43.30 | 22.78 | 61.90 | 28.81 | 45.65 | 43.05   |
-| WL256*           | 48.88       | 49.36      | 58.98 | 33.34 | 74.00     | 52.03 | 33.12 | 73.34 | 29.05 | 48.81 | 45.16   |
+
+| Model                  | Avg (All) | Avg (MTEB) | Class  | Clust  | PairClass | Rank   | Ret    | STS    | Sum    | Pearl  | WordSim |
+|------------------------|-----------|------------|--------|--------|-----------|--------|--------|--------|--------|--------|---------|
+| all-MiniLM-L6-v2        | 56.08     | 56.09      | 62.62  | 41.94  | 82.37     | 58.04  | 41.95  | 78.90  | 30.81  | 60.83  | 49.91   |
+| M2V_base_glove_subword  | 49.06     | 46.69      | 61.27  | 30.03  | 74.71     | 49.15  | 27.16  | 69.09  | 30.08  | 56.82  | 57.99   |
+| M2V_base_glove          | 48.58     | 47.6       | 61.35  | 30.52  | 75.34     | 48.5   | 29.26  | 70.31  | 31.5   | 50.28  | 54.29   |
+| M2V_base_output         | 46.79     | 45.34      | 61.25  | 25.58  | 74.9      | 47.63  | 26.14  | 68.58  | 29.2   | 54.02  | 49.18   |
+| GloVe_300d              | 42.84     | 42.36      | 57.31  | 27.66  | 72.48     | 43.3   | 22.78  | 61.9   | 28.81  | 45.65  | 43.05   |
+| WL256*                  | 48.88     | 49.36      | 58.98  | 33.34  | 74.00     | 52.03  | 33.12  | 73.34  | 29.05  | 48.81  | 45.16   |
+
 
 <details>
   <summary>  Task Abbreviations </summary>
@@ -259,14 +262,15 @@ For readability, the MTEB task names are abbreviated as follows:
 
 In addition to the MTEB evaluation, we evaluate Model2Vec on a number of classification datasets. These are used as additional evidence to avoid overfitting to the MTEB dataset and to benchmark the speed of the model. The results are shown in the table below.
 
-| model            |   Average |     sst2 |   imdb |     trec |   ag_news |
-|:-----------------|----------:|---------:|-------:|---------:|----------:|
-| bge-base-en-v1.5 |  90.00 | 91.54 | 91.88 | 85.16 |  91.45   |
-| all-MiniLM-L6-v2 |  84.10 | 83.95 | 81.36 | 81.31 |  89.77 |
-| M2V_base_output  |  82.23 | 80.92 | 84.56 | 75.27 |  88.17  |
-| M2V_base_glove   |  80.76 | 83.07 | 85.24 | 66.12 | 88.61 |
-| WL256            |  78.48  | 76.88  | 80.12 | 69.23 |  87.68 |
-| GloVe_300d       |  77.77  | 81.68 | 84.00   | 55.67 |  89.71 |
+| Model                  | Average | SST2   | IMDB  | TREC   | AG News |
+|:-----------------------|:-------:|:------:|:-----:|:------:|:-------:|
+| bge-base-en-v1.5        | 90.00   | 91.54  | 91.88 | 85.16  | 91.45   |
+| all-MiniLM-L6-v2        | 84.10   | 83.95  | 81.36 | 81.31  | 89.77   |
+| M2V_base_output         | 82.23   | 80.92  | 84.56 | 75.27  | 88.17   |
+| M2V_base_glove_subword  | 81.95   | 82.84  | 85.96 | 70.51  | 88.49   |
+| M2V_base_glove          | 80.76   | 83.07  | 85.24 | 66.12  | 88.61   |
+| WL256                   | 78.48   | 76.88  | 80.12 | 69.23  | 87.68   |
+| GloVe_300d              | 77.77   | 81.68  | 84.00 | 55.67  | 89.71   |
 
 As can be seen, Model2Vec models outperform the GloVe and WL256 models on all classification tasks, and are competitive with the all-MiniLM-L6-v2 model, while being much faster.