Skip to content

Commit

Permalink
Updated results
Browse files Browse the repository at this point in the history
  • Loading branch information
Pringled committed Oct 2, 2024
1 parent 0bd7cb0 commit 3e46b26
Showing 1 changed file with 19 additions and 15 deletions.
34 changes: 19 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,13 +231,16 @@ print(make_leaderboard(task_scores))

Model2Vec is evaluated on MTEB, as well as two additional tasks: [PEARL](https://github.com/tigerchen52/PEARL) (a phrase representation task) and WordSim (a collection of _word_ similarity tasks). The results are shown in the table below.

| Model | Avg (All) | Avg (MTEB) | Class | Clust | PairClass | Rank | Ret | STS | Sum | PEARL | WordSim |
|------------------|-------------|------------|-------|-------|-----------|-------|-------|-------|-------|-------|---------|
| all-MiniLM-L6-v2 | 56.08 | 56.09 | 62.62 | 41.94 | 82.37 | 58.04 | 41.95 | 78.90 | 30.81 | 60.83 | 49.91 |
| M2V_base_glove | 48.58 | 47.60 | 61.35 | 30.52 | 75.34 | 48.50 | 29.26 | 70.31 | 31.50 | 50.28 | 54.29 |
| M2V_base_output | 46.79 | 45.34 | 61.25 | 25.58 | 74.90 | 47.63 | 26.14 | 68.58 | 29.20 | 54.02 | 49.18 |
| GloVe_300d | 42.84 | 42.36 | 57.31 | 27.66 | 72.48 | 43.30 | 22.78 | 61.90 | 28.81 | 45.65 | 43.05 |
| WL256* | 48.88 | 49.36 | 58.98 | 33.34 | 74.00 | 52.03 | 33.12 | 73.34 | 29.05 | 48.81 | 45.16 |

| Model | Avg (All) | Avg (MTEB) | Class | Clust | PairClass | Rank | Ret | STS | Sum | Pearl | WordSim |
|------------------------|-----------|------------|--------|--------|-----------|--------|--------|--------|--------|--------|---------|
| all-MiniLM-L6-v2 | 56.08 | 56.09 | 62.62 | 41.94 | 82.37 | 58.04 | 41.95 | 78.90 | 30.81 | 60.83 | 49.91 |
| M2V_base_glove_subword | 49.06 | 46.69 | 61.27 | 30.03 | 74.71 | 49.15 | 27.16 | 69.09 | 30.08 | 56.82 | 57.99 |
| M2V_base_glove | 48.58 | 47.6 | 61.35 | 30.52 | 75.34 | 48.5 | 29.26 | 70.31 | 31.5 | 50.28 | 54.29 |
| M2V_base_output | 46.79 | 45.34 | 61.25 | 25.58 | 74.9 | 47.63 | 26.14 | 68.58 | 29.2 | 54.02 | 49.18 |
| GloVe_300d | 42.84 | 42.36 | 57.31 | 27.66 | 72.48 | 43.3 | 22.78 | 61.9 | 28.81 | 45.65 | 43.05 |
| WL256* | 48.88 | 49.36 | 58.98 | 33.34 | 74.00 | 52.03 | 33.12 | 73.34 | 29.05 | 48.81 | 45.16 |


<details>
<summary> Task Abbreviations </summary>
Expand All @@ -259,14 +262,15 @@ For readability, the MTEB task names are abbreviated as follows:

In addition to the MTEB evaluation, we evaluate Model2Vec on a number of classification datasets. These are used as additional evidence to avoid overfitting to the MTEB dataset and to benchmark the speed of the model. The results are shown in the table below.

| model | Average | sst2 | imdb | trec | ag_news |
|:-----------------|----------:|---------:|-------:|---------:|----------:|
| bge-base-en-v1.5 | 90.00 | 91.54 | 91.88 | 85.16 | 91.45 |
| all-MiniLM-L6-v2 | 84.10 | 83.95 | 81.36 | 81.31 | 89.77 |
| M2V_base_output | 82.23 | 80.92 | 84.56 | 75.27 | 88.17 |
| M2V_base_glove | 80.76 | 83.07 | 85.24 | 66.12 | 88.61 |
| WL256 | 78.48 | 76.88 | 80.12 | 69.23 | 87.68 |
| GloVe_300d | 77.77 | 81.68 | 84.00 | 55.67 | 89.71 |
| Model | Average | SST2 | IMDB | TREC | AG News |
|:-----------------------|:-------:|:------:|:-----:|:------:|:-------:|
| bge-base-en-v1.5 | 90.00 | 91.54 | 91.88 | 85.16 | 91.45 |
| all-MiniLM-L6-v2 | 84.10 | 83.95 | 81.36 | 81.31 | 89.77 |
| M2V_base_output | 82.23 | 80.92 | 84.56 | 75.27 | 88.17 |
| M2V_base_glove_subword | 81.95 | 82.84 | 85.96 | 70.51 | 88.49 |
| M2V_base_glove | 80.76 | 83.07 | 85.24 | 66.12 | 88.61 |
| WL256 | 78.48 | 76.88 | 80.12 | 69.23 | 87.68 |
| GloVe_300d | 77.77 | 81.68 | 84.00 | 55.67 | 89.71 |

As can be seen, Model2Vec models outperform the GloVe and WL256 models on all classification tasks, and are competitive with the all-MiniLM-L6-v2 model, while being much faster.

Expand Down

0 comments on commit 3e46b26

Please sign in to comment.