Update README.md

Q-Future · Oct 30, 2023 · 80043f2 · 80043f2
1 parent d2eb4a1
commit 80043f2
Showing 1 changed file with 19 additions and 16 deletions.
diff --git a/leaderboards/README.md b/leaderboards/README.md
@@ -63,28 +63,31 @@ Results of [GPT-4V](https://chat.openai.com) and non-expert human:
 
 |**Participant Name** | yes-or-no | what | how | distortion | others | in-context distortion | in-context others | overall |
 | - | - | - | - | - | - | -| - | -| 
-| GPT-4V | 0.7792 | 0.7918 | **0.6268** | 0.7058 | **0.7303** | 0.7466 | **0.7795** | **0.7336** (+0.1142 to best open-source)  |
-| human-1 | **0.8248** | **0.7939** | 0.6029 | **0.7562** | 0.7208 | **0.7637** | 0.7300 | **0.7431** (+0.0095 to GPT-4V)  |
+| GPT-4V (Close-Source Model)   | 0.7792  | 0.7918  | 0.6268  | 0.7058  | 0.7303  | 0.7466  | 0.7795  | 0.7336  |
+| Junior-level Human            | 0.8248  | 0.7939  | 0.6029  | 0.7562  | 0.7208  | 0.7637  | 0.7300  | 0.7431  |
+| Senior-level Human            | 0.8431  | 0.8894  | 0.7202  | 0.7965  | 0.7947  | 0.8390  | 0.8707  | 0.8174  |
 
-GPT-4V is primarily human-level.
+GPT-4V is primarily a Junior-level Human.
 
 Results of Open-source models:
 
 |**Model Name** | yes-or-no | what | how | distortion | others | in-context distortion | in-context others | overall |
 | - | - | - | - | - | - | -| - | -| 
-| idefics | 0.6004 | 0.4642 | 0.4671 | 0.4038 | 0.5990 | 0.4726 | 0.6477 | 0.5151 |
-| instructblip_t5 | 0.6953 | 0.5900 | 0.5617 | 0.5731 | 0.6563 | 0.5651 | 0.7121 | **0.6194** (rank 1 on open-source) |
-| instructblip_vicuna | 0.7099 | 0.5141 | 0.4300 | 0.4500 | 0.6301 | 0.5719 | 0.6439 | 0.5585 |
-| kosmos_2 | 0.6058 | 0.3124 | 0.3539 | 0.3865 | 0.4654 | 0.4349 | 0.4735 | 0.4334 |
-| llama_adapter_v2 | 0.6661 | 0.5466 | 0.5165 | 0.5615 | 0.6181 | 0.5925 | 0.5455 | 0.5806 |
-| llava_v1.5 | 0.6734 | 0.6334 | 0.5412 | 0.5278 | 0.6802 | 0.5856 | 0.7338 | 0.6181 |
-| llava_v1 | 0.5712 | 0.5488 | 0.5185 | 0.4558 | 0.5800 | 0.5719 | 0.6477 | 0.5472 |
-| minigpt4_13b | 0.6077 | 0.5033 | 0.4300 | 0.4558 | 0.5251 | 0.5342 | 0.6098 | 0.5177 |
-| mplug_owl | 0.7245 | 0.5488 | 0.4753 | 0.4962 | 0.6301 | 0.6267 | 0.6667 | 0.5893 |
-| otter_v1 | 0.5766 | 0.3970 | 0.4259 | 0.4212 | 0.4893 | 0.4760 | 0.5417 | 0.4722 |
-| qwen_vl | 0.6533 | 0.6074 | 0.5844 | 0.5413 | 0.6635 | 0.5822 | 0.7300 | 0.6167 |
-| shikra | 0.6909 | 0.4793 | 0.4671 | 0.4731 | 0.6086 | 0.5308 | 0.6477 | 0.5532 |
-| visualglm | 0.6131 | 0.5358 | 0.4403 | 0.4856 | 0.5489 | 0.5548 | 0.5779 | 0.5331 |
+| random guess              | 0.5000  | 0.2848  | 0.3330  | 0.3724  | 0.3850  | 0.3913  | 0.3710  | 0.3794  |
+| LLaVA-v1.5 (Vicuna-v1.5-7B) | 0.6460  | 0.5922  | 0.5576  | 0.4798  | 0.6730  | 0.5890  | 0.7376  | 0.6007  |
+| LLaVA-v1.5 (Vicuna-v1.5-13B) | 0.6496  | 0.6486  | 0.5412  | 0.5355  | 0.6659  | 0.5890  | 0.7148  | 0.6140  |
+| InternLM-XComposer (InternLM) | 0.6843  | 0.6204  | 0.6193  | 0.5681  | 0.7041  | 0.5753  | 0.7719  | 0.6435  |
+| IDEFICS-Instruct (LLaMA-7B) | 0.6004  | 0.4642  | 0.4671  | 0.4038  | 0.5990  | 0.4726  | 0.6477  | 0.5151  |
+| Qwen-VL (QwenLM)           | 0.6533  | 0.6074  | 0.5844  | 0.5413  | 0.6635  | 0.5822  | 0.7300  | 0.6167  |
+| Shikra(Vicuna-7B)          | 0.6909  | 0.4793  | 0.4671  | 0.4731  | 0.6086  | 0.5308  | 0.6477  | 0.5532  |
+| Otter-v1 (MPT-7B)          | 0.5766  | 0.3970  | 0.4259  | 0.4212  | 0.4893  | 0.4760  | 0.5417  | 0.4722  |
+| InstructBLIP (Flan-T5-XL)  | 0.6953  | 0.5900  | 0.5617  | 0.5731  | 0.6551  | 0.5651  | 0.7121  | 0.6194  |
+| InstructBLIP (Vicuna-7B)   | 0.7099  | 0.5141  | 0.4300  | 0.4500  | 0.6301  | 0.5719  | 0.6439  | 0.5585  |
+| VisualGLM-6B (GLM-6B)      | 0.6131  | 0.5358  | 0.4403  | 0.4856  | 0.5489  | 0.5548  | 0.5779  | 0.5331  |
+| mPLUG-Owl (LLaMA-7B)       | 0.7245  | 0.5488  | 0.4753  | 0.4962  | 0.6301  | 0.6267  | 0.6667  | 0.5893  |
+| LLaMA-Adapter-V2           | 0.6618  | 0.5466  | 0.5165  | 0.5615  | 0.6181  | 0.5925  | 0.5455  | 0.5806  |
+| LLaVA-v1 (Vicuna-13B)      | 0.5712  | 0.5488  | 0.5185  | 0.4558  | 0.5800  | 0.5719  | 0.6477  | 0.5472  |
+| MiniGPT-4 (Vicuna-13B)     | 0.6077  | 0.5033  | 0.4300  | 0.4558  | 0.5251  | 0.5342  | 0.6098  | 0.5177  |
 
 
 ### (*Additional*) PPL-based Testing Pipeline