Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
zzc-1998 authored Oct 30, 2023
1 parent d2eb4a1 commit 80043f2
Showing 1 changed file with 19 additions and 16 deletions.
35 changes: 19 additions & 16 deletions leaderboards/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,28 +63,31 @@ Results of [GPT-4V](https://chat.openai.com) and non-expert human:

|**Participant Name** | yes-or-no | what | how | distortion | others | in-context distortion | in-context others | overall |
| - | - | - | - | - | - | -| - | -|
| GPT-4V | 0.7792 | 0.7918 | **0.6268** | 0.7058 | **0.7303** | 0.7466 | **0.7795** | **0.7336** (+0.1142 to best open-source) |
| human-1 | **0.8248** | **0.7939** | 0.6029 | **0.7562** | 0.7208 | **0.7637** | 0.7300 | **0.7431** (+0.0095 to GPT-4V) |
| GPT-4V (Close-Source Model) | 0.7792 | 0.7918 | 0.6268 | 0.7058 | 0.7303 | 0.7466 | 0.7795 | 0.7336 |
| Junior-level Human | 0.8248 | 0.7939 | 0.6029 | 0.7562 | 0.7208 | 0.7637 | 0.7300 | 0.7431 |
| Senior-level Human | 0.8431 | 0.8894 | 0.7202 | 0.7965 | 0.7947 | 0.8390 | 0.8707 | 0.8174 |

GPT-4V is primarily human-level.
GPT-4V is primarily a Junior-level Human.

Results of Open-source models:

|**Model Name** | yes-or-no | what | how | distortion | others | in-context distortion | in-context others | overall |
| - | - | - | - | - | - | -| - | -|
| idefics | 0.6004 | 0.4642 | 0.4671 | 0.4038 | 0.5990 | 0.4726 | 0.6477 | 0.5151 |
| instructblip_t5 | 0.6953 | 0.5900 | 0.5617 | 0.5731 | 0.6563 | 0.5651 | 0.7121 | **0.6194** (rank 1 on open-source) |
| instructblip_vicuna | 0.7099 | 0.5141 | 0.4300 | 0.4500 | 0.6301 | 0.5719 | 0.6439 | 0.5585 |
| kosmos_2 | 0.6058 | 0.3124 | 0.3539 | 0.3865 | 0.4654 | 0.4349 | 0.4735 | 0.4334 |
| llama_adapter_v2 | 0.6661 | 0.5466 | 0.5165 | 0.5615 | 0.6181 | 0.5925 | 0.5455 | 0.5806 |
| llava_v1.5 | 0.6734 | 0.6334 | 0.5412 | 0.5278 | 0.6802 | 0.5856 | 0.7338 | 0.6181 |
| llava_v1 | 0.5712 | 0.5488 | 0.5185 | 0.4558 | 0.5800 | 0.5719 | 0.6477 | 0.5472 |
| minigpt4_13b | 0.6077 | 0.5033 | 0.4300 | 0.4558 | 0.5251 | 0.5342 | 0.6098 | 0.5177 |
| mplug_owl | 0.7245 | 0.5488 | 0.4753 | 0.4962 | 0.6301 | 0.6267 | 0.6667 | 0.5893 |
| otter_v1 | 0.5766 | 0.3970 | 0.4259 | 0.4212 | 0.4893 | 0.4760 | 0.5417 | 0.4722 |
| qwen_vl | 0.6533 | 0.6074 | 0.5844 | 0.5413 | 0.6635 | 0.5822 | 0.7300 | 0.6167 |
| shikra | 0.6909 | 0.4793 | 0.4671 | 0.4731 | 0.6086 | 0.5308 | 0.6477 | 0.5532 |
| visualglm | 0.6131 | 0.5358 | 0.4403 | 0.4856 | 0.5489 | 0.5548 | 0.5779 | 0.5331 |
| random guess | 0.5000 | 0.2848 | 0.3330 | 0.3724 | 0.3850 | 0.3913 | 0.3710 | 0.3794 |
| LLaVA-v1.5 (Vicuna-v1.5-7B) | 0.6460 | 0.5922 | 0.5576 | 0.4798 | 0.6730 | 0.5890 | 0.7376 | 0.6007 |
| LLaVA-v1.5 (Vicuna-v1.5-13B) | 0.6496 | 0.6486 | 0.5412 | 0.5355 | 0.6659 | 0.5890 | 0.7148 | 0.6140 |
| InternLM-XComposer (InternLM) | 0.6843 | 0.6204 | 0.6193 | 0.5681 | 0.7041 | 0.5753 | 0.7719 | 0.6435 |
| IDEFICS-Instruct (LLaMA-7B) | 0.6004 | 0.4642 | 0.4671 | 0.4038 | 0.5990 | 0.4726 | 0.6477 | 0.5151 |
| Qwen-VL (QwenLM) | 0.6533 | 0.6074 | 0.5844 | 0.5413 | 0.6635 | 0.5822 | 0.7300 | 0.6167 |
| Shikra(Vicuna-7B) | 0.6909 | 0.4793 | 0.4671 | 0.4731 | 0.6086 | 0.5308 | 0.6477 | 0.5532 |
| Otter-v1 (MPT-7B) | 0.5766 | 0.3970 | 0.4259 | 0.4212 | 0.4893 | 0.4760 | 0.5417 | 0.4722 |
| InstructBLIP (Flan-T5-XL) | 0.6953 | 0.5900 | 0.5617 | 0.5731 | 0.6551 | 0.5651 | 0.7121 | 0.6194 |
| InstructBLIP (Vicuna-7B) | 0.7099 | 0.5141 | 0.4300 | 0.4500 | 0.6301 | 0.5719 | 0.6439 | 0.5585 |
| VisualGLM-6B (GLM-6B) | 0.6131 | 0.5358 | 0.4403 | 0.4856 | 0.5489 | 0.5548 | 0.5779 | 0.5331 |
| mPLUG-Owl (LLaMA-7B) | 0.7245 | 0.5488 | 0.4753 | 0.4962 | 0.6301 | 0.6267 | 0.6667 | 0.5893 |
| LLaMA-Adapter-V2 | 0.6618 | 0.5466 | 0.5165 | 0.5615 | 0.6181 | 0.5925 | 0.5455 | 0.5806 |
| LLaVA-v1 (Vicuna-13B) | 0.5712 | 0.5488 | 0.5185 | 0.4558 | 0.5800 | 0.5719 | 0.6477 | 0.5472 |
| MiniGPT-4 (Vicuna-13B) | 0.6077 | 0.5033 | 0.4300 | 0.4558 | 0.5251 | 0.5342 | 0.6098 | 0.5177 |


### (*Additional*) PPL-based Testing Pipeline
Expand Down

0 comments on commit 80043f2

Please sign in to comment.