Update README.md
Browse files
README.md
CHANGED
|
@@ -33,26 +33,27 @@ For the first two benchmarks, we take Spearman corrleation between model's outpu
|
|
| 33 |
averaged among all the evaluation aspects as indicator.
|
| 34 |
For GenAI-Bench and VBench, which include human preference data among two or more videos,
|
| 35 |
we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
|
| 36 |
-
| metric | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench
|
| 37 |
-
|
| 38 |
-
| MantisScore (reg) |
|
| 39 |
-
| MantisScore (gen) | 222.4 |
|
| 40 |
-
| Gemini-1.5-Pro |
|
| 41 |
-
| Gemini-1.5-Flash | 157.5 | 20.8 | 17.3 |
|
| 42 |
-
| GPT-4o | 155.4 |
|
| 43 |
-
| CLIP-sim | 126.8 | 8.9 |
|
| 44 |
-
| DINO-sim | 121.3 | 7.5 | 32.1 | 38.5 |
|
| 45 |
-
| SSIM-sim | 118.0 | 13.4 | 26.9 | 34.1 |
|
| 46 |
-
| CLIP-Score | 114.4 | -7.2 | 21.7 | 45.0 |
|
| 47 |
-
| LLaVA-1.5-7B | 108.3 | 8.5 | 10.5 | 49.9 |
|
| 48 |
-
| LLaVA-1.6-7B | 93.3 | -3.1 | 13.2 | 44.5 |
|
| 49 |
-
| X-CLIP-Score | 92.9 | -1.9 | 13.3 | 41.4 |
|
| 50 |
-
| PIQE | 78.3 | -10.1 | -1.2 | 34.5
|
| 51 |
-
| BRISQUE | 75.9 | -20.3 | 3.9 | 38.5 |
|
| 52 |
-
| Idefics2 | 73.0 | 6.5 | 0.3 | 34.6 |
|
| 53 |
-
| SSIM-dyn | 42.5 | -5.5 | -17.0 | 28.4 |
|
| 54 |
-
| MES-dyn | 36.7 | -12.9 | -26.4 | 31.4 |
|
| 55 |
-
|
|
|
|
| 56 |
|
| 57 |
## Usage
|
| 58 |
### Installation
|
|
|
|
| 33 |
averaged among all the evaluation aspects as indicator.
|
| 34 |
For GenAI-Bench and VBench, which include human preference data among two or more videos,
|
| 35 |
we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
|
| 36 |
+
| metric | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench |
|
| 37 |
+
|-------------------|:---------------:|:--------------:|:-----------:|:-----------:|:----------:|
|
| 38 |
+
| MantisScore (reg) | **278.3** | 75.7 | **51.1** | **78.5** | **73.0** |
|
| 39 |
+
| MantisScore (gen) | 222.4 | **77.1** | 27.6 | 59.0 | 58.7 |
|
| 40 |
+
| Gemini-1.5-Pro | <u>158.8</u> | 22.1 | 22.9 | 60.9 | 52.9 |
|
| 41 |
+
| Gemini-1.5-Flash | 157.5 | 20.8 | 17.3 | <u>67.1</u> | 52.3 |
|
| 42 |
+
| GPT-4o | 155.4 | <u>23.1</u> | 28.7 | 52.0 | 51.7 |
|
| 43 |
+
| CLIP-sim | 126.8 | 8.9 | <u>36.2</u> | 34.2 | 47.4 |
|
| 44 |
+
| DINO-sim | 121.3 | 7.5 | 32.1 | 38.5 | 43.3 |
|
| 45 |
+
| SSIM-sim | 118.0 | 13.4 | 26.9 | 34.1 | 43.5 |
|
| 46 |
+
| CLIP-Score | 114.4 | -7.2 | 21.7 | 45.0 | 54.9 |
|
| 47 |
+
| LLaVA-1.5-7B | 108.3 | 8.5 | 10.5 | 49.9 | 39.4 |
|
| 48 |
+
| LLaVA-1.6-7B | 93.3 | -3.1 | 13.2 | 44.5 | 38.7 |
|
| 49 |
+
| X-CLIP-Score | 92.9 | -1.9 | 13.3 | 41.4 | 40.1 |
|
| 50 |
+
| PIQE | 78.3 | -10.1 | -1.2 | 34.5 |<u> 55.1</u>|
|
| 51 |
+
| BRISQUE | 75.9 | -20.3 | 3.9 | 38.5 | 53.7 |
|
| 52 |
+
| Idefics2 | 73.0 | 6.5 | 0.3 | 34.6 | 31.7 |
|
| 53 |
+
| SSIM-dyn | 42.5 | -5.5 | -17.0 | 28.4 | 36.5 |
|
| 54 |
+
| MES-dyn | 36.7 | -12.9 | -26.4 | 31.4 | 44.5 |
|
| 55 |
+
|
| 56 |
+
The best in MantisScore series is in bold and the best in baselines is underlined.
|
| 57 |
|
| 58 |
## Usage
|
| 59 |
### Installation
|