Update README.md
Browse files
README.md
CHANGED
|
@@ -9,32 +9,38 @@ pinned: false
|
|
| 9 |
|
| 10 |
# MJ-Bench Team: Align
|
| 11 |
|
| 12 |
-
|
| 13 |
## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
|
| 14 |
|
| 15 |
-
We release MJ-Bench-Video
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
| 18 |
|
|
|
|
| 19 |
|
| 20 |
## 👩⚖️ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
|
| 21 |
|
| 22 |
-
Project page
|
| 23 |
-
Code repository
|
| 24 |
-
|
| 25 |
-
While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes.
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
-
<!--  -->
|
| 38 |
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
|
|
|
|
|
| 9 |
|
| 10 |
# MJ-Bench Team: Align
|
| 11 |
|
|
|
|
| 12 |
## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
|
| 13 |
|
| 14 |
+
We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
|
| 15 |
|
| 16 |
+
<p align="center">
|
| 17 |
+
<img src="https://raw.githubusercontent.com/aiming-lab/MJ-Video/main/asserts/overview.png" alt="MJ-Video Overview" width="80%"/>
|
| 18 |
+
</p>
|
| 19 |
|
| 20 |
+
---
|
| 21 |
|
| 22 |
## 👩⚖️ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
|
| 23 |
|
| 24 |
+
- **Project page**: [https://mj-bench.github.io/](https://mj-bench.github.io/)
|
| 25 |
+
- **Code repository**: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench)
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
|
| 28 |
|
| 29 |
+
However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
|
| 30 |
|
| 31 |
+
1. **Alignment**
|
| 32 |
+
2. **Safety**
|
| 33 |
+
3. **Image Quality**
|
| 34 |
+
4. **Bias**
|
| 35 |
|
| 36 |
+
We evaluate a wide range of multimodal judges, including:
|
| 37 |
+
- 6 smaller-sized CLIP-based scoring models
|
| 38 |
+
- 11 open-source VLMs (e.g., the LLaVA family)
|
| 39 |
+
- 4 closed-source VLMs (e.g., GPT-4, Claude 3)
|
|
|
|
| 40 |
|
| 41 |
+
<p align="center">
|
| 42 |
+
<img src="https://github.com/MJ-Bench/MJ-Bench.github.io/blob/main/static/images/dataset_overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
|
| 43 |
+
</p>
|
| 44 |
|
| 45 |
+
🔥 **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
|
| 46 |
+
You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).
|