Update README.md
Browse files
README.md
CHANGED
|
@@ -32,13 +32,16 @@ pinned: false
|
|
| 32 |
---
|
| 33 |
|
| 34 |
## Recent News
|
| 35 |
-
- 🎉 **MJ-PreferGen** is **accepted by ICLR25**! Check out the paper: [*MJ-PreferGen: An Automatic Framework for Preference Data Synthesis*](https://openreview.net/forum?id=WpZyPk79Fu)
|
| 36 |
- 🔥 We have released [**MJ-Video**](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)!
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
|
| 41 |
|
|
|
|
|
|
|
|
|
|
| 42 |
We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
|
| 43 |
|
| 44 |
<p align="center">
|
|
@@ -54,6 +57,10 @@ We release **MJ-Bench-Video**, a comprehensive fine-grained video preference ben
|
|
| 54 |
|
| 55 |
Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
|
| 58 |
|
| 59 |
1. **Alignment**
|
|
@@ -66,9 +73,6 @@ We evaluate a wide range of multimodal judges, including:
|
|
| 66 |
- 11 open-source VLMs (e.g., the LLaVA family)
|
| 67 |
- 4 closed-source VLMs (e.g., GPT-4, Claude 3)
|
| 68 |
|
| 69 |
-
<p align="center">
|
| 70 |
-
<img src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench/main/assets/overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
|
| 71 |
-
</p>
|
| 72 |
|
| 73 |
🔥 **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
|
| 74 |
You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).
|
|
|
|
| 32 |
---
|
| 33 |
|
| 34 |
## Recent News
|
| 35 |
+
- 🎉 **MJ-PreferGen** is **accepted by ICLR25**! Check out the paper: [*MJ-PreferGen: An Automatic Framework for Preference Data Synthesis*](https://openreview.net/forum?id=WpZyPk79Fu).
|
| 36 |
- 🔥 We have released [**MJ-Video**](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)!
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
|
| 41 |
|
| 42 |
+
- **Project page**: [https://aiming-lab.github.io/MJ-VIDEO.github.io/](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
|
| 43 |
+
- **Code repository**: [https://github.com/aiming-lab/MJ-Video](https://github.com/aiming-lab/MJ-Video)
|
| 44 |
+
|
| 45 |
We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
|
| 46 |
|
| 47 |
<p align="center">
|
|
|
|
| 57 |
|
| 58 |
Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
|
| 59 |
|
| 60 |
+
<p align="center">
|
| 61 |
+
<img src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench/main/assets/overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
|
| 62 |
+
</p>
|
| 63 |
+
|
| 64 |
However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
|
| 65 |
|
| 66 |
1. **Alignment**
|
|
|
|
| 73 |
- 11 open-source VLMs (e.g., the LLaVA family)
|
| 74 |
- 4 closed-source VLMs (e.g., GPT-4, Claude 3)
|
| 75 |
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
🔥 **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
|
| 78 |
You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).
|