Zhaorun commited on
Commit
83020f3
·
verified ·
1 Parent(s): 8f971a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -32,13 +32,16 @@ pinned: false
32
  ---
33
 
34
  ## Recent News
35
- - 🎉 **MJ-PreferGen** is **accepted by ICLR25**! Check out the paper: [*MJ-PreferGen: An Automatic Framework for Preference Data Synthesis*](https://openreview.net/forum?id=WpZyPk79Fu)
36
  - 🔥 We have released [**MJ-Video**](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)!
37
 
38
  ---
39
 
40
  ## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
41
 
 
 
 
42
  We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
43
 
44
  <p align="center">
@@ -54,6 +57,10 @@ We release **MJ-Bench-Video**, a comprehensive fine-grained video preference ben
54
 
55
  Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
56
 
 
 
 
 
57
  However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
58
 
59
  1. **Alignment**
@@ -66,9 +73,6 @@ We evaluate a wide range of multimodal judges, including:
66
  - 11 open-source VLMs (e.g., the LLaVA family)
67
  - 4 closed-source VLMs (e.g., GPT-4, Claude 3)
68
 
69
- <p align="center">
70
- <img src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench/main/assets/overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
71
- </p>
72
 
73
  🔥 **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
74
  You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).
 
32
  ---
33
 
34
  ## Recent News
35
+ - 🎉 **MJ-PreferGen** is **accepted by ICLR25**! Check out the paper: [*MJ-PreferGen: An Automatic Framework for Preference Data Synthesis*](https://openreview.net/forum?id=WpZyPk79Fu).
36
  - 🔥 We have released [**MJ-Video**](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)!
37
 
38
  ---
39
 
40
  ## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
41
 
42
+ - **Project page**: [https://aiming-lab.github.io/MJ-VIDEO.github.io/](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
43
+ - **Code repository**: [https://github.com/aiming-lab/MJ-Video](https://github.com/aiming-lab/MJ-Video)
44
+
45
  We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
46
 
47
  <p align="center">
 
57
 
58
  Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
59
 
60
+ <p align="center">
61
+ <img src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench/main/assets/overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
62
+ </p>
63
+
64
  However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
65
 
66
  1. **Alignment**
 
73
  - 11 open-source VLMs (e.g., the LLaVA family)
74
  - 4 closed-source VLMs (e.g., GPT-4, Claude 3)
75
 
 
 
 
76
 
77
  🔥 **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
78
  You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).