MMRB2 evaluation results
Browse files- README.md +4 -4
- all_results.json +0 -0
- all_results.jsonl +0 -0
- task_edit_mj1-checkpoint.json +0 -0
- task_interleaved_mj1-checkpoint.json +0 -0
- task_reasoning_mj1-checkpoint.json +0 -0
README.md
CHANGED
|
@@ -24,11 +24,11 @@ The MMRB2 benchmark evaluates reward models on multimodal understanding and inte
|
|
| 24 |
| Task | Description | Count |
|
| 25 |
|------|-------------|-------|
|
| 26 |
| Text-to-Image (image) | Evaluating image generation quality | 1000 |
|
| 27 |
-
| Image Editing (edit) | Evaluating edit instruction adherence |
|
| 28 |
-
| Interleaved | Mixed text+image generation |
|
| 29 |
-
| Visual Reasoning (reasoning) | Step-by-step reasoning quality |
|
| 30 |
|
| 31 |
-
**Total pairs evaluated:**
|
| 32 |
|
| 33 |
## Files
|
| 34 |
|
|
|
|
| 24 |
| Task | Description | Count |
|
| 25 |
|------|-------------|-------|
|
| 26 |
| Text-to-Image (image) | Evaluating image generation quality | 1000 |
|
| 27 |
+
| Image Editing (edit) | Evaluating edit instruction adherence | 1000 |
|
| 28 |
+
| Interleaved | Mixed text+image generation | 1000 |
|
| 29 |
+
| Visual Reasoning (reasoning) | Step-by-step reasoning quality | 1000 |
|
| 30 |
|
| 31 |
+
**Total pairs evaluated:** 4000
|
| 32 |
|
| 33 |
## Files
|
| 34 |
|
all_results.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
all_results.jsonl
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
task_edit_mj1-checkpoint.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
task_interleaved_mj1-checkpoint.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
task_reasoning_mj1-checkpoint.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|