MMRB2 evaluation results

Files changed (6) hide show

README.md CHANGED Viewed

@@ -24,11 +24,11 @@ The MMRB2 benchmark evaluates reward models on multimodal understanding and inte
 | Task | Description | Count |
 |------|-------------|-------|
 | Text-to-Image (image) | Evaluating image generation quality | 1000 |
-| Image Editing (edit) | Evaluating edit instruction adherence | 0 |
-| Interleaved | Mixed text+image generation | 0 |
-| Visual Reasoning (reasoning) | Step-by-step reasoning quality | 0 |
-**Total pairs evaluated:** 1000
 ## Files

 | Task | Description | Count |
 |------|-------------|-------|
 | Text-to-Image (image) | Evaluating image generation quality | 1000 |
+| Image Editing (edit) | Evaluating edit instruction adherence | 1000 |
+| Interleaved | Mixed text+image generation | 1000 |
+| Visual Reasoning (reasoning) | Step-by-step reasoning quality | 1000 |
+**Total pairs evaluated:** 4000
 ## Files

all_results.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

all_results.jsonl CHANGED Viewed

The diff for this file is too large to render. See raw diff

task_edit_mj1-checkpoint.json ADDED Viewed

The diff for this file is too large to render. See raw diff

task_interleaved_mj1-checkpoint.json ADDED Viewed

The diff for this file is too large to render. See raw diff

task_reasoning_mj1-checkpoint.json ADDED Viewed

The diff for this file is too large to render. See raw diff