Bhavkumar21 commited on
Commit
88f55c2
·
verified ·
1 Parent(s): 014a2d5

MMRB2 evaluation results

Browse files
README.md CHANGED
@@ -24,11 +24,11 @@ The MMRB2 benchmark evaluates reward models on multimodal understanding and inte
24
  | Task | Description | Count |
25
  |------|-------------|-------|
26
  | Text-to-Image (image) | Evaluating image generation quality | 1000 |
27
- | Image Editing (edit) | Evaluating edit instruction adherence | 0 |
28
- | Interleaved | Mixed text+image generation | 0 |
29
- | Visual Reasoning (reasoning) | Step-by-step reasoning quality | 0 |
30
 
31
- **Total pairs evaluated:** 1000
32
 
33
  ## Files
34
 
 
24
  | Task | Description | Count |
25
  |------|-------------|-------|
26
  | Text-to-Image (image) | Evaluating image generation quality | 1000 |
27
+ | Image Editing (edit) | Evaluating edit instruction adherence | 1000 |
28
+ | Interleaved | Mixed text+image generation | 1000 |
29
+ | Visual Reasoning (reasoning) | Step-by-step reasoning quality | 1000 |
30
 
31
+ **Total pairs evaluated:** 4000
32
 
33
  ## Files
34
 
all_results.json CHANGED
The diff for this file is too large to render. See raw diff
 
all_results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
task_edit_mj1-checkpoint.json ADDED
The diff for this file is too large to render. See raw diff
 
task_interleaved_mj1-checkpoint.json ADDED
The diff for this file is too large to render. See raw diff
 
task_reasoning_mj1-checkpoint.json ADDED
The diff for this file is too large to render. See raw diff