HY-Wan commited on
Commit
b3988e8
·
verified ·
1 Parent(s): 73d328e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -106
README.md CHANGED
@@ -21,110 +21,8 @@ pipeline_tag: image-to-video
21
 
22
  # VR-Bench: A Multimodal Video Reasoning Benchmark
23
 
24
- ## Dataset Description
25
 
26
- This is a multimodal dataset containing video demonstrations of game-playing scenarios across different game types including mazes, 3D mazes, Sokoban puzzles, and trap fields. The dataset is designed for training AI models on visual reasoning, planning, and sequential decision-making tasks.
27
-
28
- ## Dataset Structure
29
-
30
- The dataset is organized into three main directories:
31
-
32
- - `train_data/`: Training data with subdirectories for each game type and difficulty level
33
- - `test_data/`: Test data with the same structure as training data
34
- - `test_data_merge/`: Merged test data organized by game type (without difficulty separation)
35
-
36
- ### Game Types
37
-
38
- 1. **Maze**: Classic 2D maze navigation
39
- 2. **Irregular Maze**: Non-standard maze layouts
40
- 3. **Maze3D**: Three-dimensional maze navigation
41
- 4. **Sokoban**: Box-pushing puzzle game
42
- 5. **Trapfield**: Navigation with obstacles and traps
43
-
44
- ### Difficulty Levels
45
-
46
- Each game type has three difficulty levels:
47
- - `easy`: Simple layouts with shorter solution paths
48
- - `medium`: Moderate complexity
49
- - `hard`: Complex layouts requiring advanced planning
50
-
51
- ## File Format
52
-
53
- Each data sample consists of:
54
- - **Video file** (`.mp4`): Demonstration of gameplay
55
- - **Image file** (`.png`): Initial state screenshot
56
- - **JSON file** (`.json`): Game state metadata including:
57
- - Grid layout and dimensions
58
- - Entity positions (player, goal, boxes)
59
- - Bounding box information
60
- - Render parameters
61
-
62
- ### JSON Structure
63
-
64
- ```json
65
- {
66
- "version": "1.0",
67
- "game_type": "maze",
68
- "entities": {
69
- "player": {
70
- "pixel_pos": {"x": 165, "y": 45},
71
- "bbox": {"x": 150, "y": 30, "width": 30, "height": 30},
72
- "grid_pos": {"row": 1, "col": 5}
73
- },
74
- "goal": {
75
- "pixel_pos": {"x": 105, "y": 165},
76
- "bbox": {"x": 90, "y": 150, "width": 30, "height": 30},
77
- "grid_pos": {"row": 5, "col": 3}
78
- }
79
- },
80
- "grid": {
81
- "data": [[1,1,1,...], [1,0,0,...], ...],
82
- "height": 7,
83
- "width": 7
84
- },
85
- "render": {
86
- "cell_size": 30,
87
- "image_width": 210,
88
- "image_height": 210
89
- }
90
- }
91
- ```
92
-
93
- ### Metadata CSV
94
-
95
- Each subdirectory contains a `metadata.csv` file with columns:
96
- - `video`: Video filename
97
- - `prompt`: Associated text prompt (currently empty)
98
- - `input_image`: Initial state image filename
99
-
100
- ## Usage
101
-
102
- This dataset can be used for:
103
- - **Visual Planning**: Learning to plan sequences of actions from visual input
104
- - **Multimodal Learning**: Combining video, image, and structured data
105
- - **Reinforcement Learning**: Training agents on game environments
106
- - **Video Understanding**: Learning temporal patterns in sequential decision-making
107
-
108
- ## Dataset Statistics
109
-
110
- - **Total Games**: 5 game types
111
- - **Difficulty Levels**: 3 per game type
112
- - **Data Splits**: Training and test sets
113
- - **File Types**: Video (.mp4), Images (.png), Metadata (.json), Index (.csv)
114
-
115
- ## Citation
116
-
117
- If you use this dataset in your research, please cite:
118
-
119
- ```bibtex
120
- @dataset{vr_bench_2025,
121
- title={VR-Bench: A Multimodal Video Reasoning Benchmark},
122
- author={[Author Name]},
123
- year={2025},
124
- url={https://huggingface.co/datasets/[username]/VR-Bench}
125
- }
126
- ```
127
-
128
- ## License
129
-
130
- This dataset is released under the MIT License.
 
21
 
22
  # VR-Bench: A Multimodal Video Reasoning Benchmark
23
 
24
+ ## 🧠 Models
25
 
26
+ | Models | Download Links | Description |
27
+ |---------------------------|----------------|-----------------------------------------------------------------------------|
28
+ | MiniVeo3-Reasoner-Maze-5B | 🤗 [HuggingFace](https://huggingface.co/你的模型地址) | Fine-tuned LoRA for [Maze](https://example.com/maze-docs) tasks (3×3 to 6×6 sizes) from the base model [Wan2.2-TI2V-5B](https://huggingface.co/你的base模型地址) |