HY-Wan
/

Wan-R1

@@ -21,110 +21,8 @@ pipeline_tag: image-to-video
 # VR-Bench: A Multimodal Video Reasoning Benchmark
-## Dataset Description
-This is a multimodal dataset containing video demonstrations of game-playing scenarios across different game types including mazes, 3D mazes, Sokoban puzzles, and trap fields. The dataset is designed for training AI models on visual reasoning, planning, and sequential decision-making tasks.
-## Dataset Structure
-The dataset is organized into three main directories:
-- `train_data/`: Training data with subdirectories for each game type and difficulty level
-- `test_data/`: Test data with the same structure as training data
-- `test_data_merge/`: Merged test data organized by game type (without difficulty separation)
-### Game Types
-1. **Maze**: Classic 2D maze navigation
-2. **Irregular Maze**: Non-standard maze layouts
-3. **Maze3D**: Three-dimensional maze navigation
-4. **Sokoban**: Box-pushing puzzle game
-5. **Trapfield**: Navigation with obstacles and traps
-### Difficulty Levels
-Each game type has three difficulty levels:
-- `easy`: Simple layouts with shorter solution paths
-- `medium`: Moderate complexity
-- `hard`: Complex layouts requiring advanced planning
-## File Format
-Each data sample consists of:
-- **Video file** (`.mp4`): Demonstration of gameplay
-- **Image file** (`.png`): Initial state screenshot
-- **JSON file** (`.json`): Game state metadata including:
-  - Grid layout and dimensions
-  - Entity positions (player, goal, boxes)
-  - Bounding box information
-  - Render parameters
-### JSON Structure
-```json
-{
-  "version": "1.0",
-  "game_type": "maze",
-  "entities": {
-    "player": {
-      "pixel_pos": {"x": 165, "y": 45},
-      "bbox": {"x": 150, "y": 30, "width": 30, "height": 30},
-      "grid_pos": {"row": 1, "col": 5}
-    },
-    "goal": {
-      "pixel_pos": {"x": 105, "y": 165},
-      "bbox": {"x": 90, "y": 150, "width": 30, "height": 30},
-      "grid_pos": {"row": 5, "col": 3}
-    }
-  },
-  "grid": {
-    "data": [[1,1,1,...], [1,0,0,...], ...],
-    "height": 7,
-    "width": 7
-  },
-  "render": {
-    "cell_size": 30,
-    "image_width": 210,
-    "image_height": 210
-  }
-}
-```
-### Metadata CSV
-Each subdirectory contains a `metadata.csv` file with columns:
-- `video`: Video filename
-- `prompt`: Associated text prompt (currently empty)
-- `input_image`: Initial state image filename
-## Usage
-This dataset can be used for:
-- **Visual Planning**: Learning to plan sequences of actions from visual input
-- **Multimodal Learning**: Combining video, image, and structured data
-- **Reinforcement Learning**: Training agents on game environments
-- **Video Understanding**: Learning temporal patterns in sequential decision-making
-## Dataset Statistics
-- **Total Games**: 5 game types
-- **Difficulty Levels**: 3 per game type
-- **Data Splits**: Training and test sets
-- **File Types**: Video (.mp4), Images (.png), Metadata (.json), Index (.csv)
-## Citation
-If you use this dataset in your research, please cite:
-```bibtex
-@dataset{vr_bench_2025,
-  title={VR-Bench: A Multimodal Video Reasoning Benchmark},
-  author={[Author Name]},
-  year={2025},
-  url={https://huggingface.co/datasets/[username]/VR-Bench}
-}
-```
-## License
-This dataset is released under the MIT License.

 # VR-Bench: A Multimodal Video Reasoning Benchmark
+## 🧠 Models
+| Models                     | Download Links | Description                                                                 |
+|---------------------------|----------------|-----------------------------------------------------------------------------|
+| MiniVeo3-Reasoner-Maze-5B | 🤗 [HuggingFace](https://huggingface.co/你的模型地址) | Fine-tuned LoRA for [Maze](https://example.com/maze-docs) tasks (3×3 to 6×6 sizes) from the base model [Wan2.2-TI2V-5B](https://huggingface.co/你的base模型地址) |