create read-only mc eval app
Browse filesThis view is limited to 50 files because it contains too many changes. See raw diff
- .gitattributes +2 -0
- README.md +9 -7
- arena/.DS_Store +0 -0
- arena/README.md +100 -0
- arena/__init__.py +1 -0
- arena/actions.py +211 -0
- arena/app.py +437 -0
- arena/build_manifest.py +248 -0
- arena/dataset.py +125 -0
- arena/dataset_notes.md +72 -0
- arena/manifest.json +0 -0
- arena/result_logger.py +18 -0
- arena/results/.gitkeep +1 -0
- arena/results/annotations.jsonl +1 -0
- data_subset/.DS_Store +0 -0
- data_subset/1_wasd_only/01.jpg +3 -0
- data_subset/1_wasd_only/01.mp4 +3 -0
- data_subset/1_wasd_only/01_action.npy +3 -0
- data_subset/1_wasd_only/01_wangame.mp4 +3 -0
- data_subset/1_wasd_only/02.jpg +3 -0
- data_subset/1_wasd_only/02.mp4 +3 -0
- data_subset/1_wasd_only/02_action.npy +3 -0
- data_subset/1_wasd_only/02_wangame.mp4 +3 -0
- data_subset/1_wasd_only/03.jpg +3 -0
- data_subset/1_wasd_only/03.mp4 +3 -0
- data_subset/1_wasd_only/03_action.npy +3 -0
- data_subset/1_wasd_only/03_wangame.mp4 +3 -0
- data_subset/1_wasd_only/04.jpg +3 -0
- data_subset/1_wasd_only/04.mp4 +3 -0
- data_subset/1_wasd_only/04_action.npy +3 -0
- data_subset/1_wasd_only/04_wangame.mp4 +3 -0
- data_subset/1_wasd_only/05.jpg +3 -0
- data_subset/1_wasd_only/05.mp4 +3 -0
- data_subset/1_wasd_only/05_action.npy +3 -0
- data_subset/1_wasd_only/05_wangame.mp4 +3 -0
- data_subset/1_wasd_only/06.jpg +3 -0
- data_subset/1_wasd_only/06.mp4 +3 -0
- data_subset/1_wasd_only/06_action.npy +3 -0
- data_subset/1_wasd_only/06_wangame.mp4 +3 -0
- data_subset/1_wasd_only/07.jpg +3 -0
- data_subset/1_wasd_only/07.mp4 +3 -0
- data_subset/1_wasd_only/07_action.npy +3 -0
- data_subset/1_wasd_only/07_wangame.mp4 +3 -0
- data_subset/1_wasd_only/08.jpg +3 -0
- data_subset/1_wasd_only/08.mp4 +3 -0
- data_subset/1_wasd_only/08_action.npy +3 -0
- data_subset/1_wasd_only/08_wangame.mp4 +3 -0
- data_subset/1_wasd_only/09.jpg +3 -0
- data_subset/1_wasd_only/09.mp4 +3 -0
- data_subset/1_wasd_only/09_action.npy +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
*.jpg filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,12 +1,14 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 6.9.0
|
| 8 |
-
|
| 9 |
-
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Minecraft LM-Arena Baseline
|
| 3 |
+
emoji: 🎮
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 6.9.0
|
| 8 |
+
python_version: 3.10
|
| 9 |
+
app_file: arena/app.py
|
| 10 |
+
fullWidth: true
|
| 11 |
---
|
| 12 |
|
| 13 |
+
Minecraft LM-Arena baseline for paired Minecraft video
|
| 14 |
+
evaluation.
|
arena/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
arena/README.md
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Minecraft LM-Arena Baseline
|
| 2 |
+
|
| 3 |
+
This app is a small local Gradio baseline for reviewing paired Minecraft videos from `data_subset/`.
|
| 4 |
+
It follows the current dataset shape first and does not add physics or causality tags yet.
|
| 5 |
+
|
| 6 |
+
## Inferred dataset format
|
| 7 |
+
|
| 8 |
+
- Each scenario folder in `data_subset/` contains 10 cases: `01` through `10`.
|
| 9 |
+
- Each case is paired by exact case id inside one scenario folder.
|
| 10 |
+
- The pairing used here is:
|
| 11 |
+
- left: `{case_id}.mp4`
|
| 12 |
+
- right: `{case_id}_wangame.mp4`
|
| 13 |
+
- actions: `{case_id}_action.npy`
|
| 14 |
+
- preview still: `{case_id}.jpg`
|
| 15 |
+
- `ptlflow/run_all_eval.py` and `ptlflow/visualize_results.py` both treat `{id}.mp4` as the reference / GT video and `{id}_wangame.mp4` as the generated WanGame output. The app follows that same convention.
|
| 16 |
+
|
| 17 |
+
## App behavior
|
| 18 |
+
|
| 19 |
+
- Loads one paired sample at a time from `arena/manifest.json`.
|
| 20 |
+
- Shows reference video on the left and WanGame output on the right.
|
| 21 |
+
- Displays a formatted action summary derived from `*_action.npy`.
|
| 22 |
+
- Collects three votes:
|
| 23 |
+
- action following
|
| 24 |
+
- visual quality
|
| 25 |
+
- temporal consistency
|
| 26 |
+
- Each vote is `Left better`, `Right better`, or `Tie / unsure`.
|
| 27 |
+
- Includes a `Tie all / unsure` shortcut.
|
| 28 |
+
- Includes a manual `Flag artifact` flow:
|
| 29 |
+
- pause the player
|
| 30 |
+
- read the native video timestamp
|
| 31 |
+
- type seconds into the artifact field
|
| 32 |
+
- click `Flag artifact`
|
| 33 |
+
- Saves annotations to `arena/results/annotations.jsonl`.
|
| 34 |
+
|
| 35 |
+
## Files
|
| 36 |
+
|
| 37 |
+
- `app.py`: Gradio UI
|
| 38 |
+
- `build_manifest.py`: dataset scanner and manifest writer
|
| 39 |
+
- `dataset.py`: manifest loading and path resolution
|
| 40 |
+
- `actions.py`: action parsing and formatting
|
| 41 |
+
- `result_logger.py`: JSONL logging
|
| 42 |
+
|
| 43 |
+
## How to run
|
| 44 |
+
|
| 45 |
+
Install the minimal dependencies in your Python environment:
|
| 46 |
+
|
| 47 |
+
```bash
|
| 48 |
+
python -m pip install gradio numpy
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
Build or rebuild the manifest:
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
python arena/build_manifest.py
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
Run the app:
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
python arena/app.py
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
Optional flags:
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
python arena/app.py --rebuild-manifest --port 7861
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
Read-only mode for public demos:
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
python arena/app.py --disable-writes
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
Or with an environment variable:
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
ARENA_DISABLE_WRITES=1 python arena/app.py
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## Limitations and ambiguities
|
| 82 |
+
|
| 83 |
+
- The current dataset naturally supports a fixed reference-vs-WanGame A/B pair, not a blinded model-vs-model arena.
|
| 84 |
+
- `.jpg` files look like aligned preview stills, but none of the relevant `ptlflow` evaluation scripts consume them. The app surfaces them only as metadata.
|
| 85 |
+
- `*_action.npy` contains `keyboard` `(T, 6)` and `mouse` `(T, 2)` arrays. The keyboard order is inferred from `ptlflow/action_flow_score.py` as `[W, S, A, D, left, right]`, and mouse order as `[pitch, yaw]`.
|
| 86 |
+
- In this subset, the `left` and `right` keyboard channels exist in the format but appear unused.
|
| 87 |
+
- Gradio’s stock video components do not provide a reliable cross-player live timestamp callback, so artifact flagging uses a documented manual timestamp fallback.
|
| 88 |
+
- The two video players are independent and not synchronized.
|
| 89 |
+
- If you deploy to Hugging Face Spaces, free storage is ephemeral. Local JSONL annotations are fine for local runs, but not a durable collection backend for a public deployment.
|
| 90 |
+
|
| 91 |
+
## If physics / causality tags are added later
|
| 92 |
+
|
| 93 |
+
- Extend the JSONL schema in `result_logger.py` with new tag fields.
|
| 94 |
+
- Add new controls in `app.py`; the manifest format does not need to change for simple extra labels.
|
| 95 |
+
- If the future setup compares multiple generated videos instead of reference vs generated, change the manifest schema first so samples can carry arbitrary candidate lists instead of the current fixed left/right pair.
|
| 96 |
+
|
| 97 |
+
## Spaces note
|
| 98 |
+
|
| 99 |
+
- `app.py` reads `GRADIO_SERVER_NAME` and `GRADIO_SERVER_PORT`, so it is safe to run on Hugging Face Spaces.
|
| 100 |
+
- If you want the published app to be review-only for now, set `ARENA_DISABLE_WRITES=1`.
|
arena/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Baseline human-eval app for Minecraft video comparisons."""
|
arena/actions.py
ADDED
|
@@ -0,0 +1,211 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass, asdict
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
from typing import Any
|
| 6 |
+
|
| 7 |
+
import numpy as np
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
KEY_NAMES = ["W", "S", "A", "D", "left", "right"]
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
@dataclass(frozen=True)
|
| 14 |
+
class ActionSegment:
|
| 15 |
+
start_frame: int
|
| 16 |
+
end_frame: int
|
| 17 |
+
label: str
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
@dataclass(frozen=True)
|
| 21 |
+
class ActionSummary:
|
| 22 |
+
n_frames: int
|
| 23 |
+
fps: float | None
|
| 24 |
+
duration_s: float | None
|
| 25 |
+
used_keys: list[str]
|
| 26 |
+
mouse_pitch_values: list[float]
|
| 27 |
+
mouse_yaw_values: list[float]
|
| 28 |
+
control_mode: str
|
| 29 |
+
segments: list[ActionSegment]
|
| 30 |
+
markdown: str
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def load_action_file(action_path: str | Path) -> tuple[np.ndarray, np.ndarray]:
|
| 34 |
+
payload = np.load(Path(action_path), allow_pickle=True).item()
|
| 35 |
+
if not isinstance(payload, dict):
|
| 36 |
+
raise ValueError(f"Expected dict payload in {action_path}, found {type(payload)!r}")
|
| 37 |
+
if "keyboard" not in payload or "mouse" not in payload:
|
| 38 |
+
raise ValueError(f"Missing keyboard/mouse arrays in {action_path}")
|
| 39 |
+
|
| 40 |
+
keyboard = np.asarray(payload["keyboard"], dtype=np.float32)
|
| 41 |
+
mouse = np.asarray(payload["mouse"], dtype=np.float32)
|
| 42 |
+
if keyboard.ndim != 2 or keyboard.shape[1] != len(KEY_NAMES):
|
| 43 |
+
raise ValueError(f"Unexpected keyboard shape for {action_path}: {keyboard.shape}")
|
| 44 |
+
if mouse.ndim != 2 or mouse.shape[1] < 2:
|
| 45 |
+
raise ValueError(f"Unexpected mouse shape for {action_path}: {mouse.shape}")
|
| 46 |
+
if keyboard.shape[0] != mouse.shape[0]:
|
| 47 |
+
raise ValueError(
|
| 48 |
+
f"Keyboard/mouse length mismatch for {action_path}: "
|
| 49 |
+
f"{keyboard.shape[0]} vs {mouse.shape[0]}"
|
| 50 |
+
)
|
| 51 |
+
|
| 52 |
+
return keyboard, mouse[:, :2]
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def build_action_summary(
|
| 56 |
+
action_path: str | Path,
|
| 57 |
+
fps: float | None = None,
|
| 58 |
+
max_segments: int = 18,
|
| 59 |
+
) -> ActionSummary:
|
| 60 |
+
keyboard, mouse = load_action_file(action_path)
|
| 61 |
+
n_frames = int(keyboard.shape[0])
|
| 62 |
+
duration_s = (n_frames / fps) if fps else None
|
| 63 |
+
|
| 64 |
+
used_keys = [KEY_NAMES[i] for i in range(len(KEY_NAMES)) if np.any(keyboard[:, i] > 0.5)]
|
| 65 |
+
mouse_pitch_values = _rounded_unique(mouse[:, 0])
|
| 66 |
+
mouse_yaw_values = _rounded_unique(mouse[:, 1])
|
| 67 |
+
control_mode = _infer_control_mode(keyboard, mouse)
|
| 68 |
+
segments = _collapse_segments(keyboard, mouse)
|
| 69 |
+
markdown = _format_markdown(
|
| 70 |
+
n_frames=n_frames,
|
| 71 |
+
fps=fps,
|
| 72 |
+
duration_s=duration_s,
|
| 73 |
+
used_keys=used_keys,
|
| 74 |
+
mouse_pitch_values=mouse_pitch_values,
|
| 75 |
+
mouse_yaw_values=mouse_yaw_values,
|
| 76 |
+
control_mode=control_mode,
|
| 77 |
+
segments=segments,
|
| 78 |
+
max_segments=max_segments,
|
| 79 |
+
)
|
| 80 |
+
|
| 81 |
+
return ActionSummary(
|
| 82 |
+
n_frames=n_frames,
|
| 83 |
+
fps=fps,
|
| 84 |
+
duration_s=duration_s,
|
| 85 |
+
used_keys=used_keys,
|
| 86 |
+
mouse_pitch_values=mouse_pitch_values,
|
| 87 |
+
mouse_yaw_values=mouse_yaw_values,
|
| 88 |
+
control_mode=control_mode,
|
| 89 |
+
segments=segments,
|
| 90 |
+
markdown=markdown,
|
| 91 |
+
)
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
def summary_to_manifest_dict(summary: ActionSummary) -> dict[str, Any]:
|
| 95 |
+
return {
|
| 96 |
+
"n_frames": summary.n_frames,
|
| 97 |
+
"fps": summary.fps,
|
| 98 |
+
"duration_s": summary.duration_s,
|
| 99 |
+
"used_keys": summary.used_keys,
|
| 100 |
+
"mouse_pitch_values": summary.mouse_pitch_values,
|
| 101 |
+
"mouse_yaw_values": summary.mouse_yaw_values,
|
| 102 |
+
"control_mode": summary.control_mode,
|
| 103 |
+
"segments": [asdict(segment) for segment in summary.segments],
|
| 104 |
+
"markdown": summary.markdown,
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
def _rounded_unique(values: np.ndarray) -> list[float]:
|
| 109 |
+
rounded = {round(float(value), 3) for value in values.tolist()}
|
| 110 |
+
return sorted(rounded)
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
def _infer_control_mode(keyboard: np.ndarray, mouse: np.ndarray) -> str:
|
| 114 |
+
has_keyboard = bool(np.any(keyboard > 0.5))
|
| 115 |
+
has_mouse = bool(np.any(np.abs(mouse) > 1e-6))
|
| 116 |
+
if has_keyboard and has_mouse:
|
| 117 |
+
return "keyboard + camera"
|
| 118 |
+
if has_keyboard:
|
| 119 |
+
return "keyboard-only"
|
| 120 |
+
if has_mouse:
|
| 121 |
+
return "camera-only"
|
| 122 |
+
return "idle / unclear"
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
def _collapse_segments(keyboard: np.ndarray, mouse: np.ndarray) -> list[ActionSegment]:
|
| 126 |
+
if keyboard.shape[0] == 0:
|
| 127 |
+
return []
|
| 128 |
+
|
| 129 |
+
labels = [_describe_step(keyboard[idx], mouse[idx]) for idx in range(keyboard.shape[0])]
|
| 130 |
+
segments: list[ActionSegment] = []
|
| 131 |
+
start = 0
|
| 132 |
+
current = labels[0]
|
| 133 |
+
for idx in range(1, len(labels)):
|
| 134 |
+
if labels[idx] != current:
|
| 135 |
+
segments.append(ActionSegment(start_frame=start, end_frame=idx - 1, label=current))
|
| 136 |
+
start = idx
|
| 137 |
+
current = labels[idx]
|
| 138 |
+
segments.append(ActionSegment(start_frame=start, end_frame=len(labels) - 1, label=current))
|
| 139 |
+
return segments
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
def _describe_step(keyboard_row: np.ndarray, mouse_row: np.ndarray) -> str:
|
| 143 |
+
pressed_keys = [KEY_NAMES[idx] for idx, value in enumerate(keyboard_row) if value > 0.5]
|
| 144 |
+
pitch = float(mouse_row[0]) if len(mouse_row) >= 1 else 0.0
|
| 145 |
+
yaw = float(mouse_row[1]) if len(mouse_row) >= 2 else 0.0
|
| 146 |
+
has_mouse = abs(pitch) > 1e-6 or abs(yaw) > 1e-6
|
| 147 |
+
|
| 148 |
+
key_label = "+".join(pressed_keys) if pressed_keys else ""
|
| 149 |
+
mouse_label = ""
|
| 150 |
+
if has_mouse:
|
| 151 |
+
mouse_label = f"mouse(pitch={pitch:+.1f}, yaw={yaw:+.1f})"
|
| 152 |
+
|
| 153 |
+
if key_label and mouse_label:
|
| 154 |
+
return f"{key_label} + {mouse_label}"
|
| 155 |
+
if key_label:
|
| 156 |
+
return key_label
|
| 157 |
+
if mouse_label:
|
| 158 |
+
return mouse_label
|
| 159 |
+
return "idle"
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
def _format_markdown(
|
| 163 |
+
n_frames: int,
|
| 164 |
+
fps: float | None,
|
| 165 |
+
duration_s: float | None,
|
| 166 |
+
used_keys: list[str],
|
| 167 |
+
mouse_pitch_values: list[float],
|
| 168 |
+
mouse_yaw_values: list[float],
|
| 169 |
+
control_mode: str,
|
| 170 |
+
segments: list[ActionSegment],
|
| 171 |
+
max_segments: int,
|
| 172 |
+
) -> str:
|
| 173 |
+
timing_bits = [f"{n_frames} action steps"]
|
| 174 |
+
if fps:
|
| 175 |
+
timing_bits.append(f"{fps:.2f} FPS")
|
| 176 |
+
if duration_s is not None:
|
| 177 |
+
timing_bits.append(f"~{duration_s:.2f}s")
|
| 178 |
+
|
| 179 |
+
lines = [
|
| 180 |
+
f"**Action summary:** {' | '.join(timing_bits)}",
|
| 181 |
+
f"**Inferred control mode:** {control_mode}",
|
| 182 |
+
f"**Keys used:** {', '.join(used_keys) if used_keys else 'none'}",
|
| 183 |
+
(
|
| 184 |
+
"**Mouse values:** "
|
| 185 |
+
f"pitch={_format_values(mouse_pitch_values)} | "
|
| 186 |
+
f"yaw={_format_values(mouse_yaw_values)}"
|
| 187 |
+
),
|
| 188 |
+
"",
|
| 189 |
+
"**Timeline**",
|
| 190 |
+
]
|
| 191 |
+
|
| 192 |
+
for segment in segments[:max_segments]:
|
| 193 |
+
if fps:
|
| 194 |
+
start_s = segment.start_frame / fps
|
| 195 |
+
end_s = (segment.end_frame + 1) / fps
|
| 196 |
+
prefix = f"`{start_s:.2f}s-{end_s:.2f}s`"
|
| 197 |
+
else:
|
| 198 |
+
prefix = f"`frames {segment.start_frame}-{segment.end_frame}`"
|
| 199 |
+
lines.append(f"- {prefix}: {segment.label}")
|
| 200 |
+
|
| 201 |
+
remaining = len(segments) - max_segments
|
| 202 |
+
if remaining > 0:
|
| 203 |
+
lines.append(f"- ... {remaining} more segments omitted for readability")
|
| 204 |
+
|
| 205 |
+
return "\n".join(lines)
|
| 206 |
+
|
| 207 |
+
|
| 208 |
+
def _format_values(values: list[float]) -> str:
|
| 209 |
+
if not values:
|
| 210 |
+
return "[]"
|
| 211 |
+
return "[" + ", ".join(f"{value:+.1f}" for value in values) + "]"
|
arena/app.py
ADDED
|
@@ -0,0 +1,437 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import argparse
|
| 4 |
+
import os
|
| 5 |
+
from datetime import datetime, timezone
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
from typing import Any
|
| 8 |
+
|
| 9 |
+
import gradio as gr
|
| 10 |
+
|
| 11 |
+
try:
|
| 12 |
+
from .dataset import DatasetManifest, Sample, ensure_manifest, load_manifest
|
| 13 |
+
from .result_logger import append_annotation
|
| 14 |
+
except ImportError:
|
| 15 |
+
from dataset import DatasetManifest, Sample, ensure_manifest, load_manifest
|
| 16 |
+
from result_logger import append_annotation
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
VOTE_CHOICES = ["Left better", "Tie / unsure", "Right better"]
|
| 20 |
+
FLAG_HELP = (
|
| 21 |
+
"No artifact flags recorded yet. Pause a player, read the native timestamp, "
|
| 22 |
+
"type it below, and click `Flag artifact`."
|
| 23 |
+
)
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def build_app(
|
| 27 |
+
manifest: DatasetManifest,
|
| 28 |
+
results_dir: Path,
|
| 29 |
+
writes_enabled: bool = True,
|
| 30 |
+
) -> gr.Blocks:
|
| 31 |
+
if not manifest.samples:
|
| 32 |
+
raise ValueError("Manifest contains no samples.")
|
| 33 |
+
|
| 34 |
+
first_sample = manifest.samples[0]
|
| 35 |
+
first_title = _sample_title(first_sample, 0, len(manifest.samples))
|
| 36 |
+
first_metadata = _sample_metadata(first_sample)
|
| 37 |
+
first_status = _status_message(
|
| 38 |
+
f"Loaded `{first_sample.sample_id}`. Save an annotation, then move to the next sample."
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
with gr.Blocks(title="Minecraft LM-Arena Baseline") as demo:
|
| 42 |
+
current_index = gr.State(0)
|
| 43 |
+
artifact_flags = gr.State([])
|
| 44 |
+
|
| 45 |
+
gr.Markdown("# Minecraft LM-Arena Baseline")
|
| 46 |
+
gr.Markdown(
|
| 47 |
+
"Left is the reference `.mp4`; right is the paired WanGame `_wangame.mp4` output. "
|
| 48 |
+
"Players are independent. The artifact button uses a manual timestamp fallback because "
|
| 49 |
+
"plain Gradio does not reliably expose live `currentTime` from both video widgets."
|
| 50 |
+
)
|
| 51 |
+
if not writes_enabled:
|
| 52 |
+
gr.Markdown(
|
| 53 |
+
"**Read-only mode:** annotation writes are disabled. "
|
| 54 |
+
"Use this for public demo review until the final eval schema is settled."
|
| 55 |
+
)
|
| 56 |
+
|
| 57 |
+
sample_title = gr.Markdown(first_title)
|
| 58 |
+
sample_metadata = gr.Markdown(first_metadata)
|
| 59 |
+
|
| 60 |
+
with gr.Row():
|
| 61 |
+
left_video = gr.Video(
|
| 62 |
+
value=str(first_sample.reference_video),
|
| 63 |
+
label=first_sample.left_label,
|
| 64 |
+
)
|
| 65 |
+
right_video = gr.Video(
|
| 66 |
+
value=str(first_sample.generated_video),
|
| 67 |
+
label=first_sample.right_label,
|
| 68 |
+
)
|
| 69 |
+
|
| 70 |
+
action_markdown = gr.Markdown(first_sample.action_markdown)
|
| 71 |
+
|
| 72 |
+
with gr.Row():
|
| 73 |
+
action_following = gr.Radio(
|
| 74 |
+
choices=VOTE_CHOICES,
|
| 75 |
+
label="Action following",
|
| 76 |
+
)
|
| 77 |
+
visual_quality = gr.Radio(
|
| 78 |
+
choices=VOTE_CHOICES,
|
| 79 |
+
label="Visual quality",
|
| 80 |
+
)
|
| 81 |
+
temporal_consistency = gr.Radio(
|
| 82 |
+
choices=VOTE_CHOICES,
|
| 83 |
+
label="Temporal consistency",
|
| 84 |
+
)
|
| 85 |
+
|
| 86 |
+
tie_all = gr.Button("Tie all / unsure")
|
| 87 |
+
tie_all.click(
|
| 88 |
+
fn=lambda: ("Tie / unsure", "Tie / unsure", "Tie / unsure"),
|
| 89 |
+
outputs=[action_following, visual_quality, temporal_consistency],
|
| 90 |
+
)
|
| 91 |
+
|
| 92 |
+
gr.Markdown(
|
| 93 |
+
"Artifact flagging fallback: enter the paused player time in seconds, then record it."
|
| 94 |
+
)
|
| 95 |
+
with gr.Row():
|
| 96 |
+
artifact_time_input = gr.Textbox(
|
| 97 |
+
label="Artifact timestamp (seconds)",
|
| 98 |
+
placeholder="Example: 1.24",
|
| 99 |
+
)
|
| 100 |
+
flag_artifact = gr.Button("Flag artifact")
|
| 101 |
+
clear_artifacts = gr.Button("Clear artifact flags")
|
| 102 |
+
|
| 103 |
+
artifact_markdown = gr.Markdown(FLAG_HELP)
|
| 104 |
+
note = gr.Textbox(lines=3, label="Optional note")
|
| 105 |
+
|
| 106 |
+
with gr.Row():
|
| 107 |
+
save_button = gr.Button(
|
| 108 |
+
"Save annotation",
|
| 109 |
+
variant="primary",
|
| 110 |
+
interactive=writes_enabled,
|
| 111 |
+
)
|
| 112 |
+
prev_button = gr.Button("Previous sample")
|
| 113 |
+
next_button = gr.Button("Next sample")
|
| 114 |
+
|
| 115 |
+
status = gr.Markdown(first_status)
|
| 116 |
+
|
| 117 |
+
flag_artifact.click(
|
| 118 |
+
fn=record_artifact_flag,
|
| 119 |
+
inputs=[artifact_time_input, artifact_flags],
|
| 120 |
+
outputs=[artifact_flags, artifact_markdown, artifact_time_input, status],
|
| 121 |
+
)
|
| 122 |
+
clear_artifacts.click(
|
| 123 |
+
fn=lambda: ([], FLAG_HELP, "", _status_message("Cleared artifact flags.")),
|
| 124 |
+
outputs=[artifact_flags, artifact_markdown, artifact_time_input, status],
|
| 125 |
+
)
|
| 126 |
+
save_button.click(
|
| 127 |
+
fn=lambda index, flags, action_vote, visual_vote, temporal_vote, note_text: save_annotation(
|
| 128 |
+
manifest=manifest,
|
| 129 |
+
results_dir=results_dir,
|
| 130 |
+
sample_index=index,
|
| 131 |
+
flags=flags,
|
| 132 |
+
action_vote=action_vote,
|
| 133 |
+
visual_vote=visual_vote,
|
| 134 |
+
temporal_vote=temporal_vote,
|
| 135 |
+
note_text=note_text,
|
| 136 |
+
writes_enabled=writes_enabled,
|
| 137 |
+
),
|
| 138 |
+
inputs=[
|
| 139 |
+
current_index,
|
| 140 |
+
artifact_flags,
|
| 141 |
+
action_following,
|
| 142 |
+
visual_quality,
|
| 143 |
+
temporal_consistency,
|
| 144 |
+
note,
|
| 145 |
+
],
|
| 146 |
+
outputs=[status],
|
| 147 |
+
)
|
| 148 |
+
prev_button.click(
|
| 149 |
+
fn=lambda index: navigate_sample(manifest, index - 1),
|
| 150 |
+
inputs=[current_index],
|
| 151 |
+
outputs=_sample_outputs(
|
| 152 |
+
sample_title,
|
| 153 |
+
sample_metadata,
|
| 154 |
+
left_video,
|
| 155 |
+
right_video,
|
| 156 |
+
action_markdown,
|
| 157 |
+
action_following,
|
| 158 |
+
visual_quality,
|
| 159 |
+
temporal_consistency,
|
| 160 |
+
artifact_time_input,
|
| 161 |
+
artifact_markdown,
|
| 162 |
+
note,
|
| 163 |
+
status,
|
| 164 |
+
current_index,
|
| 165 |
+
artifact_flags,
|
| 166 |
+
),
|
| 167 |
+
)
|
| 168 |
+
next_button.click(
|
| 169 |
+
fn=lambda index: navigate_sample(manifest, index + 1),
|
| 170 |
+
inputs=[current_index],
|
| 171 |
+
outputs=_sample_outputs(
|
| 172 |
+
sample_title,
|
| 173 |
+
sample_metadata,
|
| 174 |
+
left_video,
|
| 175 |
+
right_video,
|
| 176 |
+
action_markdown,
|
| 177 |
+
action_following,
|
| 178 |
+
visual_quality,
|
| 179 |
+
temporal_consistency,
|
| 180 |
+
artifact_time_input,
|
| 181 |
+
artifact_markdown,
|
| 182 |
+
note,
|
| 183 |
+
status,
|
| 184 |
+
current_index,
|
| 185 |
+
artifact_flags,
|
| 186 |
+
),
|
| 187 |
+
)
|
| 188 |
+
|
| 189 |
+
return demo
|
| 190 |
+
|
| 191 |
+
|
| 192 |
+
def navigate_sample(manifest: DatasetManifest, requested_index: int) -> tuple[Any, ...]:
|
| 193 |
+
sample_count = len(manifest.samples)
|
| 194 |
+
sample_index = max(0, min(requested_index, sample_count - 1))
|
| 195 |
+
sample = manifest.samples[sample_index]
|
| 196 |
+
status = _status_message(f"Loaded `{sample.sample_id}`.")
|
| 197 |
+
return (
|
| 198 |
+
_sample_title(sample, sample_index, sample_count),
|
| 199 |
+
_sample_metadata(sample),
|
| 200 |
+
str(sample.reference_video),
|
| 201 |
+
str(sample.generated_video),
|
| 202 |
+
sample.action_markdown,
|
| 203 |
+
None,
|
| 204 |
+
None,
|
| 205 |
+
None,
|
| 206 |
+
"",
|
| 207 |
+
FLAG_HELP,
|
| 208 |
+
"",
|
| 209 |
+
status,
|
| 210 |
+
sample_index,
|
| 211 |
+
[],
|
| 212 |
+
)
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
def record_artifact_flag(
|
| 216 |
+
artifact_time_text: str,
|
| 217 |
+
existing_flags: list[dict[str, Any]] | None,
|
| 218 |
+
) -> tuple[list[dict[str, Any]], str, str, str]:
|
| 219 |
+
existing_flags = list(existing_flags or [])
|
| 220 |
+
try:
|
| 221 |
+
timestamp_s = round(float(artifact_time_text.strip()), 3)
|
| 222 |
+
except (AttributeError, ValueError):
|
| 223 |
+
return (
|
| 224 |
+
existing_flags,
|
| 225 |
+
_artifact_markdown(existing_flags),
|
| 226 |
+
artifact_time_text,
|
| 227 |
+
_status_message("Enter a numeric timestamp before flagging an artifact."),
|
| 228 |
+
)
|
| 229 |
+
|
| 230 |
+
if timestamp_s < 0:
|
| 231 |
+
return (
|
| 232 |
+
existing_flags,
|
| 233 |
+
_artifact_markdown(existing_flags),
|
| 234 |
+
artifact_time_text,
|
| 235 |
+
_status_message("Artifact timestamps must be zero or positive."),
|
| 236 |
+
)
|
| 237 |
+
|
| 238 |
+
existing_flags.append(
|
| 239 |
+
{
|
| 240 |
+
"timestamp_s": timestamp_s,
|
| 241 |
+
"source": "manual_text_entry",
|
| 242 |
+
"recorded_at": _utc_now(),
|
| 243 |
+
}
|
| 244 |
+
)
|
| 245 |
+
return (
|
| 246 |
+
existing_flags,
|
| 247 |
+
_artifact_markdown(existing_flags),
|
| 248 |
+
"",
|
| 249 |
+
_status_message(f"Flagged artifact at {timestamp_s:.3f}s."),
|
| 250 |
+
)
|
| 251 |
+
|
| 252 |
+
|
| 253 |
+
def save_annotation(
|
| 254 |
+
manifest: DatasetManifest,
|
| 255 |
+
results_dir: Path,
|
| 256 |
+
sample_index: int,
|
| 257 |
+
flags: list[dict[str, Any]] | None,
|
| 258 |
+
action_vote: str | None,
|
| 259 |
+
visual_vote: str | None,
|
| 260 |
+
temporal_vote: str | None,
|
| 261 |
+
note_text: str,
|
| 262 |
+
writes_enabled: bool,
|
| 263 |
+
) -> str:
|
| 264 |
+
if not writes_enabled:
|
| 265 |
+
return _status_message(
|
| 266 |
+
"Annotation writes are disabled in this deployment. "
|
| 267 |
+
"Set `ARENA_DISABLE_WRITES=0` or omit `--disable-writes` to enable saving."
|
| 268 |
+
)
|
| 269 |
+
|
| 270 |
+
missing = [
|
| 271 |
+
label
|
| 272 |
+
for label, value in (
|
| 273 |
+
("action following", action_vote),
|
| 274 |
+
("visual quality", visual_vote),
|
| 275 |
+
("temporal consistency", temporal_vote),
|
| 276 |
+
)
|
| 277 |
+
if not value
|
| 278 |
+
]
|
| 279 |
+
if missing:
|
| 280 |
+
return _status_message(f"Select votes for: {', '.join(missing)}.")
|
| 281 |
+
|
| 282 |
+
sample = manifest.samples[sample_index]
|
| 283 |
+
flags = list(flags or [])
|
| 284 |
+
record = {
|
| 285 |
+
"annotated_at": _utc_now(),
|
| 286 |
+
"sample_id": sample.sample_id,
|
| 287 |
+
"scenario": sample.scenario,
|
| 288 |
+
"case_id": sample.case_id,
|
| 289 |
+
"pair_mode": sample.pair_mode,
|
| 290 |
+
"left_label": sample.left_label,
|
| 291 |
+
"right_label": sample.right_label,
|
| 292 |
+
"reference_video": sample.reference_video_relative,
|
| 293 |
+
"generated_video": sample.generated_video_relative,
|
| 294 |
+
"preview_image": sample.preview_image_relative,
|
| 295 |
+
"action_path": sample.action_path_relative,
|
| 296 |
+
"votes": {
|
| 297 |
+
"action_following": action_vote,
|
| 298 |
+
"visual_quality": visual_vote,
|
| 299 |
+
"temporal_consistency": temporal_vote,
|
| 300 |
+
},
|
| 301 |
+
"artifact_flags": flags,
|
| 302 |
+
"artifact_latest_s": flags[-1]["timestamp_s"] if flags else None,
|
| 303 |
+
"note": note_text.strip(),
|
| 304 |
+
}
|
| 305 |
+
output_path = append_annotation(results_dir=results_dir, record=record)
|
| 306 |
+
return _status_message(
|
| 307 |
+
f"Saved `{sample.sample_id}` to `{_display_path(output_path)}`. "
|
| 308 |
+
"Use Next sample to continue."
|
| 309 |
+
)
|
| 310 |
+
|
| 311 |
+
|
| 312 |
+
def _sample_outputs(*components: Any) -> list[Any]:
|
| 313 |
+
return list(components)
|
| 314 |
+
|
| 315 |
+
|
| 316 |
+
def _sample_title(sample: Sample, sample_index: int, sample_count: int) -> str:
|
| 317 |
+
return (
|
| 318 |
+
f"## Sample {sample_index + 1} / {sample_count}\n"
|
| 319 |
+
f"`{sample.sample_id}`"
|
| 320 |
+
)
|
| 321 |
+
|
| 322 |
+
|
| 323 |
+
def _sample_metadata(sample: Sample) -> str:
|
| 324 |
+
reference_meta = sample.reference_video_meta or {}
|
| 325 |
+
generated_meta = sample.generated_video_meta or {}
|
| 326 |
+
width = generated_meta.get("width") or reference_meta.get("width")
|
| 327 |
+
height = generated_meta.get("height") or reference_meta.get("height")
|
| 328 |
+
fps = generated_meta.get("fps") or reference_meta.get("fps")
|
| 329 |
+
duration_s = generated_meta.get("duration_s") or reference_meta.get("duration_s")
|
| 330 |
+
control_mode = sample.action_summary.get("control_mode", "unknown")
|
| 331 |
+
|
| 332 |
+
parts = [
|
| 333 |
+
f"**Scenario:** `{sample.scenario}`",
|
| 334 |
+
f"**Case ID:** `{sample.case_id}`",
|
| 335 |
+
f"**Pairing:** left=`{sample.reference_video_relative}` | right=`{sample.generated_video_relative}`",
|
| 336 |
+
f"**Action file:** `{sample.action_path_relative}`",
|
| 337 |
+
f"**Preview still:** `{sample.preview_image_relative or 'missing'}`",
|
| 338 |
+
f"**Inferred control regime:** {control_mode}",
|
| 339 |
+
]
|
| 340 |
+
|
| 341 |
+
if width and height:
|
| 342 |
+
parts.append(f"**Resolution:** {width}x{height}")
|
| 343 |
+
if fps:
|
| 344 |
+
parts.append(f"**FPS:** {fps:.2f}")
|
| 345 |
+
if duration_s:
|
| 346 |
+
parts.append(f"**Duration:** {duration_s:.2f}s")
|
| 347 |
+
|
| 348 |
+
return " | ".join(parts)
|
| 349 |
+
|
| 350 |
+
|
| 351 |
+
def _artifact_markdown(flags: list[dict[str, Any]]) -> str:
|
| 352 |
+
if not flags:
|
| 353 |
+
return FLAG_HELP
|
| 354 |
+
|
| 355 |
+
lines = ["**Flagged artifact times**"]
|
| 356 |
+
for index, flag in enumerate(flags, start=1):
|
| 357 |
+
lines.append(f"- {index}. `{flag['timestamp_s']:.3f}s` via {flag['source']}")
|
| 358 |
+
return "\n".join(lines)
|
| 359 |
+
|
| 360 |
+
|
| 361 |
+
def _status_message(message: str) -> str:
|
| 362 |
+
return f"**Status:** {message}"
|
| 363 |
+
|
| 364 |
+
|
| 365 |
+
def _display_path(path: Path) -> str:
|
| 366 |
+
try:
|
| 367 |
+
return path.relative_to(Path.cwd()).as_posix()
|
| 368 |
+
except ValueError:
|
| 369 |
+
return str(path)
|
| 370 |
+
|
| 371 |
+
|
| 372 |
+
def _utc_now() -> str:
|
| 373 |
+
return datetime.now(timezone.utc).isoformat()
|
| 374 |
+
|
| 375 |
+
|
| 376 |
+
def parse_args() -> argparse.Namespace:
|
| 377 |
+
repo_root = Path(__file__).resolve().parents[1]
|
| 378 |
+
parser = argparse.ArgumentParser(description="Run the Minecraft LM-Arena baseline app.")
|
| 379 |
+
parser.add_argument(
|
| 380 |
+
"--manifest",
|
| 381 |
+
type=Path,
|
| 382 |
+
default=repo_root / "arena" / "manifest.json",
|
| 383 |
+
help="Path to the normalized manifest JSON file.",
|
| 384 |
+
)
|
| 385 |
+
parser.add_argument(
|
| 386 |
+
"--results-dir",
|
| 387 |
+
type=Path,
|
| 388 |
+
default=repo_root / "arena" / "results",
|
| 389 |
+
help="Directory for JSONL annotation logs.",
|
| 390 |
+
)
|
| 391 |
+
parser.add_argument(
|
| 392 |
+
"--rebuild-manifest",
|
| 393 |
+
action="store_true",
|
| 394 |
+
help="Re-scan data_subset and rebuild the manifest before launch.",
|
| 395 |
+
)
|
| 396 |
+
parser.add_argument(
|
| 397 |
+
"--host",
|
| 398 |
+
type=str,
|
| 399 |
+
default=os.getenv("GRADIO_SERVER_NAME", "0.0.0.0"),
|
| 400 |
+
help="Host interface for Gradio.",
|
| 401 |
+
)
|
| 402 |
+
parser.add_argument(
|
| 403 |
+
"--port",
|
| 404 |
+
type=int,
|
| 405 |
+
default=int(os.getenv("GRADIO_SERVER_PORT", "7860")),
|
| 406 |
+
help="Port for Gradio.",
|
| 407 |
+
)
|
| 408 |
+
parser.add_argument(
|
| 409 |
+
"--disable-writes",
|
| 410 |
+
action="store_true",
|
| 411 |
+
default=_env_flag("ARENA_DISABLE_WRITES", False),
|
| 412 |
+
help="Disable writing annotations to disk.",
|
| 413 |
+
)
|
| 414 |
+
return parser.parse_args()
|
| 415 |
+
|
| 416 |
+
|
| 417 |
+
def main() -> None:
|
| 418 |
+
args = parse_args()
|
| 419 |
+
manifest_path = ensure_manifest(manifest_path=args.manifest, rebuild=args.rebuild_manifest)
|
| 420 |
+
manifest = load_manifest(manifest_path)
|
| 421 |
+
demo = build_app(
|
| 422 |
+
manifest=manifest,
|
| 423 |
+
results_dir=args.results_dir,
|
| 424 |
+
writes_enabled=not args.disable_writes,
|
| 425 |
+
)
|
| 426 |
+
demo.launch(server_name=args.host, server_port=args.port)
|
| 427 |
+
|
| 428 |
+
|
| 429 |
+
def _env_flag(name: str, default: bool) -> bool:
|
| 430 |
+
raw = os.getenv(name)
|
| 431 |
+
if raw is None:
|
| 432 |
+
return default
|
| 433 |
+
return raw.strip().lower() in {"1", "true", "yes", "on"}
|
| 434 |
+
|
| 435 |
+
|
| 436 |
+
if __name__ == "__main__":
|
| 437 |
+
main()
|
arena/build_manifest.py
ADDED
|
@@ -0,0 +1,248 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import argparse
|
| 4 |
+
import json
|
| 5 |
+
import subprocess
|
| 6 |
+
from datetime import datetime, timezone
|
| 7 |
+
from fractions import Fraction
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
from typing import Any
|
| 10 |
+
|
| 11 |
+
try:
|
| 12 |
+
from .actions import build_action_summary, summary_to_manifest_dict
|
| 13 |
+
except ImportError:
|
| 14 |
+
from actions import build_action_summary, summary_to_manifest_dict
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
GENERATED_SUFFIX = "_wangame.mp4"
|
| 18 |
+
ACTION_SUFFIX = "_action.npy"
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def build_manifest(dataset_root: Path, repo_root: Path | None = None) -> dict[str, Any]:
|
| 22 |
+
repo_root = repo_root or Path(__file__).resolve().parents[1]
|
| 23 |
+
dataset_root = dataset_root.resolve()
|
| 24 |
+
|
| 25 |
+
samples: list[dict[str, Any]] = []
|
| 26 |
+
warnings: list[str] = []
|
| 27 |
+
scenario_summaries: list[dict[str, Any]] = []
|
| 28 |
+
|
| 29 |
+
for scenario_dir in sorted(path for path in dataset_root.iterdir() if path.is_dir()):
|
| 30 |
+
indexed_cases = _index_scenario_cases(scenario_dir)
|
| 31 |
+
valid_case_ids: list[str] = []
|
| 32 |
+
|
| 33 |
+
for case_id in sorted(indexed_cases):
|
| 34 |
+
entry = indexed_cases[case_id]
|
| 35 |
+
missing = [
|
| 36 |
+
field
|
| 37 |
+
for field in ("reference_video", "generated_video", "action_path")
|
| 38 |
+
if field not in entry
|
| 39 |
+
]
|
| 40 |
+
if missing:
|
| 41 |
+
warnings.append(
|
| 42 |
+
f"Skipping {scenario_dir.name}/{case_id}: missing {', '.join(sorted(missing))}"
|
| 43 |
+
)
|
| 44 |
+
continue
|
| 45 |
+
|
| 46 |
+
reference_video = entry["reference_video"]
|
| 47 |
+
generated_video = entry["generated_video"]
|
| 48 |
+
action_path = entry["action_path"]
|
| 49 |
+
preview_image = entry.get("preview_image")
|
| 50 |
+
|
| 51 |
+
reference_meta = probe_video(reference_video)
|
| 52 |
+
generated_meta = probe_video(generated_video)
|
| 53 |
+
fps = (
|
| 54 |
+
generated_meta.get("fps")
|
| 55 |
+
or reference_meta.get("fps")
|
| 56 |
+
or generated_meta.get("avg_frame_rate")
|
| 57 |
+
or reference_meta.get("avg_frame_rate")
|
| 58 |
+
)
|
| 59 |
+
action_summary = build_action_summary(action_path, fps=fps)
|
| 60 |
+
|
| 61 |
+
sample = {
|
| 62 |
+
"sample_id": f"{scenario_dir.name}/{case_id}",
|
| 63 |
+
"scenario": scenario_dir.name,
|
| 64 |
+
"case_id": case_id,
|
| 65 |
+
"pair_mode": "reference_vs_wangame",
|
| 66 |
+
"left_label": "Reference (.mp4)",
|
| 67 |
+
"right_label": "Generated (WanGame)",
|
| 68 |
+
"reference_video": _path_for_manifest(reference_video, repo_root),
|
| 69 |
+
"generated_video": _path_for_manifest(generated_video, repo_root),
|
| 70 |
+
"preview_image": _path_for_manifest(preview_image, repo_root) if preview_image else None,
|
| 71 |
+
"action_path": _path_for_manifest(action_path, repo_root),
|
| 72 |
+
"reference_video_meta": reference_meta,
|
| 73 |
+
"generated_video_meta": generated_meta,
|
| 74 |
+
"action_summary": summary_to_manifest_dict(action_summary),
|
| 75 |
+
"action_markdown": action_summary.markdown,
|
| 76 |
+
}
|
| 77 |
+
samples.append(sample)
|
| 78 |
+
valid_case_ids.append(case_id)
|
| 79 |
+
|
| 80 |
+
scenario_summaries.append(
|
| 81 |
+
{
|
| 82 |
+
"scenario": scenario_dir.name,
|
| 83 |
+
"n_samples": len(valid_case_ids),
|
| 84 |
+
"case_ids": valid_case_ids,
|
| 85 |
+
}
|
| 86 |
+
)
|
| 87 |
+
|
| 88 |
+
return {
|
| 89 |
+
"manifest_version": 1,
|
| 90 |
+
"created_at": _utc_now(),
|
| 91 |
+
"repo_root": _path_for_manifest(repo_root.resolve(), repo_root),
|
| 92 |
+
"dataset_root": _path_for_manifest(dataset_root, repo_root),
|
| 93 |
+
"pair_mode": "reference_vs_wangame",
|
| 94 |
+
"sample_count": len(samples),
|
| 95 |
+
"scenario_summaries": scenario_summaries,
|
| 96 |
+
"samples": samples,
|
| 97 |
+
"warnings": warnings,
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
def write_manifest(dataset_root: Path, manifest_path: Path, repo_root: Path | None = None) -> Path:
|
| 102 |
+
manifest = build_manifest(dataset_root=dataset_root, repo_root=repo_root)
|
| 103 |
+
manifest_path.parent.mkdir(parents=True, exist_ok=True)
|
| 104 |
+
with manifest_path.open("w", encoding="utf-8") as handle:
|
| 105 |
+
json.dump(manifest, handle, indent=2)
|
| 106 |
+
handle.write("\n")
|
| 107 |
+
return manifest_path
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
def probe_video(video_path: Path) -> dict[str, Any]:
|
| 111 |
+
command = [
|
| 112 |
+
"ffprobe",
|
| 113 |
+
"-v",
|
| 114 |
+
"error",
|
| 115 |
+
"-select_streams",
|
| 116 |
+
"v:0",
|
| 117 |
+
"-show_entries",
|
| 118 |
+
"stream=width,height,avg_frame_rate,nb_frames,duration",
|
| 119 |
+
"-of",
|
| 120 |
+
"json",
|
| 121 |
+
str(video_path),
|
| 122 |
+
]
|
| 123 |
+
try:
|
| 124 |
+
result = subprocess.run(
|
| 125 |
+
command,
|
| 126 |
+
check=True,
|
| 127 |
+
capture_output=True,
|
| 128 |
+
text=True,
|
| 129 |
+
)
|
| 130 |
+
except FileNotFoundError:
|
| 131 |
+
return {}
|
| 132 |
+
except subprocess.CalledProcessError:
|
| 133 |
+
return {}
|
| 134 |
+
|
| 135 |
+
try:
|
| 136 |
+
payload = json.loads(result.stdout)
|
| 137 |
+
stream = payload["streams"][0]
|
| 138 |
+
except (json.JSONDecodeError, KeyError, IndexError):
|
| 139 |
+
return {}
|
| 140 |
+
|
| 141 |
+
fps_text = stream.get("avg_frame_rate")
|
| 142 |
+
fps = _parse_fraction(fps_text)
|
| 143 |
+
duration = _parse_float(stream.get("duration"))
|
| 144 |
+
nb_frames = _parse_int(stream.get("nb_frames"))
|
| 145 |
+
|
| 146 |
+
return {
|
| 147 |
+
"width": _parse_int(stream.get("width")),
|
| 148 |
+
"height": _parse_int(stream.get("height")),
|
| 149 |
+
"avg_frame_rate": fps,
|
| 150 |
+
"fps": fps,
|
| 151 |
+
"duration_s": duration,
|
| 152 |
+
"nb_frames": nb_frames,
|
| 153 |
+
}
|
| 154 |
+
|
| 155 |
+
|
| 156 |
+
def _index_scenario_cases(scenario_dir: Path) -> dict[str, dict[str, Path]]:
|
| 157 |
+
indexed: dict[str, dict[str, Path]] = {}
|
| 158 |
+
for path in sorted(candidate for candidate in scenario_dir.iterdir() if candidate.is_file()):
|
| 159 |
+
case_id: str | None = None
|
| 160 |
+
field: str | None = None
|
| 161 |
+
if path.name.endswith(ACTION_SUFFIX):
|
| 162 |
+
case_id = path.name[: -len(ACTION_SUFFIX)]
|
| 163 |
+
field = "action_path"
|
| 164 |
+
elif path.name.endswith(GENERATED_SUFFIX):
|
| 165 |
+
case_id = path.name[: -len(GENERATED_SUFFIX)]
|
| 166 |
+
field = "generated_video"
|
| 167 |
+
elif path.suffix.lower() == ".mp4":
|
| 168 |
+
case_id = path.stem
|
| 169 |
+
field = "reference_video"
|
| 170 |
+
elif path.suffix.lower() == ".jpg":
|
| 171 |
+
case_id = path.stem
|
| 172 |
+
field = "preview_image"
|
| 173 |
+
|
| 174 |
+
if case_id and field:
|
| 175 |
+
indexed.setdefault(case_id, {})[field] = path.resolve()
|
| 176 |
+
return indexed
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
def _path_for_manifest(path: Path, repo_root: Path) -> str:
|
| 180 |
+
resolved = path.resolve()
|
| 181 |
+
try:
|
| 182 |
+
return resolved.relative_to(repo_root.resolve()).as_posix()
|
| 183 |
+
except ValueError:
|
| 184 |
+
return str(resolved)
|
| 185 |
+
|
| 186 |
+
|
| 187 |
+
def _parse_fraction(value: Any) -> float | None:
|
| 188 |
+
if not value:
|
| 189 |
+
return None
|
| 190 |
+
try:
|
| 191 |
+
return float(Fraction(str(value)))
|
| 192 |
+
except (ZeroDivisionError, ValueError):
|
| 193 |
+
return None
|
| 194 |
+
|
| 195 |
+
|
| 196 |
+
def _parse_float(value: Any) -> float | None:
|
| 197 |
+
if value in (None, ""):
|
| 198 |
+
return None
|
| 199 |
+
try:
|
| 200 |
+
return float(value)
|
| 201 |
+
except (TypeError, ValueError):
|
| 202 |
+
return None
|
| 203 |
+
|
| 204 |
+
|
| 205 |
+
def _parse_int(value: Any) -> int | None:
|
| 206 |
+
if value in (None, ""):
|
| 207 |
+
return None
|
| 208 |
+
try:
|
| 209 |
+
return int(value)
|
| 210 |
+
except (TypeError, ValueError):
|
| 211 |
+
return None
|
| 212 |
+
|
| 213 |
+
|
| 214 |
+
def _utc_now() -> str:
|
| 215 |
+
return datetime.now(timezone.utc).isoformat()
|
| 216 |
+
|
| 217 |
+
|
| 218 |
+
def _default_repo_root() -> Path:
|
| 219 |
+
return Path(__file__).resolve().parents[1]
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
def main() -> None:
|
| 223 |
+
repo_root = _default_repo_root()
|
| 224 |
+
parser = argparse.ArgumentParser(description="Build a normalized dataset manifest for arena.")
|
| 225 |
+
parser.add_argument(
|
| 226 |
+
"--dataset-root",
|
| 227 |
+
type=Path,
|
| 228 |
+
default=repo_root / "data_subset",
|
| 229 |
+
help="Path to the dataset root (default: repo_root/data_subset)",
|
| 230 |
+
)
|
| 231 |
+
parser.add_argument(
|
| 232 |
+
"--manifest",
|
| 233 |
+
type=Path,
|
| 234 |
+
default=repo_root / "arena" / "manifest.json",
|
| 235 |
+
help="Path to the manifest JSON file to write",
|
| 236 |
+
)
|
| 237 |
+
args = parser.parse_args()
|
| 238 |
+
|
| 239 |
+
manifest_path = write_manifest(
|
| 240 |
+
dataset_root=args.dataset_root,
|
| 241 |
+
manifest_path=args.manifest,
|
| 242 |
+
repo_root=repo_root,
|
| 243 |
+
)
|
| 244 |
+
print(f"Wrote manifest to {manifest_path}")
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
if __name__ == "__main__":
|
| 248 |
+
main()
|
arena/dataset.py
ADDED
|
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
from dataclasses import dataclass
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
from typing import Any
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
@dataclass(frozen=True)
|
| 10 |
+
class Sample:
|
| 11 |
+
sample_id: str
|
| 12 |
+
scenario: str
|
| 13 |
+
case_id: str
|
| 14 |
+
pair_mode: str
|
| 15 |
+
left_label: str
|
| 16 |
+
right_label: str
|
| 17 |
+
reference_video_relative: str
|
| 18 |
+
generated_video_relative: str
|
| 19 |
+
action_path_relative: str
|
| 20 |
+
preview_image_relative: str | None
|
| 21 |
+
reference_video: Path
|
| 22 |
+
generated_video: Path
|
| 23 |
+
action_path: Path
|
| 24 |
+
preview_image: Path | None
|
| 25 |
+
reference_video_meta: dict[str, Any]
|
| 26 |
+
generated_video_meta: dict[str, Any]
|
| 27 |
+
action_summary: dict[str, Any]
|
| 28 |
+
action_markdown: str
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
@dataclass(frozen=True)
|
| 32 |
+
class DatasetManifest:
|
| 33 |
+
manifest_path: Path
|
| 34 |
+
dataset_root: Path
|
| 35 |
+
pair_mode: str
|
| 36 |
+
sample_count: int
|
| 37 |
+
scenario_summaries: list[dict[str, Any]]
|
| 38 |
+
warnings: list[str]
|
| 39 |
+
samples: list[Sample]
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def ensure_manifest(manifest_path: Path | None = None, rebuild: bool = False) -> Path:
|
| 43 |
+
manifest_path = manifest_path or default_manifest_path()
|
| 44 |
+
if rebuild or not manifest_path.exists():
|
| 45 |
+
build_manifest_module = _import_build_manifest()
|
| 46 |
+
build_manifest_module.write_manifest(
|
| 47 |
+
dataset_root=default_dataset_root(),
|
| 48 |
+
manifest_path=manifest_path,
|
| 49 |
+
repo_root=repo_root(),
|
| 50 |
+
)
|
| 51 |
+
return manifest_path
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def load_manifest(manifest_path: Path | None = None) -> DatasetManifest:
|
| 55 |
+
manifest_path = manifest_path or default_manifest_path()
|
| 56 |
+
with manifest_path.open("r", encoding="utf-8") as handle:
|
| 57 |
+
payload = json.load(handle)
|
| 58 |
+
|
| 59 |
+
root = repo_root()
|
| 60 |
+
samples = [
|
| 61 |
+
Sample(
|
| 62 |
+
sample_id=item["sample_id"],
|
| 63 |
+
scenario=item["scenario"],
|
| 64 |
+
case_id=item["case_id"],
|
| 65 |
+
pair_mode=item.get("pair_mode", "reference_vs_wangame"),
|
| 66 |
+
left_label=item.get("left_label", "Left"),
|
| 67 |
+
right_label=item.get("right_label", "Right"),
|
| 68 |
+
reference_video_relative=item["reference_video"],
|
| 69 |
+
generated_video_relative=item["generated_video"],
|
| 70 |
+
action_path_relative=item["action_path"],
|
| 71 |
+
preview_image_relative=item.get("preview_image"),
|
| 72 |
+
reference_video=_resolve_repo_path(root, item["reference_video"]),
|
| 73 |
+
generated_video=_resolve_repo_path(root, item["generated_video"]),
|
| 74 |
+
action_path=_resolve_repo_path(root, item["action_path"]),
|
| 75 |
+
preview_image=_resolve_repo_path(root, item["preview_image"]) if item.get("preview_image") else None,
|
| 76 |
+
reference_video_meta=item.get("reference_video_meta", {}),
|
| 77 |
+
generated_video_meta=item.get("generated_video_meta", {}),
|
| 78 |
+
action_summary=item.get("action_summary", {}),
|
| 79 |
+
action_markdown=item.get("action_markdown", "Action summary unavailable."),
|
| 80 |
+
)
|
| 81 |
+
for item in payload.get("samples", [])
|
| 82 |
+
]
|
| 83 |
+
|
| 84 |
+
dataset_root_value = payload.get("dataset_root")
|
| 85 |
+
dataset_root = _resolve_repo_path(root, dataset_root_value) if dataset_root_value else default_dataset_root()
|
| 86 |
+
|
| 87 |
+
return DatasetManifest(
|
| 88 |
+
manifest_path=manifest_path,
|
| 89 |
+
dataset_root=dataset_root,
|
| 90 |
+
pair_mode=payload.get("pair_mode", "reference_vs_wangame"),
|
| 91 |
+
sample_count=int(payload.get("sample_count", len(samples))),
|
| 92 |
+
scenario_summaries=list(payload.get("scenario_summaries", [])),
|
| 93 |
+
warnings=list(payload.get("warnings", [])),
|
| 94 |
+
samples=samples,
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
def default_manifest_path() -> Path:
|
| 99 |
+
return repo_root() / "arena" / "manifest.json"
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
def default_dataset_root() -> Path:
|
| 103 |
+
return repo_root() / "data_subset"
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
def repo_root() -> Path:
|
| 107 |
+
return Path(__file__).resolve().parents[1]
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
def _resolve_repo_path(root: Path, value: str | None) -> Path:
|
| 111 |
+
if not value:
|
| 112 |
+
return root
|
| 113 |
+
path = Path(value)
|
| 114 |
+
if path.is_absolute():
|
| 115 |
+
return path
|
| 116 |
+
return root / path
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
def _import_build_manifest():
|
| 120 |
+
try:
|
| 121 |
+
from . import build_manifest as build_manifest_module
|
| 122 |
+
except ImportError:
|
| 123 |
+
import build_manifest as build_manifest_module
|
| 124 |
+
|
| 125 |
+
return build_manifest_module
|
arena/dataset_notes.md
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dataset Notes
|
| 2 |
+
|
| 3 |
+
## Short assumptions
|
| 4 |
+
|
| 5 |
+
- Each folder under `data_subset/` is a scenario family, likely grouped by control regime or prompt generation regime rather than by evaluator split.
|
| 6 |
+
- Each scenario folder currently contains 10 complete cases: `01` through `10`.
|
| 7 |
+
- A complete case consists of:
|
| 8 |
+
- `{id}.mp4`
|
| 9 |
+
- `{id}_wangame.mp4`
|
| 10 |
+
- `{id}_action.npy`
|
| 11 |
+
- `{id}.jpg`
|
| 12 |
+
|
| 13 |
+
## What the files likely mean
|
| 14 |
+
|
| 15 |
+
- `{id}.mp4`
|
| 16 |
+
- Most likely the reference / ground-truth video for that case.
|
| 17 |
+
- This is not guessed only from the filename: `ptlflow/run_all_eval.py` and `ptlflow/visualize_results.py` explicitly pair `{id}.mp4` with `{id}_wangame.mp4` as reference vs generated.
|
| 18 |
+
- `{id}_wangame.mp4`
|
| 19 |
+
- Most likely the WanGame-generated output for the same case.
|
| 20 |
+
- `{id}_action.npy`
|
| 21 |
+
- A pickled dict with two arrays:
|
| 22 |
+
- `keyboard`: shape `(77, 6)`
|
| 23 |
+
- `mouse`: shape `(77, 2)`
|
| 24 |
+
- From `ptlflow/action_flow_score.py`, the keyboard order is `[W, S, A, D, left, right]`.
|
| 25 |
+
- From the same script, the mouse order is `[pitch, yaw]`.
|
| 26 |
+
- The subset appears aligned at 77 frames per case, with videos observed at 25 FPS and about 3.08s duration.
|
| 27 |
+
- `{id}.jpg`
|
| 28 |
+
- Likely a preview still or initial frame.
|
| 29 |
+
- It visually matches the opening scene for at least one checked sample.
|
| 30 |
+
- Relevant `ptlflow` scripts do not appear to use it for evaluation, so its exact role remains somewhat ambiguous.
|
| 31 |
+
|
| 32 |
+
## What each scenario folder likely represents
|
| 33 |
+
|
| 34 |
+
- `camera`
|
| 35 |
+
- Inferred camera-only regime: no keyboard activity, nonzero mouse yaw throughout sampled files.
|
| 36 |
+
- `camera4hold_alpha1`
|
| 37 |
+
- Inferred camera-only regime with held pitch/yaw steps.
|
| 38 |
+
- `1_wasd_only`
|
| 39 |
+
- Inferred keyboard-only regime with no mouse input.
|
| 40 |
+
- `wasdonly_alpha1`
|
| 41 |
+
- Another keyboard-only regime with no mouse input.
|
| 42 |
+
- `fully_random`
|
| 43 |
+
- Mixed keyboard + mouse regime.
|
| 44 |
+
- `wasd4holdrandview_simple_1key1mouse1`
|
| 45 |
+
- Mixed keyboard + mouse regime; folder name suggests sparse held inputs, which matches the action arrays broadly.
|
| 46 |
+
|
| 47 |
+
These scenario names were not documented elsewhere in the repo, so the descriptions above are inferred from folder names plus action statistics.
|
| 48 |
+
|
| 49 |
+
## Pairing logic
|
| 50 |
+
|
| 51 |
+
- Pair samples only within the same scenario folder.
|
| 52 |
+
- Pair by exact case id:
|
| 53 |
+
- `scenario/01.mp4`
|
| 54 |
+
- `scenario/01_wangame.mp4`
|
| 55 |
+
- `scenario/01_action.npy`
|
| 56 |
+
- Do not pair across scenario folders even when the case ids match.
|
| 57 |
+
|
| 58 |
+
## Baseline UI choice
|
| 59 |
+
|
| 60 |
+
- The baseline app should be side-by-side A/B, not single-video scoring.
|
| 61 |
+
- Reason:
|
| 62 |
+
- the dataset has a natural two-video pair per case
|
| 63 |
+
- `ptlflow` already treats that pair as the main eval unit
|
| 64 |
+
- the user requested an LM-Arena-style baseline first
|
| 65 |
+
|
| 66 |
+
## Important ambiguity
|
| 67 |
+
|
| 68 |
+
- This is an A/B comparison, but it is asymmetric:
|
| 69 |
+
- left is a reference video
|
| 70 |
+
- right is a generated WanGame output
|
| 71 |
+
- That means the UI is "arena-shaped" but not a blinded model-vs-model arena.
|
| 72 |
+
- A stricter single-video scoring flow would also be coherent, but the current repo structure supports paired comparison more directly, so the baseline chooses A/B.
|
arena/manifest.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
arena/result_logger.py
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
from typing import Any
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def annotations_path(results_dir: Path) -> Path:
|
| 9 |
+
return results_dir / "annotations.jsonl"
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def append_annotation(results_dir: Path, record: dict[str, Any]) -> Path:
|
| 13 |
+
results_dir.mkdir(parents=True, exist_ok=True)
|
| 14 |
+
output_path = annotations_path(results_dir)
|
| 15 |
+
with output_path.open("a", encoding="utf-8") as handle:
|
| 16 |
+
handle.write(json.dumps(record, ensure_ascii=True))
|
| 17 |
+
handle.write("\n")
|
| 18 |
+
return output_path
|
arena/results/.gitkeep
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
|
arena/results/annotations.jsonl
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"annotated_at": "2026-03-10T04:48:33.348392+00:00", "sample_id": "1_wasd_only/01", "scenario": "1_wasd_only", "case_id": "01", "pair_mode": "reference_vs_wangame", "left_label": "Reference (.mp4)", "right_label": "Generated (WanGame)", "reference_video": "data_subset/1_wasd_only/01.mp4", "generated_video": "data_subset/1_wasd_only/01_wangame.mp4", "preview_image": "data_subset/1_wasd_only/01.jpg", "action_path": "data_subset/1_wasd_only/01_action.npy", "votes": {"action_following": "Left better", "visual_quality": "Tie / unsure", "temporal_consistency": "Left better"}, "artifact_flags": [], "artifact_latest_s": null, "note": ""}
|
data_subset/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
data_subset/1_wasd_only/01.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/01.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:855b25f93f4f165acf93cee4b37f341501fc59e9810c807ebd5837b5f12fe186
|
| 3 |
+
size 78786
|
data_subset/1_wasd_only/01_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a580fa0e2f5f783e7113886f0447e580819e249bc993083aeff074a3233afcd7
|
| 3 |
+
size 2902
|
data_subset/1_wasd_only/01_wangame.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e60938e6a77a1811216efee750e281eddc8eb38b8d2d0f133084e83465b6e967
|
| 3 |
+
size 62610
|
data_subset/1_wasd_only/02.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/02.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b0f3fbfb28afe97a9b6d9664f5d30c2f13bad54939135137db0652652de78fd0
|
| 3 |
+
size 86880
|
data_subset/1_wasd_only/02_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:20181eb53edb42e80a8271d99226e8b4ffd55e5d5501df73730085166977423d
|
| 3 |
+
size 2902
|
data_subset/1_wasd_only/02_wangame.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e7b3b957a3073916783a4b2c0bf320ce18b92c337814c16963432a2ae36b587f
|
| 3 |
+
size 403384
|
data_subset/1_wasd_only/03.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/03.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9e9d527912f7ce59c4f57ae5a9b771105746c87772377452a65346360e1244ae
|
| 3 |
+
size 98605
|
data_subset/1_wasd_only/03_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:030abdc4ec31ce9cffebae9c46b08e7e89779715e1c4f81abe16cbffbee1746b
|
| 3 |
+
size 2902
|
data_subset/1_wasd_only/03_wangame.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2d28932a5f60b96f9f855b2a4593125a0d88a492a9c023ebf9e3ff412165333c
|
| 3 |
+
size 393043
|
data_subset/1_wasd_only/04.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/04.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b7c0b55bed20a73c61b7d50b09681f6f189ff8a8942129e143b2326d91be5810
|
| 3 |
+
size 73484
|
data_subset/1_wasd_only/04_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e32503ed36fa7199a83548db507c3c4dfba396fc9acdf71792d86a71eb1ffa9b
|
| 3 |
+
size 2902
|
data_subset/1_wasd_only/04_wangame.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5049a9a152b2497bf877b754ab8d607ec0f25f4b92297b40703c5438257db9f1
|
| 3 |
+
size 79860
|
data_subset/1_wasd_only/05.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/05.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:96e7064faaa87f8d845517eba3695d3e60b84cfc1eb4cdd57acb27cc494b26e2
|
| 3 |
+
size 629639
|
data_subset/1_wasd_only/05_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:20181eb53edb42e80a8271d99226e8b4ffd55e5d5501df73730085166977423d
|
| 3 |
+
size 2902
|
data_subset/1_wasd_only/05_wangame.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6819b458185b7d8ae8181852c7205ff7d6e681b56ff222e7b668ef829b80dfba
|
| 3 |
+
size 502535
|
data_subset/1_wasd_only/06.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/06.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:70c28dd861202aef52d1e6e3bead6cb7d8eed08c49cc983c8f84e03e6791e1cc
|
| 3 |
+
size 744972
|
data_subset/1_wasd_only/06_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:030abdc4ec31ce9cffebae9c46b08e7e89779715e1c4f81abe16cbffbee1746b
|
| 3 |
+
size 2902
|
data_subset/1_wasd_only/06_wangame.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:42b9afe180eeec4674c49113061619fe80b2708e2c40364e5c763528af72309b
|
| 3 |
+
size 720365
|
data_subset/1_wasd_only/07.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/07.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ad07edd622a19fdd30e342328e560412ba3f8f7f63a2e140941e227a84cf3565
|
| 3 |
+
size 704835
|
data_subset/1_wasd_only/07_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a580fa0e2f5f783e7113886f0447e580819e249bc993083aeff074a3233afcd7
|
| 3 |
+
size 2902
|
data_subset/1_wasd_only/07_wangame.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8ed71ccc53d9c18de093c7852db3d8d007efadaf12b724f585045ed4e53f6b41
|
| 3 |
+
size 690364
|
data_subset/1_wasd_only/08.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/08.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c827892a1b4fa233e5d702524ad6fc75c48d494100f74effef6c16a685a215fc
|
| 3 |
+
size 777476
|
data_subset/1_wasd_only/08_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:030abdc4ec31ce9cffebae9c46b08e7e89779715e1c4f81abe16cbffbee1746b
|
| 3 |
+
size 2902
|
data_subset/1_wasd_only/08_wangame.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a36f305aa87e043f93daeb9da57ec160c30ff1bd9381676d168e000d48cbd443
|
| 3 |
+
size 696419
|
data_subset/1_wasd_only/09.jpg
ADDED
|
Git LFS Details
|
data_subset/1_wasd_only/09.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ab71cea35b8f75063ed4b2bf91600663eb94ee7f0954f4685e49c715a6f28ff6
|
| 3 |
+
size 1048922
|
data_subset/1_wasd_only/09_action.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a580fa0e2f5f783e7113886f0447e580819e249bc993083aeff074a3233afcd7
|
| 3 |
+
size 2902
|