Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

CHANGES.md +215 -0
STATE_ACTION_SPEC.md +83 -0
backup/configs.py +489 -0
backup/env.py +248 -0
configs.py +8 -3
configs.py.bak +489 -0
env.py +187 -80
env.py.bak +248 -0

CHANGES.md ADDED Viewed

	@@ -0,0 +1,215 @@

+# Robocasa_Env 변경사항 — 공식 GR00T eval 흐름과 정렬
+목적: `lerobot-eval --env.type=robocasa ...`가 공식
+`Isaac-GR00T/scripts/run_eval.py`와 같은 방식으로 동작하도록 수정.
+**State/Action 계약(`STATE_ACTION_SPEC.md`)은 그대로 유지** — lerobot 정책(예: ACT)이
+12-dim concat action을 출력하고, env가 16-dim concat `agent_pos`를 노출하는 약속을 깨지 않는다.
+수정 파일: `env.py`, `configs.py` 두 개. (백업: `*.bak`)
+---
+## 1. `env.py` 변경 요약
+### 1-1. `step()`에서 자체 `self.reset()` 제거 (★ 핵심)
+**Before**
+```python
+if terminated:
+    info["final_info"] = {...}
+    self.reset()        # ← 환경이 스스로 reset
+return new_obs, reward, terminated, truncated, info
+```
+**After**
+```python
+# self.reset() 호출 제거.
+# gymnasium 0.29+ VectorEnv의 autoreset이 terminated/truncated를 보고
+# final_info를 만들고 자동으로 리셋한다. wrapper가 한 번 더 reset하면
+# 첫 obs가 final obs를 덮어쓰고, lerobot rollout의 final_info["is_success"]가
+# 정합성을 잃는다.
+return new_obs, float(reward), terminated, truncated, info
+```
+- 공식 GR00T `simulation.py`도 wrapper 안에서 reset하지 않고 외부 루프가 처리함.
+- `lerobot_eval.py`의 `rollout()`이 `info["final_info"][i]["is_success"]`를 읽으므로,
+  자동 reset에 맡기는 게 정확한 SR 집계로 이어짐.
+### 1-2. 언어 조건(`task`)을 obs로 직접 노출
+`RoboCasaGymEnv.get_observation`은 이미 `annotation.human.task_description` 키로
+ep_meta의 lang을 obs에 채워 준다. 사용자 코드의 `_format_raw_obs`는 이걸 버리고
+있었음.
+```python
+def _format_raw_obs(self, raw_obs):
+    ...
+    lang = raw_obs.get("annotation.human.task_description") \
+           or self._task_description or self.task
+    new_obs["task"] = str(lang)
+    self._task_description = str(lang)
+    return new_obs
+```
+- `AsyncVectorEnv`에서는 `env.call("task_description")`이 worker process를 거쳐 비싸고
+  타이밍 이슈가 있다 (README의 *use_async_envs=True 보류: task_description 누락* 이슈).
+  obs에 직접 넣으면 sync/async 모두에서 끊기지 않음.
+- `observation_space`에도 `"task": spaces.Text(max_length=512)`를 추가.
+### 1-3. horizon은 공식 헬퍼 `get_task_horizon` 사용
+```python
+from robocasa.utils.dataset_registry_utils import get_task_horizon
+...
+self._max_episode_steps = int(get_task_horizon(task))
+```
+- 기존 `meta_info[task]['horizon']` 직조회와 동일 결과지만, 공식 함수와 동일한 코드 경로를 타게 한다.
+- `lerobot_eval.rollout()`이 `env.call("_max_episode_steps")[0]`을 읽으므로,
+  **batch에는 horizon이 같은 task만 묶이도록** task별로 별도 VectorEnv를 만든다 (기존 구조 유지).
+### 1-4. `_resolve_task_list()` 신설 — benchmark/단일 task/다중 task 모두 처리
+공식 `run_eval.py`의 다음 로직을 재현:
+```python
+all_env_names = []
+for task_set in task_set_list:
+    all_env_names += TASK_SET_REGISTRY[task_set]
+all_env_names = set(all_env_names)
+for env_name in all_env_names:
+    config = SimulationConfig(env_name=f"robocasa/{env_name}", split=split, ...)
+```
+수정 후 `make_env`:
+- benchmark 키(`atomic_seen`, `composite_unseen`, `pretrain50`, ...)면 sub-task 리스트로 펼친다.
+- 단일 task 이름이면 그대로 사용.
+- 여러 개를 공백/콤마/리스트 형태로 모두 받는다.
+  - `--env.task=atomic_seen composite_unseen composite_seen` (draccus list)
+  - `--env.task="atomic_seen,composite_unseen"` (콤마 분리)
+  - `--env.task=PnPCounterToCab` (단일)
+### 1-5. `cfg.split` 명시 시 우선
+**Before**
+```python
+if task_name in combined_tasks:
+    task_names = combined_tasks[task_name]
+    gym_kwargs["split"] = "target" if task_name in TARGET_TASKS else "pretrain"
+    # ← 사용자가 --env.split=pretrain 줘도 강제 덮어씀
+```
+**After**
+```python
+if item in TARGET_TASKS:
+    split = explicit_split or "target"   # explicit이 있으면 그것 사용
+elif item in PRETRAINING_TASKS:
+    split = explicit_split or "pretrain"
+else:
+    pairs.append((item, explicit_split)) # 단일 task는 그대로
+```
+이제 `--env.task=atomic_seen --env.split=pretrain` 같은 조합이 의도대로 동작한다
+(공식 `run_eval.py`도 `--task_set`과 `--split`을 독립 인자로 받음).
+### 1-6. GL context 수동 조작은 보존 (단, try/except)
+`reset()`의 `gl_ctx.free()`, `step()`의 `make_current()`는 **이유 불명의 우회 hack**으로 보이지만
+사용자 환경에서 의도적으로 추가된 것일 가능성이 있어 **보존**.
+다만 `try/except`로 감싸 다른 환경에서 깨지지 않도록 했다.
+### 1-7. 기타 정리
+- `convert_state` 결과를 `float32`로 명시 (lerobot tensor 변환 안전)
+- `convert_action`이 list/tuple도 받도록 `np.asarray`
+- `_format_raw_obs`의 `"video." in k` → `k.startswith("video.")` (오탐 방지)
+- info dict에 `task`, `task_description`을 매 step 채움 (lerobot의 `add_envs_task` fallback)
+---
+## 2. `configs.py` 변경 요약
+```python
+@EnvConfig.register_subclass("robocasa")
+@dataclass
+class RoboCasaEnv(HubEnvConfig):
+    hub_path: str = "Whalswp/RoboCasa_Env"
+    # ★ list 허용 — 공식 run_eval처럼 여러 task_set 동시 입력
+    task: str | list[str] | None = None
+    fps: int = 20                          # ★ 신설 (env.py가 cfg.fps 참조)
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    camera_name: str = "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right"
+    observation_height: int = 256
+    observation_width: int = 256
+    split: str | None = None               # `pretrain` | `target` | `all` | None
+```
+`features` / `features_map` / `hub_path`는 변경하지 않음 — STATE_ACTION_SPEC 계약 유지.
+---
+## 3. 공식 GR00T eval ↔ 수정 후 lerobot-eval 대응표
+| 공식 인자 (`run_eval.py`) | lerobot-eval 인자 | 비고 |
+|--------------------------|-------------------|------|
+| `--model_path` | `--policy.path` | lerobot은 `PreTrainedPolicy` 로드 가능해야 함 |
+| `--task_set atomic_seen composite_unseen` | `--env.task=atomic_seen composite_unseen` | 이번 수정으로 동등 |
+| `--split pretrain` | `--env.split=pretrain` | explicit 우선되도록 수정 |
+| `--n_episodes 50` | `--eval.n_episodes=50` | 동일 |
+| `--n_envs 5` | `--eval.batch_size=5` | 동일 |
+| `--video_dir <path>` | `--output_dir <path>/videos` | lerobot이 `output_dir/videos/{task}_{id}/eval_episode_*.mp4` 자동 |
+| `--n_action_steps 16` | (해당 없음) | lerobot 정책이 single-step 출력. ACT 등 lerobot 정책은 내부 chunk 처리 |
+GR00T 정책 자체는 chunk(16-step) 출력이라 lerobot의 `select_action` 단일 스텝 흐름과 다르다.
+이건 정책 어댑터 영역이라 **이번 수정 범위 밖** (env는 정책-환경 사이의 12-dim 단일 스텝 계약만 책임짐).
+---
+## 4. 사용 예시
+```bash
+# 단일 task (sanity)
+lerobot-eval \
+  --policy.path=BrunoM42/act_base-robocasa_target_PickPlaceCounterToCabinet \
+  --env.type=robocasa \
+  --env.task=PickPlaceCounterToCabinet \
+  --eval.batch_size=5 --eval.n_episodes=5 \
+  --policy.device=cuda --trust_remote_code=true \
+  --env.split=pretrain \
+  --output_dir /home/seonho/clvla/benchmarks/robocasa365/bench_outputs
+# 여러 benchmark 동시 (= 공식 --task_set atomic_seen composite_unseen composite_seen)
+lerobot-eval \
+  --policy.path=<...> \
+  --env.type=robocasa \
+  --env.task="atomic_seen composite_unseen composite_seen" \
+  --eval.batch_size=5 --eval.n_episodes=50 \
+  --policy.device=cuda --trust_remote_code=true \
+  --env.split=pretrain \
+  --output_dir /home/seonho/clvla/benchmarks/robocasa365/bench_outputs
+```
+`out[task][0] = VectorEnv(...)` 구조이므로 `eval_info.json`의 `per_group`에는
+**task별 success rate**가 그대로 떨어지고, `overall`이 전체 평균이 된다 — 공식
+`get_eval_stats.py`가 만드는 task_set 평균과 비교하기 쉬운 형태.
+---
+## 5. 변경하지 않은 것 (의도)
+- **State 16-dim / Action 12-dim concat 계약** — `STATE_ACTION_SPEC.md` 유지
+- `convert_state`, `convert_action`의 인덱스/키 정의
+- `hub_path`, `features`, `features_map`
+- `_create_obs_and_action_space`의 카메라/액션 박스 shape
+- `gl_ctx.free()` / `make_current()` 호출 (이유 불명, 안전 우선 보존)
+## 6. 한계 / 후속 과제
+- GR00T 정책을 lerobot에서 그대로 쓰려면 chunk·history를 처리하는 **정책 어댑터**가 별도로 필요.
+  `lerobot_eval_gap_analysis.md` §3.C 참고. 이번 수정은 환경 측만.
+- lerobot의 `eval_info.json`을 공식 `evals/<split>/<env>/stats.json` 트리로
+  변환해주는 작은 스크립트가 있으면 `gr00t/eval/get_eval_stats.py`를 그대로 재사용 가능
+  (`lerobot_eval_gap_analysis.md` §3.D).
+- Hub(`Whalswp/RoboCasa_Env`)를 쓰는 경우, **이 로컬 변경을 hub에 푸시**해야 lerobot의
+  `trust_remote_code` 경로가 새 버전을 받는다. 로컬에서 직접 import하는 경우엔 그대로 적용됨.

STATE_ACTION_SPEC.md ADDED Viewed

	@@ -0,0 +1,83 @@

+# RoboCasa State / Action 명세
+> 근거 파일: `env.py`, `gym_wrapper.py` (`PandaOmronKeyConverter`), `robosuite/controllers/parts/arm/osc.py`, `robosuite/controllers/config/robots/default_pandaomron.json`
+---
+## State (총 16차원)
+`env.py: convert_state()` 기준으로 concatenate됨.
+| 인덱스 | 차원 | 키 | absolute / relative | 표현 |
+|--------|------|----|---------------------|------|
+| 0~2 | 3 | `state.base_position` | **absolute** | xyz |
+| 3~6 | 4 | `state.base_rotation` | **absolute** | **Quaternion** (`robot0_base_quat`) |
+| 7~9 | 3 | `state.end_effector_position_relative` | **relative** (base → EE) | xyz |
+| 10~13 | 4 | `state.end_effector_rotation_relative` | **relative** (base → EE) | **Quaternion** (`robot0_base_to_eef_quat`) |
+| 14~15 | 2 | `state.gripper_qpos` | — | joint position |
+---
+## Action (총 12차원)
+`env.py: convert_action()` 기준으로 분해됨.
+| 인덱스 | 차원 | 키 | 설명 |
+|--------|------|----|------|
+| 0~3 | 4 | `action.base_motion` | 베이스 이동 (아래 참고) |
+| 4 | 1 | `action.control_mode` | 제어 모드 스위치 (아래 참고) |
+| 5~7 | 3 | `action.end_effector_position` | EE delta position, **base frame 기준** |
+| 8~10 | 3 | `action.end_effector_rotation` | EE delta rotation, **base frame 기준, axis-angle** |
+| 11 | 1 | `action.gripper_close` | 그리퍼 닫기 (0.5 threshold → binary) |
+### base_motion (4차원) 상세
+| 인덱스 | 대상 | controller type | 설명 |
+|--------|------|-----------------|------|
+| 0~2 | `robot0_base` | `JOINT_VELOCITY` | 모바일 베이스 x속도 / y속도 / yaw속도 |
+| 3 | `robot0_torso` | `JOINT_POSITION` | 몸통 수직 리프트 joint position (≈ 높이) |
+### control_mode (1차원) 상세
+| 값 | base_mode | 동작 |
+|----|-----------|------|
+| < 0.5 | -1 | **Arm mode** — 베이스 고정, 팔로 조작 (goal: `achieved` 기준) |
+| ≥ 0.5 | +1 | **Base mode** — 베이스 이동, 팔 목표 유지 (goal: `desired` 기준) |
+---
+## EE Rotation: axis-angle을 쓰는 이유
+OSC controller(`osc.py`)는 rotation input을 `Rotation.from_rotvec()` 으로 해석 → **axis-angle 고정**.
+RPY(Euler angle) 대신 axis-angle을 쓰는 이유:
+1. **Gimbal lock 없음** — RPY는 특정 자세에서 두 축이 겹쳐 DOF를 잃는 singularity 발생. EE는 자유 회전하므로 실제 문제가 됨.
+2. **Delta 제어에 자연스러움** — "이 축 방향으로 θ만큼 회전" 의미가 직관적이고 보간이 smooth. RPY delta는 순서 의존성(roll→pitch→yaw) 때문에 합성이 복잡함.
+3. **크기 = 회전량** — 벡터 norm이 회전각이라 output clipping이 자연스러움. (`output_max: [0.5, 0.5, 0.5]` rad)
+> RPY 입력은 코드상 지원하지 않음. 필요하면 wrapper에서 변환 필요:
+> ```python
+> from scipy.spatial.transform import Rotation
+> axis_angle = Rotation.from_euler('xyz', rpy).as_rotvec()
+> ```
+---
+## 시뮬레이션 연결 흐름
+```
+policy output (12-dim)
+    ↓ convert_action()  [env.py]
+action dict (base_motion, control_mode, EE_pos, EE_rot, gripper_close)
+    ↓ unmap_action()  [gym_wrapper.py]
+{
+  robot0_right:        concat(EE_pos[3], EE_rot[3])  → OSC_POSE controller
+  robot0_right_gripper: threshold(gripper_close, 0.5) → -1 or +1
+  robot0_base:         base_motion[0:3]              → JOINT_VELOCITY controller
+  robot0_torso:        base_motion[3:4]              → JOINT_POSITION controller
+  robot0_base_mode:    threshold(control_mode, 0.5)  → -1 or +1
+}
+    ↓ env.step()  [robosuite]
+MuJoCo simulation
+```

backup/configs.py ADDED Viewed

	@@ -0,0 +1,489 @@

+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import abc
+from dataclasses import dataclass, field, fields
+from typing import Any
+import draccus
+from lerobot.configs.types import FeatureType, PolicyFeature
+from lerobot.robots import RobotConfig
+from lerobot.teleoperators.config import TeleoperatorConfig
+from lerobot.utils.constants import (
+    ACTION,
+    LIBERO_KEY_EEF_MAT,
+    LIBERO_KEY_EEF_POS,
+    LIBERO_KEY_EEF_QUAT,
+    LIBERO_KEY_GRIPPER_QPOS,
+    LIBERO_KEY_GRIPPER_QVEL,
+    LIBERO_KEY_JOINTS_POS,
+    LIBERO_KEY_JOINTS_VEL,
+    LIBERO_KEY_PIXELS_AGENTVIEW,
+    LIBERO_KEY_PIXELS_EYE_IN_HAND,
+    OBS_ENV_STATE,
+    OBS_IMAGE,
+    OBS_IMAGES,
+    OBS_STATE,
+)
+@dataclass
+class EnvConfig(draccus.ChoiceRegistry, abc.ABC):
+    task: str | None = None
+    fps: int = 30
+    features: dict[str, PolicyFeature] = field(default_factory=dict)
+    features_map: dict[str, str] = field(default_factory=dict)
+    max_parallel_tasks: int = 1
+    disable_env_checker: bool = True
+    @property
+    def type(self) -> str:
+        return self.get_choice_name(self.__class__)
+    @property
+    def package_name(self) -> str:
+        """Package name to import if environment not found in gym registry"""
+        return f"gym_{self.type}"
+    @property
+    def gym_id(self) -> str:
+        """ID string used in gym.make() to instantiate the environment"""
+        return f"{self.package_name}/{self.task}"
+    @property
+    @abc.abstractmethod
+    def gym_kwargs(self) -> dict:
+        raise NotImplementedError()
+@dataclass
+class HubEnvConfig(EnvConfig):
+    """Base class for environments that delegate creation to a hub-hosted make_env.
+    Hub environments download and execute remote code from the HF Hub.
+    The hub_path points to a repository containing an env.py with a make_env function.
+    """
+    hub_path: str | None = None  # required: e.g., "username/repo" or "username/repo@branch:file.py"
+    @property
+    def gym_kwargs(self) -> dict:
+        # Not used for hub environments - the hub's make_env handles everything
+        return {}
+@EnvConfig.register_subclass("aloha")
+@dataclass
+class AlohaEnv(EnvConfig):
+    task: str | None = "AlohaInsertion-v0"
+    fps: int = 50
+    episode_length: int = 400
+    obs_type: str = "pixels_agent_pos"
+    observation_height: int = 480
+    observation_width: int = 640
+    render_mode: str = "rgb_array"
+    features: dict[str, PolicyFeature] = field(
+        default_factory=lambda: {
+            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(14,)),
+        }
+    )
+    features_map: dict[str, str] = field(
+        default_factory=lambda: {
+            ACTION: ACTION,
+            "agent_pos": OBS_STATE,
+            "top": f"{OBS_IMAGE}.top",
+            "pixels/top": f"{OBS_IMAGES}.top",
+        }
+    )
+    def __post_init__(self):
+        if self.obs_type == "pixels":
+            self.features["top"] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+        elif self.obs_type == "pixels_agent_pos":
+            self.features["agent_pos"] = PolicyFeature(type=FeatureType.STATE, shape=(14,))
+            self.features["pixels/top"] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+    @property
+    def gym_kwargs(self) -> dict:
+        return {
+            "obs_type": self.obs_type,
+            "render_mode": self.render_mode,
+            "max_episode_steps": self.episode_length,
+        }
+@EnvConfig.register_subclass("pusht")
+@dataclass
+class PushtEnv(EnvConfig):
+    task: str | None = "PushT-v0"
+    fps: int = 10
+    episode_length: int = 300
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    visualization_width: int = 384
+    visualization_height: int = 384
+    observation_height: int = 384
+    observation_width: int = 384
+    features: dict[str, PolicyFeature] = field(
+        default_factory=lambda: {
+            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(2,)),
+            "agent_pos": PolicyFeature(type=FeatureType.STATE, shape=(2,)),
+        }
+    )
+    features_map: dict[str, str] = field(
+        default_factory=lambda: {
+            ACTION: ACTION,
+            "agent_pos": OBS_STATE,
+            "environment_state": OBS_ENV_STATE,
+            "pixels": OBS_IMAGE,
+        }
+    )
+    def __post_init__(self):
+        if self.obs_type == "pixels_agent_pos":
+            self.features["pixels"] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+        elif self.obs_type == "environment_state_agent_pos":
+            self.features["environment_state"] = PolicyFeature(type=FeatureType.ENV, shape=(16,))
+    @property
+    def gym_kwargs(self) -> dict:
+        return {
+            "obs_type": self.obs_type,
+            "render_mode": self.render_mode,
+            "visualization_width": self.visualization_width,
+            "visualization_height": self.visualization_height,
+            "max_episode_steps": self.episode_length,
+        }
+@dataclass
+class ImagePreprocessingConfig:
+    crop_params_dict: dict[str, tuple[int, int, int, int]] | None = None
+    resize_size: tuple[int, int] | None = None
+@dataclass
+class RewardClassifierConfig:
+    """Configuration for reward classification."""
+    pretrained_path: str | None = None
+    success_threshold: float = 0.5
+    success_reward: float = 1.0
+@dataclass
+class InverseKinematicsConfig:
+    """Configuration for inverse kinematics processing."""
+    urdf_path: str | None = None
+    target_frame_name: str | None = None
+    end_effector_bounds: dict[str, list[float]] | None = None
+    end_effector_step_sizes: dict[str, float] | None = None
+@dataclass
+class ObservationConfig:
+    """Configuration for observation processing."""
+    add_joint_velocity_to_observation: bool = False
+    add_current_to_observation: bool = False
+    add_ee_pose_to_observation: bool = False
+    display_cameras: bool = False
+@dataclass
+class GripperConfig:
+    """Configuration for gripper control and penalties."""
+    use_gripper: bool = True
+    gripper_penalty: float = 0.0
+@dataclass
+class ResetConfig:
+    """Configuration for environment reset behavior."""
+    fixed_reset_joint_positions: Any | None = None
+    reset_time_s: float = 5.0
+    control_time_s: float = 20.0
+    terminate_on_success: bool = True
+@dataclass
+class HILSerlProcessorConfig:
+    """Configuration for environment processing pipeline."""
+    control_mode: str = "gamepad"
+    observation: ObservationConfig | None = None
+    image_preprocessing: ImagePreprocessingConfig | None = None
+    gripper: GripperConfig | None = None
+    reset: ResetConfig | None = None
+    inverse_kinematics: InverseKinematicsConfig | None = None
+    reward_classifier: RewardClassifierConfig | None = None
+    max_gripper_pos: float | None = 100.0
+@EnvConfig.register_subclass(name="gym_manipulator")
+@dataclass
+class HILSerlRobotEnvConfig(EnvConfig):
+    """Configuration for the HILSerlRobotEnv environment."""
+    robot: RobotConfig | None = None
+    teleop: TeleoperatorConfig | None = None
+    processor: HILSerlProcessorConfig = field(default_factory=HILSerlProcessorConfig)
+    name: str = "real_robot"
+    @property
+    def gym_kwargs(self) -> dict:
+        return {}
+@EnvConfig.register_subclass("libero")
+@dataclass
+class LiberoEnv(EnvConfig):
+    task: str = "libero_10"  # can also choose libero_spatial, libero_object, etc.
+    task_ids: list[int] | None = None
+    fps: int = 30
+    episode_length: int | None = None
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    camera_name: str = "agentview_image,robot0_eye_in_hand_image"
+    init_states: bool = True
+    camera_name_mapping: dict[str, str] | None = None
+    observation_height: int = 360
+    observation_width: int = 360
+    features: dict[str, PolicyFeature] = field(
+        default_factory=lambda: {
+            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(7,)),
+        }
+    )
+    features_map: dict[str, str] = field(
+        default_factory=lambda: {
+            ACTION: ACTION,
+            LIBERO_KEY_EEF_POS: f"{OBS_STATE}.eef_pos",
+            LIBERO_KEY_EEF_QUAT: f"{OBS_STATE}.eef_quat",
+            LIBERO_KEY_EEF_MAT: f"{OBS_STATE}.eef_mat",
+            LIBERO_KEY_GRIPPER_QPOS: f"{OBS_STATE}.gripper_qpos",
+            LIBERO_KEY_GRIPPER_QVEL: f"{OBS_STATE}.gripper_qvel",
+            LIBERO_KEY_JOINTS_POS: f"{OBS_STATE}.joint_pos",
+            LIBERO_KEY_JOINTS_VEL: f"{OBS_STATE}.joint_vel",
+            LIBERO_KEY_PIXELS_AGENTVIEW: f"{OBS_IMAGES}.image",
+            LIBERO_KEY_PIXELS_EYE_IN_HAND: f"{OBS_IMAGES}.image2",
+        }
+    )
+    control_mode: str = "relative"  # or "absolute"
+    def __post_init__(self):
+        if self.obs_type == "pixels":
+            self.features[LIBERO_KEY_PIXELS_AGENTVIEW] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+            self.features[LIBERO_KEY_PIXELS_EYE_IN_HAND] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+        elif self.obs_type == "pixels_agent_pos":
+            self.features[LIBERO_KEY_PIXELS_AGENTVIEW] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+            self.features[LIBERO_KEY_PIXELS_EYE_IN_HAND] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+            self.features[LIBERO_KEY_EEF_POS] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(3,),
+            )
+            self.features[LIBERO_KEY_EEF_QUAT] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(4,),
+            )
+            self.features[LIBERO_KEY_EEF_MAT] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(3, 3),
+            )
+            self.features[LIBERO_KEY_GRIPPER_QPOS] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(2,),
+            )
+            self.features[LIBERO_KEY_GRIPPER_QVEL] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(2,),
+            )
+            self.features[LIBERO_KEY_JOINTS_POS] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(7,),
+            )
+            self.features[LIBERO_KEY_JOINTS_VEL] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(7,),
+            )
+        else:
+            raise ValueError(f"Unsupported obs_type: {self.obs_type}")
+    @property
+    def gym_kwargs(self) -> dict:
+        kwargs: dict[str, Any] = {"obs_type": self.obs_type, "render_mode": self.render_mode}
+        if self.task_ids is not None:
+            kwargs["task_ids"] = self.task_ids
+        return kwargs
+@EnvConfig.register_subclass("metaworld")
+@dataclass
+class MetaworldEnv(EnvConfig):
+    task: str = "metaworld-push-v2"  # add all tasks
+    fps: int = 80
+    episode_length: int = 400
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    multitask_eval: bool = True
+    features: dict[str, PolicyFeature] = field(
+        default_factory=lambda: {
+            "action": PolicyFeature(type=FeatureType.ACTION, shape=(4,)),
+        }
+    )
+    features_map: dict[str, str] = field(
+        default_factory=lambda: {
+            "action": ACTION,
+            "agent_pos": OBS_STATE,
+            "top": f"{OBS_IMAGE}",
+            "pixels/top": f"{OBS_IMAGE}",
+        }
+    )
+    def __post_init__(self):
+        if self.obs_type == "pixels":
+            self.features["top"] = PolicyFeature(type=FeatureType.VISUAL, shape=(480, 480, 3))
+        elif self.obs_type == "pixels_agent_pos":
+            self.features["agent_pos"] = PolicyFeature(type=FeatureType.STATE, shape=(4,))
+            self.features["pixels/top"] = PolicyFeature(type=FeatureType.VISUAL, shape=(480, 480, 3))
+        else:
+            raise ValueError(f"Unsupported obs_type: {self.obs_type}")
+    @property
+    def gym_kwargs(self) -> dict:
+        return {
+            "obs_type": self.obs_type,
+            "render_mode": self.render_mode,
+        }
+@EnvConfig.register_subclass("isaaclab_arena")
+@dataclass
+class IsaaclabArenaEnv(HubEnvConfig):
+    hub_path: str = "nvidia/isaaclab-arena-envs"
+    episode_length: int = 300
+    num_envs: int = 1
+    embodiment: str | None = "gr1_pink"
+    object: str | None = "power_drill"
+    mimic: bool = False
+    teleop_device: str | None = None
+    seed: int | None = 42
+    device: str | None = "cuda:0"
+    disable_fabric: bool = False
+    enable_cameras: bool = False
+    headless: bool = False
+    enable_pinocchio: bool = True
+    environment: str | None = "gr1_microwave"
+    task: str | None = "Reach out to the microwave and open it."
+    state_dim: int = 54
+    action_dim: int = 36
+    camera_height: int = 512
+    camera_width: int = 512
+    video: bool = False
+    video_length: int = 100
+    video_interval: int = 200
+    # Comma-separated keys, e.g., "robot_joint_pos,left_eef_pos"
+    state_keys: str = "robot_joint_pos"
+    # Comma-separated keys, e.g., "robot_pov_cam_rgb,front_cam_rgb"
+    # Set to None or "" for environments without cameras
+    camera_keys: str | None = None
+    features: dict[str, PolicyFeature] = field(default_factory=dict)
+    features_map: dict[str, str] = field(default_factory=dict)
+    kwargs: dict | None = None
+    def __post_init__(self):
+        if self.kwargs:
+            # dynamically convert kwargs to fields in the dataclass
+            # NOTE! the new fields will not bee seen by the dataclass repr
+            field_names = {f.name for f in fields(self)}
+            for key, value in self.kwargs.items():
+                if key not in field_names and key != "kwargs":
+                    setattr(self, key, value)
+            self.kwargs = None
+        # Set action feature
+        self.features[ACTION] = PolicyFeature(type=FeatureType.ACTION, shape=(self.action_dim,))
+        self.features_map[ACTION] = ACTION
+        # Set state feature
+        self.features[OBS_STATE] = PolicyFeature(type=FeatureType.STATE, shape=(self.state_dim,))
+        self.features_map[OBS_STATE] = OBS_STATE
+        # Add camera features for each camera key
+        if self.enable_cameras and self.camera_keys:
+            for cam_key in self.camera_keys.split(","):
+                cam_key = cam_key.strip()
+                if cam_key:
+                    self.features[cam_key] = PolicyFeature(
+                        type=FeatureType.VISUAL,
+                        shape=(self.camera_height, self.camera_width, 3),
+                    )
+                    self.features_map[cam_key] = f"{OBS_IMAGES}.{cam_key}"
+    @property
+    def gym_kwargs(self) -> dict:
+        return {}
+# ------------------------ Robocasa365 --------------------------------
+@EnvConfig.register_subclass("robocasa")
+@dataclass
+class RoboCasaEnv(HubEnvConfig):
+    hub_path: str = "Whalswp/RoboCasa_Env"
+    task: str | None = None
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    camera_name: str = "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right"
+    observation_height: int = 256
+    observation_width: int = 256
+    split: str | None = None
+    # VLA 모델 등에서 사용할 Observation & Action 규격 매핑
+    features: dict[str, PolicyFeature] = field(default_factory=lambda: {
+        ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(12,)),
+        "agent_pos": PolicyFeature(type=FeatureType.STATE, shape=(16,)),
+        "pixels/robot0_agentview_left": PolicyFeature(type=FeatureType.VISUAL, shape=(256, 256, 3)),
+        "pixels/robot0_agentview_right": PolicyFeature(type=FeatureType.VISUAL, shape=(256, 256, 3)),
+        "pixels/robot0_eye_in_hand": PolicyFeature(type=FeatureType.VISUAL, shape=(256, 256, 3)),
+    })
+    features_map: dict[str, str] = field(default_factory=lambda: {
+        ACTION: ACTION,
+        "agent_pos": OBS_STATE,
+        "pixels/robot0_agentview_left": f"{OBS_IMAGES}.robot0_agentview_left",
+        "pixels/robot0_agentview_right": f"{OBS_IMAGES}.robot0_agentview_right",
+        "pixels/robot0_eye_in_hand": f"{OBS_IMAGES}.robot0_eye_in_hand",
+    })

backup/env.py ADDED Viewed

	@@ -0,0 +1,248 @@

+# env.py
+import gymnasium as gym
+from gymnasium import spaces
+import numpy as np
+from collections import defaultdict
+from collections.abc import Callable, Sequence, Mapping
+from functools import partial
+from typing import Any
+# RoboCasa 전용 라이브러리 임포트
+from robocasa.wrappers.gym_wrapper import RoboCasaGymEnv
+from robocasa.utils.dataset_registry import ATOMIC_TASK_DATASETS, COMPOSITE_TASK_DATASETS, TARGET_TASKS, PRETRAINING_TASKS
+OBS_STATE_DIM = 16
+ACTION_DIM = 12
+ACTION_LOW = -1.0
+ACTION_HIGH = 1.0
+def convert_state(dict_state):
+    """시뮬레이터 상태를 LeRobot이 기대하는 형태로 변환(Conversion)합니다."""
+    dict_state = dict_state.copy()
+    final_state = np.concatenate([
+        dict_state["state.base_position"],
+        dict_state["state.base_rotation"],
+        dict_state["state.end_effector_position_relative"],
+        dict_state["state.end_effector_rotation_relative"],
+        dict_state["state.gripper_qpos"],
+    ], axis=0)
+    return final_state
+def convert_action(action):
+    """LeRobot의 액션을 시뮬레이터가 이해하는 dict 형태로 변환합니다."""
+    action = action.copy()
+    output_action = {
+        "action.base_motion": action[0:4],
+        "action.control_mode": action[4:5],
+        "action.end_effector_position": action[5:8],
+        "action.end_effector_rotation": action[8:11],
+        "action.gripper_close": action[11:12],
+    }
+    return output_action
+def _parse_camera_names(camera_name: str | Sequence[str]) -> list[str]:
+    """카메라 이름을 리스트 형태로 정규화(Normalization)합니다."""
+    if isinstance(camera_name, str):
+        cams = [c.strip() for c in camera_name.split(",") if c.strip()]
+    elif isinstance(camera_name, (list, tuple)):
+        cams = [str(c).strip() for c in camera_name if str(c).strip()]
+    else:
+        raise TypeError(f"camera_name must be str or sequence[str], got {type(camera_name).__name__}")
+    if not cams:
+        raise ValueError("camera_name resolved to an empty list.")
+    return cams
+class RoboCasaEnv(RoboCasaGymEnv):
+    metadata = {"render_modes": ["rgb_array"], "render_fps": 20}
+    def __init__(
+        self,
+        task: str,
+        camera_name: Sequence[str] = ["robot0_agentview_left", "robot0_eye_in_hand", "robot0_agentview_right"],
+        render_mode: str = "rgb_array",
+        obs_type: str = "pixels_agent_pos",
+        observation_width: int = 256,
+        observation_height: int = 256,
+        split: str | None = None,
+        **kwargs
+    ):
+        self.obs_type = obs_type
+        self.render_mode = render_mode
+        self.split = split
+        self.task = task
+        self._task_description = ""
+        kwargs.pop("fps", None)
+        self.kwargs = kwargs
+        meta_info = {**ATOMIC_TASK_DATASETS, **COMPOSITE_TASK_DATASETS}
+        try:
+            self._max_episode_steps = meta_info[task]['horizon']
+        except KeyError:
+            raise ValueError(f"Unknown task '{task}'. Valid tasks are: {list(meta_info.keys())}")
+        super().__init__(
+            task,
+            camera_names=camera_name,
+            camera_widths=observation_width,
+            camera_heights=observation_height,
+            enable_render=(render_mode is not None),
+            split=split,
+            **kwargs
+        )
+    def _create_obs_and_action_space(self):
+        images = {}
+        for cam in self.camera_names:
+            images[cam] = spaces.Box(
+                low=0, high=255, shape=(self.camera_heights, self.camera_widths, 3), dtype=np.uint8
+            )
+        if self.obs_type == "state":
+            raise NotImplementedError("The 'state' observation type is not supported.")
+        elif self.obs_type == "pixels":
+            self.observation_space = spaces.Dict({"pixels": spaces.Dict(images)})
+        elif self.obs_type == "pixels_agent_pos":
+            self.observation_space = spaces.Dict({
+                "pixels": spaces.Dict(images),
+                "agent_pos": spaces.Box(low=-1000, high=1000, shape=(OBS_STATE_DIM,), dtype=np.float32),
+            })
+        else:
+            raise ValueError(f"Unknown obs_type: {self.obs_type}")
+        self.action_space = spaces.Box(
+            low=ACTION_LOW, high=ACTION_HIGH, shape=(int(ACTION_DIM),), dtype=np.float32
+        )
+    @property
+    def task_description(self) -> str:
+        return self._task_description
+    def reset(self, seed: int | None = None, **kwargs):
+        self.unwrapped.sim._render_context_offscreen.gl_ctx.free()
+        observation, info = super().reset(seed, **kwargs)
+        self._task_description = self.env.get_ep_meta().get("lang", self.task)
+        print(f"[RoboCasaEnv] task_description: {self._task_description!r}")
+        return self._format_raw_obs(observation), info
+    def _format_raw_obs(self, raw_obs: dict):
+        new_obs = {}
+        if self.obs_type == "pixels_agent_pos":
+            new_obs["agent_pos"] = convert_state(raw_obs)
+        new_obs["pixels"] = {}
+        for k, v in raw_obs.items():
+            if "video." in k:
+                new_obs["pixels"][k.replace("video.", "")] = v
+        return new_obs
+    def step(self, action: np.ndarray):
+        self.unwrapped.sim._render_context_offscreen.gl_ctx.make_current()
+        action_dict = convert_action(action)
+        observation, reward, done, truncated, info = super().step(action_dict)
+        new_obs = self._format_raw_obs(observation)
+        is_success = bool(info.get("success", 0))
+        terminated = done or is_success
+        info.update({"task": self.task, "done": done, "is_success": is_success})
+        if terminated:
+            info["final_info"] = {"task": self.task, "done": bool(done), "is_success": bool(is_success)}
+            self.reset()
+        return new_obs, reward, terminated, truncated, info
+    def render(self):
+        frame = super().render()
+        if frame is None:
+            return frame
+        from PIL import Image, ImageDraw, ImageFont
+        import textwrap
+        text = self._task_description or self.task
+        w = frame.shape[1]
+        try:
+            font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14)
+        except Exception:
+            font = ImageFont.load_default()
+        lines = textwrap.wrap(text, width=55)
+        line_h = 18
+        bar_h = len(lines) * line_h + 10
+        bar = Image.new("RGB", (w, bar_h), color=(30, 30, 30))
+        draw = ImageDraw.Draw(bar)
+        for i, line in enumerate(lines):
+            draw.text((8, 5 + i * line_h), line, font=font, fill=(220, 220, 220))
+        return np.concatenate([frame, np.array(bar)], axis=0)
+def _make_env_fns(task_name: str, n_envs: int, camera_names: list[str], gym_kwargs: Mapping[str, Any]):
+    def _make_env(episode_index: int, **kwargs):
+        seed = kwargs.pop("seed", episode_index)
+        return RoboCasaEnv(task=task_name, camera_name=camera_names, seed=seed, **kwargs)
+    return [partial(_make_env, i, **gym_kwargs) for i in range(n_envs)]
+# ======================================================================
+# LeRobot Hub 필수 진입점(Entry Point)
+# ======================================================================
+def make_env(n_envs: int = 1, use_async_envs: bool = False, cfg=None) -> dict[str, dict[int, Any]]:
+    """
+    LeRobot이 Hub에서 환경을 로드할 때 호출하는 메인 함수입니다.
+    """
+    # 환경 래퍼 클래스 선택
+    env_cls = partial(gym.vector.AsyncVectorEnv, context="spawn") if use_async_envs else gym.vector.SyncVectorEnv
+    # 설정값 추출 (cfg 객체가 있으면 사용하고, 없으면 기본값 적용)
+    if cfg is not None:
+        task_name = getattr(cfg, "task", "CloseFridge")
+        fps = getattr(cfg, "fps", 20)  # fps 추출
+        gym_kwargs = {
+            "obs_type": getattr(cfg, "obs_type", "pixels_agent_pos"),
+            "render_mode": getattr(cfg, "render_mode", "rgb_array"), # render_mode 유지
+            "observation_width": getattr(cfg, "observation_width", 256),
+            "observation_height": getattr(cfg, "observation_height", 256),
+            "camera_name": getattr(cfg, "camera_name", "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right"),
+            "split": getattr(cfg, "split", None),
+            "fps": fps,  # 핵심 인자 누락 방지
+        }
+    else:
+        # cfg 없이 직접 호출될 때의 기본값
+        task_name = "CloseFridge"
+        gym_kwargs = {
+            "obs_type": "pixels_agent_pos",
+            "render_mode": "rgb_array",
+            "observation_width": 256,
+            "observation_height": 256,
+            "camera_name": "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right",
+            "split": None,
+        }
+    parsed_camera_names = _parse_camera_names(gym_kwargs.pop("camera_name"))
+    combined_tasks = {**TARGET_TASKS, **PRETRAINING_TASKS}
+    # 벤치마크인지 단일 태스크인지 구분
+    if task_name in combined_tasks:
+        task_names = combined_tasks[task_name]
+        gym_kwargs["split"] = "target" if task_name in TARGET_TASKS else "pretrain"
+    else:
+        task_names = [t.strip() for t in task_name.split(",")]
+    out = defaultdict(dict)
+    # 태스크별로 환경 생성
+    for task in task_names:
+        fns = _make_env_fns(
+            task_name=task,
+            n_envs=n_envs,
+            camera_names=parsed_camera_names,
+            gym_kwargs=gym_kwargs
+        )
+        out[task][0] = env_cls(fns)
+    # {suite_name: {task_id: VectorEnv}} 형태로 반환
+    #return {"robocasa": dict(out)}
+    return {suite: dict(task_map) for suite, task_map in out.items()}

configs.py CHANGED Viewed

@@ -462,14 +462,19 @@ class IsaaclabArenaEnv(HubEnvConfig):
 @dataclass
 class RoboCasaEnv(HubEnvConfig):
-    hub_path: str = "Whalswp/RoboCasa_Env"
-    task: str | None = None
     obs_type: str = "pixels_agent_pos"
     render_mode: str = "rgb_array"
     camera_name: str = "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right"
     observation_height: int = 256
     observation_width: int = 256
     split: str | None = None
     # VLA 모델 등에서 사용할 Observation & Action 규격 매핑

 @dataclass
 class RoboCasaEnv(HubEnvConfig):
+    hub_path: str = "Whalswp/RoboCasa_Env"
+    # 단일 task 이름 또는 benchmark 키(`atomic_seen`, `composite_unseen`, ...).
+    # 공식 GR00T eval처럼 여러 개를 동시에 받을 수 있도록 list도 허용.
+    # 예: --env.task=atomic_seen composite_unseen composite_seen
+    task: str | list[str] | None = None
+    fps: int = 20
     obs_type: str = "pixels_agent_pos"
     render_mode: str = "rgb_array"
     camera_name: str = "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right"
     observation_height: int = 256
     observation_width: int = 256
+    # `pretrain` | `target` | `all` | None (None이면 task로부터 추론)
     split: str | None = None
     # VLA 모델 등에서 사용할 Observation & Action 규격 매핑

configs.py.bak ADDED Viewed

	@@ -0,0 +1,489 @@

+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import abc
+from dataclasses import dataclass, field, fields
+from typing import Any
+import draccus
+from lerobot.configs.types import FeatureType, PolicyFeature
+from lerobot.robots import RobotConfig
+from lerobot.teleoperators.config import TeleoperatorConfig
+from lerobot.utils.constants import (
+    ACTION,
+    LIBERO_KEY_EEF_MAT,
+    LIBERO_KEY_EEF_POS,
+    LIBERO_KEY_EEF_QUAT,
+    LIBERO_KEY_GRIPPER_QPOS,
+    LIBERO_KEY_GRIPPER_QVEL,
+    LIBERO_KEY_JOINTS_POS,
+    LIBERO_KEY_JOINTS_VEL,
+    LIBERO_KEY_PIXELS_AGENTVIEW,
+    LIBERO_KEY_PIXELS_EYE_IN_HAND,
+    OBS_ENV_STATE,
+    OBS_IMAGE,
+    OBS_IMAGES,
+    OBS_STATE,
+)
+@dataclass
+class EnvConfig(draccus.ChoiceRegistry, abc.ABC):
+    task: str | None = None
+    fps: int = 30
+    features: dict[str, PolicyFeature] = field(default_factory=dict)
+    features_map: dict[str, str] = field(default_factory=dict)
+    max_parallel_tasks: int = 1
+    disable_env_checker: bool = True
+    @property
+    def type(self) -> str:
+        return self.get_choice_name(self.__class__)
+    @property
+    def package_name(self) -> str:
+        """Package name to import if environment not found in gym registry"""
+        return f"gym_{self.type}"
+    @property
+    def gym_id(self) -> str:
+        """ID string used in gym.make() to instantiate the environment"""
+        return f"{self.package_name}/{self.task}"
+    @property
+    @abc.abstractmethod
+    def gym_kwargs(self) -> dict:
+        raise NotImplementedError()
+@dataclass
+class HubEnvConfig(EnvConfig):
+    """Base class for environments that delegate creation to a hub-hosted make_env.
+    Hub environments download and execute remote code from the HF Hub.
+    The hub_path points to a repository containing an env.py with a make_env function.
+    """
+    hub_path: str | None = None  # required: e.g., "username/repo" or "username/repo@branch:file.py"
+    @property
+    def gym_kwargs(self) -> dict:
+        # Not used for hub environments - the hub's make_env handles everything
+        return {}
+@EnvConfig.register_subclass("aloha")
+@dataclass
+class AlohaEnv(EnvConfig):
+    task: str | None = "AlohaInsertion-v0"
+    fps: int = 50
+    episode_length: int = 400
+    obs_type: str = "pixels_agent_pos"
+    observation_height: int = 480
+    observation_width: int = 640
+    render_mode: str = "rgb_array"
+    features: dict[str, PolicyFeature] = field(
+        default_factory=lambda: {
+            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(14,)),
+        }
+    )
+    features_map: dict[str, str] = field(
+        default_factory=lambda: {
+            ACTION: ACTION,
+            "agent_pos": OBS_STATE,
+            "top": f"{OBS_IMAGE}.top",
+            "pixels/top": f"{OBS_IMAGES}.top",
+        }
+    )
+    def __post_init__(self):
+        if self.obs_type == "pixels":
+            self.features["top"] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+        elif self.obs_type == "pixels_agent_pos":
+            self.features["agent_pos"] = PolicyFeature(type=FeatureType.STATE, shape=(14,))
+            self.features["pixels/top"] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+    @property
+    def gym_kwargs(self) -> dict:
+        return {
+            "obs_type": self.obs_type,
+            "render_mode": self.render_mode,
+            "max_episode_steps": self.episode_length,
+        }
+@EnvConfig.register_subclass("pusht")
+@dataclass
+class PushtEnv(EnvConfig):
+    task: str | None = "PushT-v0"
+    fps: int = 10
+    episode_length: int = 300
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    visualization_width: int = 384
+    visualization_height: int = 384
+    observation_height: int = 384
+    observation_width: int = 384
+    features: dict[str, PolicyFeature] = field(
+        default_factory=lambda: {
+            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(2,)),
+            "agent_pos": PolicyFeature(type=FeatureType.STATE, shape=(2,)),
+        }
+    )
+    features_map: dict[str, str] = field(
+        default_factory=lambda: {
+            ACTION: ACTION,
+            "agent_pos": OBS_STATE,
+            "environment_state": OBS_ENV_STATE,
+            "pixels": OBS_IMAGE,
+        }
+    )
+    def __post_init__(self):
+        if self.obs_type == "pixels_agent_pos":
+            self.features["pixels"] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+        elif self.obs_type == "environment_state_agent_pos":
+            self.features["environment_state"] = PolicyFeature(type=FeatureType.ENV, shape=(16,))
+    @property
+    def gym_kwargs(self) -> dict:
+        return {
+            "obs_type": self.obs_type,
+            "render_mode": self.render_mode,
+            "visualization_width": self.visualization_width,
+            "visualization_height": self.visualization_height,
+            "max_episode_steps": self.episode_length,
+        }
+@dataclass
+class ImagePreprocessingConfig:
+    crop_params_dict: dict[str, tuple[int, int, int, int]] | None = None
+    resize_size: tuple[int, int] | None = None
+@dataclass
+class RewardClassifierConfig:
+    """Configuration for reward classification."""
+    pretrained_path: str | None = None
+    success_threshold: float = 0.5
+    success_reward: float = 1.0
+@dataclass
+class InverseKinematicsConfig:
+    """Configuration for inverse kinematics processing."""
+    urdf_path: str | None = None
+    target_frame_name: str | None = None
+    end_effector_bounds: dict[str, list[float]] | None = None
+    end_effector_step_sizes: dict[str, float] | None = None
+@dataclass
+class ObservationConfig:
+    """Configuration for observation processing."""
+    add_joint_velocity_to_observation: bool = False
+    add_current_to_observation: bool = False
+    add_ee_pose_to_observation: bool = False
+    display_cameras: bool = False
+@dataclass
+class GripperConfig:
+    """Configuration for gripper control and penalties."""
+    use_gripper: bool = True
+    gripper_penalty: float = 0.0
+@dataclass
+class ResetConfig:
+    """Configuration for environment reset behavior."""
+    fixed_reset_joint_positions: Any | None = None
+    reset_time_s: float = 5.0
+    control_time_s: float = 20.0
+    terminate_on_success: bool = True
+@dataclass
+class HILSerlProcessorConfig:
+    """Configuration for environment processing pipeline."""
+    control_mode: str = "gamepad"
+    observation: ObservationConfig | None = None
+    image_preprocessing: ImagePreprocessingConfig | None = None
+    gripper: GripperConfig | None = None
+    reset: ResetConfig | None = None
+    inverse_kinematics: InverseKinematicsConfig | None = None
+    reward_classifier: RewardClassifierConfig | None = None
+    max_gripper_pos: float | None = 100.0
+@EnvConfig.register_subclass(name="gym_manipulator")
+@dataclass
+class HILSerlRobotEnvConfig(EnvConfig):
+    """Configuration for the HILSerlRobotEnv environment."""
+    robot: RobotConfig | None = None
+    teleop: TeleoperatorConfig | None = None
+    processor: HILSerlProcessorConfig = field(default_factory=HILSerlProcessorConfig)
+    name: str = "real_robot"
+    @property
+    def gym_kwargs(self) -> dict:
+        return {}
+@EnvConfig.register_subclass("libero")
+@dataclass
+class LiberoEnv(EnvConfig):
+    task: str = "libero_10"  # can also choose libero_spatial, libero_object, etc.
+    task_ids: list[int] | None = None
+    fps: int = 30
+    episode_length: int | None = None
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    camera_name: str = "agentview_image,robot0_eye_in_hand_image"
+    init_states: bool = True
+    camera_name_mapping: dict[str, str] | None = None
+    observation_height: int = 360
+    observation_width: int = 360
+    features: dict[str, PolicyFeature] = field(
+        default_factory=lambda: {
+            ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(7,)),
+        }
+    )
+    features_map: dict[str, str] = field(
+        default_factory=lambda: {
+            ACTION: ACTION,
+            LIBERO_KEY_EEF_POS: f"{OBS_STATE}.eef_pos",
+            LIBERO_KEY_EEF_QUAT: f"{OBS_STATE}.eef_quat",
+            LIBERO_KEY_EEF_MAT: f"{OBS_STATE}.eef_mat",
+            LIBERO_KEY_GRIPPER_QPOS: f"{OBS_STATE}.gripper_qpos",
+            LIBERO_KEY_GRIPPER_QVEL: f"{OBS_STATE}.gripper_qvel",
+            LIBERO_KEY_JOINTS_POS: f"{OBS_STATE}.joint_pos",
+            LIBERO_KEY_JOINTS_VEL: f"{OBS_STATE}.joint_vel",
+            LIBERO_KEY_PIXELS_AGENTVIEW: f"{OBS_IMAGES}.image",
+            LIBERO_KEY_PIXELS_EYE_IN_HAND: f"{OBS_IMAGES}.image2",
+        }
+    )
+    control_mode: str = "relative"  # or "absolute"
+    def __post_init__(self):
+        if self.obs_type == "pixels":
+            self.features[LIBERO_KEY_PIXELS_AGENTVIEW] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+            self.features[LIBERO_KEY_PIXELS_EYE_IN_HAND] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+        elif self.obs_type == "pixels_agent_pos":
+            self.features[LIBERO_KEY_PIXELS_AGENTVIEW] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+            self.features[LIBERO_KEY_PIXELS_EYE_IN_HAND] = PolicyFeature(
+                type=FeatureType.VISUAL, shape=(self.observation_height, self.observation_width, 3)
+            )
+            self.features[LIBERO_KEY_EEF_POS] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(3,),
+            )
+            self.features[LIBERO_KEY_EEF_QUAT] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(4,),
+            )
+            self.features[LIBERO_KEY_EEF_MAT] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(3, 3),
+            )
+            self.features[LIBERO_KEY_GRIPPER_QPOS] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(2,),
+            )
+            self.features[LIBERO_KEY_GRIPPER_QVEL] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(2,),
+            )
+            self.features[LIBERO_KEY_JOINTS_POS] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(7,),
+            )
+            self.features[LIBERO_KEY_JOINTS_VEL] = PolicyFeature(
+                type=FeatureType.STATE,
+                shape=(7,),
+            )
+        else:
+            raise ValueError(f"Unsupported obs_type: {self.obs_type}")
+    @property
+    def gym_kwargs(self) -> dict:
+        kwargs: dict[str, Any] = {"obs_type": self.obs_type, "render_mode": self.render_mode}
+        if self.task_ids is not None:
+            kwargs["task_ids"] = self.task_ids
+        return kwargs
+@EnvConfig.register_subclass("metaworld")
+@dataclass
+class MetaworldEnv(EnvConfig):
+    task: str = "metaworld-push-v2"  # add all tasks
+    fps: int = 80
+    episode_length: int = 400
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    multitask_eval: bool = True
+    features: dict[str, PolicyFeature] = field(
+        default_factory=lambda: {
+            "action": PolicyFeature(type=FeatureType.ACTION, shape=(4,)),
+        }
+    )
+    features_map: dict[str, str] = field(
+        default_factory=lambda: {
+            "action": ACTION,
+            "agent_pos": OBS_STATE,
+            "top": f"{OBS_IMAGE}",
+            "pixels/top": f"{OBS_IMAGE}",
+        }
+    )
+    def __post_init__(self):
+        if self.obs_type == "pixels":
+            self.features["top"] = PolicyFeature(type=FeatureType.VISUAL, shape=(480, 480, 3))
+        elif self.obs_type == "pixels_agent_pos":
+            self.features["agent_pos"] = PolicyFeature(type=FeatureType.STATE, shape=(4,))
+            self.features["pixels/top"] = PolicyFeature(type=FeatureType.VISUAL, shape=(480, 480, 3))
+        else:
+            raise ValueError(f"Unsupported obs_type: {self.obs_type}")
+    @property
+    def gym_kwargs(self) -> dict:
+        return {
+            "obs_type": self.obs_type,
+            "render_mode": self.render_mode,
+        }
+@EnvConfig.register_subclass("isaaclab_arena")
+@dataclass
+class IsaaclabArenaEnv(HubEnvConfig):
+    hub_path: str = "nvidia/isaaclab-arena-envs"
+    episode_length: int = 300
+    num_envs: int = 1
+    embodiment: str | None = "gr1_pink"
+    object: str | None = "power_drill"
+    mimic: bool = False
+    teleop_device: str | None = None
+    seed: int | None = 42
+    device: str | None = "cuda:0"
+    disable_fabric: bool = False
+    enable_cameras: bool = False
+    headless: bool = False
+    enable_pinocchio: bool = True
+    environment: str | None = "gr1_microwave"
+    task: str | None = "Reach out to the microwave and open it."
+    state_dim: int = 54
+    action_dim: int = 36
+    camera_height: int = 512
+    camera_width: int = 512
+    video: bool = False
+    video_length: int = 100
+    video_interval: int = 200
+    # Comma-separated keys, e.g., "robot_joint_pos,left_eef_pos"
+    state_keys: str = "robot_joint_pos"
+    # Comma-separated keys, e.g., "robot_pov_cam_rgb,front_cam_rgb"
+    # Set to None or "" for environments without cameras
+    camera_keys: str | None = None
+    features: dict[str, PolicyFeature] = field(default_factory=dict)
+    features_map: dict[str, str] = field(default_factory=dict)
+    kwargs: dict | None = None
+    def __post_init__(self):
+        if self.kwargs:
+            # dynamically convert kwargs to fields in the dataclass
+            # NOTE! the new fields will not bee seen by the dataclass repr
+            field_names = {f.name for f in fields(self)}
+            for key, value in self.kwargs.items():
+                if key not in field_names and key != "kwargs":
+                    setattr(self, key, value)
+            self.kwargs = None
+        # Set action feature
+        self.features[ACTION] = PolicyFeature(type=FeatureType.ACTION, shape=(self.action_dim,))
+        self.features_map[ACTION] = ACTION
+        # Set state feature
+        self.features[OBS_STATE] = PolicyFeature(type=FeatureType.STATE, shape=(self.state_dim,))
+        self.features_map[OBS_STATE] = OBS_STATE
+        # Add camera features for each camera key
+        if self.enable_cameras and self.camera_keys:
+            for cam_key in self.camera_keys.split(","):
+                cam_key = cam_key.strip()
+                if cam_key:
+                    self.features[cam_key] = PolicyFeature(
+                        type=FeatureType.VISUAL,
+                        shape=(self.camera_height, self.camera_width, 3),
+                    )
+                    self.features_map[cam_key] = f"{OBS_IMAGES}.{cam_key}"
+    @property
+    def gym_kwargs(self) -> dict:
+        return {}
+# ------------------------ Robocasa365 --------------------------------
+@EnvConfig.register_subclass("robocasa")
+@dataclass
+class RoboCasaEnv(HubEnvConfig):
+    hub_path: str = "Whalswp/RoboCasa_Env"
+    task: str | None = None
+    obs_type: str = "pixels_agent_pos"
+    render_mode: str = "rgb_array"
+    camera_name: str = "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right"
+    observation_height: int = 256
+    observation_width: int = 256
+    split: str | None = None
+    # VLA 모델 등에서 사용할 Observation & Action 규격 매핑
+    features: dict[str, PolicyFeature] = field(default_factory=lambda: {
+        ACTION: PolicyFeature(type=FeatureType.ACTION, shape=(12,)),
+        "agent_pos": PolicyFeature(type=FeatureType.STATE, shape=(16,)),
+        "pixels/robot0_agentview_left": PolicyFeature(type=FeatureType.VISUAL, shape=(256, 256, 3)),
+        "pixels/robot0_agentview_right": PolicyFeature(type=FeatureType.VISUAL, shape=(256, 256, 3)),
+        "pixels/robot0_eye_in_hand": PolicyFeature(type=FeatureType.VISUAL, shape=(256, 256, 3)),
+    })
+    features_map: dict[str, str] = field(default_factory=lambda: {
+        ACTION: ACTION,
+        "agent_pos": OBS_STATE,
+        "pixels/robot0_agentview_left": f"{OBS_IMAGES}.robot0_agentview_left",
+        "pixels/robot0_agentview_right": f"{OBS_IMAGES}.robot0_agentview_right",
+        "pixels/robot0_eye_in_hand": f"{OBS_IMAGES}.robot0_eye_in_hand",
+    })

env.py CHANGED Viewed

@@ -3,21 +3,30 @@ import gymnasium as gym
 from gymnasium import spaces
 import numpy as np
 from collections import defaultdict
-from collections.abc import Callable, Sequence, Mapping
 from functools import partial
 from typing import Any
 # RoboCasa 전용 라이브러리 임포트
 from robocasa.wrappers.gym_wrapper import RoboCasaGymEnv
-from robocasa.utils.dataset_registry import ATOMIC_TASK_DATASETS, COMPOSITE_TASK_DATASETS, TARGET_TASKS, PRETRAINING_TASKS
 OBS_STATE_DIM = 16
 ACTION_DIM = 12
 ACTION_LOW = -1.0
 ACTION_HIGH = 1.0
-def convert_state(dict_state):
-    """시뮬레이터 상태를 LeRobot이 기대하는 형태로 변환(Conversion)합니다."""
     dict_state = dict_state.copy()
     final_state = np.concatenate([
         dict_state["state.base_position"],
@@ -26,11 +35,14 @@ def convert_state(dict_state):
         dict_state["state.end_effector_rotation_relative"],
         dict_state["state.gripper_qpos"],
     ], axis=0)
-    return final_state
 def convert_action(action):
-    """LeRobot의 액션을 시뮬레이터가 이해하는 dict 형태로 변환합니다."""
-    action = action.copy()
     output_action = {
         "action.base_motion": action[0:4],
         "action.control_mode": action[4:5],
@@ -40,8 +52,8 @@ def convert_action(action):
     }
     return output_action
-def _parse_camera_names(camera_name: str | Sequence[str]) -> list[str]:
-    """카메라 이름을 리스트 형태로 정규화(Normalization)합니다."""
     if isinstance(camera_name, str):
         cams = [c.strip() for c in camera_name.split(",") if c.strip()]
     elif isinstance(camera_name, (list, tuple)):
@@ -52,45 +64,71 @@ def _parse_camera_names(camera_name: str | Sequence[str]) -> list[str]:
         raise ValueError("camera_name resolved to an empty list.")
     return cams
 class RoboCasaEnv(RoboCasaGymEnv):
     metadata = {"render_modes": ["rgb_array"], "render_fps": 20}
     def __init__(
         self,
         task: str,
-        camera_name: Sequence[str] = ["robot0_agentview_left", "robot0_eye_in_hand", "robot0_agentview_right"],
         render_mode: str = "rgb_array",
         obs_type: str = "pixels_agent_pos",
         observation_width: int = 256,
         observation_height: int = 256,
         split: str | None = None,
-        **kwargs
     ):
         self.obs_type = obs_type
         self.render_mode = render_mode
         self.split = split
         self.task = task
-        self._task_description = ""
         kwargs.pop("fps", None)
         self.kwargs = kwargs
-        meta_info = {**ATOMIC_TASK_DATASETS, **COMPOSITE_TASK_DATASETS}
         try:
-            self._max_episode_steps = meta_info[task]['horizon']
-        except KeyError:
-            raise ValueError(f"Unknown task '{task}'. Valid tasks are: {list(meta_info.keys())}")
         super().__init__(
             task,
-            camera_names=camera_name,
             camera_widths=observation_width,
             camera_heights=observation_height,
             enable_render=(render_mode is not None),
             split=split,
-            **kwargs
         )
     def _create_obs_and_action_space(self):
         images = {}
         for cam in self.camera_names:
@@ -100,11 +138,15 @@ class RoboCasaEnv(RoboCasaGymEnv):
         if self.obs_type == "state":
             raise NotImplementedError("The 'state' observation type is not supported.")
         elif self.obs_type == "pixels":
-            self.observation_space = spaces.Dict({"pixels": spaces.Dict(images)})
         elif self.obs_type == "pixels_agent_pos":
             self.observation_space = spaces.Dict({
                 "pixels": spaces.Dict(images),
                 "agent_pos": spaces.Box(low=-1000, high=1000, shape=(OBS_STATE_DIM,), dtype=np.float32),
             })
         else:
             raise ValueError(f"Unknown obs_type: {self.obs_type}")
@@ -117,38 +159,75 @@ class RoboCasaEnv(RoboCasaGymEnv):
     def task_description(self) -> str:
         return self._task_description
-    def reset(self, seed: int | None = None, **kwargs):
-        self.unwrapped.sim._render_context_offscreen.gl_ctx.free()
-        observation, info = super().reset(seed, **kwargs)
-        self._task_description = self.env.get_ep_meta().get("lang", self.task)
-        print(f"[RoboCasaEnv] task_description: {self._task_description!r}")
-        return self._format_raw_obs(observation), info
-    def _format_raw_obs(self, raw_obs: dict):
-        new_obs = {}
         if self.obs_type == "pixels_agent_pos":
             new_obs["agent_pos"] = convert_state(raw_obs)
         new_obs["pixels"] = {}
         for k, v in raw_obs.items():
-            if "video." in k:
                 new_obs["pixels"][k.replace("video.", "")] = v
         return new_obs
     def step(self, action: np.ndarray):
-        self.unwrapped.sim._render_context_offscreen.gl_ctx.make_current()
         action_dict = convert_action(action)
         observation, reward, done, truncated, info = super().step(action_dict)
         new_obs = self._format_raw_obs(observation)
         is_success = bool(info.get("success", 0))
-        terminated = done or is_success
-        info.update({"task": self.task, "done": done, "is_success": is_success})
-        if terminated:
-            info["final_info"] = {"task": self.task, "done": bool(done), "is_success": bool(is_success)}
-            self.reset()
-        return new_obs, reward, terminated, truncated, info
     def render(self):
         frame = super().render()
@@ -185,71 +264,99 @@ def _make_env_fns(task_name: str, n_envs: int, camera_names: list[str], gym_kwar
     return [partial(_make_env, i, **gym_kwargs) for i in range(n_envs)]
 # ======================================================================
 # LeRobot Hub 필수 진입점(Entry Point)
 # ======================================================================
 def make_env(n_envs: int = 1, use_async_envs: bool = False, cfg=None) -> dict[str, dict[int, Any]]:
     """
-    LeRobot이 Hub에서 환경을 로드할 때 호출하는 메인 함수입니다.
-    """
-    # 환경 래퍼 클래스 선택
-    env_cls = partial(gym.vector.AsyncVectorEnv, context="spawn") if use_async_envs else gym.vector.SyncVectorEnv
-    # 설정값 추출 (cfg 객체가 있으면 사용하고, 없으면 기본값 적용)
     if cfg is not None:
-        task_name = getattr(cfg, "task", "CloseFridge")
-        fps = getattr(cfg, "fps", 20)  # fps 추출
         gym_kwargs = {
             "obs_type": getattr(cfg, "obs_type", "pixels_agent_pos"),
-            "render_mode": getattr(cfg, "render_mode", "rgb_array"), # render_mode 유지
             "observation_width": getattr(cfg, "observation_width", 256),
             "observation_height": getattr(cfg, "observation_height", 256),
-            "camera_name": getattr(cfg, "camera_name", "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right"),
-            "split": getattr(cfg, "split", None),
-            "fps": fps,  # 핵심 인자 누락 방지
         }
     else:
-        # cfg 없이 직접 호출될 때의 기본값
-        task_name = "CloseFridge"
         gym_kwargs = {
             "obs_type": "pixels_agent_pos",
             "render_mode": "rgb_array",
             "observation_width": 256,
             "observation_height": 256,
             "camera_name": "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right",
-            "split": None,
         }
     parsed_camera_names = _parse_camera_names(gym_kwargs.pop("camera_name"))
-    combined_tasks = {**TARGET_TASKS, **PRETRAINING_TASKS}
-    # 벤치마크인지 단일 태스크인지 구분
-    parts = [t.strip() for t in task_name.split(",")]
-    if len(parts) == 1 and parts[0] in combined_tasks:
-        task_names = combined_tasks[parts[0]]
-        if gym_kwargs.get("split") is None:
-            gym_kwargs["split"] = "target" if parts[0] in TARGET_TASKS else "pretrain"
-    else:
-        task_names = []
-        for part in parts:
-            if part in combined_tasks:
-                task_names.extend(combined_tasks[part])
-            else:
-                task_names.append(part)
-    out = defaultdict(dict)
-    # 태스크별로 환경 생성
-    for task in task_names:
         fns = _make_env_fns(
             task_name=task,
             n_envs=n_envs,
             camera_names=parsed_camera_names,
-            gym_kwargs=gym_kwargs
         )
         out[task][0] = env_cls(fns)
-    # {suite_name: {task_id: VectorEnv}} 형태로 반환
-    #return {"robocasa": dict(out)}
-    return {suite: dict(task_map) for suite, task_map in out.items()}

 from gymnasium import spaces
 import numpy as np
 from collections import defaultdict
+from collections.abc import Sequence, Mapping
 from functools import partial
 from typing import Any
 # RoboCasa 전용 라이브러리 임포트
 from robocasa.wrappers.gym_wrapper import RoboCasaGymEnv
+from robocasa.utils.dataset_registry import (
+    ATOMIC_TASK_DATASETS,
+    COMPOSITE_TASK_DATASETS,
+    TARGET_TASKS,
+    PRETRAINING_TASKS,
+)
+from robocasa.utils.dataset_registry_utils import get_task_horizon
 OBS_STATE_DIM = 16
 ACTION_DIM = 12
 ACTION_LOW = -1.0
 ACTION_HIGH = 1.0
+def convert_state(dict_state):
+    """시뮬레이터 상태를 LeRobot이 기대하는 16-dim concat 형태로 변환합니다.
+    인덱스 정의는 STATE_ACTION_SPEC.md 참고.
+    """
     dict_state = dict_state.copy()
     final_state = np.concatenate([
         dict_state["state.base_position"],
         dict_state["state.end_effector_rotation_relative"],
         dict_state["state.gripper_qpos"],
     ], axis=0)
+    return final_state.astype(np.float32)
 def convert_action(action):
+    """LeRobot의 12-dim 액션을 시뮬레이터가 이해하는 dict 형태로 변환합니다.
+    인덱스 정의는 STATE_ACTION_SPEC.md 참고.
+    """
+    action = np.asarray(action).copy()
     output_action = {
         "action.base_motion": action[0:4],
         "action.control_mode": action[4:5],
     }
     return output_action
+def _parse_camera_names(camera_name) -> list[str]:
     if isinstance(camera_name, str):
         cams = [c.strip() for c in camera_name.split(",") if c.strip()]
     elif isinstance(camera_name, (list, tuple)):
         raise ValueError("camera_name resolved to an empty list.")
     return cams
+def _normalize_task_arg(task) -> list[str]:
+    """`--env.task=atomic_seen composite_unseen ...` 처럼 들어오는 다양한 형태를
+    list[str]로 정규화한다.
+    - draccus가 list로 파싱한 경우: 그대로 유지
+    - 단일 문자열이면 공백 또는 콤마 분리
+    """
+    if task is None:
+        raise ValueError("task is required")
+    if isinstance(task, (list, tuple)):
+        items = []
+        for t in task:
+            items.extend(_normalize_task_arg(t))
+        return items
+    s = str(task).strip()
+    if not s:
+        return []
+    # 공백/콤마 모두 허용
+    parts = []
+    for chunk in s.replace(",", " ").split():
+        if chunk:
+            parts.append(chunk)
+    return parts
 class RoboCasaEnv(RoboCasaGymEnv):
     metadata = {"render_modes": ["rgb_array"], "render_fps": 20}
     def __init__(
         self,
         task: str,
+        camera_name: Sequence[str] = ("robot0_agentview_left", "robot0_eye_in_hand", "robot0_agentview_right"),
         render_mode: str = "rgb_array",
         obs_type: str = "pixels_agent_pos",
         observation_width: int = 256,
         observation_height: int = 256,
         split: str | None = None,
+        **kwargs,
     ):
         self.obs_type = obs_type
         self.render_mode = render_mode
         self.split = split
         self.task = task
+        self._task_description: str = task
         kwargs.pop("fps", None)
         self.kwargs = kwargs
+        # horizon은 공식 헬퍼 사용. 미등록 태스크는 명시적으로 에러.
         try:
+            self._max_episode_steps = int(get_task_horizon(task))
+        except Exception as e:
+            valid = list({**ATOMIC_TASK_DATASETS, **COMPOSITE_TASK_DATASETS}.keys())
+            raise ValueError(f"Unknown task '{task}'. Valid tasks: {valid[:10]}... ({len(valid)} total)") from e
         super().__init__(
             task,
+            camera_names=list(camera_name),
             camera_widths=observation_width,
             camera_heights=observation_height,
             enable_render=(render_mode is not None),
             split=split,
+            **kwargs,
         )
     def _create_obs_and_action_space(self):
         images = {}
         for cam in self.camera_names:
         if self.obs_type == "state":
             raise NotImplementedError("The 'state' observation type is not supported.")
         elif self.obs_type == "pixels":
+            self.observation_space = spaces.Dict({
+                "pixels": spaces.Dict(images),
+                "task": spaces.Text(max_length=512),
+            })
         elif self.obs_type == "pixels_agent_pos":
             self.observation_space = spaces.Dict({
                 "pixels": spaces.Dict(images),
                 "agent_pos": spaces.Box(low=-1000, high=1000, shape=(OBS_STATE_DIM,), dtype=np.float32),
+                "task": spaces.Text(max_length=512),
             })
         else:
             raise ValueError(f"Unknown obs_type: {self.obs_type}")
     def task_description(self) -> str:
         return self._task_description
+    def _format_raw_obs(self, raw_obs: dict) -> dict:
+        new_obs: dict[str, Any] = {}
         if self.obs_type == "pixels_agent_pos":
             new_obs["agent_pos"] = convert_state(raw_obs)
         new_obs["pixels"] = {}
         for k, v in raw_obs.items():
+            if k.startswith("video."):
                 new_obs["pixels"][k.replace("video.", "")] = v
+        # 언어 조건: AsyncVectorEnv에서도 끊기지 않도록 obs에 직접 노출.
+        # RoboCasaGymEnv가 obs에 채워주는 annotation이 있으면 우선 사용.
+        lang = raw_obs.get("annotation.human.task_description")
+        if not lang:
+            lang = self._task_description or self.task
+        new_obs["task"] = str(lang)
+        self._task_description = str(lang)
         return new_obs
+    def reset(self, seed: int | None = None, **kwargs):
+        # mujoco offscreen GL 컨텍스트가 reset 시점에 stale 상태가 되는 환경에서의
+        # 우회. (사용자 환경에서 필요해 보존)
+        try:
+            self.unwrapped.sim._render_context_offscreen.gl_ctx.free()
+        except Exception:
+            pass
+        observation, info = super().reset(seed=seed, **kwargs)
+        # ep meta의 lang이 더 풍부한 문장을 주므로 가능하면 사용
+        try:
+            ep_lang = self.env.get_ep_meta().get("lang", None)
+            if ep_lang:
+                self._task_description = str(ep_lang)
+        except Exception:
+            pass
+        formatted = self._format_raw_obs(observation)
+        info = dict(info or {})
+        info["task"] = self.task
+        info["task_description"] = self._task_description
+        return formatted, info
     def step(self, action: np.ndarray):
+        try:
+            self.unwrapped.sim._render_context_offscreen.gl_ctx.make_current()
+        except Exception:
+            pass
         action_dict = convert_action(action)
         observation, reward, done, truncated, info = super().step(action_dict)
         new_obs = self._format_raw_obs(observation)
         is_success = bool(info.get("success", 0))
+        # 공식 GR00T eval과 동일하게: success는 termination 신호로 직접 변환하지 않는다.
+        # gymnasium VectorEnv의 autoreset이 terminated/truncated를 보고 final_info를 만든다.
+        terminated = bool(done) or is_success
+        info = dict(info or {})
+        info.update({
+            "task": self.task,
+            "task_description": self._task_description,
+            "is_success": is_success,
+            "success": is_success,
+            "done": bool(done),
+        })
+        # NOTE: 자체 self.reset() 호출은 제거.
+        #   - gymnasium 0.29+ VectorEnv가 terminated/truncated 시 자동으로 reset하고
+        #     final_info를 채워준다. wrapper가 한 번 더 reset하면 첫 obs가 final obs를
+        #     덮어쓰고, lerobot rollout의 final_info["is_success"]가 정합성을 잃는다.
+        return new_obs, float(reward), terminated, truncated, info
     def render(self):
         frame = super().render()
     return [partial(_make_env, i, **gym_kwargs) for i in range(n_envs)]
+def _resolve_task_list(task_arg, explicit_split: str | None):
+    """`task` 인자(문자열/리스트)와 사용자가 명시한 split을 보고
+    실제로 띄울 (task_name, split) 페어 리스트를 만든다.
+    공식 run_eval.py처럼:
+      - benchmark 키(`atomic_seen`, `composite_unseen`, ...)가 들어오면 펼친다.
+      - explicit_split이 지정되면 그것을 우선한다.
+      - 아니면 TARGET_TASKS / PRETRAINING_TASKS 등록부에서 자동 추론.
+      - 단일 task 이름이면 그대로 사용.
+    """
+    items = _normalize_task_arg(task_arg)
+    pairs: list[tuple[str, str | None]] = []
+    for item in items:
+        if item in TARGET_TASKS:
+            split = explicit_split or "target"
+            for sub in TARGET_TASKS[item]:
+                pairs.append((sub, split))
+        elif item in PRETRAINING_TASKS:
+            split = explicit_split or "pretrain"
+            for sub in PRETRAINING_TASKS[item]:
+                pairs.append((sub, split))
+        else:
+            # 단일 task 이름
+            pairs.append((item, explicit_split))
+    # 중복 제거 (순서 유지)
+    seen = set()
+    uniq: list[tuple[str, str | None]] = []
+    for p in pairs:
+        if p in seen:
+            continue
+        seen.add(p)
+        uniq.append(p)
+    return uniq
 # ======================================================================
 # LeRobot Hub 필수 진입점(Entry Point)
 # ======================================================================
 def make_env(n_envs: int = 1, use_async_envs: bool = False, cfg=None) -> dict[str, dict[int, Any]]:
+    """LeRobot이 Hub에서 환경을 로드할 때 호출하는 메인 함수.
+    공식 GR00T eval(`Isaac-GR00T/scripts/run_eval.py`)과 동일하게:
+      - benchmark 키(`atomic_seen`, `composite_unseen`, ...)는 sub-task 리스트로 펼친다.
+      - `cfg.split`이 지정되면 그것을 우선한다.
+      - 각 sub-task는 자신의 horizon으로 별도 VectorEnv를 만든다.
     """
+    env_cls = (
+        partial(gym.vector.AsyncVectorEnv, context="spawn") if use_async_envs else gym.vector.SyncVectorEnv
+    )
     if cfg is not None:
+        task_arg = getattr(cfg, "task", None)
+        explicit_split = getattr(cfg, "split", None)
         gym_kwargs = {
             "obs_type": getattr(cfg, "obs_type", "pixels_agent_pos"),
+            "render_mode": getattr(cfg, "render_mode", "rgb_array"),
             "observation_width": getattr(cfg, "observation_width", 256),
             "observation_height": getattr(cfg, "observation_height", 256),
+            "camera_name": getattr(
+                cfg, "camera_name",
+                "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right",
+            ),
+            "fps": getattr(cfg, "fps", 20),
         }
     else:
+        task_arg = "CloseFridge"
+        explicit_split = None
         gym_kwargs = {
             "obs_type": "pixels_agent_pos",
             "render_mode": "rgb_array",
             "observation_width": 256,
             "observation_height": 256,
             "camera_name": "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right",
+            "fps": 20,
         }
     parsed_camera_names = _parse_camera_names(gym_kwargs.pop("camera_name"))
+    task_split_pairs = _resolve_task_list(task_arg, explicit_split)
+    if not task_split_pairs:
+        raise ValueError(f"No tasks resolved from task={task_arg!r}, split={explicit_split!r}")
+    out: dict[str, dict[int, Any]] = defaultdict(dict)
+    for idx, (task, split) in enumerate(task_split_pairs):
+        per_task_kwargs = dict(gym_kwargs)
+        per_task_kwargs["split"] = split
         fns = _make_env_fns(
             task_name=task,
             n_envs=n_envs,
             camera_names=parsed_camera_names,
+            gym_kwargs=per_task_kwargs,
         )
+        # `{suite: {task_id: VectorEnv}}` 구조: lerobot_eval은 group/task 단위로 집계하므로
+        # task_name 자체를 suite 키로 사용해 per-task SR을 그대로 노출.
         out[task][0] = env_cls(fns)
+    return {suite: dict(task_map) for suite, task_map in out.items()}

env.py.bak ADDED Viewed

	@@ -0,0 +1,248 @@

+# env.py
+import gymnasium as gym
+from gymnasium import spaces
+import numpy as np
+from collections import defaultdict
+from collections.abc import Callable, Sequence, Mapping
+from functools import partial
+from typing import Any
+# RoboCasa 전용 라이브러리 임포트
+from robocasa.wrappers.gym_wrapper import RoboCasaGymEnv
+from robocasa.utils.dataset_registry import ATOMIC_TASK_DATASETS, COMPOSITE_TASK_DATASETS, TARGET_TASKS, PRETRAINING_TASKS
+OBS_STATE_DIM = 16
+ACTION_DIM = 12
+ACTION_LOW = -1.0
+ACTION_HIGH = 1.0
+def convert_state(dict_state):
+    """시뮬레이터 상태를 LeRobot이 기대하는 형태로 변환(Conversion)합니다."""
+    dict_state = dict_state.copy()
+    final_state = np.concatenate([
+        dict_state["state.base_position"],
+        dict_state["state.base_rotation"],
+        dict_state["state.end_effector_position_relative"],
+        dict_state["state.end_effector_rotation_relative"],
+        dict_state["state.gripper_qpos"],
+    ], axis=0)
+    return final_state
+def convert_action(action):
+    """LeRobot의 액션을 시뮬레이터가 이해하는 dict 형태로 변환합니다."""
+    action = action.copy()
+    output_action = {
+        "action.base_motion": action[0:4],
+        "action.control_mode": action[4:5],
+        "action.end_effector_position": action[5:8],
+        "action.end_effector_rotation": action[8:11],
+        "action.gripper_close": action[11:12],
+    }
+    return output_action
+def _parse_camera_names(camera_name: str | Sequence[str]) -> list[str]:
+    """카메라 이름을 리스트 형태로 정규화(Normalization)합니다."""
+    if isinstance(camera_name, str):
+        cams = [c.strip() for c in camera_name.split(",") if c.strip()]
+    elif isinstance(camera_name, (list, tuple)):
+        cams = [str(c).strip() for c in camera_name if str(c).strip()]
+    else:
+        raise TypeError(f"camera_name must be str or sequence[str], got {type(camera_name).__name__}")
+    if not cams:
+        raise ValueError("camera_name resolved to an empty list.")
+    return cams
+class RoboCasaEnv(RoboCasaGymEnv):
+    metadata = {"render_modes": ["rgb_array"], "render_fps": 20}
+    def __init__(
+        self,
+        task: str,
+        camera_name: Sequence[str] = ["robot0_agentview_left", "robot0_eye_in_hand", "robot0_agentview_right"],
+        render_mode: str = "rgb_array",
+        obs_type: str = "pixels_agent_pos",
+        observation_width: int = 256,
+        observation_height: int = 256,
+        split: str | None = None,
+        **kwargs
+    ):
+        self.obs_type = obs_type
+        self.render_mode = render_mode
+        self.split = split
+        self.task = task
+        self._task_description = ""
+        kwargs.pop("fps", None)
+        self.kwargs = kwargs
+        meta_info = {**ATOMIC_TASK_DATASETS, **COMPOSITE_TASK_DATASETS}
+        try:
+            self._max_episode_steps = meta_info[task]['horizon']
+        except KeyError:
+            raise ValueError(f"Unknown task '{task}'. Valid tasks are: {list(meta_info.keys())}")
+        super().__init__(
+            task,
+            camera_names=camera_name,
+            camera_widths=observation_width,
+            camera_heights=observation_height,
+            enable_render=(render_mode is not None),
+            split=split,
+            **kwargs
+        )
+    def _create_obs_and_action_space(self):
+        images = {}
+        for cam in self.camera_names:
+            images[cam] = spaces.Box(
+                low=0, high=255, shape=(self.camera_heights, self.camera_widths, 3), dtype=np.uint8
+            )
+        if self.obs_type == "state":
+            raise NotImplementedError("The 'state' observation type is not supported.")
+        elif self.obs_type == "pixels":
+            self.observation_space = spaces.Dict({"pixels": spaces.Dict(images)})
+        elif self.obs_type == "pixels_agent_pos":
+            self.observation_space = spaces.Dict({
+                "pixels": spaces.Dict(images),
+                "agent_pos": spaces.Box(low=-1000, high=1000, shape=(OBS_STATE_DIM,), dtype=np.float32),
+            })
+        else:
+            raise ValueError(f"Unknown obs_type: {self.obs_type}")
+        self.action_space = spaces.Box(
+            low=ACTION_LOW, high=ACTION_HIGH, shape=(int(ACTION_DIM),), dtype=np.float32
+        )
+    @property
+    def task_description(self) -> str:
+        return self._task_description
+    def reset(self, seed: int | None = None, **kwargs):
+        self.unwrapped.sim._render_context_offscreen.gl_ctx.free()
+        observation, info = super().reset(seed, **kwargs)
+        self._task_description = self.env.get_ep_meta().get("lang", self.task)
+        print(f"[RoboCasaEnv] task_description: {self._task_description!r}")
+        return self._format_raw_obs(observation), info
+    def _format_raw_obs(self, raw_obs: dict):
+        new_obs = {}
+        if self.obs_type == "pixels_agent_pos":
+            new_obs["agent_pos"] = convert_state(raw_obs)
+        new_obs["pixels"] = {}
+        for k, v in raw_obs.items():
+            if "video." in k:
+                new_obs["pixels"][k.replace("video.", "")] = v
+        return new_obs
+    def step(self, action: np.ndarray):
+        self.unwrapped.sim._render_context_offscreen.gl_ctx.make_current()
+        action_dict = convert_action(action)
+        observation, reward, done, truncated, info = super().step(action_dict)
+        new_obs = self._format_raw_obs(observation)
+        is_success = bool(info.get("success", 0))
+        terminated = done or is_success
+        info.update({"task": self.task, "done": done, "is_success": is_success})
+        if terminated:
+            info["final_info"] = {"task": self.task, "done": bool(done), "is_success": bool(is_success)}
+            self.reset()
+        return new_obs, reward, terminated, truncated, info
+    def render(self):
+        frame = super().render()
+        if frame is None:
+            return frame
+        from PIL import Image, ImageDraw, ImageFont
+        import textwrap
+        text = self._task_description or self.task
+        w = frame.shape[1]
+        try:
+            font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14)
+        except Exception:
+            font = ImageFont.load_default()
+        lines = textwrap.wrap(text, width=55)
+        line_h = 18
+        bar_h = len(lines) * line_h + 10
+        bar = Image.new("RGB", (w, bar_h), color=(30, 30, 30))
+        draw = ImageDraw.Draw(bar)
+        for i, line in enumerate(lines):
+            draw.text((8, 5 + i * line_h), line, font=font, fill=(220, 220, 220))
+        return np.concatenate([frame, np.array(bar)], axis=0)
+def _make_env_fns(task_name: str, n_envs: int, camera_names: list[str], gym_kwargs: Mapping[str, Any]):
+    def _make_env(episode_index: int, **kwargs):
+        seed = kwargs.pop("seed", episode_index)
+        return RoboCasaEnv(task=task_name, camera_name=camera_names, seed=seed, **kwargs)
+    return [partial(_make_env, i, **gym_kwargs) for i in range(n_envs)]
+# ======================================================================
+# LeRobot Hub 필수 진입점(Entry Point)
+# ======================================================================
+def make_env(n_envs: int = 1, use_async_envs: bool = False, cfg=None) -> dict[str, dict[int, Any]]:
+    """
+    LeRobot이 Hub에서 환경을 로드할 때 호출하는 메인 함수입니다.
+    """
+    # 환경 래퍼 클래스 선택
+    env_cls = partial(gym.vector.AsyncVectorEnv, context="spawn") if use_async_envs else gym.vector.SyncVectorEnv
+    # 설정값 추출 (cfg 객체가 있으면 사용하고, 없으면 기본값 적용)
+    if cfg is not None:
+        task_name = getattr(cfg, "task", "CloseFridge")
+        fps = getattr(cfg, "fps", 20)  # fps 추출
+        gym_kwargs = {
+            "obs_type": getattr(cfg, "obs_type", "pixels_agent_pos"),
+            "render_mode": getattr(cfg, "render_mode", "rgb_array"), # render_mode 유지
+            "observation_width": getattr(cfg, "observation_width", 256),
+            "observation_height": getattr(cfg, "observation_height", 256),
+            "camera_name": getattr(cfg, "camera_name", "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right"),
+            "split": getattr(cfg, "split", None),
+            "fps": fps,  # 핵심 인자 누락 방지
+        }
+    else:
+        # cfg 없이 직접 호출될 때의 기본값
+        task_name = "CloseFridge"
+        gym_kwargs = {
+            "obs_type": "pixels_agent_pos",
+            "render_mode": "rgb_array",
+            "observation_width": 256,
+            "observation_height": 256,
+            "camera_name": "robot0_agentview_left,robot0_eye_in_hand,robot0_agentview_right",
+            "split": None,
+        }
+    parsed_camera_names = _parse_camera_names(gym_kwargs.pop("camera_name"))
+    combined_tasks = {**TARGET_TASKS, **PRETRAINING_TASKS}
+    # 벤치마크인지 단일 태스크인지 구분
+    if task_name in combined_tasks:
+        task_names = combined_tasks[task_name]
+        gym_kwargs["split"] = "target" if task_name in TARGET_TASKS else "pretrain"
+    else:
+        task_names = [t.strip() for t in task_name.split(",")]
+    out = defaultdict(dict)
+    # 태스크별로 환경 생성
+    for task in task_names:
+        fns = _make_env_fns(
+            task_name=task,
+            n_envs=n_envs,
+            camera_names=parsed_camera_names,
+            gym_kwargs=gym_kwargs
+        )
+        out[task][0] = env_cls(fns)
+    # {suite_name: {task_id: VectorEnv}} 형태로 반환
+    #return {"robocasa": dict(out)}
+    return {suite: dict(task_map) for suite, task_map in out.items()}