Spaces:

Sushruth21
/

energy_optimization

Sleeping

App Files Files Community

Sushruth21 commited on Apr 8

Commit

c7e8ea1

verified ·

1 Parent(s): aaaafca

Upload folder using huggingface_hub

Browse files

Files changed (18) hide show

Dockerfile.simple +28 -0
README.md +157 -157
__init__.py +17 -17
client.py +123 -120
inference.py +78 -17
models.py +74 -74
openenv.yaml +7 -7
openenv_he_demo.egg-info/SOURCES.txt +24 -24
openenv_he_demo.egg-info/dependency_links.txt +1 -1
openenv_he_demo.egg-info/entry_points.txt +2 -2
openenv_he_demo.egg-info/requires.txt +10 -10
openenv_he_demo.egg-info/top_level.txt +1 -1
pyproject.toml +44 -44
server/__init__.py +11 -11
server/app.py +80 -80
server/he_demo_environment.py +318 -318
server/requirements.txt +6 -6
uv.lock +0 -0

Dockerfile.simple ADDED Viewed

	@@ -0,0 +1,28 @@

+# Simple Dockerfile for Energy & Memory RAM Optimization Environment
+FROM python:3.11-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy project files
+COPY pyproject.toml uv.lock ./
+COPY . .
+# Install uv if not available
+RUN pip install uv
+# Install dependencies
+RUN uv sync --frozen --no-install-project
+# Install the project itself
+RUN uv pip install -e .
+# Expose port
+EXPOSE 8000
+# Run the server
+CMD ["uv", "run", "server"]

README.md CHANGED Viewed

@@ -1,157 +1,157 @@
----
-title: Energy & Memory RAM Optimization Environment
-emoji: ⚡
-colorFrom: blue
-colorTo: green
-sdk: docker
-pinned: false
-app_port: 8000
-base_path: /web
-tags:
-  - openenv
-  - reinforcement-learning
-  - energy-optimization
-  - resource-management
----
-# Energy & Memory RAM Optimization RL Environment
-An OpenEnv-based reinforcement learning environment for training AI agents to optimize energy consumption and RAM usage in computer systems. The environment features tasks of increasing difficulty, automated graders for task completion verification, and sophisticated reward logic.
-## Features
-### AI Agent Capabilities
-- **Resource Detection**: Real-time monitoring of RAM usage and energy consumption
-- **Optimization Strategies**: Multiple action types for different optimization approaches
-- **Adaptive Learning**: Agents learn to balance competing objectives (RAM vs energy efficiency)
-### Task Progression
-Tasks increase in difficulty from basic resource reduction to advanced multi-objective optimization:
-1. **Basic RAM Reduction**: Reduce RAM usage below 70%
-2. **Energy Optimization**: Reduce energy consumption below 6 kWh while maintaining RAM below 75%
-3. **Balanced Optimization**: Balance RAM below 60% and energy below 5 kWh
-4. **Advanced Efficiency**: Achieve RAM below 50% and energy below 4 kWh
-5. **Expert Optimization**: Master level: RAM below 40% and energy below 3 kWh
-### Automated Graders
-- **Task Completion Verification**: Automatic checking of optimization targets
-- **Performance Metrics**: Efficiency scores and progress tracking
-- **Reward Validation**: Ensures fair scoring based on actual improvements
-### Reward Logic
-- **Action Effectiveness**: Rewards based on actual resource reductions achieved
-- **Task Completion Bonuses**: Significant rewards for meeting task objectives
-- **Efficiency Incentives**: Bonuses for overall system optimization
-- **Penalty System**: Penalties for aggressive actions that may cause system instability
-## Quick Start
-### Installation
-```bash
-# Install dependencies
-pip install -r requirements.txt
-# Or using uv (recommended)
-uv sync
-```
-### Running the Environment
-```bash
-# Start the OpenEnv server
-uv run server
-# The server will be available at http://localhost:8000
-```
-### Training an Agent
-```python
-from stable_baselines3 import PPO
-from openenv.client import OpenEnvClient
-# Connect to the environment
-client = OpenEnvClient("http://localhost:8000")
-# Create and train agent
-model = PPO("MlpPolicy", client, verbose=1)
-model.learn(total_timesteps=10000)
-# Evaluate the trained agent
-obs = client.reset()
-total_reward = 0
-while not obs.done:
-    action, _ = model.predict(obs)
-    obs = client.step(action)
-    total_reward += obs.reward
-    print(f"Step reward: {obs.reward:.2f}, Total: {total_reward:.2f}")
-```
-## Docker
-```bash
-# Build the container
-docker build -t energy-optimization-rl .
-# Run the environment
-docker run --rm -p 8000:8000 energy-optimization-rl
-```
-## Environment Details
-### State Space
-- RAM usage percentage (0-100%)
-- Energy consumption in kWh
-- System load (0-1)
-- Current task information
-- Task completion progress
-- Efficiency scores
-### Action Space
-- `reduce_ram`: Focus on RAM optimization with configurable intensity (0.0-1.0)
-- `optimize_energy`: Focus on energy reduction with configurable intensity (0.0-1.0)
-- `balance_resources`: Balanced approach to both resources
-- `monitor_system`: Gather system information and slight load reduction
-### Reward Structure
-- Base rewards for resource reductions
-- Task completion bonuses (difficulty × 10 points)
-- Efficiency improvement bonuses
-- Penalties for system instability from aggressive actions
-## API Endpoints
-- `POST /reset`: Reset the environment
-- `POST /step`: Execute an optimization action
-- `GET /state`: Get current environment state
-- `GET /schema`: Get action/observation schemas
-- `WS /ws`: WebSocket endpoint for persistent sessions
-## Development
-### Project Structure
-```
-he_demo/
-├── models.py                 # Action and observation definitions
-├── server/
-│   ├── app.py               # FastAPI server application
-│   └── he_demo_environment.py # Environment implementation
-├── client.py                # Example client code
-├── inference.py             # Training and inference scripts
-├── Dockerfile               # Container configuration
-├── pyproject.toml           # Project dependencies
-└── README.md               # This file
-```
-### Adding New Tasks
-Tasks are defined in the `_create_tasks()` method of `EnergyOptimizationEnvironment`. Each task includes:
-- Name and description
-- Difficulty level
-- RAM and energy targets
-- Maximum steps allowed
-### Customizing Reward Logic
-Modify the `_calculate_reward()` method to implement custom reward strategies based on your specific optimization goals.
-## License
-This project is licensed under the BSD-style license. See LICENSE file for details.

+---
+title: Energy & Memory RAM Optimization Environment
+emoji: ⚡
+colorFrom: blue
+colorTo: green
+sdk: docker
+pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
+  - reinforcement-learning
+  - energy-optimization
+  - resource-management
+---
+# Energy & Memory RAM Optimization RL Environment
+An OpenEnv-based reinforcement learning environment for training AI agents to optimize energy consumption and RAM usage in computer systems. The environment features tasks of increasing difficulty, automated graders for task completion verification, and sophisticated reward logic.
+## Features
+### AI Agent Capabilities
+- **Resource Detection**: Real-time monitoring of RAM usage and energy consumption
+- **Optimization Strategies**: Multiple action types for different optimization approaches
+- **Adaptive Learning**: Agents learn to balance competing objectives (RAM vs energy efficiency)
+### Task Progression
+Tasks increase in difficulty from basic resource reduction to advanced multi-objective optimization:
+1. **Basic RAM Reduction**: Reduce RAM usage below 70%
+2. **Energy Optimization**: Reduce energy consumption below 6 kWh while maintaining RAM below 75%
+3. **Balanced Optimization**: Balance RAM below 60% and energy below 5 kWh
+4. **Advanced Efficiency**: Achieve RAM below 50% and energy below 4 kWh
+5. **Expert Optimization**: Master level: RAM below 40% and energy below 3 kWh
+### Automated Graders
+- **Task Completion Verification**: Automatic checking of optimization targets
+- **Performance Metrics**: Efficiency scores and progress tracking
+- **Reward Validation**: Ensures fair scoring based on actual improvements
+### Reward Logic
+- **Action Effectiveness**: Rewards based on actual resource reductions achieved
+- **Task Completion Bonuses**: Significant rewards for meeting task objectives
+- **Efficiency Incentives**: Bonuses for overall system optimization
+- **Penalty System**: Penalties for aggressive actions that may cause system instability
+## Quick Start
+### Installation
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Or using uv (recommended)
+uv sync
+```
+### Running the Environment
+```bash
+# Start the OpenEnv server
+uv run server
+# The server will be available at http://localhost:8000
+```
+### Training an Agent
+```python
+from stable_baselines3 import PPO
+from openenv.client import OpenEnvClient
+# Connect to the environment
+client = OpenEnvClient("http://localhost:8000")
+# Create and train agent
+model = PPO("MlpPolicy", client, verbose=1)
+model.learn(total_timesteps=10000)
+# Evaluate the trained agent
+obs = client.reset()
+total_reward = 0
+while not obs.done:
+    action, _ = model.predict(obs)
+    obs = client.step(action)
+    total_reward += obs.reward
+    print(f"Step reward: {obs.reward:.2f}, Total: {total_reward:.2f}")
+```
+## Docker
+```bash
+# Build the container
+docker build -t energy-optimization-rl .
+# Run the environment
+docker run --rm -p 8000:8000 energy-optimization-rl
+```
+## Environment Details
+### State Space
+- RAM usage percentage (0-100%)
+- Energy consumption in kWh
+- System load (0-1)
+- Current task information
+- Task completion progress
+- Efficiency scores
+### Action Space
+- `reduce_ram`: Focus on RAM optimization with configurable intensity (0.0-1.0)
+- `optimize_energy`: Focus on energy reduction with configurable intensity (0.0-1.0)
+- `balance_resources`: Balanced approach to both resources
+- `monitor_system`: Gather system information and slight load reduction
+### Reward Structure
+- Base rewards for resource reductions
+- Task completion bonuses (difficulty × 10 points)
+- Efficiency improvement bonuses
+- Penalties for system instability from aggressive actions
+## API Endpoints
+- `POST /reset`: Reset the environment
+- `POST /step`: Execute an optimization action
+- `GET /state`: Get current environment state
+- `GET /schema`: Get action/observation schemas
+- `WS /ws`: WebSocket endpoint for persistent sessions
+## Development
+### Project Structure
+```
+he_demo/
+├── models.py                 # Action and observation definitions
+├── server/
+│   ├── app.py               # FastAPI server application
+│   └── he_demo_environment.py # Environment implementation
+├── client.py                # Example client code
+├── inference.py             # Training and inference scripts
+├── Dockerfile               # Container configuration
+├── pyproject.toml           # Project dependencies
+└── README.md               # This file
+```
+### Adding New Tasks
+Tasks are defined in the `_create_tasks()` method of `EnergyOptimizationEnvironment`. Each task includes:
+- Name and description
+- Difficulty level
+- RAM and energy targets
+- Maximum steps allowed
+### Customizing Reward Logic
+Modify the `_calculate_reward()` method to implement custom reward strategies based on your specific optimization goals.
+## License
+This project is licensed under the BSD-style license. See LICENSE file for details.

__init__.py CHANGED Viewed

@@ -1,17 +1,17 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-"""Energy & Memory RAM Optimization Environment."""
-from .client import EnergyOptimizationEnv
-from .models import EnergyOptimizationAction, EnergyOptimizationObservation, Task
-__all__ = [
-    "EnergyOptimizationAction",
-    "EnergyOptimizationObservation",
-    "Task",
-    "EnergyOptimizationEnv",
-]

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Energy & Memory RAM Optimization Environment."""
+from .client import EnergyOptimizationEnv
+from .models import EnergyOptimizationAction, EnergyOptimizationObservation, Task
+__all__ = [
+    "EnergyOptimizationAction",
+    "EnergyOptimizationObservation",
+    "Task",
+    "EnergyOptimizationEnv",
+]

client.py CHANGED Viewed

@@ -1,120 +1,123 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-"""He Demo Environment Client."""
-from typing import Dict
-from openenv.core import EnvClient
-from openenv.core.client_types import StepResult
-from openenv.core.env_server.types import State
-from .models import EnergyOptimizationAction, EnergyOptimizationObservation, Task
-class EnergyOptimizationEnv(
-    EnvClient[EnergyOptimizationAction, EnergyOptimizationObservation, State]
-):
-    """
-    Client for the Energy & Memory RAM Optimization Environment.
-    This client maintains a persistent WebSocket connection to the environment server,
-    enabling efficient multi-step interactions with lower latency.
-    Each client instance has its own dedicated environment session on the server.
-    Example:
-        >>> # Connect to a running server
-        >>> with EnergyOptimizationEnv(base_url="http://localhost:8000") as client:
-        ...     result = client.reset()
-        ...     print(f"RAM: {result.observation.ram_usage:.1f}%, Energy: {result.observation.energy_consumption:.1f} kWh")
-        ...
-        ...     result = client.step(EnergyOptimizationAction(action_type="reduce_ram", intensity=0.8))
-        ...     print(f"Task: {result.observation.current_task.name if result.observation.current_task else 'None'}")
-    Example with Docker:
-        >>> # Automatically start container and connect
-        >>> client = EnergyOptimizationEnv.from_docker_image("energy-optimization-env:latest")
-        >>> try:
-        ...     result = client.reset()
-        ...     result = client.step(EnergyOptimizationAction(action_type="balance_resources", intensity=0.6))
-        ... finally:
-        ...     client.close()
-    """
-    def _step_payload(self, action: EnergyOptimizationAction) -> Dict:
-        """
-        Convert EnergyOptimizationAction to JSON payload for step message.
-        Args:
-            action: EnergyOptimizationAction instance
-        Returns:
-            Dictionary representation suitable for JSON encoding
-        """
-        return {
-            "action_type": action.action_type,
-            "intensity": action.intensity,
-        }
-    def _parse_result(self, payload: Dict) -> StepResult[EnergyOptimizationObservation]:
-        """
-        Parse server response into StepResult[EnergyOptimizationObservation].
-        Args:
-            payload: JSON response data from server
-        Returns:
-            StepResult with EnergyOptimizationObservation
-        """
-        obs_data = payload.get("observation", {})
-        # Parse current task if present
-        current_task = None
-        if obs_data.get("current_task"):
-            task_data = obs_data["current_task"]
-            current_task = Task(
-                name=task_data.get("name", ""),
-                description=task_data.get("description", ""),
-                difficulty=task_data.get("difficulty", 1),
-                ram_target=task_data.get("ram_target", 100.0),
-                energy_target=task_data.get("energy_target", 10.0),
-                max_steps=task_data.get("max_steps", 10)
-            )
-        observation = EnergyOptimizationObservation(
-            ram_usage=obs_data.get("ram_usage", 0.0),
-            energy_consumption=obs_data.get("energy_consumption", 0.0),
-            system_load=obs_data.get("system_load", 0.0),
-            current_task=current_task,
-            tasks_completed=obs_data.get("tasks_completed", []),
-            steps_taken=obs_data.get("steps_taken", 0),
-            task_progress=obs_data.get("task_progress", 0.0),
-            efficiency_score=obs_data.get("efficiency_score", 0.0),
-            done=payload.get("done", False),
-            reward=payload.get("reward"),
-            metadata=obs_data.get("metadata", {}),
-        )
-        return StepResult(
-            observation=observation,
-            reward=payload.get("reward"),
-            done=payload.get("done", False),
-        )
-    def _parse_state(self, payload: Dict) -> State:
-        """
-        Parse server response into State object.
-        Args:
-            payload: JSON response from state request
-        Returns:
-            State object with episode_id and step_count
-        """
-        return State(
-            episode_id=payload.get("episode_id"),
-            step_count=payload.get("step_count", 0),
-        )

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""He Demo Environment Client."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from .models import EnergyOptimizationAction, EnergyOptimizationObservation, Task, TaskSummary
+class EnergyOptimizationEnv(
+    EnvClient[EnergyOptimizationAction, EnergyOptimizationObservation, State]
+):
+    """
+    Client for the Energy & Memory RAM Optimization Environment.
+    This client maintains a persistent WebSocket connection to the environment server,
+    enabling efficient multi-step interactions with lower latency.
+    Each client instance has its own dedicated environment session on the server.
+    Example:
+        >>> # Connect to a running server
+        >>> with EnergyOptimizationEnv(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     print(f"RAM: {result.observation.ram_usage:.1f}%, Energy: {result.observation.energy_consumption:.1f} kWh")
+        ...
+        ...     result = client.step(EnergyOptimizationAction(action_type="reduce_ram", intensity=0.8))
+        ...     print(f"Task: {result.observation.current_task.name if result.observation.current_task else 'None'}")
+    Example with Docker:
+        >>> # Automatically start container and connect
+        >>> client = EnergyOptimizationEnv.from_docker_image("energy-optimization-env:latest")
+        >>> try:
+        ...     result = client.reset()
+        ...     result = client.step(EnergyOptimizationAction(action_type="balance_resources", intensity=0.6))
+        ... finally:
+        ...     client.close()
+    """
+    def _step_payload(self, action: EnergyOptimizationAction) -> Dict:
+        """
+        Convert EnergyOptimizationAction to JSON payload for step message.
+        Args:
+            action: EnergyOptimizationAction instance
+        Returns:
+            Dictionary representation suitable for JSON encoding
+        """
+        return {
+            "action_type": action.action_type,
+            "intensity": action.intensity,
+        }
+    def _parse_result(self, payload: Dict) -> StepResult[EnergyOptimizationObservation]:
+        """
+        Parse server response into StepResult[EnergyOptimizationObservation].
+        Args:
+            payload: JSON response data from server
+        Returns:
+            StepResult with EnergyOptimizationObservation
+        """
+        obs_data = payload.get("observation", {})
+        # Parse current task if present
+        current_task = None
+        if obs_data.get("current_task"):
+            task_data = obs_data["current_task"]
+            current_task = TaskSummary(
+                name=task_data.get("name", ""),
+                description=task_data.get("description", ""),
+                difficulty=task_data.get("difficulty", 1),
+                ram_target=task_data.get("ram_target", 100.0),
+                energy_target=task_data.get("energy_target", 10.0),
+                max_steps=task_data.get("max_steps", 10),
+                completed=task_data.get("completed", False),
+                remaining_steps=task_data.get("remaining_steps"),
+                progress=task_data.get("progress", 0.0)
+            )
+        observation = EnergyOptimizationObservation(
+            ram_usage=obs_data.get("ram_usage", 0.0),
+            energy_consumption=obs_data.get("energy_consumption", 0.0),
+            system_load=obs_data.get("system_load", 0.0),
+            current_task=current_task,
+            tasks_completed=obs_data.get("tasks_completed", []),
+            steps_taken=obs_data.get("steps_taken", 0),
+            task_progress=obs_data.get("task_progress", 0.0),
+            efficiency_score=obs_data.get("efficiency_score", 0.0),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """
+        Parse server response into State object.
+        Args:
+            payload: JSON response from state request
+        Returns:
+            State object with episode_id and step_count
+        """
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

inference.py CHANGED Viewed

@@ -5,22 +5,42 @@ This script demonstrates how an AI agent can learn to optimize energy consumptio
 and RAM usage through reinforcement learning in the Energy Optimization Environment.
 The agent uses an LLM to make strategic decisions about resource optimization actions.
 """
 import os
 import textwrap
 from typing import List, Optional
-from openai import OpenAI
 from he_demo.client import EnergyOptimizationEnv
 from he_demo.models import EnergyOptimizationAction
-IMAGE_NAME = os.getenv("IMAGE_NAME")
-API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
-API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
-MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
 TASK_NAME = os.getenv("ENERGY_TASK", "energy_optimization")
 BENCHMARK = os.getenv("ENERGY_BENCHMARK", "energy_optimization")
 MAX_STEPS = 50  # More steps for complex optimization tasks
@@ -156,19 +176,60 @@ def get_model_action(
         )
         action_text = (completion.choices[0].message.content or "").strip()
         return parse_action(action_text)
     except Exception as exc:
-        print(f"[DEBUG] Model request failed: {exc}", flush=True)
         return EnergyOptimizationAction(action_type="monitor_system", intensity=0.5)
-def main() -> None:
-    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
-    env = (
-        EnergyOptimizationEnv.from_docker_image(IMAGE_NAME)
-        if IMAGE_NAME
-        else EnergyOptimizationEnv(base_url="http://localhost:8000")
-    )
     history: List[str] = []
     rewards: List[float] = []
@@ -179,7 +240,7 @@ def main() -> None:
     log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
     try:
-        result = env.reset()
         last_reward = 0.0
         for step in range(1, MAX_STEPS + 1):
@@ -190,7 +251,7 @@ def main() -> None:
             action = get_model_action(client, step, result.observation, last_reward, history)
             # Execute action
-            result = env.step(action)
             obs = result.observation
             reward = result.reward or 0.0
@@ -224,11 +285,11 @@ def main() -> None:
     finally:
         try:
-            env.close()
         except Exception as e:
             print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
         log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
 if __name__ == "__main__":
-    main()

 and RAM usage through reinforcement learning in the Energy Optimization Environment.
 The agent uses an LLM to make strategic decisions about resource optimization actions.
+Required Environment Variables:
+- API_BASE_URL: The API endpoint for the LLM (for Hugging Face router, use https://router.huggingface.co/v1)
+- MODEL_NAME: The model identifier to use for inference
+- HF_TOKEN: Your Hugging Face API key with inference permissions
+- LOCAL_IMAGE_NAME: The name of the local image to use for the environment (optional)
+Example setup:
+export API_BASE_URL="https://router.huggingface.co/v1"
+export MODEL_NAME="OpenAssistant/oasst-sft-1-pythia-12b"
+export HF_TOKEN="hf_..."
+export LOCAL_IMAGE_NAME="your-docker-image"  # Optional
 """
+import asyncio
 import os
+import subprocess
 import textwrap
 from typing import List, Optional
+from openai import OpenAI, OpenAIError
 from he_demo.client import EnergyOptimizationEnv
 from he_demo.models import EnergyOptimizationAction
+# Environment configuration variables
+# Default endpoint uses Hugging Face's router; set API_BASE_URL explicitly if needed.
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
+HF_TOKEN = os.getenv("HF_TOKEN")
+LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME")
+LOCAL_SERVER_URL = os.getenv("LOCAL_SERVER_URL", "http://localhost:8000")
+# Use HF_TOKEN as API key for OpenAI client
+API_KEY = HF_TOKEN
 TASK_NAME = os.getenv("ENERGY_TASK", "energy_optimization")
 BENCHMARK = os.getenv("ENERGY_BENCHMARK", "energy_optimization")
 MAX_STEPS = 50  # More steps for complex optimization tasks
         )
         action_text = (completion.choices[0].message.content or "").strip()
         return parse_action(action_text)
+    except OpenAIError as exc:
+        error_text = str(exc)
+        print(f"[DEBUG] Model request failed: {error_text}", flush=True)
+        status_code = getattr(exc, 'status_code', None)
+        if status_code == 403 or "403" in error_text or "insufficient permissions" in error_text.lower():
+            raise RuntimeError(
+                "Hugging Face authentication failed: your token does not have sufficient inference permissions. "
+                "Use a token with inference access or switch to an active model/endpoint you are authorized for. "
+                "If you are using the Hugging Face router, ensure HF_TOKEN has the `inference` scope and that MODEL_NAME is accessible."
+            ) from exc
+        return EnergyOptimizationAction(action_type="monitor_system", intensity=0.5)
     except Exception as exc:
+        print(f"[DEBUG] Unexpected model request failure: {exc}", flush=True)
         return EnergyOptimizationAction(action_type="monitor_system", intensity=0.5)
+async def main() -> None:
+    # Validate required environment variables
+    if not API_BASE_URL or API_BASE_URL == "<your-active-endpoint>":
+        raise ValueError("API_BASE_URL environment variable must be set to your active LLM endpoint")
+    if not MODEL_NAME or MODEL_NAME == "<your-active-model>":
+        raise ValueError("MODEL_NAME environment variable must be set to your active model identifier")
+    if not HF_TOKEN:
+        raise ValueError("HF_TOKEN environment variable must be set to your Hugging Face API key")
+    client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+    async def local_image_exists(image_name: str) -> bool:
+        try:
+            result = subprocess.run(
+                ["docker", "images", "--format", "{{.Repository}}:{{.Tag}}"],
+                capture_output=True,
+                text=True,
+                check=True,
+            )
+            return image_name in result.stdout.splitlines()
+        except Exception:
+            return False
+    if LOCAL_IMAGE_NAME:
+        if await local_image_exists(LOCAL_IMAGE_NAME):
+            env = await EnergyOptimizationEnv.from_docker_image(LOCAL_IMAGE_NAME)
+        else:
+            print(
+                f"[WARN] Docker image '{LOCAL_IMAGE_NAME}' not found locally. Falling back to local server at {LOCAL_SERVER_URL}",
+                flush=True,
+            )
+            env = EnergyOptimizationEnv(base_url=LOCAL_SERVER_URL)
+    else:
+        env = EnergyOptimizationEnv(base_url=LOCAL_SERVER_URL)
     history: List[str] = []
     rewards: List[float] = []
     log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
     try:
+        result = await env.reset()
         last_reward = 0.0
         for step in range(1, MAX_STEPS + 1):
             action = get_model_action(client, step, result.observation, last_reward, history)
             # Execute action
+            result = await env.step(action)
             obs = result.observation
             reward = result.reward or 0.0
     finally:
         try:
+            await env.close()
         except Exception as e:
             print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
         log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
 if __name__ == "__main__":
+    asyncio.run(main())

models.py CHANGED Viewed

@@ -1,74 +1,74 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-"""
-Data models for the Energy & Memory RAM Optimization Environment.
-This environment simulates system resource optimization tasks where an AI agent
-must optimize RAM usage and energy consumption through various actions.
-"""
-from typing import List, Optional
-from openenv.core.env_server.types import Action, Observation
-from pydantic import BaseModel, Field
-class EnergyOptimizationAction(Action):
-    """Action for the Energy & Memory RAM Optimization environment."""
-    action_type: str = Field(
-        ...,
-        description="Type of optimization action: 'reduce_ram', 'optimize_energy', 'balance_resources', 'monitor_system'"
-    )
-    intensity: float = Field(
-        1.0,
-        description="Intensity of the action (0.0 to 1.0), affects effectiveness and potential side effects"
-    )
-class Task(BaseModel):
-    """Represents an optimization task with difficulty and requirements."""
-    name: str = Field(..., description="Unique name of the task")
-    description: str = Field(..., description="Human-readable description of the task")
-    difficulty: int = Field(..., description="Difficulty level (1-5)")
-    ram_target: float = Field(..., description="Target RAM usage percentage (lower is better)")
-    energy_target: float = Field(..., description="Target energy consumption (lower is better)")
-    max_steps: int = Field(..., description="Maximum steps allowed to complete the task")
-    completed: bool = Field(default=False, description="Whether the task has been completed")
-    def check_completion(self, ram_usage: float, energy_consumption: float, steps_taken: int) -> bool:
-        """Check if the task is completed based on current system state."""
-        if steps_taken > self.max_steps:
-            return False
-        return ram_usage <= self.ram_target and energy_consumption <= self.energy_target
-class TaskSummary(BaseModel):
-    """Serializable task summary exposed in observations."""
-    name: str = Field(..., description="Task identifier")
-    description: str = Field(..., description="Task description")
-    difficulty: int = Field(..., description="Task difficulty level")
-    ram_target: float = Field(..., description="RAM usage target percentage")
-    energy_target: float = Field(..., description="Energy consumption target in kWh")
-    max_steps: int = Field(..., description="Maximum allowed steps for the task")
-    completed: bool = Field(False, description="Whether the task is completed")
-    remaining_steps: Optional[int] = Field(None, description="Remaining steps before the task deadline")
-    progress: float = Field(..., description="Estimated progress toward task completion (0-1)")
-class EnergyOptimizationObservation(Observation):
-    """Observation from the Energy & Memory RAM Optimization environment."""
-    ram_usage: float = Field(..., description="Current RAM usage percentage (0-100)")
-    energy_consumption: float = Field(..., description="Current energy consumption in kWh")
-    system_load: float = Field(..., description="Overall system load (0-1)")
-    current_task: Optional[TaskSummary] = Field(None, description="Current optimization task")
-    tasks_completed: List[str] = Field(default_factory=list, description="List of completed task names")
-    steps_taken: int = Field(..., description="Number of steps taken in current episode")
-    task_progress: float = Field(..., description="Progress towards current task completion (0-1)")
-    efficiency_score: float = Field(..., description="Overall efficiency score based on optimization")

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the Energy & Memory RAM Optimization Environment.
+This environment simulates system resource optimization tasks where an AI agent
+must optimize RAM usage and energy consumption through various actions.
+"""
+from typing import List, Optional
+from openenv.core.env_server.types import Action, Observation
+from pydantic import BaseModel, Field
+class EnergyOptimizationAction(Action):
+    """Action for the Energy & Memory RAM Optimization environment."""
+    action_type: str = Field(
+        ...,
+        description="Type of optimization action: 'reduce_ram', 'optimize_energy', 'balance_resources', 'monitor_system'"
+    )
+    intensity: float = Field(
+        1.0,
+        description="Intensity of the action (0.0 to 1.0), affects effectiveness and potential side effects"
+    )
+class Task(BaseModel):
+    """Represents an optimization task with difficulty and requirements."""
+    name: str = Field(..., description="Unique name of the task")
+    description: str = Field(..., description="Human-readable description of the task")
+    difficulty: int = Field(..., description="Difficulty level (1-5)")
+    ram_target: float = Field(..., description="Target RAM usage percentage (lower is better)")
+    energy_target: float = Field(..., description="Target energy consumption (lower is better)")
+    max_steps: int = Field(..., description="Maximum steps allowed to complete the task")
+    completed: bool = Field(default=False, description="Whether the task has been completed")
+    def check_completion(self, ram_usage: float, energy_consumption: float, steps_taken: int) -> bool:
+        """Check if the task is completed based on current system state."""
+        if steps_taken > self.max_steps:
+            return False
+        return ram_usage <= self.ram_target and energy_consumption <= self.energy_target
+class TaskSummary(BaseModel):
+    """Serializable task summary exposed in observations."""
+    name: str = Field(..., description="Task identifier")
+    description: str = Field(..., description="Task description")
+    difficulty: int = Field(..., description="Task difficulty level")
+    ram_target: float = Field(..., description="RAM usage target percentage")
+    energy_target: float = Field(..., description="Energy consumption target in kWh")
+    max_steps: int = Field(..., description="Maximum allowed steps for the task")
+    completed: bool = Field(False, description="Whether the task is completed")
+    remaining_steps: Optional[int] = Field(None, description="Remaining steps before the task deadline")
+    progress: float = Field(..., description="Estimated progress toward task completion (0-1)")
+class EnergyOptimizationObservation(Observation):
+    """Observation from the Energy & Memory RAM Optimization environment."""
+    ram_usage: float = Field(..., description="Current RAM usage percentage (0-100)")
+    energy_consumption: float = Field(..., description="Current energy consumption in kWh")
+    system_load: float = Field(..., description="Overall system load (0-1)")
+    current_task: Optional[TaskSummary] = Field(None, description="Current optimization task")
+    tasks_completed: List[str] = Field(default_factory=list, description="List of completed task names")
+    steps_taken: int = Field(..., description="Number of steps taken in current episode")
+    task_progress: float = Field(..., description="Progress towards current task completion (0-1)")
+    efficiency_score: float = Field(..., description="Overall efficiency score based on optimization")

openenv.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
-spec_version: 1
-name: energy_optimization
-type: space
-runtime: fastapi
-app: server.app:app
-port: 8000

+spec_version: 1
+name: energy_optimization
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

openenv_he_demo.egg-info/SOURCES.txt CHANGED Viewed

@@ -1,25 +1,25 @@
-README.md
-__init__.py
-client.py
-inference.py
-models.py
-pyproject.toml
-test_environment.py
-validate.py
-./__init__.py
-./client.py
-./gym_wrapper.py
-./inference.py
-./models.py
-./test_environment.py
-./train_agent.py
-./validate.py
-openenv_he_demo.egg-info/PKG-INFO
-openenv_he_demo.egg-info/SOURCES.txt
-openenv_he_demo.egg-info/dependency_links.txt
-openenv_he_demo.egg-info/entry_points.txt
-openenv_he_demo.egg-info/requires.txt
-openenv_he_demo.egg-info/top_level.txt
-server/__init__.py
-server/app.py
 server/he_demo_environment.py

+README.md
+__init__.py
+client.py
+inference.py
+models.py
+pyproject.toml
+test_environment.py
+validate.py
+./__init__.py
+./client.py
+./gym_wrapper.py
+./inference.py
+./models.py
+./test_environment.py
+./train_agent.py
+./validate.py
+openenv_he_demo.egg-info/PKG-INFO
+openenv_he_demo.egg-info/SOURCES.txt
+openenv_he_demo.egg-info/dependency_links.txt
+openenv_he_demo.egg-info/entry_points.txt
+openenv_he_demo.egg-info/requires.txt
+openenv_he_demo.egg-info/top_level.txt
+server/__init__.py
+server/app.py
 server/he_demo_environment.py

openenv_he_demo.egg-info/dependency_links.txt CHANGED Viewed

	@@ -1 +1 @@
1	-


1	+

openenv_he_demo.egg-info/entry_points.txt CHANGED Viewed

	@@ -1,2 +1,2 @@
1	- [console_scripts]
2	- server = he_demo.server.app:main


1	+ [console_scripts]
2	+ server = he_demo.server.app:main

openenv_he_demo.egg-info/requires.txt CHANGED Viewed

@@ -1,10 +1,10 @@
-openenv-core[core]>=0.2.2
-numpy>=1.19.0
-pandas>=1.3.0
-gymnasium>=0.29.0
-stable-baselines3>=2.0.0
-torch>=2.0.0
-[dev]
-pytest>=8.0.0
-pytest-cov>=4.0.0

+openenv-core[core]>=0.2.2
+numpy>=1.19.0
+pandas>=1.3.0
+gymnasium>=0.29.0
+stable-baselines3>=2.0.0
+torch>=2.0.0
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

openenv_he_demo.egg-info/top_level.txt CHANGED Viewed

	@@ -1 +1 @@
1	- he_demo


1	+ he_demo

pyproject.toml CHANGED Viewed

@@ -1,45 +1,45 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-[build-system]
-requires = ["setuptools>=45", "wheel"]
-build-backend = "setuptools.build_meta"
-[project]
-name = "openenv-he_demo"
-version = "0.1.0"
-description = "He Demo environment for OpenEnv"
-requires-python = ">=3.10"
-dependencies = [
-    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
-    # install from github
-    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
-    "openenv-core[core]>=0.2.2",
-    # Environment-specific dependencies
-    # Add all dependencies needed for your environment here
-    # Examples:
-    "numpy>=1.19.0",
-    "pandas>=1.3.0",
-    "gymnasium>=0.29.0",
-    "stable-baselines3>=2.0.0",
-    "torch>=2.0.0",
-]
-[project.optional-dependencies]
-dev = [
-    "pytest>=8.0.0",
-    "pytest-cov>=4.0.0",
-]
-[project.scripts]
-# Server entry point - enables running via: uv run --project . server
-# or: python -m he_demo.server.app
-server = "he_demo.server.app:main"
-[tool.setuptools]
-include-package-data = true
-packages = ["he_demo", "he_demo.server"]
 package-dir = { "he_demo" = ".", "he_demo.server" = "server" }

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-he_demo"
+version = "0.1.0"
+description = "He Demo environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    "numpy>=1.19.0",
+    "pandas>=1.3.0",
+    "gymnasium>=0.29.0",
+    "stable-baselines3>=2.0.0",
+    "torch>=2.0.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m he_demo.server.app
+server = "he_demo.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["he_demo", "he_demo.server"]
 package-dir = { "he_demo" = ".", "he_demo.server" = "server" }

server/__init__.py CHANGED Viewed

@@ -1,11 +1,11 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-"""Energy & Memory RAM Optimization environment server components."""
-from .he_demo_environment import EnergyOptimizationEnvironment
-__all__ = ["EnergyOptimizationEnvironment"]

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Energy & Memory RAM Optimization environment server components."""
+from .he_demo_environment import EnergyOptimizationEnvironment
+__all__ = ["EnergyOptimizationEnvironment"]

server/app.py CHANGED Viewed

@@ -1,80 +1,80 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-"""
-FastAPI application for the He Demo Environment.
-This module creates an HTTP server that exposes the HeDemoEnvironment
-over HTTP and WebSocket endpoints, compatible with EnvClient.
-Endpoints:
-    - POST /reset: Reset the environment
-    - POST /step: Execute an action
-    - GET /state: Get current environment state
-    - GET /schema: Get action/observation schemas
-    - WS /ws: WebSocket endpoint for persistent sessions
-Usage:
-    # Development (with auto-reload):
-    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
-    # Production:
-    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
-    # Or run directly:
-    python -m server.app
-"""
-try:
-    from openenv.core.env_server.http_server import create_app
-except Exception as e:  # pragma: no cover
-    raise ImportError(
-        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
-    ) from e
-from he_demo.models import EnergyOptimizationAction, EnergyOptimizationObservation
-from he_demo.server.he_demo_environment import EnergyOptimizationEnvironment
-# Create the app with web interface and README integration
-app = create_app(
-    EnergyOptimizationEnvironment,
-    EnergyOptimizationAction,
-    EnergyOptimizationObservation,
-    env_name="energy_optimization",
-    max_concurrent_envs=1,  # increase this number to allow more concurrent WebSocket sessions
-)
-def main(host: str = "0.0.0.0", port: int = 8000):
-    """
-    Entry point for direct execution via uv run or python -m.
-    This function enables running the server without Docker:
-        uv run --project . server
-        uv run --project . server --port 8001
-        python -m he_demo.server.app
-    Args:
-        host: Host address to bind to (default: "0.0.0.0")
-        port: Port number to listen on (default: 8000)
-    For production deployments, consider using uvicorn directly with
-    multiple workers:
-        uvicorn he_demo.server.app:app --workers 4
-    """
-    import uvicorn
-    uvicorn.run(app, host=host, port=port)
-if __name__ == "__main__":
-    import argparse
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--port", type=int, default=8000)
-    args = parser.parse_args()
-    main(port=args.port)

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the He Demo Environment.
+This module creates an HTTP server that exposes the HeDemoEnvironment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m server.app
+"""
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
+    ) from e
+from he_demo.models import EnergyOptimizationAction, EnergyOptimizationObservation
+from he_demo.server.he_demo_environment import EnergyOptimizationEnvironment
+# Create the app with web interface and README integration
+app = create_app(
+    EnergyOptimizationEnvironment,
+    EnergyOptimizationAction,
+    EnergyOptimizationObservation,
+    env_name="energy_optimization",
+    max_concurrent_envs=1,  # increase this number to allow more concurrent WebSocket sessions
+)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        uv run --project . server --port 8001
+        python -m he_demo.server.app
+    Args:
+        host: Host address to bind to (default: "0.0.0.0")
+        port: Port number to listen on (default: 8000)
+    For production deployments, consider using uvicorn directly with
+    multiple workers:
+        uvicorn he_demo.server.app:app --workers 4
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+    main(port=args.port)

server/he_demo_environment.py CHANGED Viewed

@@ -1,318 +1,318 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-"""
-Energy & Memory RAM Optimization Environment Implementation.
-An RL environment for training AI agents to optimize system resources including
-RAM usage and energy consumption through various optimization strategies.
-"""
-import random
-from typing import List
-from uuid import uuid4
-from openenv.core.env_server.interfaces import Environment
-from openenv.core.env_server.types import State
-from he_demo.models import EnergyOptimizationAction, EnergyOptimizationObservation, Task, TaskSummary
-class EnergyOptimizationEnvironment(Environment):
-    """
-    Energy & Memory RAM Optimization Environment.
-    This environment simulates a computer system where an AI agent must optimize
-    RAM usage and energy consumption. The agent faces tasks of increasing difficulty
-    and receives rewards based on optimization efficiency.
-    Tasks include:
-    - Basic RAM reduction
-    - Energy optimization
-    - Resource balancing
-    - Advanced multi-objective optimization
-    The environment includes automated graders that verify task completion and
-    provide detailed feedback on optimization performance.
-    """
-    SUPPORTS_CONCURRENT_SESSIONS: bool = True
-    def __init__(self):
-        """Initialize the energy optimization environment."""
-        self._state = State(episode_id=str(uuid4()), step_count=0)
-        self._reset_count = 0
-        # System state
-        self.ram_usage = 80.0  # Starting RAM usage %
-        self.energy_consumption = 8.0  # Starting energy consumption kWh
-        self.system_load = 0.7  # Starting system load
-        # Task management
-        self.tasks = self._create_tasks()
-        self.current_task_index = 0
-        self.tasks_completed = []
-        # Performance tracking
-        self.baseline_ram = self.ram_usage
-        self.baseline_energy = self.energy_consumption
-    def _create_tasks(self) -> List[Task]:
-        """Create tasks with increasing difficulty."""
-        return [
-            Task(
-                name="basic_ram_reduction",
-                description="Reduce RAM usage below 70%",
-                difficulty=1,
-                ram_target=70.0,
-                energy_target=7.5,  # Slightly below initial 8.0
-                max_steps=10
-            ),
-            Task(
-                name="energy_optimization",
-                description="Reduce energy consumption below 6 kWh while maintaining RAM below 75%",
-                difficulty=2,
-                ram_target=75.0,
-                energy_target=6.0,
-                max_steps=15
-            ),
-            Task(
-                name="balanced_optimization",
-                description="Balance RAM below 60% and energy below 5 kWh",
-                difficulty=3,
-                ram_target=60.0,
-                energy_target=5.0,
-                max_steps=20
-            ),
-            Task(
-                name="advanced_efficiency",
-                description="Achieve RAM below 50% and energy below 4 kWh",
-                difficulty=4,
-                ram_target=50.0,
-                energy_target=4.0,
-                max_steps=25
-            ),
-            Task(
-                name="expert_optimization",
-                description="Master level: RAM below 40% and energy below 3 kWh",
-                difficulty=5,
-                ram_target=40.0,
-                energy_target=3.0,
-                max_steps=30
-            )
-        ]
-    def _get_current_task(self) -> Task:
-        """Get the current task, cycling through available tasks."""
-        if self.current_task_index >= len(self.tasks):
-            self.current_task_index = 0
-        return self.tasks[self.current_task_index]
-    def _calculate_reward(self, action: EnergyOptimizationAction) -> float:
-        """Calculate reward based on action effectiveness and task progress."""
-        base_reward = 0.0
-        # Action effectiveness rewards
-        if action.action_type == "reduce_ram":
-            ram_reduction = min(5.0 * action.intensity, self.ram_usage * 0.1)
-            self.ram_usage = max(0.0, self.ram_usage - ram_reduction)
-            base_reward += ram_reduction * 0.5  # Reward for RAM reduction
-            # Penalty for excessive RAM reduction (system instability)
-            if action.intensity > 0.8:
-                base_reward -= 2.0
-        elif action.action_type == "optimize_energy":
-            energy_reduction = min(1.0 * action.intensity, self.energy_consumption * 0.15)
-            self.energy_consumption = max(0.0, self.energy_consumption - energy_reduction)
-            base_reward += energy_reduction * 2.0  # Higher reward for energy savings
-            # Penalty for aggressive energy optimization (performance impact)
-            if action.intensity > 0.9:
-                self.system_load = min(1.0, self.system_load + 0.1)
-                base_reward -= 1.0
-        elif action.action_type == "balance_resources":
-            # Balanced approach: moderate improvements to both
-            ram_reduction = min(2.0 * action.intensity, self.ram_usage * 0.05)
-            energy_reduction = min(0.5 * action.intensity, self.energy_consumption * 0.1)
-            self.ram_usage = max(0.0, self.ram_usage - ram_reduction)
-            self.energy_consumption = max(0.0, self.energy_consumption - energy_reduction)
-            base_reward += (ram_reduction * 0.3 + energy_reduction * 1.5)
-        elif action.action_type == "monitor_system":
-            # Monitoring action: small reward for gathering information
-            base_reward += 0.1
-            # Slight natural system load reduction from monitoring
-            self.system_load = max(0.0, self.system_load - 0.02)
-        # Natural system changes (simulate real system behavior)
-        self._apply_system_dynamics()
-        # Task completion bonus
-        current_task = self._get_current_task()
-        if not current_task.completed and current_task.check_completion(
-            self.ram_usage, self.energy_consumption, self._state.step_count
-        ):
-            current_task.completed = True
-            self.tasks_completed.append(current_task.name)
-            base_reward += current_task.difficulty * 10.0  # Bonus for task completion
-            self.current_task_index += 1  # Move to next task
-        # Efficiency bonus
-        efficiency_improvement = (
-            (self.baseline_ram - self.ram_usage) / self.baseline_ram +
-            (self.baseline_energy - self.energy_consumption) / self.baseline_energy
-        ) * 0.5
-        base_reward += efficiency_improvement
-        return base_reward
-    def _apply_system_dynamics(self):
-        """Apply natural system dynamics and external factors."""
-        # Random external load changes
-        if random.random() < 0.1:  # 10% chance each step
-            load_change = random.uniform(-0.05, 0.05)
-            self.system_load = max(0.0, min(1.0, self.system_load + load_change))
-            # Load affects RAM and energy
-            ram_impact = load_change * 10.0
-            energy_impact = load_change * 0.5
-            self.ram_usage = max(0.0, min(100.0, self.ram_usage + ram_impact))
-            self.energy_consumption = max(0.0, self.energy_consumption + energy_impact)
-    def _calculate_task_progress(self) -> float:
-        """Calculate progress towards current task completion."""
-        current_task = self._get_current_task()
-        if current_task.completed:
-            return 1.0
-        # Calculate RAM progress (0-1 scale)
-        ram_progress = max(0.0, min(1.0, (100.0 - self.ram_usage) / (100.0 - current_task.ram_target)))
-        # Calculate energy progress (0-1 scale)
-        energy_range = 10.0 - current_task.energy_target  # Total possible energy reduction
-        if energy_range > 0:
-            energy_progress = max(0.0, min(1.0, (8.0 - self.energy_consumption) / energy_range))
-        else:
-            energy_progress = 1.0 if self.energy_consumption <= current_task.energy_target else 0.0
-        return min(1.0, (ram_progress + energy_progress) / 2.0)
-    def _calculate_efficiency_score(self) -> float:
-        """Calculate overall efficiency score."""
-        ram_efficiency = max(0.0, (100.0 - self.ram_usage) / 100.0)
-        energy_efficiency = max(0.0, (10.0 - self.energy_consumption) / 10.0)
-        return (ram_efficiency + energy_efficiency) / 2.0
-    def _task_to_summary(self, task: Task, steps_taken: int) -> TaskSummary:
-        """Convert a Task to a TaskSummary for observations."""
-        remaining_steps = max(0, task.max_steps - steps_taken) if not task.completed else 0
-        progress = self._calculate_task_progress() if not task.completed else 1.0
-        return TaskSummary(
-            name=task.name,
-            description=task.description,
-            difficulty=task.difficulty,
-            ram_target=task.ram_target,
-            energy_target=task.energy_target,
-            max_steps=task.max_steps,
-            completed=task.completed,
-            remaining_steps=remaining_steps,
-            progress=progress
-        )
-    def reset(self) -> EnergyOptimizationObservation:
-        """
-        Reset the environment to initial state.
-        Returns:
-            EnergyOptimizationObservation with initial system state
-        """
-        self._state = State(episode_id=str(uuid4()), step_count=0)
-        self._reset_count += 1
-        # Reset system state
-        self.ram_usage = 80.0
-        self.energy_consumption = 8.0
-        self.system_load = 0.7
-        # Reset tasks
-        for task in self.tasks:
-            task.completed = False
-        self.current_task_index = 0
-        self.tasks_completed = []
-        # Reset baselines
-        self.baseline_ram = self.ram_usage
-        self.baseline_energy = self.energy_consumption
-        current_task = self._get_current_task()
-        return EnergyOptimizationObservation(
-            ram_usage=self.ram_usage,
-            energy_consumption=self.energy_consumption,
-            system_load=self.system_load,
-            current_task=self._task_to_summary(current_task, 0) if current_task else None,
-            tasks_completed=self.tasks_completed.copy(),
-            steps_taken=0,
-            task_progress=self._calculate_task_progress(),
-            efficiency_score=self._calculate_efficiency_score(),
-            done=False,
-            reward=0.0,
-        )
-    def step(self, action: EnergyOptimizationAction) -> EnergyOptimizationObservation:
-        """
-        Execute an optimization action in the environment.
-        Args:
-            action: EnergyOptimizationAction containing the optimization strategy
-        Returns:
-            EnergyOptimizationObservation with updated system state and reward
-        """
-        self._state.step_count += 1
-        # Calculate reward for the action
-        reward = self._calculate_reward(action)
-        # Check if episode should end
-        done = self._state.step_count >= 100 or self.current_task_index >= len(self.tasks)
-        current_task = self._get_current_task()
-        return EnergyOptimizationObservation(
-            ram_usage=self.ram_usage,
-            energy_consumption=self.energy_consumption,
-            system_load=self.system_load,
-            current_task=self._task_to_summary(current_task, self._state.step_count) if current_task else None,
-            tasks_completed=self.tasks_completed.copy(),
-            steps_taken=self._state.step_count,
-            task_progress=self._calculate_task_progress(),
-            efficiency_score=self._calculate_efficiency_score(),
-            done=done,
-            reward=reward,
-            metadata={
-                "action_taken": action.action_type,
-                "action_intensity": action.intensity,
-                "episode_step": self._state.step_count,
-                "current_task_name": current_task.name if current_task else None
-            },
-        )
-    @property
-    def state(self) -> State:
-        """
-        Get the current environment state.
-        Returns:
-            Current State with episode_id and step_count
-        """
-        return self._state

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Energy & Memory RAM Optimization Environment Implementation.
+An RL environment for training AI agents to optimize system resources including
+RAM usage and energy consumption through various optimization strategies.
+"""
+import random
+from typing import List
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+from he_demo.models import EnergyOptimizationAction, EnergyOptimizationObservation, Task, TaskSummary
+class EnergyOptimizationEnvironment(Environment):
+    """
+    Energy & Memory RAM Optimization Environment.
+    This environment simulates a computer system where an AI agent must optimize
+    RAM usage and energy consumption. The agent faces tasks of increasing difficulty
+    and receives rewards based on optimization efficiency.
+    Tasks include:
+    - Basic RAM reduction
+    - Energy optimization
+    - Resource balancing
+    - Advanced multi-objective optimization
+    The environment includes automated graders that verify task completion and
+    provide detailed feedback on optimization performance.
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self):
+        """Initialize the energy optimization environment."""
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count = 0
+        # System state
+        self.ram_usage = 80.0  # Starting RAM usage %
+        self.energy_consumption = 8.0  # Starting energy consumption kWh
+        self.system_load = 0.7  # Starting system load
+        # Task management
+        self.tasks = self._create_tasks()
+        self.current_task_index = 0
+        self.tasks_completed = []
+        # Performance tracking
+        self.baseline_ram = self.ram_usage
+        self.baseline_energy = self.energy_consumption
+    def _create_tasks(self) -> List[Task]:
+        """Create tasks with increasing difficulty."""
+        return [
+            Task(
+                name="basic_ram_reduction",
+                description="Reduce RAM usage below 70%",
+                difficulty=1,
+                ram_target=70.0,
+                energy_target=7.5,  # Slightly below initial 8.0
+                max_steps=10
+            ),
+            Task(
+                name="energy_optimization",
+                description="Reduce energy consumption below 6 kWh while maintaining RAM below 75%",
+                difficulty=2,
+                ram_target=75.0,
+                energy_target=6.0,
+                max_steps=15
+            ),
+            Task(
+                name="balanced_optimization",
+                description="Balance RAM below 60% and energy below 5 kWh",
+                difficulty=3,
+                ram_target=60.0,
+                energy_target=5.0,
+                max_steps=20
+            ),
+            Task(
+                name="advanced_efficiency",
+                description="Achieve RAM below 50% and energy below 4 kWh",
+                difficulty=4,
+                ram_target=50.0,
+                energy_target=4.0,
+                max_steps=25
+            ),
+            Task(
+                name="expert_optimization",
+                description="Master level: RAM below 40% and energy below 3 kWh",
+                difficulty=5,
+                ram_target=40.0,
+                energy_target=3.0,
+                max_steps=30
+            )
+        ]
+    def _get_current_task(self) -> Task:
+        """Get the current task, cycling through available tasks."""
+        if self.current_task_index >= len(self.tasks):
+            self.current_task_index = 0
+        return self.tasks[self.current_task_index]
+    def _calculate_reward(self, action: EnergyOptimizationAction) -> float:
+        """Calculate reward based on action effectiveness and task progress."""
+        base_reward = 0.0
+        # Action effectiveness rewards
+        if action.action_type == "reduce_ram":
+            ram_reduction = min(5.0 * action.intensity, self.ram_usage * 0.1)
+            self.ram_usage = max(0.0, self.ram_usage - ram_reduction)
+            base_reward += ram_reduction * 0.5  # Reward for RAM reduction
+            # Penalty for excessive RAM reduction (system instability)
+            if action.intensity > 0.8:
+                base_reward -= 2.0
+        elif action.action_type == "optimize_energy":
+            energy_reduction = min(1.0 * action.intensity, self.energy_consumption * 0.15)
+            self.energy_consumption = max(0.0, self.energy_consumption - energy_reduction)
+            base_reward += energy_reduction * 2.0  # Higher reward for energy savings
+            # Penalty for aggressive energy optimization (performance impact)
+            if action.intensity > 0.9:
+                self.system_load = min(1.0, self.system_load + 0.1)
+                base_reward -= 1.0
+        elif action.action_type == "balance_resources":
+            # Balanced approach: moderate improvements to both
+            ram_reduction = min(2.0 * action.intensity, self.ram_usage * 0.05)
+            energy_reduction = min(0.5 * action.intensity, self.energy_consumption * 0.1)
+            self.ram_usage = max(0.0, self.ram_usage - ram_reduction)
+            self.energy_consumption = max(0.0, self.energy_consumption - energy_reduction)
+            base_reward += (ram_reduction * 0.3 + energy_reduction * 1.5)
+        elif action.action_type == "monitor_system":
+            # Monitoring action: small reward for gathering information
+            base_reward += 0.1
+            # Slight natural system load reduction from monitoring
+            self.system_load = max(0.0, self.system_load - 0.02)
+        # Natural system changes (simulate real system behavior)
+        self._apply_system_dynamics()
+        # Task completion bonus
+        current_task = self._get_current_task()
+        if not current_task.completed and current_task.check_completion(
+            self.ram_usage, self.energy_consumption, self._state.step_count
+        ):
+            current_task.completed = True
+            self.tasks_completed.append(current_task.name)
+            base_reward += current_task.difficulty * 10.0  # Bonus for task completion
+            self.current_task_index += 1  # Move to next task
+        # Efficiency bonus
+        efficiency_improvement = (
+            (self.baseline_ram - self.ram_usage) / self.baseline_ram +
+            (self.baseline_energy - self.energy_consumption) / self.baseline_energy
+        ) * 0.5
+        base_reward += efficiency_improvement
+        return base_reward
+    def _apply_system_dynamics(self):
+        """Apply natural system dynamics and external factors."""
+        # Random external load changes
+        if random.random() < 0.1:  # 10% chance each step
+            load_change = random.uniform(-0.05, 0.05)
+            self.system_load = max(0.0, min(1.0, self.system_load + load_change))
+            # Load affects RAM and energy
+            ram_impact = load_change * 10.0
+            energy_impact = load_change * 0.5
+            self.ram_usage = max(0.0, min(100.0, self.ram_usage + ram_impact))
+            self.energy_consumption = max(0.0, self.energy_consumption + energy_impact)
+    def _calculate_task_progress(self) -> float:
+        """Calculate progress towards current task completion."""
+        current_task = self._get_current_task()
+        if current_task.completed:
+            return 1.0
+        # Calculate RAM progress (0-1 scale)
+        ram_progress = max(0.0, min(1.0, (100.0 - self.ram_usage) / (100.0 - current_task.ram_target)))
+        # Calculate energy progress (0-1 scale)
+        energy_range = 10.0 - current_task.energy_target  # Total possible energy reduction
+        if energy_range > 0:
+            energy_progress = max(0.0, min(1.0, (8.0 - self.energy_consumption) / energy_range))
+        else:
+            energy_progress = 1.0 if self.energy_consumption <= current_task.energy_target else 0.0
+        return min(1.0, (ram_progress + energy_progress) / 2.0)
+    def _calculate_efficiency_score(self) -> float:
+        """Calculate overall efficiency score."""
+        ram_efficiency = max(0.0, (100.0 - self.ram_usage) / 100.0)
+        energy_efficiency = max(0.0, (10.0 - self.energy_consumption) / 10.0)
+        return (ram_efficiency + energy_efficiency) / 2.0
+    def _task_to_summary(self, task: Task, steps_taken: int) -> TaskSummary:
+        """Convert a Task to a TaskSummary for observations."""
+        remaining_steps = max(0, task.max_steps - steps_taken) if not task.completed else 0
+        progress = self._calculate_task_progress() if not task.completed else 1.0
+        return TaskSummary(
+            name=task.name,
+            description=task.description,
+            difficulty=task.difficulty,
+            ram_target=task.ram_target,
+            energy_target=task.energy_target,
+            max_steps=task.max_steps,
+            completed=task.completed,
+            remaining_steps=remaining_steps,
+            progress=progress
+        )
+    def reset(self) -> EnergyOptimizationObservation:
+        """
+        Reset the environment to initial state.
+        Returns:
+            EnergyOptimizationObservation with initial system state
+        """
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count += 1
+        # Reset system state
+        self.ram_usage = 80.0
+        self.energy_consumption = 8.0
+        self.system_load = 0.7
+        # Reset tasks
+        for task in self.tasks:
+            task.completed = False
+        self.current_task_index = 0
+        self.tasks_completed = []
+        # Reset baselines
+        self.baseline_ram = self.ram_usage
+        self.baseline_energy = self.energy_consumption
+        current_task = self._get_current_task()
+        return EnergyOptimizationObservation(
+            ram_usage=self.ram_usage,
+            energy_consumption=self.energy_consumption,
+            system_load=self.system_load,
+            current_task=self._task_to_summary(current_task, 0) if current_task else None,
+            tasks_completed=self.tasks_completed.copy(),
+            steps_taken=0,
+            task_progress=self._calculate_task_progress(),
+            efficiency_score=self._calculate_efficiency_score(),
+            done=False,
+            reward=0.0,
+        )
+    def step(self, action: EnergyOptimizationAction) -> EnergyOptimizationObservation:
+        """
+        Execute an optimization action in the environment.
+        Args:
+            action: EnergyOptimizationAction containing the optimization strategy
+        Returns:
+            EnergyOptimizationObservation with updated system state and reward
+        """
+        self._state.step_count += 1
+        # Calculate reward for the action
+        reward = self._calculate_reward(action)
+        # Check if episode should end
+        done = self._state.step_count >= 100 or self.current_task_index >= len(self.tasks)
+        current_task = self._get_current_task()
+        return EnergyOptimizationObservation(
+            ram_usage=self.ram_usage,
+            energy_consumption=self.energy_consumption,
+            system_load=self.system_load,
+            current_task=self._task_to_summary(current_task, self._state.step_count) if current_task else None,
+            tasks_completed=self.tasks_completed.copy(),
+            steps_taken=self._state.step_count,
+            task_progress=self._calculate_task_progress(),
+            efficiency_score=self._calculate_efficiency_score(),
+            done=done,
+            reward=reward,
+            metadata={
+                "action_taken": action.action_type,
+                "action_intensity": action.intensity,
+                "episode_step": self._state.step_count,
+                "current_task_name": current_task.name if current_task else None
+            },
+        )
+    @property
+    def state(self) -> State:
+        """
+        Get the current environment state.
+        Returns:
+            Current State with episode_id and step_count
+        """
+        return self._state

server/requirements.txt CHANGED Viewed

@@ -1,6 +1,6 @@
-openenv[core]>=0.2.0
-fastapi>=0.115.0
-uvicorn>=0.24.0

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

uv.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff