Sushruth21 commited on
Commit
c7e8ea1
·
verified ·
1 Parent(s): aaaafca

Upload folder using huggingface_hub

Browse files
Dockerfile.simple ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Simple Dockerfile for Energy & Memory RAM Optimization Environment
2
+ FROM python:3.11-slim
3
+
4
+ WORKDIR /app
5
+
6
+ # Install system dependencies
7
+ RUN apt-get update && apt-get install -y \
8
+ git \
9
+ && rm -rf /var/lib/apt/lists/*
10
+
11
+ # Copy project files
12
+ COPY pyproject.toml uv.lock ./
13
+ COPY . .
14
+
15
+ # Install uv if not available
16
+ RUN pip install uv
17
+
18
+ # Install dependencies
19
+ RUN uv sync --frozen --no-install-project
20
+
21
+ # Install the project itself
22
+ RUN uv pip install -e .
23
+
24
+ # Expose port
25
+ EXPOSE 8000
26
+
27
+ # Run the server
28
+ CMD ["uv", "run", "server"]
README.md CHANGED
@@ -1,157 +1,157 @@
1
- ---
2
- title: Energy & Memory RAM Optimization Environment
3
- emoji: ⚡
4
- colorFrom: blue
5
- colorTo: green
6
- sdk: docker
7
- pinned: false
8
- app_port: 8000
9
- base_path: /web
10
- tags:
11
- - openenv
12
- - reinforcement-learning
13
- - energy-optimization
14
- - resource-management
15
- ---
16
-
17
- # Energy & Memory RAM Optimization RL Environment
18
-
19
- An OpenEnv-based reinforcement learning environment for training AI agents to optimize energy consumption and RAM usage in computer systems. The environment features tasks of increasing difficulty, automated graders for task completion verification, and sophisticated reward logic.
20
-
21
- ## Features
22
-
23
- ### AI Agent Capabilities
24
- - **Resource Detection**: Real-time monitoring of RAM usage and energy consumption
25
- - **Optimization Strategies**: Multiple action types for different optimization approaches
26
- - **Adaptive Learning**: Agents learn to balance competing objectives (RAM vs energy efficiency)
27
-
28
- ### Task Progression
29
- Tasks increase in difficulty from basic resource reduction to advanced multi-objective optimization:
30
-
31
- 1. **Basic RAM Reduction**: Reduce RAM usage below 70%
32
- 2. **Energy Optimization**: Reduce energy consumption below 6 kWh while maintaining RAM below 75%
33
- 3. **Balanced Optimization**: Balance RAM below 60% and energy below 5 kWh
34
- 4. **Advanced Efficiency**: Achieve RAM below 50% and energy below 4 kWh
35
- 5. **Expert Optimization**: Master level: RAM below 40% and energy below 3 kWh
36
-
37
- ### Automated Graders
38
- - **Task Completion Verification**: Automatic checking of optimization targets
39
- - **Performance Metrics**: Efficiency scores and progress tracking
40
- - **Reward Validation**: Ensures fair scoring based on actual improvements
41
-
42
- ### Reward Logic
43
- - **Action Effectiveness**: Rewards based on actual resource reductions achieved
44
- - **Task Completion Bonuses**: Significant rewards for meeting task objectives
45
- - **Efficiency Incentives**: Bonuses for overall system optimization
46
- - **Penalty System**: Penalties for aggressive actions that may cause system instability
47
-
48
- ## Quick Start
49
-
50
- ### Installation
51
- ```bash
52
- # Install dependencies
53
- pip install -r requirements.txt
54
-
55
- # Or using uv (recommended)
56
- uv sync
57
- ```
58
-
59
- ### Running the Environment
60
- ```bash
61
- # Start the OpenEnv server
62
- uv run server
63
-
64
- # The server will be available at http://localhost:8000
65
- ```
66
-
67
- ### Training an Agent
68
- ```python
69
- from stable_baselines3 import PPO
70
- from openenv.client import OpenEnvClient
71
-
72
- # Connect to the environment
73
- client = OpenEnvClient("http://localhost:8000")
74
-
75
- # Create and train agent
76
- model = PPO("MlpPolicy", client, verbose=1)
77
- model.learn(total_timesteps=10000)
78
-
79
- # Evaluate the trained agent
80
- obs = client.reset()
81
- total_reward = 0
82
- while not obs.done:
83
- action, _ = model.predict(obs)
84
- obs = client.step(action)
85
- total_reward += obs.reward
86
- print(f"Step reward: {obs.reward:.2f}, Total: {total_reward:.2f}")
87
- ```
88
-
89
- ## Docker
90
-
91
- ```bash
92
- # Build the container
93
- docker build -t energy-optimization-rl .
94
-
95
- # Run the environment
96
- docker run --rm -p 8000:8000 energy-optimization-rl
97
- ```
98
-
99
- ## Environment Details
100
-
101
- ### State Space
102
- - RAM usage percentage (0-100%)
103
- - Energy consumption in kWh
104
- - System load (0-1)
105
- - Current task information
106
- - Task completion progress
107
- - Efficiency scores
108
-
109
- ### Action Space
110
- - `reduce_ram`: Focus on RAM optimization with configurable intensity (0.0-1.0)
111
- - `optimize_energy`: Focus on energy reduction with configurable intensity (0.0-1.0)
112
- - `balance_resources`: Balanced approach to both resources
113
- - `monitor_system`: Gather system information and slight load reduction
114
-
115
- ### Reward Structure
116
- - Base rewards for resource reductions
117
- - Task completion bonuses (difficulty × 10 points)
118
- - Efficiency improvement bonuses
119
- - Penalties for system instability from aggressive actions
120
-
121
- ## API Endpoints
122
-
123
- - `POST /reset`: Reset the environment
124
- - `POST /step`: Execute an optimization action
125
- - `GET /state`: Get current environment state
126
- - `GET /schema`: Get action/observation schemas
127
- - `WS /ws`: WebSocket endpoint for persistent sessions
128
-
129
- ## Development
130
-
131
- ### Project Structure
132
- ```
133
- he_demo/
134
- ├── models.py # Action and observation definitions
135
- ├── server/
136
- │ ├── app.py # FastAPI server application
137
- │ └── he_demo_environment.py # Environment implementation
138
- ├── client.py # Example client code
139
- ├── inference.py # Training and inference scripts
140
- ├── Dockerfile # Container configuration
141
- ├── pyproject.toml # Project dependencies
142
- └── README.md # This file
143
- ```
144
-
145
- ### Adding New Tasks
146
- Tasks are defined in the `_create_tasks()` method of `EnergyOptimizationEnvironment`. Each task includes:
147
- - Name and description
148
- - Difficulty level
149
- - RAM and energy targets
150
- - Maximum steps allowed
151
-
152
- ### Customizing Reward Logic
153
- Modify the `_calculate_reward()` method to implement custom reward strategies based on your specific optimization goals.
154
-
155
- ## License
156
-
157
- This project is licensed under the BSD-style license. See LICENSE file for details.
 
1
+ ---
2
+ title: Energy & Memory RAM Optimization Environment
3
+ emoji: ⚡
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
+ - reinforcement-learning
13
+ - energy-optimization
14
+ - resource-management
15
+ ---
16
+
17
+ # Energy & Memory RAM Optimization RL Environment
18
+
19
+ An OpenEnv-based reinforcement learning environment for training AI agents to optimize energy consumption and RAM usage in computer systems. The environment features tasks of increasing difficulty, automated graders for task completion verification, and sophisticated reward logic.
20
+
21
+ ## Features
22
+
23
+ ### AI Agent Capabilities
24
+ - **Resource Detection**: Real-time monitoring of RAM usage and energy consumption
25
+ - **Optimization Strategies**: Multiple action types for different optimization approaches
26
+ - **Adaptive Learning**: Agents learn to balance competing objectives (RAM vs energy efficiency)
27
+
28
+ ### Task Progression
29
+ Tasks increase in difficulty from basic resource reduction to advanced multi-objective optimization:
30
+
31
+ 1. **Basic RAM Reduction**: Reduce RAM usage below 70%
32
+ 2. **Energy Optimization**: Reduce energy consumption below 6 kWh while maintaining RAM below 75%
33
+ 3. **Balanced Optimization**: Balance RAM below 60% and energy below 5 kWh
34
+ 4. **Advanced Efficiency**: Achieve RAM below 50% and energy below 4 kWh
35
+ 5. **Expert Optimization**: Master level: RAM below 40% and energy below 3 kWh
36
+
37
+ ### Automated Graders
38
+ - **Task Completion Verification**: Automatic checking of optimization targets
39
+ - **Performance Metrics**: Efficiency scores and progress tracking
40
+ - **Reward Validation**: Ensures fair scoring based on actual improvements
41
+
42
+ ### Reward Logic
43
+ - **Action Effectiveness**: Rewards based on actual resource reductions achieved
44
+ - **Task Completion Bonuses**: Significant rewards for meeting task objectives
45
+ - **Efficiency Incentives**: Bonuses for overall system optimization
46
+ - **Penalty System**: Penalties for aggressive actions that may cause system instability
47
+
48
+ ## Quick Start
49
+
50
+ ### Installation
51
+ ```bash
52
+ # Install dependencies
53
+ pip install -r requirements.txt
54
+
55
+ # Or using uv (recommended)
56
+ uv sync
57
+ ```
58
+
59
+ ### Running the Environment
60
+ ```bash
61
+ # Start the OpenEnv server
62
+ uv run server
63
+
64
+ # The server will be available at http://localhost:8000
65
+ ```
66
+
67
+ ### Training an Agent
68
+ ```python
69
+ from stable_baselines3 import PPO
70
+ from openenv.client import OpenEnvClient
71
+
72
+ # Connect to the environment
73
+ client = OpenEnvClient("http://localhost:8000")
74
+
75
+ # Create and train agent
76
+ model = PPO("MlpPolicy", client, verbose=1)
77
+ model.learn(total_timesteps=10000)
78
+
79
+ # Evaluate the trained agent
80
+ obs = client.reset()
81
+ total_reward = 0
82
+ while not obs.done:
83
+ action, _ = model.predict(obs)
84
+ obs = client.step(action)
85
+ total_reward += obs.reward
86
+ print(f"Step reward: {obs.reward:.2f}, Total: {total_reward:.2f}")
87
+ ```
88
+
89
+ ## Docker
90
+
91
+ ```bash
92
+ # Build the container
93
+ docker build -t energy-optimization-rl .
94
+
95
+ # Run the environment
96
+ docker run --rm -p 8000:8000 energy-optimization-rl
97
+ ```
98
+
99
+ ## Environment Details
100
+
101
+ ### State Space
102
+ - RAM usage percentage (0-100%)
103
+ - Energy consumption in kWh
104
+ - System load (0-1)
105
+ - Current task information
106
+ - Task completion progress
107
+ - Efficiency scores
108
+
109
+ ### Action Space
110
+ - `reduce_ram`: Focus on RAM optimization with configurable intensity (0.0-1.0)
111
+ - `optimize_energy`: Focus on energy reduction with configurable intensity (0.0-1.0)
112
+ - `balance_resources`: Balanced approach to both resources
113
+ - `monitor_system`: Gather system information and slight load reduction
114
+
115
+ ### Reward Structure
116
+ - Base rewards for resource reductions
117
+ - Task completion bonuses (difficulty × 10 points)
118
+ - Efficiency improvement bonuses
119
+ - Penalties for system instability from aggressive actions
120
+
121
+ ## API Endpoints
122
+
123
+ - `POST /reset`: Reset the environment
124
+ - `POST /step`: Execute an optimization action
125
+ - `GET /state`: Get current environment state
126
+ - `GET /schema`: Get action/observation schemas
127
+ - `WS /ws`: WebSocket endpoint for persistent sessions
128
+
129
+ ## Development
130
+
131
+ ### Project Structure
132
+ ```
133
+ he_demo/
134
+ ├── models.py # Action and observation definitions
135
+ ├── server/
136
+ │ ├── app.py # FastAPI server application
137
+ │ └── he_demo_environment.py # Environment implementation
138
+ ├── client.py # Example client code
139
+ ├── inference.py # Training and inference scripts
140
+ ├── Dockerfile # Container configuration
141
+ ├── pyproject.toml # Project dependencies
142
+ └── README.md # This file
143
+ ```
144
+
145
+ ### Adding New Tasks
146
+ Tasks are defined in the `_create_tasks()` method of `EnergyOptimizationEnvironment`. Each task includes:
147
+ - Name and description
148
+ - Difficulty level
149
+ - RAM and energy targets
150
+ - Maximum steps allowed
151
+
152
+ ### Customizing Reward Logic
153
+ Modify the `_calculate_reward()` method to implement custom reward strategies based on your specific optimization goals.
154
+
155
+ ## License
156
+
157
+ This project is licensed under the BSD-style license. See LICENSE file for details.
__init__.py CHANGED
@@ -1,17 +1,17 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the BSD-style license found in the
5
- # LICENSE file in the root directory of this source tree.
6
-
7
- """Energy & Memory RAM Optimization Environment."""
8
-
9
- from .client import EnergyOptimizationEnv
10
- from .models import EnergyOptimizationAction, EnergyOptimizationObservation, Task
11
-
12
- __all__ = [
13
- "EnergyOptimizationAction",
14
- "EnergyOptimizationObservation",
15
- "Task",
16
- "EnergyOptimizationEnv",
17
- ]
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Energy & Memory RAM Optimization Environment."""
8
+
9
+ from .client import EnergyOptimizationEnv
10
+ from .models import EnergyOptimizationAction, EnergyOptimizationObservation, Task
11
+
12
+ __all__ = [
13
+ "EnergyOptimizationAction",
14
+ "EnergyOptimizationObservation",
15
+ "Task",
16
+ "EnergyOptimizationEnv",
17
+ ]
client.py CHANGED
@@ -1,120 +1,123 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the BSD-style license found in the
5
- # LICENSE file in the root directory of this source tree.
6
-
7
- """He Demo Environment Client."""
8
-
9
- from typing import Dict
10
-
11
- from openenv.core import EnvClient
12
- from openenv.core.client_types import StepResult
13
- from openenv.core.env_server.types import State
14
-
15
- from .models import EnergyOptimizationAction, EnergyOptimizationObservation, Task
16
-
17
-
18
- class EnergyOptimizationEnv(
19
- EnvClient[EnergyOptimizationAction, EnergyOptimizationObservation, State]
20
- ):
21
- """
22
- Client for the Energy & Memory RAM Optimization Environment.
23
-
24
- This client maintains a persistent WebSocket connection to the environment server,
25
- enabling efficient multi-step interactions with lower latency.
26
- Each client instance has its own dedicated environment session on the server.
27
-
28
- Example:
29
- >>> # Connect to a running server
30
- >>> with EnergyOptimizationEnv(base_url="http://localhost:8000") as client:
31
- ... result = client.reset()
32
- ... print(f"RAM: {result.observation.ram_usage:.1f}%, Energy: {result.observation.energy_consumption:.1f} kWh")
33
- ...
34
- ... result = client.step(EnergyOptimizationAction(action_type="reduce_ram", intensity=0.8))
35
- ... print(f"Task: {result.observation.current_task.name if result.observation.current_task else 'None'}")
36
-
37
- Example with Docker:
38
- >>> # Automatically start container and connect
39
- >>> client = EnergyOptimizationEnv.from_docker_image("energy-optimization-env:latest")
40
- >>> try:
41
- ... result = client.reset()
42
- ... result = client.step(EnergyOptimizationAction(action_type="balance_resources", intensity=0.6))
43
- ... finally:
44
- ... client.close()
45
- """
46
-
47
- def _step_payload(self, action: EnergyOptimizationAction) -> Dict:
48
- """
49
- Convert EnergyOptimizationAction to JSON payload for step message.
50
-
51
- Args:
52
- action: EnergyOptimizationAction instance
53
-
54
- Returns:
55
- Dictionary representation suitable for JSON encoding
56
- """
57
- return {
58
- "action_type": action.action_type,
59
- "intensity": action.intensity,
60
- }
61
-
62
- def _parse_result(self, payload: Dict) -> StepResult[EnergyOptimizationObservation]:
63
- """
64
- Parse server response into StepResult[EnergyOptimizationObservation].
65
-
66
- Args:
67
- payload: JSON response data from server
68
-
69
- Returns:
70
- StepResult with EnergyOptimizationObservation
71
- """
72
- obs_data = payload.get("observation", {})
73
-
74
- # Parse current task if present
75
- current_task = None
76
- if obs_data.get("current_task"):
77
- task_data = obs_data["current_task"]
78
- current_task = Task(
79
- name=task_data.get("name", ""),
80
- description=task_data.get("description", ""),
81
- difficulty=task_data.get("difficulty", 1),
82
- ram_target=task_data.get("ram_target", 100.0),
83
- energy_target=task_data.get("energy_target", 10.0),
84
- max_steps=task_data.get("max_steps", 10)
85
- )
86
-
87
- observation = EnergyOptimizationObservation(
88
- ram_usage=obs_data.get("ram_usage", 0.0),
89
- energy_consumption=obs_data.get("energy_consumption", 0.0),
90
- system_load=obs_data.get("system_load", 0.0),
91
- current_task=current_task,
92
- tasks_completed=obs_data.get("tasks_completed", []),
93
- steps_taken=obs_data.get("steps_taken", 0),
94
- task_progress=obs_data.get("task_progress", 0.0),
95
- efficiency_score=obs_data.get("efficiency_score", 0.0),
96
- done=payload.get("done", False),
97
- reward=payload.get("reward"),
98
- metadata=obs_data.get("metadata", {}),
99
- )
100
-
101
- return StepResult(
102
- observation=observation,
103
- reward=payload.get("reward"),
104
- done=payload.get("done", False),
105
- )
106
-
107
- def _parse_state(self, payload: Dict) -> State:
108
- """
109
- Parse server response into State object.
110
-
111
- Args:
112
- payload: JSON response from state request
113
-
114
- Returns:
115
- State object with episode_id and step_count
116
- """
117
- return State(
118
- episode_id=payload.get("episode_id"),
119
- step_count=payload.get("step_count", 0),
120
- )
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """He Demo Environment Client."""
8
+
9
+ from typing import Dict
10
+
11
+ from openenv.core import EnvClient
12
+ from openenv.core.client_types import StepResult
13
+ from openenv.core.env_server.types import State
14
+
15
+ from .models import EnergyOptimizationAction, EnergyOptimizationObservation, Task, TaskSummary
16
+
17
+
18
+ class EnergyOptimizationEnv(
19
+ EnvClient[EnergyOptimizationAction, EnergyOptimizationObservation, State]
20
+ ):
21
+ """
22
+ Client for the Energy & Memory RAM Optimization Environment.
23
+
24
+ This client maintains a persistent WebSocket connection to the environment server,
25
+ enabling efficient multi-step interactions with lower latency.
26
+ Each client instance has its own dedicated environment session on the server.
27
+
28
+ Example:
29
+ >>> # Connect to a running server
30
+ >>> with EnergyOptimizationEnv(base_url="http://localhost:8000") as client:
31
+ ... result = client.reset()
32
+ ... print(f"RAM: {result.observation.ram_usage:.1f}%, Energy: {result.observation.energy_consumption:.1f} kWh")
33
+ ...
34
+ ... result = client.step(EnergyOptimizationAction(action_type="reduce_ram", intensity=0.8))
35
+ ... print(f"Task: {result.observation.current_task.name if result.observation.current_task else 'None'}")
36
+
37
+ Example with Docker:
38
+ >>> # Automatically start container and connect
39
+ >>> client = EnergyOptimizationEnv.from_docker_image("energy-optimization-env:latest")
40
+ >>> try:
41
+ ... result = client.reset()
42
+ ... result = client.step(EnergyOptimizationAction(action_type="balance_resources", intensity=0.6))
43
+ ... finally:
44
+ ... client.close()
45
+ """
46
+
47
+ def _step_payload(self, action: EnergyOptimizationAction) -> Dict:
48
+ """
49
+ Convert EnergyOptimizationAction to JSON payload for step message.
50
+
51
+ Args:
52
+ action: EnergyOptimizationAction instance
53
+
54
+ Returns:
55
+ Dictionary representation suitable for JSON encoding
56
+ """
57
+ return {
58
+ "action_type": action.action_type,
59
+ "intensity": action.intensity,
60
+ }
61
+
62
+ def _parse_result(self, payload: Dict) -> StepResult[EnergyOptimizationObservation]:
63
+ """
64
+ Parse server response into StepResult[EnergyOptimizationObservation].
65
+
66
+ Args:
67
+ payload: JSON response data from server
68
+
69
+ Returns:
70
+ StepResult with EnergyOptimizationObservation
71
+ """
72
+ obs_data = payload.get("observation", {})
73
+
74
+ # Parse current task if present
75
+ current_task = None
76
+ if obs_data.get("current_task"):
77
+ task_data = obs_data["current_task"]
78
+ current_task = TaskSummary(
79
+ name=task_data.get("name", ""),
80
+ description=task_data.get("description", ""),
81
+ difficulty=task_data.get("difficulty", 1),
82
+ ram_target=task_data.get("ram_target", 100.0),
83
+ energy_target=task_data.get("energy_target", 10.0),
84
+ max_steps=task_data.get("max_steps", 10),
85
+ completed=task_data.get("completed", False),
86
+ remaining_steps=task_data.get("remaining_steps"),
87
+ progress=task_data.get("progress", 0.0)
88
+ )
89
+
90
+ observation = EnergyOptimizationObservation(
91
+ ram_usage=obs_data.get("ram_usage", 0.0),
92
+ energy_consumption=obs_data.get("energy_consumption", 0.0),
93
+ system_load=obs_data.get("system_load", 0.0),
94
+ current_task=current_task,
95
+ tasks_completed=obs_data.get("tasks_completed", []),
96
+ steps_taken=obs_data.get("steps_taken", 0),
97
+ task_progress=obs_data.get("task_progress", 0.0),
98
+ efficiency_score=obs_data.get("efficiency_score", 0.0),
99
+ done=payload.get("done", False),
100
+ reward=payload.get("reward"),
101
+ metadata=obs_data.get("metadata", {}),
102
+ )
103
+
104
+ return StepResult(
105
+ observation=observation,
106
+ reward=payload.get("reward"),
107
+ done=payload.get("done", False),
108
+ )
109
+
110
+ def _parse_state(self, payload: Dict) -> State:
111
+ """
112
+ Parse server response into State object.
113
+
114
+ Args:
115
+ payload: JSON response from state request
116
+
117
+ Returns:
118
+ State object with episode_id and step_count
119
+ """
120
+ return State(
121
+ episode_id=payload.get("episode_id"),
122
+ step_count=payload.get("step_count", 0),
123
+ )
inference.py CHANGED
@@ -5,22 +5,42 @@ This script demonstrates how an AI agent can learn to optimize energy consumptio
5
  and RAM usage through reinforcement learning in the Energy Optimization Environment.
6
 
7
  The agent uses an LLM to make strategic decisions about resource optimization actions.
 
 
 
 
 
 
 
 
 
 
 
 
8
  """
9
 
 
10
  import os
 
11
  import textwrap
12
  from typing import List, Optional
13
 
14
- from openai import OpenAI
15
 
16
  from he_demo.client import EnergyOptimizationEnv
17
  from he_demo.models import EnergyOptimizationAction
18
 
19
- IMAGE_NAME = os.getenv("IMAGE_NAME")
20
- API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
 
 
 
 
 
 
 
 
21
 
22
- API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
23
- MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
24
  TASK_NAME = os.getenv("ENERGY_TASK", "energy_optimization")
25
  BENCHMARK = os.getenv("ENERGY_BENCHMARK", "energy_optimization")
26
  MAX_STEPS = 50 # More steps for complex optimization tasks
@@ -156,19 +176,60 @@ def get_model_action(
156
  )
157
  action_text = (completion.choices[0].message.content or "").strip()
158
  return parse_action(action_text)
 
 
 
 
 
 
 
 
 
 
 
 
 
159
  except Exception as exc:
160
- print(f"[DEBUG] Model request failed: {exc}", flush=True)
161
  return EnergyOptimizationAction(action_type="monitor_system", intensity=0.5)
162
 
163
 
164
- def main() -> None:
165
- client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
 
 
166
 
167
- env = (
168
- EnergyOptimizationEnv.from_docker_image(IMAGE_NAME)
169
- if IMAGE_NAME
170
- else EnergyOptimizationEnv(base_url="http://localhost:8000")
171
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
  history: List[str] = []
174
  rewards: List[float] = []
@@ -179,7 +240,7 @@ def main() -> None:
179
  log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
180
 
181
  try:
182
- result = env.reset()
183
  last_reward = 0.0
184
 
185
  for step in range(1, MAX_STEPS + 1):
@@ -190,7 +251,7 @@ def main() -> None:
190
  action = get_model_action(client, step, result.observation, last_reward, history)
191
 
192
  # Execute action
193
- result = env.step(action)
194
  obs = result.observation
195
 
196
  reward = result.reward or 0.0
@@ -224,11 +285,11 @@ def main() -> None:
224
 
225
  finally:
226
  try:
227
- env.close()
228
  except Exception as e:
229
  print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
230
  log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
231
 
232
 
233
  if __name__ == "__main__":
234
- main()
 
5
  and RAM usage through reinforcement learning in the Energy Optimization Environment.
6
 
7
  The agent uses an LLM to make strategic decisions about resource optimization actions.
8
+
9
+ Required Environment Variables:
10
+ - API_BASE_URL: The API endpoint for the LLM (for Hugging Face router, use https://router.huggingface.co/v1)
11
+ - MODEL_NAME: The model identifier to use for inference
12
+ - HF_TOKEN: Your Hugging Face API key with inference permissions
13
+ - LOCAL_IMAGE_NAME: The name of the local image to use for the environment (optional)
14
+
15
+ Example setup:
16
+ export API_BASE_URL="https://router.huggingface.co/v1"
17
+ export MODEL_NAME="OpenAssistant/oasst-sft-1-pythia-12b"
18
+ export HF_TOKEN="hf_..."
19
+ export LOCAL_IMAGE_NAME="your-docker-image" # Optional
20
  """
21
 
22
+ import asyncio
23
  import os
24
+ import subprocess
25
  import textwrap
26
  from typing import List, Optional
27
 
28
+ from openai import OpenAI, OpenAIError
29
 
30
  from he_demo.client import EnergyOptimizationEnv
31
  from he_demo.models import EnergyOptimizationAction
32
 
33
+ # Environment configuration variables
34
+ # Default endpoint uses Hugging Face's router; set API_BASE_URL explicitly if needed.
35
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
36
+ MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
37
+ HF_TOKEN = os.getenv("HF_TOKEN")
38
+ LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME")
39
+ LOCAL_SERVER_URL = os.getenv("LOCAL_SERVER_URL", "http://localhost:8000")
40
+
41
+ # Use HF_TOKEN as API key for OpenAI client
42
+ API_KEY = HF_TOKEN
43
 
 
 
44
  TASK_NAME = os.getenv("ENERGY_TASK", "energy_optimization")
45
  BENCHMARK = os.getenv("ENERGY_BENCHMARK", "energy_optimization")
46
  MAX_STEPS = 50 # More steps for complex optimization tasks
 
176
  )
177
  action_text = (completion.choices[0].message.content or "").strip()
178
  return parse_action(action_text)
179
+ except OpenAIError as exc:
180
+ error_text = str(exc)
181
+ print(f"[DEBUG] Model request failed: {error_text}", flush=True)
182
+ status_code = getattr(exc, 'status_code', None)
183
+
184
+ if status_code == 403 or "403" in error_text or "insufficient permissions" in error_text.lower():
185
+ raise RuntimeError(
186
+ "Hugging Face authentication failed: your token does not have sufficient inference permissions. "
187
+ "Use a token with inference access or switch to an active model/endpoint you are authorized for. "
188
+ "If you are using the Hugging Face router, ensure HF_TOKEN has the `inference` scope and that MODEL_NAME is accessible."
189
+ ) from exc
190
+
191
+ return EnergyOptimizationAction(action_type="monitor_system", intensity=0.5)
192
  except Exception as exc:
193
+ print(f"[DEBUG] Unexpected model request failure: {exc}", flush=True)
194
  return EnergyOptimizationAction(action_type="monitor_system", intensity=0.5)
195
 
196
 
197
+ async def main() -> None:
198
+ # Validate required environment variables
199
+ if not API_BASE_URL or API_BASE_URL == "<your-active-endpoint>":
200
+ raise ValueError("API_BASE_URL environment variable must be set to your active LLM endpoint")
201
 
202
+ if not MODEL_NAME or MODEL_NAME == "<your-active-model>":
203
+ raise ValueError("MODEL_NAME environment variable must be set to your active model identifier")
204
+
205
+ if not HF_TOKEN:
206
+ raise ValueError("HF_TOKEN environment variable must be set to your Hugging Face API key")
207
+
208
+ client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
209
+
210
+ async def local_image_exists(image_name: str) -> bool:
211
+ try:
212
+ result = subprocess.run(
213
+ ["docker", "images", "--format", "{{.Repository}}:{{.Tag}}"],
214
+ capture_output=True,
215
+ text=True,
216
+ check=True,
217
+ )
218
+ return image_name in result.stdout.splitlines()
219
+ except Exception:
220
+ return False
221
+
222
+ if LOCAL_IMAGE_NAME:
223
+ if await local_image_exists(LOCAL_IMAGE_NAME):
224
+ env = await EnergyOptimizationEnv.from_docker_image(LOCAL_IMAGE_NAME)
225
+ else:
226
+ print(
227
+ f"[WARN] Docker image '{LOCAL_IMAGE_NAME}' not found locally. Falling back to local server at {LOCAL_SERVER_URL}",
228
+ flush=True,
229
+ )
230
+ env = EnergyOptimizationEnv(base_url=LOCAL_SERVER_URL)
231
+ else:
232
+ env = EnergyOptimizationEnv(base_url=LOCAL_SERVER_URL)
233
 
234
  history: List[str] = []
235
  rewards: List[float] = []
 
240
  log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
241
 
242
  try:
243
+ result = await env.reset()
244
  last_reward = 0.0
245
 
246
  for step in range(1, MAX_STEPS + 1):
 
251
  action = get_model_action(client, step, result.observation, last_reward, history)
252
 
253
  # Execute action
254
+ result = await env.step(action)
255
  obs = result.observation
256
 
257
  reward = result.reward or 0.0
 
285
 
286
  finally:
287
  try:
288
+ await env.close()
289
  except Exception as e:
290
  print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
291
  log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
292
 
293
 
294
  if __name__ == "__main__":
295
+ asyncio.run(main())
models.py CHANGED
@@ -1,74 +1,74 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the BSD-style license found in the
5
- # LICENSE file in the root directory of this source tree.
6
-
7
- """
8
- Data models for the Energy & Memory RAM Optimization Environment.
9
-
10
- This environment simulates system resource optimization tasks where an AI agent
11
- must optimize RAM usage and energy consumption through various actions.
12
- """
13
-
14
- from typing import List, Optional
15
- from openenv.core.env_server.types import Action, Observation
16
- from pydantic import BaseModel, Field
17
-
18
-
19
- class EnergyOptimizationAction(Action):
20
- """Action for the Energy & Memory RAM Optimization environment."""
21
-
22
- action_type: str = Field(
23
- ...,
24
- description="Type of optimization action: 'reduce_ram', 'optimize_energy', 'balance_resources', 'monitor_system'"
25
- )
26
- intensity: float = Field(
27
- 1.0,
28
- description="Intensity of the action (0.0 to 1.0), affects effectiveness and potential side effects"
29
- )
30
-
31
-
32
- class Task(BaseModel):
33
- """Represents an optimization task with difficulty and requirements."""
34
-
35
- name: str = Field(..., description="Unique name of the task")
36
- description: str = Field(..., description="Human-readable description of the task")
37
- difficulty: int = Field(..., description="Difficulty level (1-5)")
38
- ram_target: float = Field(..., description="Target RAM usage percentage (lower is better)")
39
- energy_target: float = Field(..., description="Target energy consumption (lower is better)")
40
- max_steps: int = Field(..., description="Maximum steps allowed to complete the task")
41
- completed: bool = Field(default=False, description="Whether the task has been completed")
42
-
43
- def check_completion(self, ram_usage: float, energy_consumption: float, steps_taken: int) -> bool:
44
- """Check if the task is completed based on current system state."""
45
- if steps_taken > self.max_steps:
46
- return False
47
- return ram_usage <= self.ram_target and energy_consumption <= self.energy_target
48
-
49
-
50
- class TaskSummary(BaseModel):
51
- """Serializable task summary exposed in observations."""
52
-
53
- name: str = Field(..., description="Task identifier")
54
- description: str = Field(..., description="Task description")
55
- difficulty: int = Field(..., description="Task difficulty level")
56
- ram_target: float = Field(..., description="RAM usage target percentage")
57
- energy_target: float = Field(..., description="Energy consumption target in kWh")
58
- max_steps: int = Field(..., description="Maximum allowed steps for the task")
59
- completed: bool = Field(False, description="Whether the task is completed")
60
- remaining_steps: Optional[int] = Field(None, description="Remaining steps before the task deadline")
61
- progress: float = Field(..., description="Estimated progress toward task completion (0-1)")
62
-
63
-
64
- class EnergyOptimizationObservation(Observation):
65
- """Observation from the Energy & Memory RAM Optimization environment."""
66
-
67
- ram_usage: float = Field(..., description="Current RAM usage percentage (0-100)")
68
- energy_consumption: float = Field(..., description="Current energy consumption in kWh")
69
- system_load: float = Field(..., description="Overall system load (0-1)")
70
- current_task: Optional[TaskSummary] = Field(None, description="Current optimization task")
71
- tasks_completed: List[str] = Field(default_factory=list, description="List of completed task names")
72
- steps_taken: int = Field(..., description="Number of steps taken in current episode")
73
- task_progress: float = Field(..., description="Progress towards current task completion (0-1)")
74
- efficiency_score: float = Field(..., description="Overall efficiency score based on optimization")
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Data models for the Energy & Memory RAM Optimization Environment.
9
+
10
+ This environment simulates system resource optimization tasks where an AI agent
11
+ must optimize RAM usage and energy consumption through various actions.
12
+ """
13
+
14
+ from typing import List, Optional
15
+ from openenv.core.env_server.types import Action, Observation
16
+ from pydantic import BaseModel, Field
17
+
18
+
19
+ class EnergyOptimizationAction(Action):
20
+ """Action for the Energy & Memory RAM Optimization environment."""
21
+
22
+ action_type: str = Field(
23
+ ...,
24
+ description="Type of optimization action: 'reduce_ram', 'optimize_energy', 'balance_resources', 'monitor_system'"
25
+ )
26
+ intensity: float = Field(
27
+ 1.0,
28
+ description="Intensity of the action (0.0 to 1.0), affects effectiveness and potential side effects"
29
+ )
30
+
31
+
32
+ class Task(BaseModel):
33
+ """Represents an optimization task with difficulty and requirements."""
34
+
35
+ name: str = Field(..., description="Unique name of the task")
36
+ description: str = Field(..., description="Human-readable description of the task")
37
+ difficulty: int = Field(..., description="Difficulty level (1-5)")
38
+ ram_target: float = Field(..., description="Target RAM usage percentage (lower is better)")
39
+ energy_target: float = Field(..., description="Target energy consumption (lower is better)")
40
+ max_steps: int = Field(..., description="Maximum steps allowed to complete the task")
41
+ completed: bool = Field(default=False, description="Whether the task has been completed")
42
+
43
+ def check_completion(self, ram_usage: float, energy_consumption: float, steps_taken: int) -> bool:
44
+ """Check if the task is completed based on current system state."""
45
+ if steps_taken > self.max_steps:
46
+ return False
47
+ return ram_usage <= self.ram_target and energy_consumption <= self.energy_target
48
+
49
+
50
+ class TaskSummary(BaseModel):
51
+ """Serializable task summary exposed in observations."""
52
+
53
+ name: str = Field(..., description="Task identifier")
54
+ description: str = Field(..., description="Task description")
55
+ difficulty: int = Field(..., description="Task difficulty level")
56
+ ram_target: float = Field(..., description="RAM usage target percentage")
57
+ energy_target: float = Field(..., description="Energy consumption target in kWh")
58
+ max_steps: int = Field(..., description="Maximum allowed steps for the task")
59
+ completed: bool = Field(False, description="Whether the task is completed")
60
+ remaining_steps: Optional[int] = Field(None, description="Remaining steps before the task deadline")
61
+ progress: float = Field(..., description="Estimated progress toward task completion (0-1)")
62
+
63
+
64
+ class EnergyOptimizationObservation(Observation):
65
+ """Observation from the Energy & Memory RAM Optimization environment."""
66
+
67
+ ram_usage: float = Field(..., description="Current RAM usage percentage (0-100)")
68
+ energy_consumption: float = Field(..., description="Current energy consumption in kWh")
69
+ system_load: float = Field(..., description="Overall system load (0-1)")
70
+ current_task: Optional[TaskSummary] = Field(None, description="Current optimization task")
71
+ tasks_completed: List[str] = Field(default_factory=list, description="List of completed task names")
72
+ steps_taken: int = Field(..., description="Number of steps taken in current episode")
73
+ task_progress: float = Field(..., description="Progress towards current task completion (0-1)")
74
+ efficiency_score: float = Field(..., description="Overall efficiency score based on optimization")
openenv.yaml CHANGED
@@ -1,7 +1,7 @@
1
- spec_version: 1
2
- name: energy_optimization
3
- type: space
4
- runtime: fastapi
5
- app: server.app:app
6
- port: 8000
7
-
 
1
+ spec_version: 1
2
+ name: energy_optimization
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+
openenv_he_demo.egg-info/SOURCES.txt CHANGED
@@ -1,25 +1,25 @@
1
- README.md
2
- __init__.py
3
- client.py
4
- inference.py
5
- models.py
6
- pyproject.toml
7
- test_environment.py
8
- validate.py
9
- ./__init__.py
10
- ./client.py
11
- ./gym_wrapper.py
12
- ./inference.py
13
- ./models.py
14
- ./test_environment.py
15
- ./train_agent.py
16
- ./validate.py
17
- openenv_he_demo.egg-info/PKG-INFO
18
- openenv_he_demo.egg-info/SOURCES.txt
19
- openenv_he_demo.egg-info/dependency_links.txt
20
- openenv_he_demo.egg-info/entry_points.txt
21
- openenv_he_demo.egg-info/requires.txt
22
- openenv_he_demo.egg-info/top_level.txt
23
- server/__init__.py
24
- server/app.py
25
  server/he_demo_environment.py
 
1
+ README.md
2
+ __init__.py
3
+ client.py
4
+ inference.py
5
+ models.py
6
+ pyproject.toml
7
+ test_environment.py
8
+ validate.py
9
+ ./__init__.py
10
+ ./client.py
11
+ ./gym_wrapper.py
12
+ ./inference.py
13
+ ./models.py
14
+ ./test_environment.py
15
+ ./train_agent.py
16
+ ./validate.py
17
+ openenv_he_demo.egg-info/PKG-INFO
18
+ openenv_he_demo.egg-info/SOURCES.txt
19
+ openenv_he_demo.egg-info/dependency_links.txt
20
+ openenv_he_demo.egg-info/entry_points.txt
21
+ openenv_he_demo.egg-info/requires.txt
22
+ openenv_he_demo.egg-info/top_level.txt
23
+ server/__init__.py
24
+ server/app.py
25
  server/he_demo_environment.py
openenv_he_demo.egg-info/dependency_links.txt CHANGED
@@ -1 +1 @@
1
-
 
1
+
openenv_he_demo.egg-info/entry_points.txt CHANGED
@@ -1,2 +1,2 @@
1
- [console_scripts]
2
- server = he_demo.server.app:main
 
1
+ [console_scripts]
2
+ server = he_demo.server.app:main
openenv_he_demo.egg-info/requires.txt CHANGED
@@ -1,10 +1,10 @@
1
- openenv-core[core]>=0.2.2
2
- numpy>=1.19.0
3
- pandas>=1.3.0
4
- gymnasium>=0.29.0
5
- stable-baselines3>=2.0.0
6
- torch>=2.0.0
7
-
8
- [dev]
9
- pytest>=8.0.0
10
- pytest-cov>=4.0.0
 
1
+ openenv-core[core]>=0.2.2
2
+ numpy>=1.19.0
3
+ pandas>=1.3.0
4
+ gymnasium>=0.29.0
5
+ stable-baselines3>=2.0.0
6
+ torch>=2.0.0
7
+
8
+ [dev]
9
+ pytest>=8.0.0
10
+ pytest-cov>=4.0.0
openenv_he_demo.egg-info/top_level.txt CHANGED
@@ -1 +1 @@
1
- he_demo
 
1
+ he_demo
pyproject.toml CHANGED
@@ -1,45 +1,45 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the BSD-style license found in the
5
- # LICENSE file in the root directory of this source tree.
6
-
7
- [build-system]
8
- requires = ["setuptools>=45", "wheel"]
9
- build-backend = "setuptools.build_meta"
10
-
11
- [project]
12
- name = "openenv-he_demo"
13
- version = "0.1.0"
14
- description = "He Demo environment for OpenEnv"
15
- requires-python = ">=3.10"
16
- dependencies = [
17
- # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
18
- # install from github
19
- # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
- "openenv-core[core]>=0.2.2",
21
- # Environment-specific dependencies
22
- # Add all dependencies needed for your environment here
23
- # Examples:
24
- "numpy>=1.19.0",
25
- "pandas>=1.3.0",
26
- "gymnasium>=0.29.0",
27
- "stable-baselines3>=2.0.0",
28
- "torch>=2.0.0",
29
- ]
30
-
31
- [project.optional-dependencies]
32
- dev = [
33
- "pytest>=8.0.0",
34
- "pytest-cov>=4.0.0",
35
- ]
36
-
37
- [project.scripts]
38
- # Server entry point - enables running via: uv run --project . server
39
- # or: python -m he_demo.server.app
40
- server = "he_demo.server.app:main"
41
-
42
- [tool.setuptools]
43
- include-package-data = true
44
- packages = ["he_demo", "he_demo.server"]
45
  package-dir = { "he_demo" = ".", "he_demo.server" = "server" }
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ [build-system]
8
+ requires = ["setuptools>=45", "wheel"]
9
+ build-backend = "setuptools.build_meta"
10
+
11
+ [project]
12
+ name = "openenv-he_demo"
13
+ version = "0.1.0"
14
+ description = "He Demo environment for OpenEnv"
15
+ requires-python = ">=3.10"
16
+ dependencies = [
17
+ # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
18
+ # install from github
19
+ # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
+ "openenv-core[core]>=0.2.2",
21
+ # Environment-specific dependencies
22
+ # Add all dependencies needed for your environment here
23
+ # Examples:
24
+ "numpy>=1.19.0",
25
+ "pandas>=1.3.0",
26
+ "gymnasium>=0.29.0",
27
+ "stable-baselines3>=2.0.0",
28
+ "torch>=2.0.0",
29
+ ]
30
+
31
+ [project.optional-dependencies]
32
+ dev = [
33
+ "pytest>=8.0.0",
34
+ "pytest-cov>=4.0.0",
35
+ ]
36
+
37
+ [project.scripts]
38
+ # Server entry point - enables running via: uv run --project . server
39
+ # or: python -m he_demo.server.app
40
+ server = "he_demo.server.app:main"
41
+
42
+ [tool.setuptools]
43
+ include-package-data = true
44
+ packages = ["he_demo", "he_demo.server"]
45
  package-dir = { "he_demo" = ".", "he_demo.server" = "server" }
server/__init__.py CHANGED
@@ -1,11 +1,11 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the BSD-style license found in the
5
- # LICENSE file in the root directory of this source tree.
6
-
7
- """Energy & Memory RAM Optimization environment server components."""
8
-
9
- from .he_demo_environment import EnergyOptimizationEnvironment
10
-
11
- __all__ = ["EnergyOptimizationEnvironment"]
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Energy & Memory RAM Optimization environment server components."""
8
+
9
+ from .he_demo_environment import EnergyOptimizationEnvironment
10
+
11
+ __all__ = ["EnergyOptimizationEnvironment"]
server/app.py CHANGED
@@ -1,80 +1,80 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the BSD-style license found in the
5
- # LICENSE file in the root directory of this source tree.
6
-
7
- """
8
- FastAPI application for the He Demo Environment.
9
-
10
- This module creates an HTTP server that exposes the HeDemoEnvironment
11
- over HTTP and WebSocket endpoints, compatible with EnvClient.
12
-
13
- Endpoints:
14
- - POST /reset: Reset the environment
15
- - POST /step: Execute an action
16
- - GET /state: Get current environment state
17
- - GET /schema: Get action/observation schemas
18
- - WS /ws: WebSocket endpoint for persistent sessions
19
-
20
- Usage:
21
- # Development (with auto-reload):
22
- uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
23
-
24
- # Production:
25
- uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
26
-
27
- # Or run directly:
28
- python -m server.app
29
- """
30
-
31
- try:
32
- from openenv.core.env_server.http_server import create_app
33
- except Exception as e: # pragma: no cover
34
- raise ImportError(
35
- "openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
36
- ) from e
37
-
38
- from he_demo.models import EnergyOptimizationAction, EnergyOptimizationObservation
39
- from he_demo.server.he_demo_environment import EnergyOptimizationEnvironment
40
-
41
-
42
- # Create the app with web interface and README integration
43
- app = create_app(
44
- EnergyOptimizationEnvironment,
45
- EnergyOptimizationAction,
46
- EnergyOptimizationObservation,
47
- env_name="energy_optimization",
48
- max_concurrent_envs=1, # increase this number to allow more concurrent WebSocket sessions
49
- )
50
-
51
-
52
- def main(host: str = "0.0.0.0", port: int = 8000):
53
- """
54
- Entry point for direct execution via uv run or python -m.
55
-
56
- This function enables running the server without Docker:
57
- uv run --project . server
58
- uv run --project . server --port 8001
59
- python -m he_demo.server.app
60
-
61
- Args:
62
- host: Host address to bind to (default: "0.0.0.0")
63
- port: Port number to listen on (default: 8000)
64
-
65
- For production deployments, consider using uvicorn directly with
66
- multiple workers:
67
- uvicorn he_demo.server.app:app --workers 4
68
- """
69
- import uvicorn
70
-
71
- uvicorn.run(app, host=host, port=port)
72
-
73
-
74
- if __name__ == "__main__":
75
- import argparse
76
-
77
- parser = argparse.ArgumentParser()
78
- parser.add_argument("--port", type=int, default=8000)
79
- args = parser.parse_args()
80
- main(port=args.port)
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ FastAPI application for the He Demo Environment.
9
+
10
+ This module creates an HTTP server that exposes the HeDemoEnvironment
11
+ over HTTP and WebSocket endpoints, compatible with EnvClient.
12
+
13
+ Endpoints:
14
+ - POST /reset: Reset the environment
15
+ - POST /step: Execute an action
16
+ - GET /state: Get current environment state
17
+ - GET /schema: Get action/observation schemas
18
+ - WS /ws: WebSocket endpoint for persistent sessions
19
+
20
+ Usage:
21
+ # Development (with auto-reload):
22
+ uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
23
+
24
+ # Production:
25
+ uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
26
+
27
+ # Or run directly:
28
+ python -m server.app
29
+ """
30
+
31
+ try:
32
+ from openenv.core.env_server.http_server import create_app
33
+ except Exception as e: # pragma: no cover
34
+ raise ImportError(
35
+ "openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
36
+ ) from e
37
+
38
+ from he_demo.models import EnergyOptimizationAction, EnergyOptimizationObservation
39
+ from he_demo.server.he_demo_environment import EnergyOptimizationEnvironment
40
+
41
+
42
+ # Create the app with web interface and README integration
43
+ app = create_app(
44
+ EnergyOptimizationEnvironment,
45
+ EnergyOptimizationAction,
46
+ EnergyOptimizationObservation,
47
+ env_name="energy_optimization",
48
+ max_concurrent_envs=1, # increase this number to allow more concurrent WebSocket sessions
49
+ )
50
+
51
+
52
+ def main(host: str = "0.0.0.0", port: int = 8000):
53
+ """
54
+ Entry point for direct execution via uv run or python -m.
55
+
56
+ This function enables running the server without Docker:
57
+ uv run --project . server
58
+ uv run --project . server --port 8001
59
+ python -m he_demo.server.app
60
+
61
+ Args:
62
+ host: Host address to bind to (default: "0.0.0.0")
63
+ port: Port number to listen on (default: 8000)
64
+
65
+ For production deployments, consider using uvicorn directly with
66
+ multiple workers:
67
+ uvicorn he_demo.server.app:app --workers 4
68
+ """
69
+ import uvicorn
70
+
71
+ uvicorn.run(app, host=host, port=port)
72
+
73
+
74
+ if __name__ == "__main__":
75
+ import argparse
76
+
77
+ parser = argparse.ArgumentParser()
78
+ parser.add_argument("--port", type=int, default=8000)
79
+ args = parser.parse_args()
80
+ main(port=args.port)
server/he_demo_environment.py CHANGED
@@ -1,318 +1,318 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the BSD-style license found in the
5
- # LICENSE file in the root directory of this source tree.
6
-
7
- """
8
- Energy & Memory RAM Optimization Environment Implementation.
9
-
10
- An RL environment for training AI agents to optimize system resources including
11
- RAM usage and energy consumption through various optimization strategies.
12
- """
13
-
14
- import random
15
- from typing import List
16
- from uuid import uuid4
17
-
18
- from openenv.core.env_server.interfaces import Environment
19
- from openenv.core.env_server.types import State
20
-
21
- from he_demo.models import EnergyOptimizationAction, EnergyOptimizationObservation, Task, TaskSummary
22
-
23
-
24
- class EnergyOptimizationEnvironment(Environment):
25
- """
26
- Energy & Memory RAM Optimization Environment.
27
-
28
- This environment simulates a computer system where an AI agent must optimize
29
- RAM usage and energy consumption. The agent faces tasks of increasing difficulty
30
- and receives rewards based on optimization efficiency.
31
-
32
- Tasks include:
33
- - Basic RAM reduction
34
- - Energy optimization
35
- - Resource balancing
36
- - Advanced multi-objective optimization
37
-
38
- The environment includes automated graders that verify task completion and
39
- provide detailed feedback on optimization performance.
40
- """
41
-
42
- SUPPORTS_CONCURRENT_SESSIONS: bool = True
43
-
44
- def __init__(self):
45
- """Initialize the energy optimization environment."""
46
- self._state = State(episode_id=str(uuid4()), step_count=0)
47
- self._reset_count = 0
48
-
49
- # System state
50
- self.ram_usage = 80.0 # Starting RAM usage %
51
- self.energy_consumption = 8.0 # Starting energy consumption kWh
52
- self.system_load = 0.7 # Starting system load
53
-
54
- # Task management
55
- self.tasks = self._create_tasks()
56
- self.current_task_index = 0
57
- self.tasks_completed = []
58
-
59
- # Performance tracking
60
- self.baseline_ram = self.ram_usage
61
- self.baseline_energy = self.energy_consumption
62
-
63
- def _create_tasks(self) -> List[Task]:
64
- """Create tasks with increasing difficulty."""
65
- return [
66
- Task(
67
- name="basic_ram_reduction",
68
- description="Reduce RAM usage below 70%",
69
- difficulty=1,
70
- ram_target=70.0,
71
- energy_target=7.5, # Slightly below initial 8.0
72
- max_steps=10
73
- ),
74
- Task(
75
- name="energy_optimization",
76
- description="Reduce energy consumption below 6 kWh while maintaining RAM below 75%",
77
- difficulty=2,
78
- ram_target=75.0,
79
- energy_target=6.0,
80
- max_steps=15
81
- ),
82
- Task(
83
- name="balanced_optimization",
84
- description="Balance RAM below 60% and energy below 5 kWh",
85
- difficulty=3,
86
- ram_target=60.0,
87
- energy_target=5.0,
88
- max_steps=20
89
- ),
90
- Task(
91
- name="advanced_efficiency",
92
- description="Achieve RAM below 50% and energy below 4 kWh",
93
- difficulty=4,
94
- ram_target=50.0,
95
- energy_target=4.0,
96
- max_steps=25
97
- ),
98
- Task(
99
- name="expert_optimization",
100
- description="Master level: RAM below 40% and energy below 3 kWh",
101
- difficulty=5,
102
- ram_target=40.0,
103
- energy_target=3.0,
104
- max_steps=30
105
- )
106
- ]
107
-
108
- def _get_current_task(self) -> Task:
109
- """Get the current task, cycling through available tasks."""
110
- if self.current_task_index >= len(self.tasks):
111
- self.current_task_index = 0
112
- return self.tasks[self.current_task_index]
113
-
114
- def _calculate_reward(self, action: EnergyOptimizationAction) -> float:
115
- """Calculate reward based on action effectiveness and task progress."""
116
- base_reward = 0.0
117
-
118
- # Action effectiveness rewards
119
- if action.action_type == "reduce_ram":
120
- ram_reduction = min(5.0 * action.intensity, self.ram_usage * 0.1)
121
- self.ram_usage = max(0.0, self.ram_usage - ram_reduction)
122
- base_reward += ram_reduction * 0.5 # Reward for RAM reduction
123
-
124
- # Penalty for excessive RAM reduction (system instability)
125
- if action.intensity > 0.8:
126
- base_reward -= 2.0
127
-
128
- elif action.action_type == "optimize_energy":
129
- energy_reduction = min(1.0 * action.intensity, self.energy_consumption * 0.15)
130
- self.energy_consumption = max(0.0, self.energy_consumption - energy_reduction)
131
- base_reward += energy_reduction * 2.0 # Higher reward for energy savings
132
-
133
- # Penalty for aggressive energy optimization (performance impact)
134
- if action.intensity > 0.9:
135
- self.system_load = min(1.0, self.system_load + 0.1)
136
- base_reward -= 1.0
137
-
138
- elif action.action_type == "balance_resources":
139
- # Balanced approach: moderate improvements to both
140
- ram_reduction = min(2.0 * action.intensity, self.ram_usage * 0.05)
141
- energy_reduction = min(0.5 * action.intensity, self.energy_consumption * 0.1)
142
-
143
- self.ram_usage = max(0.0, self.ram_usage - ram_reduction)
144
- self.energy_consumption = max(0.0, self.energy_consumption - energy_reduction)
145
-
146
- base_reward += (ram_reduction * 0.3 + energy_reduction * 1.5)
147
-
148
- elif action.action_type == "monitor_system":
149
- # Monitoring action: small reward for gathering information
150
- base_reward += 0.1
151
- # Slight natural system load reduction from monitoring
152
- self.system_load = max(0.0, self.system_load - 0.02)
153
-
154
- # Natural system changes (simulate real system behavior)
155
- self._apply_system_dynamics()
156
-
157
- # Task completion bonus
158
- current_task = self._get_current_task()
159
- if not current_task.completed and current_task.check_completion(
160
- self.ram_usage, self.energy_consumption, self._state.step_count
161
- ):
162
- current_task.completed = True
163
- self.tasks_completed.append(current_task.name)
164
- base_reward += current_task.difficulty * 10.0 # Bonus for task completion
165
- self.current_task_index += 1 # Move to next task
166
-
167
- # Efficiency bonus
168
- efficiency_improvement = (
169
- (self.baseline_ram - self.ram_usage) / self.baseline_ram +
170
- (self.baseline_energy - self.energy_consumption) / self.baseline_energy
171
- ) * 0.5
172
- base_reward += efficiency_improvement
173
-
174
- return base_reward
175
-
176
- def _apply_system_dynamics(self):
177
- """Apply natural system dynamics and external factors."""
178
- # Random external load changes
179
- if random.random() < 0.1: # 10% chance each step
180
- load_change = random.uniform(-0.05, 0.05)
181
- self.system_load = max(0.0, min(1.0, self.system_load + load_change))
182
-
183
- # Load affects RAM and energy
184
- ram_impact = load_change * 10.0
185
- energy_impact = load_change * 0.5
186
-
187
- self.ram_usage = max(0.0, min(100.0, self.ram_usage + ram_impact))
188
- self.energy_consumption = max(0.0, self.energy_consumption + energy_impact)
189
-
190
- def _calculate_task_progress(self) -> float:
191
- """Calculate progress towards current task completion."""
192
- current_task = self._get_current_task()
193
- if current_task.completed:
194
- return 1.0
195
-
196
- # Calculate RAM progress (0-1 scale)
197
- ram_progress = max(0.0, min(1.0, (100.0 - self.ram_usage) / (100.0 - current_task.ram_target)))
198
-
199
- # Calculate energy progress (0-1 scale)
200
- energy_range = 10.0 - current_task.energy_target # Total possible energy reduction
201
- if energy_range > 0:
202
- energy_progress = max(0.0, min(1.0, (8.0 - self.energy_consumption) / energy_range))
203
- else:
204
- energy_progress = 1.0 if self.energy_consumption <= current_task.energy_target else 0.0
205
-
206
- return min(1.0, (ram_progress + energy_progress) / 2.0)
207
-
208
- def _calculate_efficiency_score(self) -> float:
209
- """Calculate overall efficiency score."""
210
- ram_efficiency = max(0.0, (100.0 - self.ram_usage) / 100.0)
211
- energy_efficiency = max(0.0, (10.0 - self.energy_consumption) / 10.0)
212
- return (ram_efficiency + energy_efficiency) / 2.0
213
-
214
- def _task_to_summary(self, task: Task, steps_taken: int) -> TaskSummary:
215
- """Convert a Task to a TaskSummary for observations."""
216
- remaining_steps = max(0, task.max_steps - steps_taken) if not task.completed else 0
217
- progress = self._calculate_task_progress() if not task.completed else 1.0
218
-
219
- return TaskSummary(
220
- name=task.name,
221
- description=task.description,
222
- difficulty=task.difficulty,
223
- ram_target=task.ram_target,
224
- energy_target=task.energy_target,
225
- max_steps=task.max_steps,
226
- completed=task.completed,
227
- remaining_steps=remaining_steps,
228
- progress=progress
229
- )
230
-
231
- def reset(self) -> EnergyOptimizationObservation:
232
- """
233
- Reset the environment to initial state.
234
-
235
- Returns:
236
- EnergyOptimizationObservation with initial system state
237
- """
238
- self._state = State(episode_id=str(uuid4()), step_count=0)
239
- self._reset_count += 1
240
-
241
- # Reset system state
242
- self.ram_usage = 80.0
243
- self.energy_consumption = 8.0
244
- self.system_load = 0.7
245
-
246
- # Reset tasks
247
- for task in self.tasks:
248
- task.completed = False
249
- self.current_task_index = 0
250
- self.tasks_completed = []
251
-
252
- # Reset baselines
253
- self.baseline_ram = self.ram_usage
254
- self.baseline_energy = self.energy_consumption
255
-
256
- current_task = self._get_current_task()
257
-
258
- return EnergyOptimizationObservation(
259
- ram_usage=self.ram_usage,
260
- energy_consumption=self.energy_consumption,
261
- system_load=self.system_load,
262
- current_task=self._task_to_summary(current_task, 0) if current_task else None,
263
- tasks_completed=self.tasks_completed.copy(),
264
- steps_taken=0,
265
- task_progress=self._calculate_task_progress(),
266
- efficiency_score=self._calculate_efficiency_score(),
267
- done=False,
268
- reward=0.0,
269
- )
270
-
271
- def step(self, action: EnergyOptimizationAction) -> EnergyOptimizationObservation:
272
- """
273
- Execute an optimization action in the environment.
274
-
275
- Args:
276
- action: EnergyOptimizationAction containing the optimization strategy
277
-
278
- Returns:
279
- EnergyOptimizationObservation with updated system state and reward
280
- """
281
- self._state.step_count += 1
282
-
283
- # Calculate reward for the action
284
- reward = self._calculate_reward(action)
285
-
286
- # Check if episode should end
287
- done = self._state.step_count >= 100 or self.current_task_index >= len(self.tasks)
288
-
289
- current_task = self._get_current_task()
290
-
291
- return EnergyOptimizationObservation(
292
- ram_usage=self.ram_usage,
293
- energy_consumption=self.energy_consumption,
294
- system_load=self.system_load,
295
- current_task=self._task_to_summary(current_task, self._state.step_count) if current_task else None,
296
- tasks_completed=self.tasks_completed.copy(),
297
- steps_taken=self._state.step_count,
298
- task_progress=self._calculate_task_progress(),
299
- efficiency_score=self._calculate_efficiency_score(),
300
- done=done,
301
- reward=reward,
302
- metadata={
303
- "action_taken": action.action_type,
304
- "action_intensity": action.intensity,
305
- "episode_step": self._state.step_count,
306
- "current_task_name": current_task.name if current_task else None
307
- },
308
- )
309
-
310
- @property
311
- def state(self) -> State:
312
- """
313
- Get the current environment state.
314
-
315
- Returns:
316
- Current State with episode_id and step_count
317
- """
318
- return self._state
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Energy & Memory RAM Optimization Environment Implementation.
9
+
10
+ An RL environment for training AI agents to optimize system resources including
11
+ RAM usage and energy consumption through various optimization strategies.
12
+ """
13
+
14
+ import random
15
+ from typing import List
16
+ from uuid import uuid4
17
+
18
+ from openenv.core.env_server.interfaces import Environment
19
+ from openenv.core.env_server.types import State
20
+
21
+ from he_demo.models import EnergyOptimizationAction, EnergyOptimizationObservation, Task, TaskSummary
22
+
23
+
24
+ class EnergyOptimizationEnvironment(Environment):
25
+ """
26
+ Energy & Memory RAM Optimization Environment.
27
+
28
+ This environment simulates a computer system where an AI agent must optimize
29
+ RAM usage and energy consumption. The agent faces tasks of increasing difficulty
30
+ and receives rewards based on optimization efficiency.
31
+
32
+ Tasks include:
33
+ - Basic RAM reduction
34
+ - Energy optimization
35
+ - Resource balancing
36
+ - Advanced multi-objective optimization
37
+
38
+ The environment includes automated graders that verify task completion and
39
+ provide detailed feedback on optimization performance.
40
+ """
41
+
42
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
43
+
44
+ def __init__(self):
45
+ """Initialize the energy optimization environment."""
46
+ self._state = State(episode_id=str(uuid4()), step_count=0)
47
+ self._reset_count = 0
48
+
49
+ # System state
50
+ self.ram_usage = 80.0 # Starting RAM usage %
51
+ self.energy_consumption = 8.0 # Starting energy consumption kWh
52
+ self.system_load = 0.7 # Starting system load
53
+
54
+ # Task management
55
+ self.tasks = self._create_tasks()
56
+ self.current_task_index = 0
57
+ self.tasks_completed = []
58
+
59
+ # Performance tracking
60
+ self.baseline_ram = self.ram_usage
61
+ self.baseline_energy = self.energy_consumption
62
+
63
+ def _create_tasks(self) -> List[Task]:
64
+ """Create tasks with increasing difficulty."""
65
+ return [
66
+ Task(
67
+ name="basic_ram_reduction",
68
+ description="Reduce RAM usage below 70%",
69
+ difficulty=1,
70
+ ram_target=70.0,
71
+ energy_target=7.5, # Slightly below initial 8.0
72
+ max_steps=10
73
+ ),
74
+ Task(
75
+ name="energy_optimization",
76
+ description="Reduce energy consumption below 6 kWh while maintaining RAM below 75%",
77
+ difficulty=2,
78
+ ram_target=75.0,
79
+ energy_target=6.0,
80
+ max_steps=15
81
+ ),
82
+ Task(
83
+ name="balanced_optimization",
84
+ description="Balance RAM below 60% and energy below 5 kWh",
85
+ difficulty=3,
86
+ ram_target=60.0,
87
+ energy_target=5.0,
88
+ max_steps=20
89
+ ),
90
+ Task(
91
+ name="advanced_efficiency",
92
+ description="Achieve RAM below 50% and energy below 4 kWh",
93
+ difficulty=4,
94
+ ram_target=50.0,
95
+ energy_target=4.0,
96
+ max_steps=25
97
+ ),
98
+ Task(
99
+ name="expert_optimization",
100
+ description="Master level: RAM below 40% and energy below 3 kWh",
101
+ difficulty=5,
102
+ ram_target=40.0,
103
+ energy_target=3.0,
104
+ max_steps=30
105
+ )
106
+ ]
107
+
108
+ def _get_current_task(self) -> Task:
109
+ """Get the current task, cycling through available tasks."""
110
+ if self.current_task_index >= len(self.tasks):
111
+ self.current_task_index = 0
112
+ return self.tasks[self.current_task_index]
113
+
114
+ def _calculate_reward(self, action: EnergyOptimizationAction) -> float:
115
+ """Calculate reward based on action effectiveness and task progress."""
116
+ base_reward = 0.0
117
+
118
+ # Action effectiveness rewards
119
+ if action.action_type == "reduce_ram":
120
+ ram_reduction = min(5.0 * action.intensity, self.ram_usage * 0.1)
121
+ self.ram_usage = max(0.0, self.ram_usage - ram_reduction)
122
+ base_reward += ram_reduction * 0.5 # Reward for RAM reduction
123
+
124
+ # Penalty for excessive RAM reduction (system instability)
125
+ if action.intensity > 0.8:
126
+ base_reward -= 2.0
127
+
128
+ elif action.action_type == "optimize_energy":
129
+ energy_reduction = min(1.0 * action.intensity, self.energy_consumption * 0.15)
130
+ self.energy_consumption = max(0.0, self.energy_consumption - energy_reduction)
131
+ base_reward += energy_reduction * 2.0 # Higher reward for energy savings
132
+
133
+ # Penalty for aggressive energy optimization (performance impact)
134
+ if action.intensity > 0.9:
135
+ self.system_load = min(1.0, self.system_load + 0.1)
136
+ base_reward -= 1.0
137
+
138
+ elif action.action_type == "balance_resources":
139
+ # Balanced approach: moderate improvements to both
140
+ ram_reduction = min(2.0 * action.intensity, self.ram_usage * 0.05)
141
+ energy_reduction = min(0.5 * action.intensity, self.energy_consumption * 0.1)
142
+
143
+ self.ram_usage = max(0.0, self.ram_usage - ram_reduction)
144
+ self.energy_consumption = max(0.0, self.energy_consumption - energy_reduction)
145
+
146
+ base_reward += (ram_reduction * 0.3 + energy_reduction * 1.5)
147
+
148
+ elif action.action_type == "monitor_system":
149
+ # Monitoring action: small reward for gathering information
150
+ base_reward += 0.1
151
+ # Slight natural system load reduction from monitoring
152
+ self.system_load = max(0.0, self.system_load - 0.02)
153
+
154
+ # Natural system changes (simulate real system behavior)
155
+ self._apply_system_dynamics()
156
+
157
+ # Task completion bonus
158
+ current_task = self._get_current_task()
159
+ if not current_task.completed and current_task.check_completion(
160
+ self.ram_usage, self.energy_consumption, self._state.step_count
161
+ ):
162
+ current_task.completed = True
163
+ self.tasks_completed.append(current_task.name)
164
+ base_reward += current_task.difficulty * 10.0 # Bonus for task completion
165
+ self.current_task_index += 1 # Move to next task
166
+
167
+ # Efficiency bonus
168
+ efficiency_improvement = (
169
+ (self.baseline_ram - self.ram_usage) / self.baseline_ram +
170
+ (self.baseline_energy - self.energy_consumption) / self.baseline_energy
171
+ ) * 0.5
172
+ base_reward += efficiency_improvement
173
+
174
+ return base_reward
175
+
176
+ def _apply_system_dynamics(self):
177
+ """Apply natural system dynamics and external factors."""
178
+ # Random external load changes
179
+ if random.random() < 0.1: # 10% chance each step
180
+ load_change = random.uniform(-0.05, 0.05)
181
+ self.system_load = max(0.0, min(1.0, self.system_load + load_change))
182
+
183
+ # Load affects RAM and energy
184
+ ram_impact = load_change * 10.0
185
+ energy_impact = load_change * 0.5
186
+
187
+ self.ram_usage = max(0.0, min(100.0, self.ram_usage + ram_impact))
188
+ self.energy_consumption = max(0.0, self.energy_consumption + energy_impact)
189
+
190
+ def _calculate_task_progress(self) -> float:
191
+ """Calculate progress towards current task completion."""
192
+ current_task = self._get_current_task()
193
+ if current_task.completed:
194
+ return 1.0
195
+
196
+ # Calculate RAM progress (0-1 scale)
197
+ ram_progress = max(0.0, min(1.0, (100.0 - self.ram_usage) / (100.0 - current_task.ram_target)))
198
+
199
+ # Calculate energy progress (0-1 scale)
200
+ energy_range = 10.0 - current_task.energy_target # Total possible energy reduction
201
+ if energy_range > 0:
202
+ energy_progress = max(0.0, min(1.0, (8.0 - self.energy_consumption) / energy_range))
203
+ else:
204
+ energy_progress = 1.0 if self.energy_consumption <= current_task.energy_target else 0.0
205
+
206
+ return min(1.0, (ram_progress + energy_progress) / 2.0)
207
+
208
+ def _calculate_efficiency_score(self) -> float:
209
+ """Calculate overall efficiency score."""
210
+ ram_efficiency = max(0.0, (100.0 - self.ram_usage) / 100.0)
211
+ energy_efficiency = max(0.0, (10.0 - self.energy_consumption) / 10.0)
212
+ return (ram_efficiency + energy_efficiency) / 2.0
213
+
214
+ def _task_to_summary(self, task: Task, steps_taken: int) -> TaskSummary:
215
+ """Convert a Task to a TaskSummary for observations."""
216
+ remaining_steps = max(0, task.max_steps - steps_taken) if not task.completed else 0
217
+ progress = self._calculate_task_progress() if not task.completed else 1.0
218
+
219
+ return TaskSummary(
220
+ name=task.name,
221
+ description=task.description,
222
+ difficulty=task.difficulty,
223
+ ram_target=task.ram_target,
224
+ energy_target=task.energy_target,
225
+ max_steps=task.max_steps,
226
+ completed=task.completed,
227
+ remaining_steps=remaining_steps,
228
+ progress=progress
229
+ )
230
+
231
+ def reset(self) -> EnergyOptimizationObservation:
232
+ """
233
+ Reset the environment to initial state.
234
+
235
+ Returns:
236
+ EnergyOptimizationObservation with initial system state
237
+ """
238
+ self._state = State(episode_id=str(uuid4()), step_count=0)
239
+ self._reset_count += 1
240
+
241
+ # Reset system state
242
+ self.ram_usage = 80.0
243
+ self.energy_consumption = 8.0
244
+ self.system_load = 0.7
245
+
246
+ # Reset tasks
247
+ for task in self.tasks:
248
+ task.completed = False
249
+ self.current_task_index = 0
250
+ self.tasks_completed = []
251
+
252
+ # Reset baselines
253
+ self.baseline_ram = self.ram_usage
254
+ self.baseline_energy = self.energy_consumption
255
+
256
+ current_task = self._get_current_task()
257
+
258
+ return EnergyOptimizationObservation(
259
+ ram_usage=self.ram_usage,
260
+ energy_consumption=self.energy_consumption,
261
+ system_load=self.system_load,
262
+ current_task=self._task_to_summary(current_task, 0) if current_task else None,
263
+ tasks_completed=self.tasks_completed.copy(),
264
+ steps_taken=0,
265
+ task_progress=self._calculate_task_progress(),
266
+ efficiency_score=self._calculate_efficiency_score(),
267
+ done=False,
268
+ reward=0.0,
269
+ )
270
+
271
+ def step(self, action: EnergyOptimizationAction) -> EnergyOptimizationObservation:
272
+ """
273
+ Execute an optimization action in the environment.
274
+
275
+ Args:
276
+ action: EnergyOptimizationAction containing the optimization strategy
277
+
278
+ Returns:
279
+ EnergyOptimizationObservation with updated system state and reward
280
+ """
281
+ self._state.step_count += 1
282
+
283
+ # Calculate reward for the action
284
+ reward = self._calculate_reward(action)
285
+
286
+ # Check if episode should end
287
+ done = self._state.step_count >= 100 or self.current_task_index >= len(self.tasks)
288
+
289
+ current_task = self._get_current_task()
290
+
291
+ return EnergyOptimizationObservation(
292
+ ram_usage=self.ram_usage,
293
+ energy_consumption=self.energy_consumption,
294
+ system_load=self.system_load,
295
+ current_task=self._task_to_summary(current_task, self._state.step_count) if current_task else None,
296
+ tasks_completed=self.tasks_completed.copy(),
297
+ steps_taken=self._state.step_count,
298
+ task_progress=self._calculate_task_progress(),
299
+ efficiency_score=self._calculate_efficiency_score(),
300
+ done=done,
301
+ reward=reward,
302
+ metadata={
303
+ "action_taken": action.action_type,
304
+ "action_intensity": action.intensity,
305
+ "episode_step": self._state.step_count,
306
+ "current_task_name": current_task.name if current_task else None
307
+ },
308
+ )
309
+
310
+ @property
311
+ def state(self) -> State:
312
+ """
313
+ Get the current environment state.
314
+
315
+ Returns:
316
+ Current State with episode_id and step_count
317
+ """
318
+ return self._state
server/requirements.txt CHANGED
@@ -1,6 +1,6 @@
1
- openenv[core]>=0.2.0
2
- fastapi>=0.115.0
3
- uvicorn>=0.24.0
4
-
5
-
6
-
 
1
+ openenv[core]>=0.2.0
2
+ fastapi>=0.115.0
3
+ uvicorn>=0.24.0
4
+
5
+
6
+
uv.lock CHANGED
The diff for this file is too large to render. See raw diff