# FastAPI server exposing the Data Centre OpenEnv environment (EnvClient-compatible). from fastapi.responses import HTMLResponse from openenv.core.env_server.http_server import create_app from .environment import DCEnvironment from .models import DCAction, DCObservation app = create_app( DCEnvironment, DCAction, DCObservation, env_name="datacenter_env", max_concurrent_envs=1, ) @app.get("/", response_class=HTMLResponse) async def root(): return """ RL Environment for Datacenter Cooling and Operations

The Problem

A shared AI compute cluster has a hard 900 kW power budget. Two research teams compete every scheduling window. Team A is honest — true priority, accurate deadlines, genuine carbon preferences. Team B games the system: inflating priority by 1–2 levels, always claiming urgent deadlines, and hiding carbon flexibility 60% of the time.

A naive scheduler trusting stated claims over-allocates to Team B, crowds out legitimate work, and misses carbon deferral opportunities. The goal: train an LLM scheduler that learns — from environment reward alone — to detect and discount systematic misrepresentation.

This environment bridges Round 1 (physics-based datacenter cooling, evaluated zero-shot) with the Finale (operational scheduling layer built on the same physics engine, trained end-to-end via GRPO).

Architecture at a Glance

8 negotiation windows / episode 18 physical steps / window 900 kW hard power budget Qwen2.5-3B · GRPO-trained scheduler SB3 PPO cooling controller (pre-trained) Information asymmetry · Team B gaming

🧠 LLM Scheduler (GRPO)

Qwen2.5-3B-Instruct, 4-bit, LoRA r=16. Acts once per window. Reads stated job metadata, team history, oversight flags, power headroom, and carbon forecast. Issues ACCEPT / REJECT / DEFER per job request.

🤖 PPO Cooling Controller

SB3 MLP policy, pre-trained across all three cooling scenarios including mid-episode chiller failure. Runs 18 steps per window, controlling fan speeds (0–100%) and chiller setpoint (6–15 °C). Invisible to the LLM scheduler.

🔍 Oversight Monitor

4 rule-based detectors run after every window using ground-truth job metadata (hidden from the scheduler). Priority inflation (conf. 0.62–0.97), deadline compression, carbon gaming, and pattern escalation (≥3 windows). Flags injected into the next observation.

🏭 Physics Engine

Thermal mass model per zone: ΔT = (heat_in − heat_out) / thermal_mass. Chiller COP degrades with outside temperature. Optional chiller fault at window 5. Carbon grid schedule varies: low→high→low across the 8-window episode.

Reward Function

R_window = 0.50 × throughput
         + 0.35 × thermal_penalty  (−1.0 if 900 kW violated, else 0)
         + 0.15 × carbon_efficiency

Range per window: [−0.35, +0.65]  ·  Rule-based baseline: +0.28

Training Results

RunHardwareIterationsPeak RewardParse Fails
Colab notebookT4 GPU30 +0.1937 0% by iter 5
HF SpaceL40S GPU50 +0.2406 0% from iter 25, final 26 iters
Rule-based baseline +0.28 (target)

OpenEnv HTTP API

POST /reset    ← start a new episode → returns WindowState observation
POST /step     ← submit admission decisions → returns (WindowState, reward, done, info)
GET  /state    ← current environment state (no side effects)
GET  /health   ← liveness probe

Quick Start

from openenv import EnvClient
from server.agents.baseline_scheduler import priority_weighted_threshold

client = EnvClient("https://mephisto2412-datacenter-env.hf.space")
obs    = client.reset(seed=42)

for window in range(8):
    decisions = priority_weighted_threshold(obs)   # or your trained agent
    obs, reward, done, info = client.step(decisions)
    print(f"Window {window}  reward={reward:+.4f}  flags={len(obs.oversight_flags)}")
    if done:
        break

Links

""" def main(host: str = "0.0.0.0", port: int = 8000) -> None: """Run the server locally: python -m datacenter_env.server.app or uv run server.""" import uvicorn uvicorn.run(app, host=host, port=port) if __name__ == "__main__": import argparse parser = argparse.ArgumentParser() parser.add_argument("--port", type=int, default=8000) args = parser.parse_args() # openenv validate checks for the substring "main()" in this module main(port=args.port) # entry: main()