Spaces:

nikita200
/

adaptive-backend-traffic-controller

Sleeping

App Files Files Community

nikita200 commited on 18 days ago

Commit

4c8efe2

0 Parent(s):

first commit

Browse files

Files changed (16) hide show

.gitignore +1 -0
Dockerfile +19 -0
README.md +202 -0
__pycache__/environment.cpython-310.pyc +0 -0
__pycache__/graders.cpython-310.pyc +0 -0
__pycache__/models.cpython-310.pyc +0 -0
__pycache__/simulator.cpython-310.pyc +0 -0
__pycache__/tasks.cpython-310.pyc +0 -0
environment.py +219 -0
graders.py +125 -0
inference.py +198 -0
models.py +77 -0
openenv.yaml +118 -0
requirements.txt +7 -0
simulator.py +78 -0
tasks.py +94 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .venv

Dockerfile ADDED Viewed

	@@ -0,0 +1,19 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install dependencies first (layer cache)
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# HuggingFace Spaces requires port 7860
+EXPOSE 7860
+# Healthcheck so orchestrators know when the app is ready
+HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:7860/health')"
+CMD ["uvicorn", "environment:app", "--host", "0.0.0.0", "--port", "7860"]

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+title: Adaptive Traffic Controller
+emoji: 🚦
+colorFrom: blue
+colorTo: red
+sdk: docker
+app_port: 7860
+tags:
+  - openenv
+  - reinforcement-learning
+  - traffic-control
+  - llm-agent
+license: mit
+---
+# Adaptive Backend Traffic Controller
+An **OpenEnv**-compatible reinforcement learning environment where an LLM agent learns to prevent backend server crashes by intelligently throttling incoming traffic in real-time.
+Built for the **Scaler × Meta PyTorch Hackathon**.
+---
+## Overview
+The environment simulates a backend server receiving variable traffic. The agent observes system metrics every time step and chooses a throttling action to keep the server healthy. The server's physics are modelled realistically: CPU and memory track load linearly, latency spikes superlinearly, and sustained overload causes crashes.
+---
+## Observation Space
+| Field | Type | Range | Description |
+|-------|------|--------|-------------|
+| `cpu_usage` | float | 0.0 – 1.0 | CPU utilization fraction |
+| `memory_usage` | float | 0.0 – 1.0 | Memory utilization fraction |
+| `request_rate` | float | ≥ 0 | Incoming requests per second |
+| `queue_length` | int | 0 – 500 | Pending requests in backlog |
+| `avg_latency` | float | ≥ 0 | Average response latency (ms) |
+| `step` | int | ≥ 0 | Current episode step |
+| `crashed` | bool | — | Whether the server crashed this step |
+---
+## Action Space
+| Action | Accept Rate | Description |
+|--------|------------|-------------|
+| `allow_all` | 100% | Safe load — accept all requests |
+| `throttle_70` | 70% | Moderate load — drop 30% |
+| `throttle_40` | 40% | High load — drop 60% |
+| `drop_aggressive` | 20% | Imminent crash — drop 80% |
+---
+## Tasks
+### Task Easy — Single Spike
+- Traffic: 40 req/s baseline → 160 req/s spike at step 10 for 5 steps → back to 40
+- Episode length: 30 steps
+- Scoring:
+  - `1.0` — no crash AND avg latency < 300 ms
+  - `0.5` — no crash, but avg latency ≥ 300 ms
+  - `0.0` — any crash
+### Task Medium — Multiple Spikes
+- Traffic: 50 req/s baseline with 3 spikes of 150 req/s at steps 5, 15, 25 (3 steps each)
+- Episode length: 40 steps
+- Scoring: `(steps_without_crash / total_steps) × latency_factor`
+  - `latency_factor` = 1.0 at ≤ 200 ms, 0.5 at ≥ 600 ms, linear between
+### Task Hard — Sustained Overload
+- Traffic: ramps 60 → 200 req/s over 20 steps, stays at 200 for 20 steps, drops to 80
+- Episode length: 50 steps
+- Scoring: `throughput_ratio × 0.7 + queue_factor × 0.3`
+  - `throughput_ratio` = total allowed / total incoming
+  - `queue_factor` = fraction of steps with queue < 100
+---
+## API Endpoints
+| Method | Path | Description |
+|--------|------|-------------|
+| `POST` | `/reset` | Reset environment, returns initial state |
+| `POST` | `/step` | Execute action, returns state/reward/done/info |
+| `GET` | `/state` | Current server state |
+| `GET` | `/tasks` | List all 3 tasks |
+| `GET` | `/openenv.yaml` | OpenEnv specification |
+| `GET` | `/health` | Liveness probe |
+---
+## Setup
+### Local (Python)
+```bash
+pip install -r requirements.txt
+# Start the environment server
+uvicorn environment:app --host 0.0.0.0 --port 7860
+# In another terminal, run a quick smoke test
+curl -s localhost:7860/health
+curl -s -X POST localhost:7860/reset -H "Content-Type: application/json" \
+     -d '{"task_id": "task_easy"}' | python -m json.tool
+curl -s -X POST localhost:7860/step -H "Content-Type: application/json" \
+     -d '{"action": "throttle_70"}' | python -m json.tool
+curl -s localhost:7860/tasks | python -m json.tool
+curl -s localhost:7860/openenv.yaml
+```
+### Docker
+```bash
+docker build -t traffic-controller .
+docker run -p 7860:7860 traffic-controller
+# Same smoke tests work on localhost:7860
+```
+---
+## Running Inference
+Set the three required environment variables then run `inference.py`:
+```bash
+export API_BASE_URL="https://api-inference.huggingface.co/models/<your-model>/v1"
+export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
+export HF_TOKEN="hf_..."
+export ENV_URL="http://localhost:7860"   # optional, defaults to this
+python inference.py
+```
+Expected output:
+```
+Environment URL : http://localhost:7860
+Model           : meta-llama/Llama-3.1-8B-Instruct
+API base        : https://api-inference.huggingface.co/...
+Health check OK
+=== TASK_EASY ===
+  Starting task_easy (max_steps=30)
+    step=  1 action=allow_all          reward=+0.950 latency=  56.5ms queue=   0 cpu=0.54
+    ...
+  task_easy done — total_reward=27.3, score=1.000
+=== RESULTS ===
+  task_easy      : 1.000
+  task_medium    : 0.875
+  task_hard      : 0.623
+  Overall        : 0.833
+```
+---
+## Baseline Scores
+Measured on the deterministic simulator. Scores are in **0.0 – 1.0**.
+| Agent | task_easy | task_medium | task_hard | Overall |
+|-------|-----------|-------------|-----------|---------|
+| **Always allow_all** (naive) | 0.000 💥 | 0.833 | 0.300 💥 | 0.378 |
+| **Always drop_aggressive** (conservative) | 1.000 | 1.000 | 0.440 | 0.813 |
+| **Rule-based heuristic** | 1.000 | 1.000 | 0.500 | 0.833 |
+| **LLM agent** (target) | ≥ 0.9 | ≥ 0.9 | ≥ 0.6 | ≥ 0.8 |
+💥 = server crash occurred during episode
+**Key insight:** The hard task is the differentiator — naive and conservative agents score ≤ 0.44 because sustained 200 req/s overload requires balancing throughput (don't drop too much) against stability (don't let load crash the server). A smart LLM agent should outperform all rule-based baselines here.
+---
+## Infrastructure
+- Port: **7860** (HuggingFace Spaces)
+- CPU: 2 vCPU
+- Memory: 8 GB
+- GPU: not required
+- Inference timeout: < 20 minutes total
+---
+## Project Structure
+```
+.
+├── environment.py   # FastAPI app + episode logic
+├── tasks.py         # Traffic patterns + task metadata
+├── graders.py       # Per-task scoring functions
+├── simulator.py     # Backend physics (latency, CPU, memory, crash)
+├── models.py        # Pydantic models (state, action, request/response)
+├── inference.py     # LLM agent runner
+├── openenv.yaml     # OpenEnv spec
+├── Dockerfile
+├── requirements.txt
+└── README.md
+```

__pycache__/environment.cpython-310.pyc ADDED Viewed

Binary file (5.5 kB). View file

__pycache__/graders.cpython-310.pyc ADDED Viewed

Binary file (3.59 kB). View file

__pycache__/models.cpython-310.pyc ADDED Viewed

Binary file (3.19 kB). View file

__pycache__/simulator.cpython-310.pyc ADDED Viewed

Binary file (2.05 kB). View file

__pycache__/tasks.cpython-310.pyc ADDED Viewed

Binary file (2.4 kB). View file

environment.py ADDED Viewed

	@@ -0,0 +1,219 @@

+"""
+Adaptive Traffic Controller — OpenEnv-compatible FastAPI environment.
+Endpoints
+---------
+POST /reset          reset env, return initial state
+POST /step           take action, return (state, reward, done, info)
+GET  /state          current state
+GET  /tasks          list all tasks
+GET  /openenv.yaml   OpenEnv spec
+GET  /health         liveness probe
+"""
+from __future__ import annotations
+import os
+from contextlib import asynccontextmanager
+from pathlib import Path
+from typing import Any
+import yaml
+from fastapi import FastAPI, HTTPException
+from fastapi.responses import PlainTextResponse
+from graders import grade
+from models import (
+    Action,
+    ACTION_ACCEPT_RATE,
+    EpisodeStep,
+    HealthResponse,
+    ResetRequest,
+    ResetResponse,
+    ServerState,
+    StepRequest,
+    StepResponse,
+    TaskListResponse,
+)
+from simulator import compute_next_state, initial_state
+from tasks import EPISODE_LENGTHS, TASK_METADATA, TRAFFIC_PATTERNS
+# ---------------------------------------------------------------------------
+# In-memory session state
+# ---------------------------------------------------------------------------
+class EnvSession:
+    def __init__(self) -> None:
+        self.task_id: str = "task_easy"
+        self.state: ServerState = initial_state()
+        self.step: int = 0
+        self.done: bool = False
+        self.history: list[EpisodeStep] = []
+    def reset(self, task_id: str) -> ServerState:
+        traffic_fn = TRAFFIC_PATTERNS[task_id]
+        first_incoming = traffic_fn(0)
+        self.task_id = task_id
+        self.state = initial_state(first_incoming)
+        self.step = 0
+        self.done = False
+        self.history = []
+        return self.state
+    def step_env(self, action: Action) -> tuple[ServerState, float, bool, dict[str, Any]]:
+        if self.done:
+            raise ValueError("Episode is done. Call /reset to start a new episode.")
+        task_id = self.task_id
+        traffic_fn = TRAFFIC_PATTERNS[task_id]
+        max_steps = EPISODE_LENGTHS[task_id]
+        incoming = traffic_fn(self.step)
+        accept_rate = ACTION_ACCEPT_RATE[action]
+        allowed = incoming * accept_rate
+        next_state, crashed = compute_next_state(self.state, allowed, incoming)
+        next_state.step = self.step + 1
+        # --- Reward shaping ---
+        reward = _compute_reward(
+            incoming=incoming,
+            allowed=allowed,
+            latency=next_state.avg_latency,
+            crashed=crashed,
+            queue=next_state.queue_length,
+        )
+        ep_step = EpisodeStep(
+            step=self.step,
+            state=next_state,
+            action=action,
+            reward=reward,
+            incoming_requests=incoming,
+            allowed_requests=allowed,
+            crashed=crashed,
+        )
+        self.history.append(ep_step)
+        self.step += 1
+        self.state = next_state
+        self.done = crashed or (self.step >= max_steps)
+        # Expose the *upcoming* incoming rate so the agent can react proactively.
+        # This mirrors real monitoring: you see current traffic flow before deciding
+        # the next throttle level.
+        if not self.done:
+            upcoming = traffic_fn(self.step)
+            self.state.request_rate = round(upcoming, 2)
+        info: dict[str, Any] = {
+            "incoming_requests": incoming,
+            "allowed_requests": allowed,
+            "accept_rate": accept_rate,
+            "crashed": crashed,
+            "episode_step": self.step,
+            "max_steps": max_steps,
+        }
+        if self.done:
+            final_score = grade(task_id, self.history)
+            info["final_score"] = final_score
+            info["episode_done"] = True
+        return next_state, reward, self.done, info
+def _compute_reward(
+    incoming: float,
+    allowed: float,
+    latency: float,
+    crashed: bool,
+    queue: int,
+) -> float:
+    if crashed:
+        return -10.0
+    # Throughput reward: prefer allowing more traffic (normalised to [0, 1])
+    throughput_reward = allowed / max(incoming, 1.0)
+    # Latency penalty: smooth penalty starting at 200 ms
+    latency_penalty = max(0.0, (latency - 200.0) / 800.0)  # 0 at 200ms, 1 at 1000ms
+    # Queue penalty
+    queue_penalty = min(1.0, queue / 500.0)
+    reward = throughput_reward - latency_penalty * 0.5 - queue_penalty * 0.3
+    return round(reward, 4)
+# ---------------------------------------------------------------------------
+# App lifecycle
+# ---------------------------------------------------------------------------
+SESSION = EnvSession()
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    SESSION.reset("task_easy")
+    yield
+app = FastAPI(
+    title="Adaptive Traffic Controller",
+    description="OpenEnv environment for LLM-based backend traffic control",
+    version="1.0.0",
+    lifespan=lifespan,
+)
+# ---------------------------------------------------------------------------
+# Endpoints
+# ---------------------------------------------------------------------------
+@app.get("/health", response_model=HealthResponse)
+async def health() -> HealthResponse:
+    return HealthResponse(status="ok")
+@app.post("/reset", response_model=ResetResponse)
+async def reset(body: ResetRequest = ResetRequest()) -> ResetResponse:
+    if body.task_id not in TRAFFIC_PATTERNS:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Unknown task_id {body.task_id!r}. "
+                   f"Valid: {list(TRAFFIC_PATTERNS.keys())}",
+        )
+    state = SESSION.reset(body.task_id)
+    return ResetResponse(
+        state=state,
+        task_id=body.task_id,
+        max_steps=EPISODE_LENGTHS[body.task_id],
+    )
+@app.post("/step", response_model=StepResponse)
+async def step(body: StepRequest) -> StepResponse:
+    try:
+        state, reward, done, info = SESSION.step_env(body.action)
+    except ValueError as exc:
+        raise HTTPException(status_code=400, detail=str(exc))
+    return StepResponse(state=state, reward=reward, done=done, info=info)
+@app.get("/state", response_model=ServerState)
+async def get_state() -> ServerState:
+    return SESSION.state
+@app.get("/tasks", response_model=TaskListResponse)
+async def list_tasks() -> TaskListResponse:
+    return TaskListResponse(tasks=TASK_METADATA)
+@app.get("/openenv.yaml", response_class=PlainTextResponse)
+async def get_openenv_yaml() -> str:
+    yaml_path = Path(__file__).parent / "openenv.yaml"
+    if not yaml_path.exists():
+        raise HTTPException(status_code=404, detail="openenv.yaml not found")
+    return yaml_path.read_text()

graders.py ADDED Viewed

	@@ -0,0 +1,125 @@

+"""
+Graders — deterministic scoring for each task.
+Each grader receives the full episode history and returns a float in [0.0, 1.0].
+"""
+from __future__ import annotations
+from models import EpisodeStep
+# ---------------------------------------------------------------------------
+# Task Easy — Single Spike
+# ---------------------------------------------------------------------------
+def grade_task_easy(history: list[EpisodeStep]) -> float:
+    """
+    Score:
+      1.0  → no crash AND avg latency across all steps < 300 ms
+      0.5  → no crash but avg latency >= 300 ms
+      0.0  → any crash occurred
+    """
+    if not history:
+        return 0.0
+    crashed = any(s.crashed for s in history)
+    if crashed:
+        return 0.0
+    avg_latency = sum(s.state.avg_latency for s in history) / len(history)
+    if avg_latency < 300.0:
+        return 1.0
+    return 0.5
+# ---------------------------------------------------------------------------
+# Task Medium — Multiple Spikes
+# ---------------------------------------------------------------------------
+def grade_task_medium(history: list[EpisodeStep]) -> float:
+    """
+    Score:
+      base  = steps_without_crash / total_steps
+      penalty factor for high latency: multiplied by latency_factor in [0.5, 1.0]
+        latency_factor = 1.0 if avg_latency <= 200 ms
+        latency_factor = 0.5 if avg_latency >= 600 ms
+        linear interpolation in between
+    """
+    if not history:
+        return 0.0
+    total = len(history)
+    crash_steps = sum(1 for s in history if s.crashed)
+    base = (total - crash_steps) / total
+    avg_latency = sum(s.state.avg_latency for s in history) / total
+    low, high = 200.0, 600.0
+    if avg_latency <= low:
+        latency_factor = 1.0
+    elif avg_latency >= high:
+        latency_factor = 0.5
+    else:
+        latency_factor = 1.0 - 0.5 * (avg_latency - low) / (high - low)
+    return round(base * latency_factor, 4)
+# ---------------------------------------------------------------------------
+# Task Hard — Sustained Overload
+# ---------------------------------------------------------------------------
+def grade_task_hard(history: list[EpisodeStep]) -> float:
+    """
+    Score = throughput_ratio * stability_bonus * queue_factor
+    throughput_ratio = sum(allowed) / sum(incoming)   — maximize allowed traffic
+    stability_bonus  = 1.0 if no crash, 0.0 if any crash
+    queue_factor     = fraction of steps where queue_length < 100
+    """
+    if not history:
+        return 0.0
+    total_incoming = sum(s.incoming_requests for s in history)
+    total_allowed = sum(s.allowed_requests for s in history)
+    if total_incoming == 0:
+        throughput_ratio = 0.0
+    else:
+        throughput_ratio = min(1.0, total_allowed / total_incoming)
+    crashed = any(s.crashed for s in history)
+    stability_bonus = 0.0 if crashed else 1.0
+    # Partial credit for keeping queue under control
+    low_queue_steps = sum(1 for s in history if s.state.queue_length < 100)
+    queue_factor = low_queue_steps / len(history)
+    # Combine: throughput matters most, stability is binary gate,
+    # queue is a tie-breaker bonus
+    if stability_bonus == 0.0:
+        # Still give partial credit for throughput management even with a crash
+        score = throughput_ratio * 0.3 * queue_factor
+    else:
+        score = throughput_ratio * 0.7 + queue_factor * 0.3
+    return round(min(1.0, max(0.0, score)), 4)
+# ---------------------------------------------------------------------------
+# Dispatcher
+# ---------------------------------------------------------------------------
+GRADERS = {
+    "task_easy": grade_task_easy,
+    "task_medium": grade_task_medium,
+    "task_hard": grade_task_hard,
+}
+def grade(task_id: str, history: list[EpisodeStep]) -> float:
+    grader = GRADERS.get(task_id)
+    if grader is None:
+        raise ValueError(f"Unknown task_id: {task_id!r}")
+    return grader(history)

inference.py ADDED Viewed

	@@ -0,0 +1,198 @@

+"""
+LLM agent runner for the Adaptive Traffic Controller environment.
+Usage:
+    API_BASE_URL=<url> MODEL_NAME=<model> HF_TOKEN=<token> python inference.py
+Environment variables:
+    API_BASE_URL  — OpenAI-compatible base URL (e.g. HuggingFace TGI endpoint)
+    MODEL_NAME    — Model identifier
+    HF_TOKEN      — API key / HuggingFace token
+    ENV_URL       — (optional) Traffic controller environment URL, default http://localhost:7860
+"""
+from __future__ import annotations
+import os
+import sys
+import time
+import httpx
+from openai import OpenAI
+# ---------------------------------------------------------------------------
+# Configuration
+# ---------------------------------------------------------------------------
+API_BASE_URL: str = os.environ["API_BASE_URL"]
+MODEL_NAME: str = os.environ["MODEL_NAME"]
+HF_TOKEN: str = os.environ["HF_TOKEN"]
+ENV_URL: str = os.environ.get("ENV_URL", "http://localhost:7860")
+VALID_ACTIONS = {"allow_all", "throttle_70", "throttle_40", "drop_aggressive"}
+DEFAULT_ACTION = "throttle_70"
+MAX_RETRIES = 3
+client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+# ---------------------------------------------------------------------------
+# Prompts
+# ---------------------------------------------------------------------------
+SYSTEM_PROMPT = """You are a backend traffic controller agent.
+Your goal: prevent server crashes while maximizing throughput.
+Server state fields:
+  cpu_usage      — fraction 0.0–1.0 (danger above 0.8)
+  memory_usage   — fraction 0.0–1.0 (danger above 0.8)
+  request_rate   — incoming requests per second
+  queue_length   — pending requests (danger above 200)
+  avg_latency    — milliseconds (danger above 400ms)
+Available actions (choose exactly one):
+  allow_all        — accept 100% of requests (use when load is safe)
+  throttle_70      — accept 70%, drop 30% (use when load is moderate)
+  throttle_40      — accept 40%, drop 60% (use when load is high)
+  drop_aggressive  — accept 20%, drop 80% (use when crash is imminent)
+Decision heuristics:
+  - cpu < 0.6 AND latency < 200ms AND queue < 50  → allow_all
+  - cpu < 0.75 OR latency < 300ms                  → throttle_70
+  - cpu < 0.9 OR latency < 500ms OR queue < 150    → throttle_40
+  - otherwise                                       → drop_aggressive
+Respond with ONLY the action name, nothing else. No punctuation, no explanation."""
+def _format_state(state: dict) -> str:
+    return (
+        f"cpu_usage={state['cpu_usage']:.3f} "
+        f"memory_usage={state['memory_usage']:.3f} "
+        f"request_rate={state['request_rate']:.1f} req/s "
+        f"queue_length={state['queue_length']} "
+        f"avg_latency={state['avg_latency']:.1f}ms "
+        f"step={state.get('step', '?')}"
+    )
+# ---------------------------------------------------------------------------
+# LLM interaction
+# ---------------------------------------------------------------------------
+def get_action(state: dict) -> str:
+    """Query the LLM for a throttling action given the current server state."""
+    user_msg = f"Current server state: {_format_state(state)}\nChoose action:"
+    for attempt in range(1, MAX_RETRIES + 1):
+        try:
+            response = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=[
+                    {"role": "system", "content": SYSTEM_PROMPT},
+                    {"role": "user", "content": user_msg},
+                ],
+                max_tokens=20,
+                temperature=0.0,
+            )
+            raw = response.choices[0].message.content.strip().lower()
+            # Normalise: strip punctuation, take first token
+            action = raw.split()[0].rstrip(".,;:!") if raw.split() else ""
+            if action in VALID_ACTIONS:
+                return action
+            print(f"  [warn] LLM returned invalid action {raw!r}, attempt {attempt}/{MAX_RETRIES}")
+        except Exception as exc:
+            print(f"  [warn] LLM call failed ({exc}), attempt {attempt}/{MAX_RETRIES}")
+            time.sleep(1)
+    print(f"  [warn] falling back to default action: {DEFAULT_ACTION}")
+    return DEFAULT_ACTION
+# ---------------------------------------------------------------------------
+# Episode runner
+# ---------------------------------------------------------------------------
+def run_task(task_id: str, env_url: str) -> float:
+    """Run one full episode for task_id and return the final graded score."""
+    http = httpx.Client(base_url=env_url, timeout=30.0)
+    # Reset environment
+    reset_resp = http.post("/reset", json={"task_id": task_id})
+    reset_resp.raise_for_status()
+    data = reset_resp.json()
+    state = data["state"]
+    max_steps = data["max_steps"]
+    print(f"  Starting {task_id} (max_steps={max_steps})")
+    total_reward = 0.0
+    final_score = 0.0
+    step = 0
+    while True:
+        action = get_action(state)
+        step_resp = http.post("/step", json={"action": action})
+        step_resp.raise_for_status()
+        result = step_resp.json()
+        state = result["state"]
+        reward = result["reward"]
+        done = result["done"]
+        info = result["info"]
+        total_reward += reward
+        step += 1
+        crashed = info.get("crashed", False)
+        print(
+            f"    step={step:3d} action={action:<18s} "
+            f"reward={reward:+.3f} latency={state['avg_latency']:6.1f}ms "
+            f"queue={state['queue_length']:4d} cpu={state['cpu_usage']:.2f}"
+            + (" [CRASH]" if crashed else "")
+        )
+        if done:
+            final_score = info.get("final_score", 0.0)
+            break
+    print(f"  {task_id} done — total_reward={total_reward:.3f}, score={final_score:.3f}")
+    http.close()
+    return final_score
+# ---------------------------------------------------------------------------
+# Entry point
+# ---------------------------------------------------------------------------
+def main() -> None:
+    env_url = ENV_URL
+    print(f"Environment URL : {env_url}")
+    print(f"Model           : {MODEL_NAME}")
+    print(f"API base        : {API_BASE_URL}")
+    print()
+    # Quick health check
+    try:
+        resp = httpx.get(f"{env_url}/health", timeout=10.0)
+        resp.raise_for_status()
+        print("Health check OK\n")
+    except Exception as exc:
+        print(f"[ERROR] Environment not reachable at {env_url}: {exc}")
+        sys.exit(1)
+    results: dict[str, float] = {}
+    for task_id in ["task_easy", "task_medium", "task_hard"]:
+        print(f"=== {task_id.upper()} ===")
+        score = run_task(task_id, env_url)
+        results[task_id] = score
+        print()
+    print("=== RESULTS ===")
+    for task_id, score in results.items():
+        print(f"  {task_id:<15s}: {score:.3f}")
+    overall = sum(results.values()) / len(results)
+    print(f"  {'Overall':<15s}: {overall:.3f}")
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,77 @@

+from __future__ import annotations
+from enum import Enum
+from typing import Any
+from pydantic import BaseModel, Field
+class Action(str, Enum):
+    allow_all = "allow_all"
+    throttle_70 = "throttle_70"
+    throttle_40 = "throttle_40"
+    drop_aggressive = "drop_aggressive"
+ACTION_ACCEPT_RATE: dict[Action, float] = {
+    Action.allow_all: 1.0,
+    Action.throttle_70: 0.7,
+    Action.throttle_40: 0.4,
+    Action.drop_aggressive: 0.2,
+}
+class ServerState(BaseModel):
+    cpu_usage: float = Field(..., ge=0.0, le=1.0, description="CPU utilization 0–1")
+    memory_usage: float = Field(..., ge=0.0, le=1.0, description="Memory utilization 0–1")
+    request_rate: float = Field(..., ge=0.0, description="Incoming requests per second")
+    queue_length: int = Field(..., ge=0, description="Pending requests in queue")
+    avg_latency: float = Field(..., ge=0.0, description="Average latency in milliseconds")
+    step: int = Field(default=0, description="Current episode step")
+    crashed: bool = Field(default=False, description="Whether server has crashed")
+class ResetRequest(BaseModel):
+    task_id: str = Field(default="task_easy", description="Task to run")
+class ResetResponse(BaseModel):
+    state: ServerState
+    task_id: str
+    max_steps: int
+class StepRequest(BaseModel):
+    action: Action
+class StepResponse(BaseModel):
+    state: ServerState
+    reward: float
+    done: bool
+    info: dict[str, Any] = Field(default_factory=dict)
+class EpisodeStep(BaseModel):
+    step: int
+    state: ServerState
+    action: Action
+    reward: float
+    incoming_requests: float
+    allowed_requests: float
+    crashed: bool
+class TaskInfo(BaseModel):
+    id: str
+    description: str
+    episode_length: int
+    difficulty: str
+class TaskListResponse(BaseModel):
+    tasks: list[TaskInfo]
+class HealthResponse(BaseModel):
+    status: str = "ok"

openenv.yaml ADDED Viewed

	@@ -0,0 +1,118 @@

+name: adaptive-traffic-controller
+version: "1.0.0"
+description: >
+  LLM agent controls backend traffic throttling to prevent server crashes.
+  The agent observes real-time server metrics and chooses a throttling action
+  each step to keep CPU, memory, and latency within safe bounds.
+observation_space:
+  cpu_usage:
+    type: float
+    range: [0.0, 1.0]
+    description: CPU utilization as a fraction of total capacity
+  memory_usage:
+    type: float
+    range: [0.0, 1.0]
+    description: Memory utilization as a fraction of total capacity
+  request_rate:
+    type: float
+    unit: requests/sec
+    description: Current incoming request rate
+  queue_length:
+    type: int
+    range: [0, 500]
+    description: Number of pending requests waiting to be processed
+  avg_latency:
+    type: float
+    unit: milliseconds
+    description: Average response latency for processed requests
+  step:
+    type: int
+    description: Current step index within the episode
+  crashed:
+    type: bool
+    description: Whether the server has crashed this step
+action_space:
+  type: discrete
+  actions:
+    - id: allow_all
+      accept_rate: 1.0
+      description: Accept 100% of incoming requests
+    - id: throttle_70
+      accept_rate: 0.7
+      description: Accept 70%, drop 30% of incoming requests
+    - id: throttle_40
+      accept_rate: 0.4
+      description: Accept 40%, drop 60% of incoming requests
+    - id: drop_aggressive
+      accept_rate: 0.2
+      description: Accept 20%, drop 80% of incoming requests
+tasks:
+  - id: task_easy
+    difficulty: easy
+    episode_length: 30
+    description: >
+      Single spike: baseline 40 req/s, spike to 160 req/s at step 10 for
+      5 steps, return to 40. Agent must detect spike, throttle, and recover.
+    grading:
+      full_score: "no crash AND avg_latency < 300ms"
+      partial_score: "no crash but avg_latency >= 300ms → 0.5"
+      zero_score: "any crash → 0.0"
+  - id: task_medium
+    difficulty: medium
+    episode_length: 40
+    description: >
+      Three traffic spikes of 150 req/s at steps 5, 15, 25 (3 steps each),
+      baseline 50 req/s. Agent must handle repeated bursts.
+    grading:
+      formula: "score = (steps_without_crash / total_steps) * latency_factor"
+      latency_factor: "1.0 at <=200ms, 0.5 at >=600ms, linear between"
+  - id: task_hard
+    difficulty: hard
+    episode_length: 50
+    description: >
+      Sustained overload: traffic ramps 60→200 req/s over 20 steps, stays
+      at 200 for 20 steps, then drops to 80. Agent must balance throughput
+      vs. stability under prolonged high load.
+    grading:
+      formula: "score = throughput_ratio * 0.7 + queue_factor * 0.3"
+      throughput_ratio: "total_allowed / total_incoming"
+      stability_bonus: "crash zeroes out primary score (partial credit * 0.3)"
+      queue_factor: "fraction of steps with queue_length < 100"
+endpoints:
+  reset:
+    method: POST
+    path: /reset
+    description: Reset environment, returns initial state
+  step:
+    method: POST
+    path: /step
+    description: Execute action, returns next state, reward, done flag, and info
+  state:
+    method: GET
+    path: /state
+    description: Get current server state
+  tasks:
+    method: GET
+    path: /tasks
+    description: List all available tasks
+  spec:
+    method: GET
+    path: /openenv.yaml
+    description: This OpenEnv specification file
+  health:
+    method: GET
+    path: /health
+    description: Liveness probe
+infrastructure:
+  port: 7860
+  cpu: 2
+  memory_gb: 8
+  gpu_required: false
+  max_inference_minutes: 20

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi>=0.111.0
+uvicorn[standard]>=0.29.0
+pydantic>=2.7.0
+openai>=1.30.0
+httpx>=0.27.0
+numpy>=1.26.0
+pyyaml>=6.0.1

simulator.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""Backend simulation math — models how a real server responds to load."""
+from __future__ import annotations
+from models import ServerState
+MAX_CAPACITY = 100.0   # requests/sec the backend can handle at full health
+BASE_LATENCY = 50.0    # milliseconds at zero load
+MAX_QUEUE = 500
+CRASH_LOAD_RATIO = 1.3  # server crashes when 30% or more over capacity
+def compute_next_state(
+    current_state: ServerState,
+    allowed_requests: float,
+    incoming_requests: float,
+) -> tuple[ServerState, bool]:
+    """
+    Compute the next server state after one time step.
+    Returns (next_state, crashed).
+    The environment exposes the *upcoming* request_rate in the observation so
+    the agent can react before overload happens (see environment.py).
+    Crash fires when allowed traffic exceeds 130% of capacity in a single step.
+    """
+    load_ratio = allowed_requests / MAX_CAPACITY
+    # Latency spikes superlinearly under load
+    if load_ratio <= 1.0:
+        latency = BASE_LATENCY * (1.0 + load_ratio ** 2)
+    else:
+        latency = BASE_LATENCY * (1.0 + load_ratio ** 3)  # exponential degradation
+    # Queue builds when allowed requests exceed capacity
+    queue_delta = max(0.0, allowed_requests - MAX_CAPACITY)
+    # Queue drains when load is under capacity (servers catch up)
+    queue_drain = max(0.0, (MAX_CAPACITY - allowed_requests) * 0.3)
+    new_queue = current_state.queue_length + queue_delta - queue_drain
+    queue_length = int(min(MAX_QUEUE, max(0.0, new_queue)))
+    # Crash if load exceeds 130% of capacity
+    crashed = load_ratio > CRASH_LOAD_RATIO
+    # Latency grows with queue backlog
+    latency += queue_length * 0.5
+    # CPU and memory track load
+    cpu = min(1.0, 0.3 + load_ratio * 0.6)
+    memory = min(1.0, 0.2 + load_ratio * 0.4)
+    next_state = ServerState(
+        cpu_usage=round(cpu, 4),
+        memory_usage=round(memory, 4),
+        request_rate=round(incoming_requests, 2),
+        queue_length=queue_length,
+        avg_latency=round(latency, 2),
+        step=current_state.step + 1,
+        crashed=crashed,
+    )
+    return next_state, crashed
+def initial_state(incoming_requests: float = 40.0) -> ServerState:
+    """Return a clean initial server state."""
+    load_ratio = incoming_requests / MAX_CAPACITY
+    latency = BASE_LATENCY * (1.0 + load_ratio ** 2)
+    cpu = min(1.0, 0.3 + load_ratio * 0.6)
+    memory = min(1.0, 0.2 + load_ratio * 0.4)
+    return ServerState(
+        cpu_usage=round(cpu, 4),
+        memory_usage=round(memory, 4),
+        request_rate=round(incoming_requests, 2),
+        queue_length=0,
+        avg_latency=round(latency, 2),
+        step=0,
+        crashed=False,
+    )

tasks.py ADDED Viewed

	@@ -0,0 +1,94 @@

+"""Task definitions — each describes a traffic pattern and episode parameters."""
+from __future__ import annotations
+from models import TaskInfo
+# ---------------------------------------------------------------------------
+# Traffic pattern generators
+# Each returns incoming request rate (req/s) for a given step index (0-based).
+# ---------------------------------------------------------------------------
+def traffic_easy(step: int) -> float:
+    """
+    Task Easy — Single Spike
+    Baseline 40 req/s, spike to 160 at step 10 for 5 steps, back to 40.
+    """
+    if 10 <= step < 15:
+        return 160.0
+    return 40.0
+def traffic_medium(step: int) -> float:
+    """
+    Task Medium — Multiple Spikes
+    Baseline 50 req/s, spikes of 150 req/s at steps 5–7, 15–17, 25–27.
+    """
+    if 5 <= step < 8:
+        return 150.0
+    if 15 <= step < 18:
+        return 150.0
+    if 25 <= step < 28:
+        return 150.0
+    return 50.0
+def traffic_hard(step: int) -> float:
+    """
+    Task Hard — Sustained Overload
+    Ramps from 60 → 200 req/s over 20 steps, stays at 200 for 20 more steps,
+    then drops back to 80 for the final 10 steps.
+    """
+    if step < 20:
+        # linear ramp 60 → 200
+        return 60.0 + (200.0 - 60.0) * (step / 19.0)
+    if step < 40:
+        return 200.0
+    return 80.0
+TRAFFIC_PATTERNS: dict[str, callable] = {
+    "task_easy": traffic_easy,
+    "task_medium": traffic_medium,
+    "task_hard": traffic_hard,
+}
+EPISODE_LENGTHS: dict[str, int] = {
+    "task_easy": 30,
+    "task_medium": 40,
+    "task_hard": 50,
+}
+TASK_METADATA: list[TaskInfo] = [
+    TaskInfo(
+        id="task_easy",
+        description=(
+            "Single traffic spike: baseline 40 req/s rising to 160 req/s at step 10 "
+            "for 5 steps, then back to 40. Agent must detect and throttle the spike "
+            "without crashing the server."
+        ),
+        episode_length=30,
+        difficulty="easy",
+    ),
+    TaskInfo(
+        id="task_medium",
+        description=(
+            "Three traffic spikes of 150 req/s at steps 5, 15, and 25 (3 steps each), "
+            "baseline 50 req/s. Agent must handle repeated bursts while maintaining "
+            "throughput between spikes."
+        ),
+        episode_length=40,
+        difficulty="medium",
+    ),
+    TaskInfo(
+        id="task_hard",
+        description=(
+            "Sustained overload: traffic ramps from 60 → 200 req/s over 20 steps, "
+            "stays at 200 for 20 more steps, then drops to 80. Agent must balance "
+            "throughput vs. stability under prolonged high load."
+        ),
+        episode_length=50,
+        difficulty="hard",
+    ),
+]