Upload 16 files

Browse files

Files changed (16) hide show

Dockerfile +14 -0
README.md +252 -3
__init__.cpython-313.pyc +0 -0
__init__.py +0 -0
app.cpython-313.pyc +0 -0
app.py +19 -0
client.cpython-313.pyc +0 -0
graders.cpython-313.pyc +0 -0
inference.cpython-313.pyc +0 -0
inference.py +275 -0
models.cpython-313.pyc +0 -0
openenv.yaml +8 -0
pyproject.toml +23 -0
requirements.txt +4 -0
tasks.cpython-313.pyc +0 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,14 @@

+FROM python:3.11-slim
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV PORT=8000
+WORKDIR /app
+COPY requirements.txt /app/requirements.txt
+RUN pip install --no-cache-dir -r /app/requirements.txt
+COPY . /app
+CMD ["python", "-m", "server.app"]

README.md CHANGED Viewed

@@ -1,3 +1,252 @@
----
-license: ecl-2.0
----

+---
+title: support-triage-openenv
+sdk: docker
+app_port: 8000
+tags:
+  - openenv
+  - reinforcement-learning
+  - customer-support
+---
+# Support Triage OpenEnv
+A real-world OpenEnv environment where an agent performs customer support triage: prioritization, routing, tagging, information gathering, and response drafting.
+This project is designed for Round 1 style hackathon evaluation:
+- Full typed OpenEnv models
+- `reset()` / `step()` / `state()` API
+- 3 deterministic graded tasks (easy/medium/hard)
+- Dense reward shaping with partial progress
+- Baseline `inference.py` using OpenAI client and required env vars
+- Docker + Hugging Face Spaces deployment files
+## Why This Environment Has Real Utility
+Teams actually do this workflow in support operations and trust/safety queues. This environment evaluates whether an agent can:
+- classify urgency
+- route to the right team
+- attach relevant operational tags
+- ask for required evidence
+- draft safe and useful customer responses
+- close only when resolution criteria are met
+## Module-Aligned Build Guide (From Your Course)
+### Module 1: Why OpenEnv?
+- We treat the environment as a service with typed contracts.
+- Core loop follows RL structure: observe -> act -> reward.
+### Module 2: Using Existing Environments
+- `support_triage_env/models.py` defines typed `Action`, `Observation`, `State`.
+- `support_triage_env/client.py` gives a reusable typed client.
+### Module 3: Deploying Environments
+- `server/app.py` is the OpenEnv validator-compatible entrypoint (`main()` + callable script).
+- `server/Dockerfile` provides reproducible container runtime.
+- `openenv.yaml` defines deployment metadata.
+### Module 4: Building Your Own Environment
+- `support_triage_env/server/environment.py` implements task simulation.
+- `support_triage_env/tasks.py` defines deterministic fixtures.
+- `support_triage_env/graders.py` implements 0.0-1.0 grading.
+### Module 5: Training with OpenEnv + Reward Signals
+- Reward shaping is dense and trajectory-aware.
+- `inference.py` runs model-based episodes and exports reproducible baseline scores.
+## Action Space
+Action model: `SupportTriageAction`
+```text
+set_priority(value)
+route_team(value)
+add_tag(value)
+draft_reply(value)
+request_info(value)
+close_ticket()
+noop()
+```
+Valid priorities: `low | medium | high | urgent`
+Valid teams: `billing | technical | account | trust_safety | shipping`
+## Observation Space
+Observation model: `SupportTriageObservation`
+Key fields:
+- `task_id`, `difficulty`, `objective`
+- `title`, `customer_tier`, `customer_message`
+- current working state: `priority`, `routed_team`, `tags`, `draft_reply`, `info_requested`
+- `steps_remaining`, `last_feedback`, `allowed_actions`
+- inherited `reward`, `done`
+## State Space
+State model: `SupportTriageState`
+Contains episode metadata and full workflow state:
+- `episode_id`, `step_count`
+- `task_id`, `difficulty`, `objective`, `max_steps`
+- `priority`, `routed_team`, `tags`
+- `info_requested`, `closed`, `close_valid`
+- `history`
+## Tasks and Graders
+### Easy: `easy_password_reset`
+- Scenario: login token failure after password reset
+- Expected routing: `account`
+- Expected priority: `medium`
+- Required tags: `password-reset`, `login`
+### Medium: `medium_double_charge`
+- Scenario: premium customer charged twice
+- Expected routing: `billing`
+- Expected priority: `high`
+- Required tags: `refund`, `double-charge`, `vip`
+- Needs additional evidence request
+### Hard: `hard_account_takeover`
+- Scenario: possible account takeover + fraud + abusive content
+- Expected routing: `trust_safety`
+- Expected priority: `urgent`
+- Required tags: `security`, `account-takeover`, `fraud`, `content-abuse`
+- Needs security-safe communication and evidence collection
+### Grading Design
+`support_triage_env/graders.py` computes deterministic component scores:
+- priority correctness
+- routing correctness
+- required tags coverage
+- reply quality (required/forbidden phrase logic)
+- process quality (info request + closure quality + efficiency)
+Final score is normalized to `[0.0, 1.0]`.
+## Reward Function
+The environment provides dense rewards at each step:
+- positive reward for correct priority/routing/tagging
+- incremental reward for improving draft response quality
+- positive signal for meaningful information requests when required
+- strong bonus for valid close
+- penalties for invalid actions, repeated loops, no-op behavior, or premature close
+- small per-step cost to discourage inefficient trajectories
+## Windows Setup
+```powershell
+py -3.11 -m venv .venv
+.\.venv\Scripts\Activate.ps1
+python -m pip install -U pip
+pip install -r requirements.txt
+```
+Optional: if `openenv` command is not found, use:
+```powershell
+& "$env:APPDATA\Python\Python313\Scripts\openenv.exe" --help
+```
+## Run Locally
+### Start API server
+```powershell
+python -m uvicorn support_triage_env.server.app:app --host 0.0.0.0 --port 8000 --reload
+```
+### Validate with OpenEnv tooling
+```powershell
+openenv validate --verbose
+openenv validate --url http://localhost:8000
+```
+## Baseline Inference
+`inference.py` is at project root as required.
+Set env vars first:
+```powershell
+$env:API_BASE_URL = "https://router.huggingface.co/v1"
+$env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
+$env:HF_TOKEN = "<your_hf_token>"
+```
+Run:
+```powershell
+python .\inference.py
+```
+Output:
+- per-task scores
+- average score
+- `baseline_scores.json`
+## Docker
+Build:
+```powershell
+docker build -t support-triage-openenv:latest -f server/Dockerfile .
+```
+Run:
+```powershell
+docker run --rm -p 8000:8000 support-triage-openenv:latest
+```
+## Deploy to Hugging Face Spaces
+```powershell
+openenv push --repo-id <your-username>/support-triage-openenv
+```
+Then set in Space settings:
+- `API_BASE_URL`
+- `MODEL_NAME`
+- `HF_TOKEN`
+## Suggested Baseline Reporting Format
+Include in submission:
+- model name
+- per-task score table
+- average score
+- runtime in minutes
+- commit hash
+## Project Structure
+```text
+support-triage-openenv/
+|- server/
+|  |- __init__.py
+|  |- app.py
+|  |- Dockerfile
+|- support_triage_env/
+|  |- __init__.py
+|  |- models.py
+|  |- client.py
+|  |- tasks.py
+|  |- graders.py
+|  |- server/
+|     |- __init__.py
+|     |- app.py
+|     |- environment.py
+|     |- Dockerfile
+|- inference.py
+|- openenv.yaml
+|- pyproject.toml
+|- requirements.txt
+|- uv.lock
+|- README.md
+```

__init__.cpython-313.pyc ADDED Viewed

Binary file (115 Bytes). View file

__init__.py ADDED Viewed

File without changes

app.cpython-313.pyc ADDED Viewed

Binary file (795 Bytes). View file

app.py ADDED Viewed

	@@ -0,0 +1,19 @@

+"""Root server entrypoint expected by OpenEnv validator."""
+from __future__ import annotations
+import os
+import uvicorn
+from support_triage_env.server.app import app
+def main() -> None:
+    host = os.getenv("HOST", "0.0.0.0")
+    port = int(os.getenv("PORT", "8000"))
+    uvicorn.run("server.app:app", host=host, port=port)
+if __name__ == "__main__":
+    main()

client.cpython-313.pyc ADDED Viewed

Binary file (4.58 kB). View file

graders.cpython-313.pyc ADDED Viewed

Binary file (5.66 kB). View file

inference.cpython-313.pyc ADDED Viewed

Binary file (12.1 kB). View file

inference.py ADDED Viewed

	@@ -0,0 +1,275 @@

+"""
+Baseline inference for support-triage-openenv.
+Required environment variables before submission:
+- API_BASE_URL
+- MODEL_NAME
+- HF_TOKEN
+"""
+from __future__ import annotations
+import json
+import os
+import re
+from dataclasses import asdict, dataclass
+from typing import Dict, Optional
+from openai import OpenAI
+from support_triage_env.models import SupportTriageAction, SupportTriageObservation
+from support_triage_env.server.environment import SupportTriageEnvironment
+from support_triage_env.tasks import TASK_ORDER
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("OPENAI_API_KEY")
+MODEL_NAME = os.getenv("MODEL_NAME")
+MAX_STEPS = 14
+TEMPERATURE = 0.1
+MAX_TOKENS = 220
+ACTION_TYPES = {
+    "set_priority",
+    "route_team",
+    "add_tag",
+    "draft_reply",
+    "request_info",
+    "close_ticket",
+    "noop",
+}
+SYSTEM_PROMPT = (
+    "You are a customer support triage agent operating in an RL environment. "
+    "Return exactly one JSON object with keys action_type and value. "
+    "Valid action_type values are: set_priority, route_team, add_tag, "
+    "draft_reply, request_info, close_ticket, noop. "
+    "Do not include markdown, explanations, or extra text."
+)
+@dataclass
+class EpisodeReport:
+    task_id: str
+    steps: int
+    score: float
+    breakdown: Dict[str, float]
+def build_user_prompt(step: int, obs: SupportTriageObservation) -> str:
+    return (
+        f"Step: {step}\n"
+        f"Task: {obs.task_id} ({obs.difficulty})\n"
+        f"Objective: {obs.objective}\n"
+        f"Title: {obs.title}\n"
+        f"Customer tier: {obs.customer_tier}\n"
+        f"Customer message: {obs.customer_message}\n"
+        f"Current priority: {obs.priority}\n"
+        f"Current team: {obs.routed_team}\n"
+        f"Current tags: {obs.tags}\n"
+        f"Info requested: {obs.info_requested}\n"
+        f"Current draft reply: {obs.draft_reply}\n"
+        f"Steps remaining: {obs.steps_remaining}\n"
+        f"Last feedback: {obs.last_feedback}\n"
+        f"Allowed actions: {obs.allowed_actions}\n"
+        "Respond with JSON only."
+    )
+def _extract_json(text: str) -> Optional[Dict[str, object]]:
+    text = (text or "").strip()
+    if not text:
+        return None
+    try:
+        parsed = json.loads(text)
+        if isinstance(parsed, dict):
+            return parsed
+    except json.JSONDecodeError:
+        pass
+    match = re.search(r"\{.*\}", text, re.DOTALL)
+    if not match:
+        return None
+    try:
+        parsed = json.loads(match.group(0))
+    except json.JSONDecodeError:
+        return None
+    return parsed if isinstance(parsed, dict) else None
+def fallback_action(obs: SupportTriageObservation) -> SupportTriageAction:
+    # Deterministic fallback keeps runs reproducible if model output is malformed.
+    if not obs.priority:
+        mapping = {
+            "easy_password_reset": "medium",
+            "medium_double_charge": "high",
+            "hard_account_takeover": "urgent",
+        }
+        return SupportTriageAction(action_type="set_priority", value=mapping.get(obs.task_id, "medium"))
+    if not obs.routed_team:
+        mapping = {
+            "easy_password_reset": "account",
+            "medium_double_charge": "billing",
+            "hard_account_takeover": "trust_safety",
+        }
+        return SupportTriageAction(action_type="route_team", value=mapping.get(obs.task_id, "technical"))
+    if obs.task_id == "easy_password_reset" and "password-reset" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="password-reset")
+    if obs.task_id == "easy_password_reset" and "login" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="login")
+    if obs.task_id == "medium_double_charge" and "refund" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="refund")
+    if obs.task_id == "medium_double_charge" and "double-charge" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="double-charge")
+    if obs.task_id == "medium_double_charge" and "vip" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="vip")
+    if obs.task_id == "hard_account_takeover" and "security" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="security")
+    if obs.task_id == "hard_account_takeover" and "account-takeover" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="account-takeover")
+    if obs.task_id == "hard_account_takeover" and "fraud" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="fraud")
+    if obs.task_id == "hard_account_takeover" and "content-abuse" not in obs.tags:
+        return SupportTriageAction(action_type="add_tag", value="content-abuse")
+    if obs.task_id == "easy_password_reset" and not obs.draft_reply:
+        return SupportTriageAction(
+            action_type="draft_reply",
+            value=(
+                "Sorry for the login trouble. Please use the reset link again and "
+                "enable 2FA after login. If this continues, support can verify your token."
+            ),
+        )
+    if obs.task_id == "medium_double_charge" and not obs.info_requested:
+        return SupportTriageAction(
+            action_type="request_info",
+            value="Please share the transaction ID and last 4 digits of the charged card.",
+        )
+    if obs.task_id == "medium_double_charge" and not obs.draft_reply:
+        return SupportTriageAction(
+            action_type="draft_reply",
+            value=(
+                "Sorry for this frustration. Our billing team will investigate the "
+                "double charge and process any eligible refund after verification."
+            ),
+        )
+    if obs.task_id == "hard_account_takeover" and not obs.info_requested:
+        return SupportTriageAction(
+            action_type="request_info",
+            value="Please share screenshot evidence, timestamps, and the suspicious order ID.",
+        )
+    if obs.task_id == "hard_account_takeover" and not obs.draft_reply:
+        return SupportTriageAction(
+            action_type="draft_reply",
+            value=(
+                "We have escalated this security case. Please secure your account, reset "
+                "password now, and enable two-factor authentication immediately."
+            ),
+        )
+    return SupportTriageAction(action_type="close_ticket", value="")
+def parse_action(response_text: str, obs: SupportTriageObservation) -> SupportTriageAction:
+    parsed = _extract_json(response_text)
+    if not parsed:
+        return fallback_action(obs)
+    action_type = str(parsed.get("action_type", "noop")).strip()
+    value_obj = parsed.get("value")
+    value = "" if value_obj is None else str(value_obj)
+    if action_type not in ACTION_TYPES:
+        return fallback_action(obs)
+    return SupportTriageAction(action_type=action_type, value=value)
+def run_task(client: OpenAI, task_id: str) -> EpisodeReport:
+    env = SupportTriageEnvironment()
+    obs = env.reset(task_id=task_id)
+    for step in range(1, MAX_STEPS + 1):
+        if obs.done:
+            break
+        user_prompt = build_user_prompt(step, obs)
+        try:
+            completion = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=[
+                    {"role": "system", "content": SYSTEM_PROMPT},
+                    {"role": "user", "content": user_prompt},
+                ],
+                temperature=TEMPERATURE,
+                max_tokens=MAX_TOKENS,
+                stream=False,
+            )
+            response_text = completion.choices[0].message.content or ""
+        except Exception as exc:  # noqa: BLE001
+            print(f"Model call failed on {task_id} step {step}: {exc}. Falling back to heuristic.")
+            response_text = ""
+        action = parse_action(response_text, obs)
+        obs = env.step(action)
+        print(
+            f"[{task_id}] step={step} action={action.action_type}:{action.value} "
+            f"reward={obs.reward:+.3f} done={obs.done}"
+        )
+        if obs.done:
+            break
+    final = env.evaluate()
+    return EpisodeReport(
+        task_id=task_id,
+        steps=int(final["steps"]),
+        score=float(final["score"]),
+        breakdown=dict(final["breakdown"]),
+    )
+def main() -> None:
+    if not API_KEY:
+        raise RuntimeError("Missing HF_TOKEN (or OPENAI_API_KEY fallback) environment variable.")
+    if not MODEL_NAME:
+        raise RuntimeError("Missing MODEL_NAME environment variable.")
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    reports = [run_task(client, task_id) for task_id in TASK_ORDER]
+    avg_score = sum(r.score for r in reports) / len(reports)
+    print("\n=== Baseline Scores ===")
+    for report in reports:
+        print(f"{report.task_id}: score={report.score:.4f} steps={report.steps}")
+    print(f"Average score: {avg_score:.4f}")
+    payload = {
+        "model": MODEL_NAME,
+        "api_base_url": API_BASE_URL,
+        "average_score": round(avg_score, 4),
+        "tasks": [asdict(report) for report in reports],
+    }
+    with open("baseline_scores.json", "w", encoding="utf-8") as f:
+        json.dump(payload, f, indent=2)
+    print("Saved baseline_scores.json")
+if __name__ == "__main__":
+    main()

models.cpython-313.pyc ADDED Viewed

Binary file (2.67 kB). View file

openenv.yaml ADDED Viewed

	@@ -0,0 +1,8 @@

+spec_version: 1
+name: support-triage-openenv
+version: "0.1.0"
+description: "Customer support triage environment for OpenEnv"
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,23 @@

+[build-system]
+requires = ["setuptools>=68.0", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "support-triage-openenv"
+version = "0.1.0"
+description = "OpenEnv environment for customer support triage and response drafting"
+readme = "README.md"
+requires-python = ">=3.10"
+authors = [{ name = "Hackathon Team" }]
+dependencies = [
+  "openenv-core>=0.2.0",
+  "openai>=1.35.0",
+  "pydantic>=2.7.0",
+  "uvicorn>=0.30.0",
+]
+[project.scripts]
+server = "server.app:main"
+[tool.setuptools.packages.find]
+include = ["support_triage_env*", "server*"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+openenv-core>=0.2.0
+openai>=1.35.0
+pydantic>=2.7.0
+uvicorn>=0.30.0

tasks.cpython-313.pyc ADDED Viewed

Binary file (3.81 kB). View file

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff