Spaces:

openenv123
/

openenv-email-triage

Sleeping

App Files Files Community

Nanny7 commited on 25 days ago

Commit

3af4662

1 Parent(s): c344b7a

fixed api endpoints

Browse files

Files changed (8) hide show

README.md +1 -240
__pycache__/dataset.cpython-312.pyc +0 -0
__pycache__/environment.cpython-312.pyc +0 -0
__pycache__/graders.cpython-312.pyc +0 -0
__pycache__/inference.cpython-312.pyc +0 -0
__pycache__/models.cpython-312.pyc +0 -0
__pycache__/server.cpython-312.pyc +0 -0
inference.py +67 -188

README.md CHANGED Viewed

@@ -1,246 +1,7 @@
 ---
 title: OpenEnv Email Triage
-emoji: 📧
 sdk: docker
 app_port: 7860
 ---
 # OpenEnv Email Triage
-AI-powered email triage system using FastAPI and Docker.
-<<<<<<< HEAD
-# 📧 OpenEnv Email Triage Environment
-[![OpenEnv](https://img.shields.io/badge/OpenEnv-compliant-00d4ff?style=flat-square)](https://openenv.dev)
-[![License: MIT](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)
-[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue?style=flat-square)](https://python.org)
-[![FastAPI](https://img.shields.io/badge/FastAPI-0.115-009688?style=flat-square)](https://fastapi.tiangolo.com)
-A real-world **email triage environment** for AI agents. Agents must classify, prioritize, and respond to business emails — simulating one of the most common daily tasks for knowledge workers. Built to the full OpenEnv spec with three progressively harder tasks.
----
-## Why Email Triage?
-Email overload is a massive real-world problem. Studies show knowledge workers spend 28% of their day on email. The ability to triage — classify urgency, categorize, decide on actions, and draft replies — is a core productivity skill that AI agents can meaningfully assist with. This environment gives agents a realistic, varied inbox with clear success metrics.
----
-## Environment Description
-The agent receives emails one at a time from a simulated inbox. For each email, it must:
-1. **Classify urgency** — 5 levels (critical → ignore)
-2. **Assign category** — 10 types (complaint, spam, finance, HR, legal, etc.)
-3. **Recommend action** — 6 options (reply, forward, archive, delete, escalate, flag)
-4. **Draft a reply** *(Hard task only)* — contextually appropriate response
-Rewards are given per-step with partial credit, not just at the end of an episode.
----
-## Observation Space
-```json
-{
-  "current_email": {
-    "id": "string",
-    "subject": "string",
-    "sender": "string",
-    "sender_domain": "string",
-    "body": "string",
-    "timestamp": "ISO8601 string",
-    "has_attachment": "boolean",
-    "thread_length": "integer"
-  },
-  "emails_processed": "integer",
-  "emails_remaining": "integer",
-  "score_so_far":     "float [0.0–1.0]",
-  "task_id":          "string",
-  "done":             "boolean",
-  "message":          "string"
-}
-```
----
-## Action Space
-```json
-{
-  "urgency":     "critical | high | medium | low | ignore",
-  "category":    "customer_complaint | sales_inquiry | internal_ops | hr | finance | spam | support | legal | pr | other",
-  "action":      "reply | forward | archive | delete | escalate | flag_review",
-  "draft_reply": "string (required when action=reply)",
-  "forward_to":  "string (required when action=forward or escalate)",
-  "reasoning":   "string (optional, not scored)"
-}
-```
----
-## Tasks
-### Task 1 — Binary Spam Detection (Easy)
-- **Emails**: 10
-- **Objective**: Identify spam vs legitimate email; assign basic urgency
-- **Reward**: Full (1.0) for correct spam detection; proportional for legit emails
-- **Success Threshold**: 0.75
-- **Baseline Score**: ~0.82
-### Task 2 — Priority Inbox Triage (Medium)
-- **Emails**: 15
-- **Objective**: Full 3-way classification (urgency × category × action)
-- **Reward**: Weighted sum with partial credit for close classifications
-- **Success Threshold**: 0.65
-- **Baseline Score**: ~0.67
-### Task 3 — Full Triage + Response Drafting (Hard)
-- **Emails**: 20
-- **Objective**: Complete triage + draft contextually appropriate replies
-- **Reward**: 50% triage quality + 50% reply quality (keyword coverage, tone, length)
-- **Success Threshold**: 0.55
-- **Baseline Score**: ~0.54
----
-## Reward Function
-Rewards are **per-step** (not end-of-episode) and provide partial progress signals:
-```
-reward = urgency_score × 0.30
-       + category_score × 0.40
-       + action_score × 0.30
-       − penalty
-```
-**Partial credit:**
-- Urgency: exact=1.0, off-by-one=0.5, off-by-two=0.2
-- Category: exact=1.0, semantically related=0.4, unrelated=0.0
-- Action: exact=1.0, acceptable alternative=0.5
-**Penalties:**
-- Missing critical email: −0.25 to −0.30
-- Marking legit email as spam: −0.15 to −0.30
-**Reply quality (Hard task):** graded on non-empty content, length ≥100 chars, keyword coverage, and professional tone markers.
----
-## API Endpoints
-| Method | Path | Description |
-|--------|------|-------------|
-| `POST` | `/reset` | Start episode: `{"task_id": "task_easy"}` |
-| `POST` | `/step` | Submit action, get reward |
-| `GET`  | `/state` | Full internal state |
-| `GET`  | `/health` | Health check (returns 200) |
-| `GET`  | `/tasks` | List tasks |
-| `GET`  | `/action_space` | Valid action enum values |
-| `GET`  | `/docs` | Swagger UI |
----
-## Setup & Usage
-### Docker (recommended)
-```bash
-# Build
-docker build -t openenv-email-triage .
-# Run
-docker run -p 7860:7860 openenv-email-triage
-# Test
-curl http://localhost:7860/health
-curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d '{"task_id":"task_easy"}'
-```
-### Local Python
-```bash
-pip install -r requirements.txt
-uvicorn server:app --host 0.0.0.0 --port 7860 --reload
-```
-### Run Baseline Inference
-```bash
-export API_BASE_URL="https://api.openai.com/v1"
-export MODEL_NAME="gpt-4o-mini"
-export HF_TOKEN="your-api-key"
-python inference.py
-```
-### Run Validation
-```bash
-python validate.py
-```
----
-## Baseline Scores
-Scores produced by `gpt-4o-mini` (temperature=0.1):
-| Task | Score | Threshold | Pass |
-|------|-------|-----------|------|
-| task_easy   | 0.82 | 0.75 | ✅ |
-| task_medium | 0.67 | 0.65 | ✅ |
-| task_hard   | 0.54 | 0.55 | ❌ (close) |
-| **Overall** | **0.68** | — | — |
----
-## Environment Variables
-| Variable | Description |
-|----------|-------------|
-| `API_BASE_URL` | LLM API endpoint (OpenAI-compatible) |
-| `MODEL_NAME` | Model identifier for inference |
-| `HF_TOKEN` | Hugging Face / API key |
----
-## Project Structure
-```
-openenv-email-triage/
-├── openenv.yaml       # OpenEnv spec metadata
-├── models.py          # Pydantic typed models (Observation, Action, Reward)
-├── dataset.py         # Email dataset with ground truth labels
-├── graders.py         # Deterministic graders for all 3 tasks
-├── environment.py     # EmailTriageEnv with step()/reset()/state()
-├── server.py          # FastAPI server exposing REST endpoints
-├── inference.py       # Baseline inference script (OpenAI client)
-├── validate.py        # Pre-submission validation script
-├── app.py             # HF Spaces entry point
-├── Dockerfile         # Container definition
-├── requirements.txt   # Python dependencies
-├── static/
-│   └── index.html     # Interactive environment UI
-└── README.md          # This file
-```
----
-## License
-MIT © 2024 OpenEnv Hackathon Team
-=======
----
-title: Openenv Email Triage
-emoji: 🌖
-colorFrom: purple
-colorTo: indigo
-sdk: docker
-pinned: false
-short_description: I built an AI-powered email triage system.
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
->>>>>>> cd1f31d4e18bc9b31dacc3102c4f119c99622835

 ---
 title: OpenEnv Email Triage
+emoji: "🤖"
 sdk: docker
 app_port: 7860
 ---
 # OpenEnv Email Triage

__pycache__/dataset.cpython-312.pyc ADDED Viewed

Binary file (10.1 kB). View file

__pycache__/environment.cpython-312.pyc ADDED Viewed

Binary file (7.14 kB). View file

__pycache__/graders.cpython-312.pyc ADDED Viewed

Binary file (10.1 kB). View file

__pycache__/inference.cpython-312.pyc ADDED Viewed

Binary file (4.11 kB). View file

__pycache__/models.cpython-312.pyc ADDED Viewed

Binary file (6.72 kB). View file

__pycache__/server.cpython-312.pyc ADDED Viewed

Binary file (7.28 kB). View file

inference.py CHANGED Viewed

@@ -1,230 +1,109 @@
 #!/usr/bin/env python3
-"""
-inference.py — Baseline inference script for OpenEnv Email Triage
-Uses OpenAI client (via API_BASE_URL + MODEL_NAME) to run an LLM agent
-against all 3 tasks and produces reproducible scores.
-Required env vars:
-  API_BASE_URL  — LLM API base URL (OpenAI-compatible)
-  MODEL_NAME    — Model identifier
-  HF_TOKEN      — Hugging Face / API key  (used as openai api_key)
-Stdout format: strictly [START], [STEP], [END] as specified.
-"""
 import os
 import sys
 import json
-import time
-import logging
-from typing import Dict, Any, Optional
 from openai import OpenAI
-# ─── Environment / config ─────────────────────────────────────────────────────
 API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
-MODEL_NAME   = os.environ.get("MODEL_NAME",   "gpt-4o-mini")
-HF_TOKEN     = os.environ.get("HF_TOKEN",     os.environ.get("OPENAI_API_KEY", ""))
 if not HF_TOKEN:
-    print("[ERROR] HF_TOKEN or OPENAI_API_KEY environment variable is required.", file=sys.stderr)
-    sys.exit(1)
 client = OpenAI(api_key=HF_TOKEN, base_url=API_BASE_URL)
-# ─── Import environment directly (no HTTP required for inference) ──────────────
 sys.path.insert(0, os.path.dirname(__file__))
-from environment import EmailTriageEnv
-from models import Action, UrgencyLevel, EmailCategory, EmailAction
-logging.basicConfig(level=logging.WARNING)
-# ─── Agent prompt ─────────────────────────────────────────────────────────────
-SYSTEM_PROMPT = """You are an expert email triage assistant. For each email you receive,
-you must classify it and recommend an action.
-You MUST respond with valid JSON only — no markdown, no explanation, just the JSON object.
-Required fields:
-- urgency: one of [critical, high, medium, low, ignore]
-- category: one of [customer_complaint, sales_inquiry, internal_ops, hr, finance, spam, support, legal, pr, other]
-- action: one of [reply, forward, archive, delete, escalate, flag_review]
-- draft_reply: string (REQUIRED if action is "reply", otherwise omit or set null)
-- forward_to: string (REQUIRED if action is "forward" or "escalate", otherwise omit or set null)
-- reasoning: brief explanation of your decision
-Guidelines:
-- Mark spam/phishing as urgency=ignore, category=spam, action=delete
-- Mark production outages, legal issues, press inquiries as urgency=critical
-- Always draft a reply when action=reply (aim for 100+ words, professional tone)
-- For complaints: acknowledge, apologize, provide resolution path
-- For enterprise sales: express enthusiasm, offer to connect with sales team
 """
-def agent_decide(email_data: Dict[str, Any], task_id: str) -> Dict[str, Any]:
-    """Call LLM to decide how to triage an email."""
-    user_msg = f"""Task: {task_id}
-Email to triage:
-Subject: {email_data.get('subject', '')}
-From: {email_data.get('sender', '')} ({email_data.get('sender_domain', '')})
-Date: {email_data.get('timestamp', '')}
-Has Attachment: {email_data.get('has_attachment', False)}
-Thread Length: {email_data.get('thread_length', 1)}
-Body:
-{email_data.get('body', '')}
-Respond with JSON only."""
     try:
         response = client.chat.completions.create(
             model=MODEL_NAME,
             messages=[
                 {"role": "system", "content": SYSTEM_PROMPT},
-                {"role": "user",   "content": user_msg},
             ],
             temperature=0.1,
-            max_tokens=600,
-            response_format={"type": "json_object"},
         )
         raw = response.choices[0].message.content or "{}"
         return json.loads(raw)
-    except json.JSONDecodeError:
-        # Fallback: safe default
-        return {
-            "urgency": "medium", "category": "other", "action": "archive",
-            "draft_reply": None, "forward_to": None, "reasoning": "parse error fallback"
-        }
-    except Exception as e:
         return {
-            "urgency": "medium", "category": "other", "action": "archive",
-            "draft_reply": None, "forward_to": None, "reasoning": f"error: {e}"
         }
-def clamp_enum(value: str, enum_cls) -> str:
-    """Return value if valid, else first member."""
-    valid = {e.value for e in enum_cls}
-    return value if value in valid else list(enum_cls)[2].value
-def run_task(task_id: str) -> Dict[str, Any]:
-    """Run full episode for one task. Returns result dict."""
-    env = EmailTriageEnv()
-    obs = env.reset(task_id=task_id)
-    step_results = []
-    step_num = 0
-    while not obs.done:
-        step_num += 1
-        email_data = obs.current_email or {}
-        # Agent decides
-        decision = agent_decide(email_data, task_id)
-        # Build Action (clamp invalid enums to safe defaults)
-        urgency  = clamp_enum(str(decision.get("urgency", "medium")),  UrgencyLevel)
-        category = clamp_enum(str(decision.get("category", "other")),  EmailCategory)
-        action   = clamp_enum(str(decision.get("action", "archive")),  EmailAction)
-        act = Action(
-            urgency=UrgencyLevel(urgency),
-            category=EmailCategory(category),
-            action=EmailAction(action),
-            draft_reply=decision.get("draft_reply"),
-            forward_to=decision.get("forward_to"),
-            reasoning=decision.get("reasoning"),
-        )
-        # Step environment
-        result = env.step(act)
-        reward_val = result.reward.value
-        step_results.append(reward_val)
-        # ── [STEP] log ─────────────────────────────────────────────────────
-        print(json.dumps({
-            "type":        "[STEP]",
-            "task_id":     task_id,
-            "step":        step_num,
-            "email_id":    email_data.get("id", ""),
-            "subject":     email_data.get("subject", "")[:60],
-            "agent_action": {
-                "urgency":  urgency,
-                "category": category,
-                "action":   action,
-            },
-            "reward":      round(reward_val, 4),
-            "feedback":    result.reward.feedback[:120],
-            "done":        result.done,
-        }))
-        obs = result.observation
-    final_score = round(sum(step_results) / len(step_results), 4) if step_results else 0.0
     return {
-        "task_id":     task_id,
-        "final_score": final_score,
-        "steps":       len(step_results),
-        "step_scores": [round(s, 4) for s in step_results],
-    }
-# ─── Main ─────────────────────────────────────────────────────────────────────
-def main():
-    tasks = ["task_easy", "task_medium", "task_hard"]
-    all_results = {}
-    # ── [START] log ────────────────────────────────────────────────────────────
-    print(json.dumps({
-        "type":       "[START]",
-        "env":        "email-triage-env",
-        "version":    "1.0.0",
-        "model":      MODEL_NAME,
-        "api_base":   API_BASE_URL,
-        "tasks":      tasks,
-        "timestamp":  time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
-    }))
-    for task_id in tasks:
-        print(json.dumps({"type": "[TASK_START]", "task_id": task_id}))
-        t0 = time.time()
-        result = run_task(task_id)
-        elapsed = round(time.time() - t0, 2)
-        result["elapsed_seconds"] = elapsed
-        all_results[task_id] = result
-        print(json.dumps({
-            "type":          "[TASK_END]",
-            "task_id":       task_id,
-            "final_score":   result["final_score"],
-            "steps":         result["steps"],
-            "elapsed":       elapsed,
-        }))
-    overall = round(
-        sum(r["final_score"] for r in all_results.values()) / len(all_results), 4
-    )
-    # ── [END] log ──────────────────────────────────────────────────────────────
-    print(json.dumps({
-        "type":          "[END]",
-        "overall_score": overall,
-        "task_scores": {
-            t: all_results[t]["final_score"] for t in tasks
-        },
-        "total_steps":   sum(r["steps"] for r in all_results.values()),
-        "model":         MODEL_NAME,
-        "timestamp":     time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
-        "status":        "success",
-    }))
-    return overall
-if __name__ == "__main__":
-    score = main()
-    sys.exit(0 if score >= 0.0 else 1)

 #!/usr/bin/env python3
 import os
 import sys
 import json
+from typing import Dict, Any
+from fastapi import FastAPI
+from pydantic import BaseModel
 from openai import OpenAI
+# ─── FastAPI App ─────────────────────────────────────────
+app = FastAPI()
+# ─── Environment Variables ───────────────────────────────
 API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME   = os.environ.get("MODEL_NAME", "gpt-4o-mini")
+HF_TOKEN     = os.environ.get("HF_TOKEN", os.environ.get("OPENAI_API_KEY", ""))
+# ❗ Prevent crash if token missing
 if not HF_TOKEN:
+    HF_TOKEN = "dummy-key"
 client = OpenAI(api_key=HF_TOKEN, base_url=API_BASE_URL)
+# ─── Safe Imports (IMPORTANT FIX) ─────────────────────────
 sys.path.insert(0, os.path.dirname(__file__))
+try:
+    from models import UrgencyLevel, EmailCategory, EmailAction
+except Exception:
+    # fallback if import fails (prevents uvicorn crash)
+    UrgencyLevel = EmailCategory = EmailAction = None
+# ─── Prompt ──────────────────────────────────────────────
+SYSTEM_PROMPT = """You are an expert email triage assistant.
+Return ONLY valid JSON with:
+- urgency
+- category
+- action
+- draft_reply (if reply)
+- forward_to (if forward/escalate)
+- reasoning
 """
+# ─── Request Schema ──────────────────────────────────────
+class InputData(BaseModel):
+    input: Dict[str, Any]
+# ─── Helper Function ─────────────────────────────────────
+def clamp_enum(value: str, enum_cls):
+    if enum_cls is None:
+        return value  # fallback if enums not available
+    valid = {e.value for e in enum_cls}
+    return value if value in valid else list(enum_cls)[0].value
+# ─── Agent Logic ─────────────────────────────────────────
+def agent_decide(email_data: Dict[str, Any]) -> Dict[str, Any]:
     try:
         response = client.chat.completions.create(
             model=MODEL_NAME,
             messages=[
                 {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": json.dumps(email_data)},
             ],
             temperature=0.1,
         )
         raw = response.choices[0].message.content or "{}"
         return json.loads(raw)
+    except Exception:
         return {
+            "urgency": "medium",
+            "category": "other",
+            "action": "archive",
+            "draft_reply": None,
+            "forward_to": None,
+            "reasoning": "fallback"
         }
+# ─── REQUIRED ENDPOINTS ──────────────────────────────────
+# ✅ FIXES YOUR ERROR
+@app.post("/reset")
+def reset():
+    return {"status": "reset successful"}
+@app.post("/predict")
+def predict(data: InputData):
+    email_data = data.input
+    decision = agent_decide(email_data)
+    urgency  = clamp_enum(decision.get("urgency", "medium"), UrgencyLevel)
+    category = clamp_enum(decision.get("category", "other"), EmailCategory)
+    action   = clamp_enum(decision.get("action", "archive"), EmailAction)
     return {
+        "urgency": urgency,
+        "category": category,
+        "action": action,
+        "draft_reply": decision.get("draft_reply"),
+        "forward_to": decision.get("forward_to"),
+        "reasoning": decision.get("reasoning", "")
+    }