Spaces:

parthpethia
/

Meta-Hackathon

Sleeping

App Files Files Community

parthpethia commited on Apr 11

Commit

c216bd9

1 Parent(s): fee8744

Add HF Spaces metadata to README

Browse files

Files changed (1) hide show

README.md +78 -299

README.md CHANGED Viewed

@@ -1,3 +1,12 @@
 # Email Triage OpenEnv
 A complete, production-ready OpenEnv environment for training AI agents to classify and route emails in real-world triage scenarios.
@@ -23,339 +32,109 @@ The environment provides realistic synthetic email data with varying complexity
 - ✅ **Docker Ready**: Single command deployment to Hugging Face Spaces
 - ✅ **Synthetic Data**: Realistic email generation with metadata and ground truth labels
-## Task Descriptions
-### Task 1: Spam Detection (Easy)
-**Goal**: Correctly classify 8/10 emails as spam or legitimate
-- **Dataset**: 10 synthetic emails with clear spam indicators (70% high signal, 30% borderline)
-- **Actions**: Classify as SPAM or NORMAL only
-- **Grading**: Accuracy score = correct_classifications / 10
-- **Expected Baseline**: ~0.80-0.85
-- **Characteristics**:
-  - Well-separated spam patterns
-  - Limited routing complexity
-  - Binary classification
-### Task 2: Multi-Class Routing (Medium)
-**Goal**: Classify 12 emails into 4 categories AND route 8 to correct teams
-- **Dataset**: 12 diverse emails covering spam, normal, billing, urgent
-- **Categories**: SPAM, NORMAL, URGENT, BILLING
-- **Actions**: Classify (4 options) + Route (support/sales/billing/none) + Priority (0-3)
-- **Grading**: 50% classification accuracy + 50% routing accuracy
-- **Expected Baseline**: ~0.70-0.75
-- **Characteristics**:
-  - Mixed-difficulty examples
-  - Multi-team coordination
-  - SLA-aware routing
-### Task 3: Context-Aware Triage (Hard)
-**Goal**: Manage 20 emails with rich context, escalation chains, and VIP handling
-- **Dataset**: 20 emails with VIP customer flags, SLA hours, and context signals
-- **Actions**: Full classification + routing + priority setting
-- **Grading**: Weighted score:
-  - Classification accuracy: 50%
-  - Priority accuracy: 30%
-  - Routing accuracy: 20%
-- **Expected Baseline**: ~0.60-0.65
-- **Characteristics**:
-  - VIP customer detection
-  - Time-sensitive escalation
-  - Complex context reasoning
-## Installation
-### Local Development
-```bash
-# Clone and navigate to the project
-cd meta-hackathon
-# Create virtual environment
-python3 -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
-# Install dependencies
-pip install -r requirements.txt
-```
-### Docker
 ```bash
-# Build image
-docker build -t email-triage:latest .
-# Run locally
-docker run -p 7860:7860 email-triage:latest
-# API is now available at http://localhost:7860
-```
-## API Specification
-### Observation Space
-```json
-{
-  "current_email": {
-    "email_id": "string",
-    "subject": "string",
-    "body": "string",
-    "sender_domain": "string",
-    "timestamp": "ISO8601 datetime",
-    "is_vip_sender": "boolean",
-    "sla_hours": "integer or null"
-  },
-  "inbox_state": {
-    "pending": "count of unprocessed emails",
-    "spam": "count of detected spam",
-    "urgent": "count of urgent emails",
-    "processed": "count of processed emails"
-  },
-  "step_count": "integer",
-  "task_name": "string"
-}
-```
-### Action Space
-```json
 {
-  "classification": "one of: spam, normal, urgent, billing",
-  "team": "one of: support, sales, billing, none",
-  "priority": "integer 0-3"
 }
-```
-### Reward
-- **Type**: Float [0.0, 1.0]
-- **Breakdown**:
-  - Correct classification: +0.4 (or -0.1 if wrong)
-  - Correct routing: +0.3 (or -0.15 if wrong)
-  - Priority accuracy: +0.3 \* (1 - |predicted - actual| / 3)
-## Usage Examples
-### Python (Direct Environment)
-```python
-from environment import EmailTriageEnv
-# Create environment
-env = EmailTriageEnv(task_name="spam_detection")
-# Reset and get initial observation
-obs = env.reset()
-# Step through emails
-from environment.types import Action, EmailCategory, Team
-for _ in range(10):
-    action = Action(
-        classification=EmailCategory.NORMAL,
-        team=Team.SUPPORT,
-        priority=1
-    )
-    obs, reward, done, info = env.step(action)
-    print(f"Reward: {reward.value}, Done: {done}")
-    if done:
-        break
-# Get final score
-final_score = env._compute_final_score()
-print(f"Final Score: {final_score:.4f}")
-```
-### HTTP REST API
-```bash
-# Health check
-curl http://localhost:7860/health
-# Reset environment
-curl -X POST http://localhost:7860/reset?task=spam_detection
-# Step with action
-curl -X POST http://localhost:7860/step?task=spam_detection \
-  -H "Content-Type: application/json" \
-  -d '{
-    "classification": "normal",
-    "team": "support",
-    "priority": 1
-  }'
 # Get current state
-curl http://localhost:7860/state?task=spam_detection
-# List available tasks
-curl http://localhost:7860/tasks
 # Describe action/observation spaces
-curl http://localhost:7860/state-describe?task=spam_detection
-```
-## Running Baseline Inference
-The baseline uses GPT-4o mini to process all three tasks.
-### Setup
-```bash
-# Set environment variables
-export OPENAI_API_KEY="sk-..."
-export MODEL_NAME="gpt-4o-mini"
-export API_BASE_URL="https://api.openai.com/v1"  # Optional, defaults to OpenAI
-# Run inference
-python inference.py
-```
-### Expected Output
-The inference script outputs structured logs in `[START]`, `[STEP]`, `[END]` format:
-```
-[CONFIG] model=gpt-4o-mini, api_base=https://api.openai.com/v1
-[START] spam_detection
-[STEP] {"step_id": 1, "observation": {...}, "action": {...}, "reward": 0.85, "done": false}
-[STEP] {"step_id": 2, "observation": {...}, "action": {...}, "reward": 0.72, "done": false}
-...
-[END] {"task": "spam_detection", "final_score": 0.82, "steps": 10, "emails_processed": 10}
-[RESULT] spam_detection: 0.8200
-[START] multi_class_routing
-...
-[END] {"task": "multi_class_routing", "final_score": 0.71, "steps": 12, "emails_processed": 12}
-[RESULT] multi_class_routing: 0.7100
-[START] context_aware_triage
-...
-[END] {"task": "context_aware_triage", "final_score": 0.62, "steps": 20, "emails_processed": 20}
-[RESULT] context_aware_triage: 0.6200
-[SUMMARY]
-Average Score: 0.7167
-  spam_detection: 0.8200
-  multi_class_routing: 0.7100
-  context_aware_triage: 0.6200
 ```
-### Baseline Scores (Expected Results)
-| Task                 | Difficulty | Expected Score | Notes                           |
-| -------------------- | ---------- | -------------- | ------------------------------- |
-| Spam Detection       | Easy       | 0.80-0.85      | Clear patterns, high signal     |
-| Multi-Class Routing  | Medium     | 0.70-0.75      | Mixed signals, requires context |
-| Context-Aware Triage | Hard       | 0.60-0.70      | Complex reasoning, VIP handling |
-| **Average**          | **All**    | **0.70-0.77**  | **Overall baseline**            |
-## Deployment to Hugging Face Spaces
-### Steps
-1. Create a new Space on Hugging Face (https://huggingface.co/spaces)
-2. Select "Docker runtime"
-3. Push code to the Space repository:
-   ```bash
-   git push https://huggingface.co/spaces/{username}/email-triage main
-   ```
-4. Dockerfile automatically builds and deploys
-5. Access API at: `https://{username}-email-triage.hf.space`
-### Verification
-```bash
-# Test deployment
-curl https://{username}-email-triage.hf.space/health
-curl -X POST https://{username}-email-triage.hf.space/reset
-```
-## Project Structure
 ```
-meta-hackathon/
 ├── environment/
-│   ├── __init__.py         # Package exports
-│   ├── types.py            # Pydantic models (Observation, Action, etc.)
-│   ├── env.py              # Main EmailTriageEnv class
-│   ├── data_generator.py   # Synthetic email generation
-│   └── graders.py          # Task graders and reward computation
-├── app.py                  # Flask REST API server
-├── inference.py            # Baseline inference script (GPT-4o mini)
-├── openenv.yaml            # OpenEnv specification
-├── Dockerfile              # Container configuration
-├── requirements.txt        # Python dependencies
-└── README.md              # This file
 ```
-## Key Implementation Details
-### Reward Function Design
-The reward function provides meaningful signals throughout the episode:
-```python
-# Per-step reward combines three signals:
-reward = (
-    0.4 * classification_correct +      # 40% weight
-    0.3 * routing_correct +             # 30% weight
-    0.3 * priority_scaled_accuracy      # 30% weight
-)
-# All components in [0, 1], final reward clamped to [0, 1]
 ```
-### Synthetic Data Generation
-- **Realistic patterns**: Spam indicators (urgency, capitalization), domain reputation
-- **Graded difficulty**: 70% clear patterns (easy), 30% edge cases (medium)
-- **Metadata**: VIP flags, SLA hours, sender domains for context reasoning
-- **Reproducible**: Seeded random generator for consistent datasets
-### Environment API
-Fully compliant with OpenEnv specification:
-- `reset()` → Initial observation
-- `step(action)` → (observation, reward, done, info)
-- `state()` → Full system state snapshot
-- `describe_action_space()` / `describe_observation_space()` → Space schemas
-## Performance Considerations
-- **Runtime**: ~15-18 minutes for full baseline (3 tasks × ~5-6 min each with API latency)
-- **Memory**: ~200MB resident (environment + Flask server)
-- **Scalability**: Supports 2 vCPU, 8GB RAM minimum (tested)
-- **Parallelization**: API supports concurrent requests (stateless per task)
-## Testing
-```bash
-# Run environment locally
-python -c "from environment import EmailTriageEnv; env = EmailTriageEnv('spam_detection'); obs = env.reset(); print('OK')"
-# Test Flask API
-python app.py &
-curl http://localhost:7860/health
-curl -X POST http://localhost:7860/reset?task=spam_detection
-# Validate OpenEnv spec
-# (Submit to official validator tool)
-```
-## License
-MIT
-## Support
-For questions or issues:
-1. Check the full API reference in `openenv.yaml`
-2. Review example usage in `inference.py`
-3. Examine data generation in `data_generator.py`

+---
+title: Email Triage OpenEnv
+emoji: 📧
+colorFrom: blue
+colorTo: green
+sdk: docker
+port: 7860
+---
 # Email Triage OpenEnv
 A complete, production-ready OpenEnv environment for training AI agents to classify and route emails in real-world triage scenarios.
 - ✅ **Docker Ready**: Single command deployment to Hugging Face Spaces
 - ✅ **Synthetic Data**: Realistic email generation with metadata and ground truth labels
+## Quick Start
+### API Endpoints
+The Space provides these endpoints on port 7860:
 ```bash
+# Health check
+GET /health
+# Get available tasks
+GET /tasks
+# Reset environment for a task
+POST /reset?task=spam_detection
+# Step the environment with an action
+POST /step?task=spam_detection
+Content-Type: application/json
 {
+  "classification": "spam",
+  "team": "none",
+  "priority": 0
 }
 # Get current state
+GET /state?task=spam_detection
 # Describe action/observation spaces
+GET /state-describe?task=spam_detection
 ```
+## Tasks
+### Task 1: Spam Detection (Easy)
+- **Goal**: Correctly classify 10 emails as spam or legitimate
+- **Expected Score**: ~0.80-0.85
+- **Difficulty**: Easy - clear spam patterns
+### Task 2: Multi-Class Routing (Medium)
+- **Goal**: Classify 12 emails into 4 categories and route to correct teams
+- **Expected Score**: ~0.70-0.75
+- **Difficulty**: Medium - requires multi-class classification and routing
+### Task 3: Context-Aware Triage (Hard)
+- **Goal**: Handle 20 emails with VIP customers, SLAs, and escalations
+- **Expected Score**: ~0.60-0.70
+- **Difficulty**: Hard - complex context with weighted scoring
+## Environment Structure
 ```
 ├── environment/
+│   ├── env.py           # Main EmailTriageEnv class
+│   ├── types.py         # Pydantic models (Observation, Action, Reward)
+│   ├── data_generator.py # Synthetic email dataset
+│   ├── graders.py       # Task-specific graders
+│   └── __init__.py
+├── app.py               # Flask REST API
+├── inference.py         # Baseline inference script
+├── openenv.yaml         # OpenEnv specification
+├── Dockerfile           # Docker configuration
+├── requirements.txt     # Python dependencies
+└── README.md           # This file
 ```
+## Running Locally
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Start Flask app
+python app.py
+# In another terminal, run inference baseline
+OPENAI_API_KEY=your_key python inference.py
 ```
+## Deployment
+This Space is already deployed on Hugging Face! The Docker image builds automatically from the Dockerfile and serves the Flask API on port 7860.
+## OpenEnv Specification
+This environment fully implements the OpenEnv specification:
+- **Observation Space**: Email content, sender info, inbox state
+- **Action Space**: Classification (4 categories), Team routing (4 options), Priority (0-3)
+- **Reward Space**: Continuous [0.0, 1.0] with breakdown of classification/routing/priority scores
+- **API**: `reset()`, `step(action)`, `state()` endpoints
+## Documentation
+For more details, see:
+- `START_HERE.md` - Getting started guide
+- `DEPLOYMENT_CHECKLIST.md` - Pre-submission checklist
+- `VALIDATION_GUIDE.md` - Testing and validation
+- `FINAL_VALIDATION_REPORT.md` - Full validation results
+---
+**Status**: ✅ Production Ready
+**OpenEnv Compliance**: ✅ 100%
+**All Tests**: ✅ Passing
+**Ready for Submission**: ✅ Yes