Spaces:
Sleeping
Sleeping
File size: 4,658 Bytes
5e4b568 bcd8636 5e4b568 bcd8636 5e4b568 bcd8636 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | ---
title: DataDetective
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
---
# DataDetective β Business Incident Investigation Environment
An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where AI
agents investigate real-world business incidents by querying a SQL database,
analysing patterns, and submitting root-cause findings.
## What It Does
The agent is given a realistic company database (TechMart β a mid-size B2B+B2C
electronics retailer) and a business problem to investigate. It can execute
SQL queries to explore the data, then submit a final written analysis. The
environment automatically grades the analysis based on whether key findings
were identified. Each task has 5 grading criteria worth 0.20 each, enabling
meaningful partial credit.
## Tasks (Easy β Hard)
| # | Task ID | Difficulty | Scenario |
|---|---------|-----------|----------|
| 1 | `orders_drop` | Easy | Order volume dropped sharply after promo ended |
| 2 | `returns_spike` | Medium | Product returns spiking in West region (defective SKU) |
| 3 | `supplier_quality` | Medium | Supplier-level quality crisis across multiple products |
| 4 | `shipping_delay` | Medium-Hard | Customer satisfaction crisis from carrier delays |
| 5 | `inventory_stockout` | Medium-Hard | Regional sales underperformance from warehouse stockout |
| 6 | `customer_churn` | Hard | Active customer decline across segments post price hike |
| 7 | `revenue_paradox` | Hard | Revenue up but profit down β multi-causal margin erosion |
| 8 | `fraud_detection` | Hard | Coordinated fraud ring with fake accounts |
| 9 | `repeat_purchase_decline` | Hard | Repeat purchase collapse masked by acquisition spend |
Each task is scored 0.0 β 1.0 based on specific findings the agent must discover.
## Action / Observation Spaces
### Action (`DataDetectiveAction`)
| Field | Type | Description |
|-------|------|-------------|
| `action_type` | `str` | `"query"` to run SQL, `"answer"` to submit findings |
| `content` | `str` | SQL query string or final analysis text |
### Observation (`DataDetectiveObservation`)
| Field | Type | Description |
|-------|------|-------------|
| `output` | `str` | Query results (formatted table) or feedback |
| `task_description` | `str` | The investigation task |
| `schema_info` | `str` | Database schema (shown at reset) |
| `step_number` | `int` | Current step |
| `max_steps` | `int` | Maximum steps allowed (30) |
| `message` | `str` | Status message |
## Database Schema (11 Tables)
The TechMart database includes:
| Table | Description |
|-------|-------------|
| `customers` | Customer demographics (region, segment, signup date) |
| `products` | Product catalog (category, price, cost, supplier) |
| `orders` | Order history with totals |
| `order_items` | Line items with quantity and unit price |
| `returns` | Product returns with reasons and refund amounts |
| `promotions` | Promotional campaigns with discount percentages |
| `price_changes` | Historical price adjustments |
| `shipping` | Shipment records with carrier and delivery dates |
| `support_tickets` | Customer support tickets by category and priority |
| `inventory_log` | Daily stock levels per product per warehouse region |
| `marketing_spend` | Daily marketing spend by channel, campaign, and region |
All data is synthetic, generated in-memory (no external databases required).
## Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Start the Server
```bash
uvicorn server.app:app --host 0.0.0.0 --port 7860
```
### 3. Health Check
```bash
curl http://localhost:7860/health
```
### 4. Run the Baseline Agent
```bash
API_BASE_URL="https://router.huggingface.co/v1" \
MODEL_NAME="gpt-4.1-mini" \
HF_TOKEN="hf_..." \
python inference.py
```
### 5. Docker
```bash
docker build -t data-detective .
docker run -p 7860:7860 data-detective
```
## Environment Variables
| Env Var | Purpose | Required |
|---------|---------|----------|
| `API_BASE_URL` | LLM endpoint URL | Yes |
| `MODEL_NAME` | Model identifier | Yes |
| `HF_TOKEN` | API key / HF token | Yes |
| `ENV_URL` | Environment server URL | No (default: `http://localhost:7860`) |
## How Grading Works
Each task has an automated grader that checks the agent's final answer for
specific key findings (keywords, patterns, named entities). Each task has 5
grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit
is awarded for each finding discovered.
## Setup Requirements
- Python 3.10+
- No GPU required
- Runs within 2 vCPU / 8 GB memory
- All data is generated in-memory (no external databases)
|