DataDetective / README.md
Viani's picture
Deploy DataDetective: 9-task business investigation environment
bcd8636 verified
---
title: DataDetective
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
---
# DataDetective β€” Business Incident Investigation Environment
An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where AI
agents investigate real-world business incidents by querying a SQL database,
analysing patterns, and submitting root-cause findings.
## What It Does
The agent is given a realistic company database (TechMart β€” a mid-size B2B+B2C
electronics retailer) and a business problem to investigate. It can execute
SQL queries to explore the data, then submit a final written analysis. The
environment automatically grades the analysis based on whether key findings
were identified. Each task has 5 grading criteria worth 0.20 each, enabling
meaningful partial credit.
## Tasks (Easy β†’ Hard)
| # | Task ID | Difficulty | Scenario |
|---|---------|-----------|----------|
| 1 | `orders_drop` | Easy | Order volume dropped sharply after promo ended |
| 2 | `returns_spike` | Medium | Product returns spiking in West region (defective SKU) |
| 3 | `supplier_quality` | Medium | Supplier-level quality crisis across multiple products |
| 4 | `shipping_delay` | Medium-Hard | Customer satisfaction crisis from carrier delays |
| 5 | `inventory_stockout` | Medium-Hard | Regional sales underperformance from warehouse stockout |
| 6 | `customer_churn` | Hard | Active customer decline across segments post price hike |
| 7 | `revenue_paradox` | Hard | Revenue up but profit down β€” multi-causal margin erosion |
| 8 | `fraud_detection` | Hard | Coordinated fraud ring with fake accounts |
| 9 | `repeat_purchase_decline` | Hard | Repeat purchase collapse masked by acquisition spend |
Each task is scored 0.0 – 1.0 based on specific findings the agent must discover.
## Action / Observation Spaces
### Action (`DataDetectiveAction`)
| Field | Type | Description |
|-------|------|-------------|
| `action_type` | `str` | `"query"` to run SQL, `"answer"` to submit findings |
| `content` | `str` | SQL query string or final analysis text |
### Observation (`DataDetectiveObservation`)
| Field | Type | Description |
|-------|------|-------------|
| `output` | `str` | Query results (formatted table) or feedback |
| `task_description` | `str` | The investigation task |
| `schema_info` | `str` | Database schema (shown at reset) |
| `step_number` | `int` | Current step |
| `max_steps` | `int` | Maximum steps allowed (30) |
| `message` | `str` | Status message |
## Database Schema (11 Tables)
The TechMart database includes:
| Table | Description |
|-------|-------------|
| `customers` | Customer demographics (region, segment, signup date) |
| `products` | Product catalog (category, price, cost, supplier) |
| `orders` | Order history with totals |
| `order_items` | Line items with quantity and unit price |
| `returns` | Product returns with reasons and refund amounts |
| `promotions` | Promotional campaigns with discount percentages |
| `price_changes` | Historical price adjustments |
| `shipping` | Shipment records with carrier and delivery dates |
| `support_tickets` | Customer support tickets by category and priority |
| `inventory_log` | Daily stock levels per product per warehouse region |
| `marketing_spend` | Daily marketing spend by channel, campaign, and region |
All data is synthetic, generated in-memory (no external databases required).
## Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Start the Server
```bash
uvicorn server.app:app --host 0.0.0.0 --port 7860
```
### 3. Health Check
```bash
curl http://localhost:7860/health
```
### 4. Run the Baseline Agent
```bash
API_BASE_URL="https://router.huggingface.co/v1" \
MODEL_NAME="gpt-4.1-mini" \
HF_TOKEN="hf_..." \
python inference.py
```
### 5. Docker
```bash
docker build -t data-detective .
docker run -p 7860:7860 data-detective
```
## Environment Variables
| Env Var | Purpose | Required |
|---------|---------|----------|
| `API_BASE_URL` | LLM endpoint URL | Yes |
| `MODEL_NAME` | Model identifier | Yes |
| `HF_TOKEN` | API key / HF token | Yes |
| `ENV_URL` | Environment server URL | No (default: `http://localhost:7860`) |
## How Grading Works
Each task has an automated grader that checks the agent's final answer for
specific key findings (keywords, patterns, named entities). Each task has 5
grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit
is awarded for each finding discovered.
## Setup Requirements
- Python 3.10+
- No GPU required
- Runs within 2 vCPU / 8 GB memory
- All data is generated in-memory (no external databases)