Spaces:
Sleeping
Sleeping
| title: DataDetective | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 7860 | |
| # DataDetective β Business Incident Investigation Environment | |
| An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where AI | |
| agents investigate real-world business incidents by querying a SQL database, | |
| analysing patterns, and submitting root-cause findings. | |
| ## What It Does | |
| The agent is given a realistic company database (TechMart β a mid-size B2B+B2C | |
| electronics retailer) and a business problem to investigate. It can execute | |
| SQL queries to explore the data, then submit a final written analysis. The | |
| environment automatically grades the analysis based on whether key findings | |
| were identified. Each task has 5 grading criteria worth 0.20 each, enabling | |
| meaningful partial credit. | |
| ## Tasks (Easy β Hard) | |
| | # | Task ID | Difficulty | Scenario | | |
| |---|---------|-----------|----------| | |
| | 1 | `orders_drop` | Easy | Order volume dropped sharply after promo ended | | |
| | 2 | `returns_spike` | Medium | Product returns spiking in West region (defective SKU) | | |
| | 3 | `supplier_quality` | Medium | Supplier-level quality crisis across multiple products | | |
| | 4 | `shipping_delay` | Medium-Hard | Customer satisfaction crisis from carrier delays | | |
| | 5 | `inventory_stockout` | Medium-Hard | Regional sales underperformance from warehouse stockout | | |
| | 6 | `customer_churn` | Hard | Active customer decline across segments post price hike | | |
| | 7 | `revenue_paradox` | Hard | Revenue up but profit down β multi-causal margin erosion | | |
| | 8 | `fraud_detection` | Hard | Coordinated fraud ring with fake accounts | | |
| | 9 | `repeat_purchase_decline` | Hard | Repeat purchase collapse masked by acquisition spend | | |
| Each task is scored 0.0 β 1.0 based on specific findings the agent must discover. | |
| ## Action / Observation Spaces | |
| ### Action (`DataDetectiveAction`) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `action_type` | `str` | `"query"` to run SQL, `"answer"` to submit findings | | |
| | `content` | `str` | SQL query string or final analysis text | | |
| ### Observation (`DataDetectiveObservation`) | |
| | Field | Type | Description | | |
| |-------|------|-------------| | |
| | `output` | `str` | Query results (formatted table) or feedback | | |
| | `task_description` | `str` | The investigation task | | |
| | `schema_info` | `str` | Database schema (shown at reset) | | |
| | `step_number` | `int` | Current step | | |
| | `max_steps` | `int` | Maximum steps allowed (30) | | |
| | `message` | `str` | Status message | | |
| ## Database Schema (11 Tables) | |
| The TechMart database includes: | |
| | Table | Description | | |
| |-------|-------------| | |
| | `customers` | Customer demographics (region, segment, signup date) | | |
| | `products` | Product catalog (category, price, cost, supplier) | | |
| | `orders` | Order history with totals | | |
| | `order_items` | Line items with quantity and unit price | | |
| | `returns` | Product returns with reasons and refund amounts | | |
| | `promotions` | Promotional campaigns with discount percentages | | |
| | `price_changes` | Historical price adjustments | | |
| | `shipping` | Shipment records with carrier and delivery dates | | |
| | `support_tickets` | Customer support tickets by category and priority | | |
| | `inventory_log` | Daily stock levels per product per warehouse region | | |
| | `marketing_spend` | Daily marketing spend by channel, campaign, and region | | |
| All data is synthetic, generated in-memory (no external databases required). | |
| ## Quick Start | |
| ### 1. Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Start the Server | |
| ```bash | |
| uvicorn server.app:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| ### 3. Health Check | |
| ```bash | |
| curl http://localhost:7860/health | |
| ``` | |
| ### 4. Run the Baseline Agent | |
| ```bash | |
| API_BASE_URL="https://router.huggingface.co/v1" \ | |
| MODEL_NAME="gpt-4.1-mini" \ | |
| HF_TOKEN="hf_..." \ | |
| python inference.py | |
| ``` | |
| ### 5. Docker | |
| ```bash | |
| docker build -t data-detective . | |
| docker run -p 7860:7860 data-detective | |
| ``` | |
| ## Environment Variables | |
| | Env Var | Purpose | Required | | |
| |---------|---------|----------| | |
| | `API_BASE_URL` | LLM endpoint URL | Yes | | |
| | `MODEL_NAME` | Model identifier | Yes | | |
| | `HF_TOKEN` | API key / HF token | Yes | | |
| | `ENV_URL` | Environment server URL | No (default: `http://localhost:7860`) | | |
| ## How Grading Works | |
| Each task has an automated grader that checks the agent's final answer for | |
| specific key findings (keywords, patterns, named entities). Each task has 5 | |
| grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit | |
| is awarded for each finding discovered. | |
| ## Setup Requirements | |
| - Python 3.10+ | |
| - No GPU required | |
| - Runs within 2 vCPU / 8 GB memory | |
| - All data is generated in-memory (no external databases) | |