Spaces:

Viani
/

DataDetective

Sleeping

App Files Files Community

DataDetective / README.md

Viani

Deploy DataDetective: 9-task business investigation environment

bcd8636 verified about 1 month ago

preview code

raw

history blame contribute delete

4.66 kB

	---
	title: DataDetective
	emoji: 🔍
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 7860
	---

	# DataDetective — Business Incident Investigation Environment

	An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where AI
	agents investigate real-world business incidents by querying a SQL database,
	analysing patterns, and submitting root-cause findings.

	## What It Does

	The agent is given a realistic company database (TechMart — a mid-size B2B+B2C
	electronics retailer) and a business problem to investigate. It can execute
	SQL queries to explore the data, then submit a final written analysis. The
	environment automatically grades the analysis based on whether key findings
	were identified. Each task has 5 grading criteria worth 0.20 each, enabling
	meaningful partial credit.

	## Tasks (Easy → Hard)

	\| # \| Task ID \| Difficulty \| Scenario \|
	\|---\|---------\|-----------\|----------\|
	\| 1 \| `orders_drop` \| Easy \| Order volume dropped sharply after promo ended \|
	\| 2 \| `returns_spike` \| Medium \| Product returns spiking in West region (defective SKU) \|
	\| 3 \| `supplier_quality` \| Medium \| Supplier-level quality crisis across multiple products \|
	\| 4 \| `shipping_delay` \| Medium-Hard \| Customer satisfaction crisis from carrier delays \|
	\| 5 \| `inventory_stockout` \| Medium-Hard \| Regional sales underperformance from warehouse stockout \|
	\| 6 \| `customer_churn` \| Hard \| Active customer decline across segments post price hike \|
	\| 7 \| `revenue_paradox` \| Hard \| Revenue up but profit down — multi-causal margin erosion \|
	\| 8 \| `fraud_detection` \| Hard \| Coordinated fraud ring with fake accounts \|
	\| 9 \| `repeat_purchase_decline` \| Hard \| Repeat purchase collapse masked by acquisition spend \|

	Each task is scored 0.0 – 1.0 based on specific findings the agent must discover.

	## Action / Observation Spaces

	### Action (`DataDetectiveAction`)

	\| Field \| Type \| Description \|
	\|-------\|------\|-------------\|
	\| `action_type` \| `str` \| `"query"` to run SQL, `"answer"` to submit findings \|
	\| `content` \| `str` \| SQL query string or final analysis text \|

	### Observation (`DataDetectiveObservation`)

	\| Field \| Type \| Description \|
	\|-------\|------\|-------------\|
	\| `output` \| `str` \| Query results (formatted table) or feedback \|
	\| `task_description` \| `str` \| The investigation task \|
	\| `schema_info` \| `str` \| Database schema (shown at reset) \|
	\| `step_number` \| `int` \| Current step \|
	\| `max_steps` \| `int` \| Maximum steps allowed (30) \|
	\| `message` \| `str` \| Status message \|

	## Database Schema (11 Tables)

	The TechMart database includes:

	\| Table \| Description \|
	\|-------\|-------------\|
	\| `customers` \| Customer demographics (region, segment, signup date) \|
	\| `products` \| Product catalog (category, price, cost, supplier) \|
	\| `orders` \| Order history with totals \|
	\| `order_items` \| Line items with quantity and unit price \|
	\| `returns` \| Product returns with reasons and refund amounts \|
	\| `promotions` \| Promotional campaigns with discount percentages \|
	\| `price_changes` \| Historical price adjustments \|
	\| `shipping` \| Shipment records with carrier and delivery dates \|
	\| `support_tickets` \| Customer support tickets by category and priority \|
	\| `inventory_log` \| Daily stock levels per product per warehouse region \|
	\| `marketing_spend` \| Daily marketing spend by channel, campaign, and region \|

	All data is synthetic, generated in-memory (no external databases required).

	## Quick Start

	### 1. Install Dependencies

	```bash
	pip install -r requirements.txt
	```

	### 2. Start the Server

	```bash
	uvicorn server.app:app --host 0.0.0.0 --port 7860
	```

	### 3. Health Check

	```bash
	curl http://localhost:7860/health
	```

	### 4. Run the Baseline Agent

	```bash
	API_BASE_URL="https://router.huggingface.co/v1" \
	MODEL_NAME="gpt-4.1-mini" \
	HF_TOKEN="hf_..." \
	python inference.py
	```

	### 5. Docker

	```bash
	docker build -t data-detective .
	docker run -p 7860:7860 data-detective
	```

	## Environment Variables

	\| Env Var \| Purpose \| Required \|
	\|---------\|---------\|----------\|
	\| `API_BASE_URL` \| LLM endpoint URL \| Yes \|
	\| `MODEL_NAME` \| Model identifier \| Yes \|
	\| `HF_TOKEN` \| API key / HF token \| Yes \|
	\| `ENV_URL` \| Environment server URL \| No (default: `http://localhost:7860`) \|

	## How Grading Works

	Each task has an automated grader that checks the agent's final answer for
	specific key findings (keywords, patterns, named entities). Each task has 5
	grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit
	is awarded for each finding discovered.

	## Setup Requirements

	- Python 3.10+
	- No GPU required
	- Runs within 2 vCPU / 8 GB memory
	- All data is generated in-memory (no external databases)