Spaces:

Mohammed-Altaf
/

DataAnalysis_Env

Sleeping

App Files Files Community

DataAnalysis_Env / README.md

Mohammed-Altaf

updated readme and baseline.py

7fd7757 about 2 months ago

preview code

raw

history blame contribute delete

7.51 kB

	---
	title: Data Analysis Agent Environment
	emoji: 📊
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	---

	# Data Analysis Agent Environment

	An OpenEnv-compliant RL environment for training and evaluating data analysis agents. Agents execute pandas code against a business dataset to answer analytical questions, graded by deterministic programmatic graders.

	## Motivation

	Data analysis is a universal real-world task. Every business needs analysts who can query datasets, compute metrics, and extract insights. This environment lets RL agents practice that exact workflow — explore a dataset with code, then submit a precise answer — with automatic scoring.

	## Action & Observation Spaces

	### Action (`DataAction`)

	\| Field \| Type \| Description \|
	\|---\|---\|---\|
	\| `action_type` \| `"execute_code"` or `"submit_answer"` \| What the agent wants to do \|
	\| `code` \| `str` (optional) \| Python/pandas code to execute \|
	\| `answer` \| `str` (optional) \| Final answer to submit for grading \|

	### Observation (`DataObservation`)

	\| Field \| Type \| Description \|
	\|---\|---\|---\|
	\| `output` \| `str` \| Stdout from code execution or environment messages \|
	\| `success` \| `bool` \| Whether the action succeeded \|
	\| `error` \| `str` (optional) \| Error message if action failed \|
	\| `task_description` \| `str` \| The question to answer (set on reset) \|
	\| `dataset_info` \| `str` \| Dataset schema summary (set on reset) \|
	\| `done` \| `bool` \| Whether the episode is over \|
	\| `reward` \| `float` \| Step reward \|

	### State (`DataState`)

	\| Field \| Type \| Description \|
	\|---\|---\|---\|
	\| `episode_id` \| `str` \| Unique episode identifier \|
	\| `step_count` \| `int` \| Current step number \|
	\| `task_id` \| `int` \| Active task (1–6) \|
	\| `answer_submitted` \| `bool` \| Whether final answer was submitted \|
	\| `final_score` \| `float` \| Graded score after submission \|

	## Tasks

	Tasks use two data sources:
	- `df` — synthetic e-commerce sales CSV (~2000 orders): `order_id`, `customer_id`, `product_name`, `category`, `quantity`, `unit_price`, `total_price`, `order_date`, `city`, `country`
	- SQLite DB (`store_data.db`) — additional tables for cross-source tasks: `customer_profiles` (300 rows), `product_catalog` (25 rows)

	### Task 1 — Easy: Top Revenue Category
	- Question: What is the top-selling product category by total revenue?
	- Grading: Containment match (case-insensitive) → 1.0 or 0.0
	- Expected difficulty: Single groupby + sum + argmax

	### Task 2 — Medium: City Revenue Share
	- Question: Which city generates the most revenue? What percentage of total revenue does it represent?
	- Grading: 0.5 for correct city + 0.5 for percentage within ±0.1%
	- Expected difficulty: Groupby + percentage calculation + formatting

	### Task 3 — Medium: Repeat Customer Cohort Analysis
	- Question: How many unique customers ordered in both January and December? Compare their average order value to all other customers.
	- Grading: 0.33 per correct field (count, cohort AOV, other AOV)
	- Expected difficulty: Temporal filtering, set intersection, conditional aggregation

	### Task 4 — Hard: Monthly Revenue Ratio
	- Question: Which month had the highest vs. lowest total revenue? What is the ratio between them?
	- Grading: 0.33 for best month + 0.33 for worst month + 0.34 for ratio within ±0.01
	- Expected difficulty: Monthly resample/groupby, min/max comparison, ratio formatting

	### Task 5 — Hard: Customer Loyalty Tier Revenue (cross-source)
	- Question: Which customer loyalty tier generates the highest total revenue and what percentage does it represent?
	- Data: Requires joining `df` with `customer_profiles` table from SQLite on `customer_id`
	- Grading: 0.33 for tier name + 0.33 for revenue within ±0.5% + 0.34 for percentage within ±0.1
	- Expected difficulty: SQLite query → pandas merge → groupby aggregation

	### Task 6 — Hard: Supplier Profitability (cross-source)
	- Question: Which supplier has the highest total profit? What is their average profit margin?
	- Data: Requires joining `df` with `product_catalog` table from SQLite on `product_name`
	- Grading: 0.33 for supplier name + 0.34 for total profit within ±0.5% + 0.33 for avg margin within ±0.1
	- Expected difficulty: SQLite query → pandas merge → per-order profit/margin calculation → group aggregation

	## Reward Function

	\| Event \| Reward \|
	\|---\|---\|
	\| Successful code execution \| +0.05 \|
	\| Code execution error \| -0.05 \|
	\| Final answer (graded) \| 0.0 — 1.0 based on task grader \|
	\| Max steps (20) exceeded \| 0.0 \|

	## Setup & Usage

	### Prerequisites
	- Python 3.13+
	- [uv](https://docs.astral.sh/uv/) package manager

	### Install
	```bash
	uv sync
	```

	### Run the server
	```bash
	uv run uvicorn server.app:app --host 0.0.0.0 --port 8000
	```

	### Run the inference
	- First export all the required env variables mentioned in the .env.example. Then run below command
	```bash
	uv run python inference.py
	```

	### Run the baseline
	```bash
	OPENAI_API_KEY=sk-... uv run python baseline.py
	# Against a deployed HF Space:
	OPENAI_API_KEY=sk-... uv run python baseline.py --base-url https://<your-username>-<space-name>.hf.space
	```

	### Docker (local)
	```bash
	docker build -t data-analysis-env .
	docker run -p 7860:7860 data-analysis-env
	```


	### Client usage (Python)
	```python
	from client import DataAnalysisClient
	from models import DataAction

	# Async
	async with DataAnalysisClient(base_url="http://localhost:8000") as client:
	result = await client.reset(task_id=1)
	result = await client.step(DataAction(action_type="execute_code", code="print(df.head())"))
	result = await client.step(DataAction(action_type="submit_answer", answer="Electronics"))

	# Sync
	with DataAnalysisClient(base_url="http://localhost:8000").sync() as client:
	result = client.reset(task_id=2)
	result = client.step(DataAction(action_type="execute_code", code="print(df.groupby('city')['total_price'].sum())"))
	```

	## Project Structure

	```
	├── models.py # DataAction, DataObservation, DataState
	├── client.py # DataAnalysisClient (EnvClient subclass)
	├── inference.py # HF inference script (uses HF Inference API)
	├── baseline.py # OpenAI baseline inference script
	├── helpers/
	│ └── response_parser.py # Robust LLM JSON response parser
	├── tasks/
	│ ├── base_task.py # Task ABC with grade() interface
	│ ├── task_easy.py # Task 1 (Easy): Top revenue category
	│ ├── task_medium.py # Task 2 (Medium): City revenue share
	│ ├── task_medium_2.py # Task 4 (Hard): Monthly revenue ratio
	│ ├── task_hard.py # Task 3 (Medium): Repeat customer cohort
	│ ├── task_hard_2.py # Task 5 (Hard): Customer loyalty tier revenue
	│ └── task_hard_3.py # Task 6 (Hard): Supplier profitability
	├── datasets/
	│ ├── sales.csv # Synthetic e-commerce sales dataset
	│ └── store_data.db # SQLite DB: customer_profiles, product_catalog
	├── server/
	│ ├── app.py # FastAPI app entry point
	│ └── data_analysis_env.py # Environment implementation
	├── Dockerfile # HF Spaces Docker build (port 7860)
	├── openenv.yaml # OpenEnv spec metadata
	└── pyproject.toml # Dependencies and project config
	```