Spaces:

Mohammed-Altaf
/

DataAnalysis_Env

Sleeping

App Files Files Community

DataAnalysis_Env / README.md

Mohammed-Altaf

updated readme and baseline.py

7fd7757 about 1 month ago

preview code

raw

history blame contribute delete

7.51 kB

metadata

title: Data Analysis Agent Environment
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

Data Analysis Agent Environment

An OpenEnv-compliant RL environment for training and evaluating data analysis agents. Agents execute pandas code against a business dataset to answer analytical questions, graded by deterministic programmatic graders.

Motivation

Data analysis is a universal real-world task. Every business needs analysts who can query datasets, compute metrics, and extract insights. This environment lets RL agents practice that exact workflow — explore a dataset with code, then submit a precise answer — with automatic scoring.

Action & Observation Spaces

Action (`DataAction`)

Field	Type	Description
`action_type`	`"execute_code"` or `"submit_answer"`	What the agent wants to do
`code`	`str` (optional)	Python/pandas code to execute
`answer`	`str` (optional)	Final answer to submit for grading

Observation (`DataObservation`)

Field	Type	Description
`output`	`str`	Stdout from code execution or environment messages
`success`	`bool`	Whether the action succeeded
`error`	`str` (optional)	Error message if action failed
`task_description`	`str`	The question to answer (set on reset)
`dataset_info`	`str`	Dataset schema summary (set on reset)
`done`	`bool`	Whether the episode is over
`reward`	`float`	Step reward

State (`DataState`)

Field	Type	Description
`episode_id`	`str`	Unique episode identifier
`step_count`	`int`	Current step number
`task_id`	`int`	Active task (1–6)
`answer_submitted`	`bool`	Whether final answer was submitted
`final_score`	`float`	Graded score after submission

Tasks

Tasks use two data sources:

df — synthetic e-commerce sales CSV (~2000 orders): order_id, customer_id, product_name, category, quantity, unit_price, total_price, order_date, city, country
SQLite DB (store_data.db) — additional tables for cross-source tasks: customer_profiles (300 rows), product_catalog (25 rows)

Task 1 — Easy: Top Revenue Category

Question: What is the top-selling product category by total revenue?
Grading: Containment match (case-insensitive) → 1.0 or 0.0
Expected difficulty: Single groupby + sum + argmax

Task 2 — Medium: City Revenue Share

Question: Which city generates the most revenue? What percentage of total revenue does it represent?
Grading: 0.5 for correct city + 0.5 for percentage within ±0.1%
Expected difficulty: Groupby + percentage calculation + formatting

Task 3 — Medium: Repeat Customer Cohort Analysis

Question: How many unique customers ordered in both January and December? Compare their average order value to all other customers.
Grading: 0.33 per correct field (count, cohort AOV, other AOV)
Expected difficulty: Temporal filtering, set intersection, conditional aggregation

Task 4 — Hard: Monthly Revenue Ratio

Question: Which month had the highest vs. lowest total revenue? What is the ratio between them?
Grading: 0.33 for best month + 0.33 for worst month + 0.34 for ratio within ±0.01
Expected difficulty: Monthly resample/groupby, min/max comparison, ratio formatting

Task 5 — Hard: Customer Loyalty Tier Revenue (cross-source)

Question: Which customer loyalty tier generates the highest total revenue and what percentage does it represent?
Data: Requires joining df with customer_profiles table from SQLite on customer_id
Grading: 0.33 for tier name + 0.33 for revenue within ±0.5% + 0.34 for percentage within ±0.1
Expected difficulty: SQLite query → pandas merge → groupby aggregation

Task 6 — Hard: Supplier Profitability (cross-source)

Question: Which supplier has the highest total profit? What is their average profit margin?
Data: Requires joining df with product_catalog table from SQLite on product_name
Grading: 0.33 for supplier name + 0.34 for total profit within ±0.5% + 0.33 for avg margin within ±0.1
Expected difficulty: SQLite query → pandas merge → per-order profit/margin calculation → group aggregation

Reward Function

Event	Reward
Successful code execution	+0.05
Code execution error	-0.05
Final answer (graded)	0.0 — 1.0 based on task grader
Max steps (20) exceeded	0.0

Setup & Usage

Prerequisites

Python 3.13+
uv package manager

Install

uv sync

Run the server

uv run uvicorn server.app:app --host 0.0.0.0 --port 8000

Run the inference

First export all the required env variables mentioned in the .env.example. Then run below command

uv run python inference.py

Run the baseline

OPENAI_API_KEY=sk-... uv run python baseline.py
# Against a deployed HF Space:
OPENAI_API_KEY=sk-... uv run python baseline.py --base-url https://<your-username>-<space-name>.hf.space

Docker (local)

docker build -t data-analysis-env .
docker run -p 7860:7860 data-analysis-env

Client usage (Python)

from client import DataAnalysisClient
from models import DataAction

# Async
async with DataAnalysisClient(base_url="http://localhost:8000") as client:
    result = await client.reset(task_id=1)
    result = await client.step(DataAction(action_type="execute_code", code="print(df.head())"))
    result = await client.step(DataAction(action_type="submit_answer", answer="Electronics"))

# Sync
with DataAnalysisClient(base_url="http://localhost:8000").sync() as client:
    result = client.reset(task_id=2)
    result = client.step(DataAction(action_type="execute_code", code="print(df.groupby('city')['total_price'].sum())"))

Project Structure

├── models.py                  # DataAction, DataObservation, DataState
├── client.py                  # DataAnalysisClient (EnvClient subclass)
├── inference.py               # HF inference script (uses HF Inference API)
├── baseline.py                # OpenAI baseline inference script
├── helpers/
│   └── response_parser.py     # Robust LLM JSON response parser
├── tasks/
│   ├── base_task.py           # Task ABC with grade() interface
│   ├── task_easy.py           # Task 1 (Easy): Top revenue category
│   ├── task_medium.py         # Task 2 (Medium): City revenue share
│   ├── task_medium_2.py       # Task 4 (Hard): Monthly revenue ratio
│   ├── task_hard.py           # Task 3 (Medium): Repeat customer cohort
│   ├── task_hard_2.py         # Task 5 (Hard): Customer loyalty tier revenue
│   └── task_hard_3.py         # Task 6 (Hard): Supplier profitability
├── datasets/
│   ├── sales.csv              # Synthetic e-commerce sales dataset
│   └── store_data.db          # SQLite DB: customer_profiles, product_catalog
├── server/
│   ├── app.py                 # FastAPI app entry point
│   └── data_analysis_env.py   # Environment implementation
├── Dockerfile                 # HF Spaces Docker build (port 7860)
├── openenv.yaml               # OpenEnv spec metadata
└── pyproject.toml             # Dependencies and project config