Spaces:

HimanshuSardana2
/

data_analysis_env

Runtime error

App Files Files Community

HimanshuSardana2 commited on 9 days ago

Commit

abb357f

verified ·

1 Parent(s): 8c8a964

Upload folder using huggingface_hub

Browse files

Files changed (17) hide show

Dockerfile +18 -0
README.md +150 -4
__init__.py +49 -0
client.py +66 -0
inference.py +244 -0
models.py +70 -0
openenv.yaml +20 -0
pyproject.toml +29 -0
server/__init__.py +1 -0
server/app.py +30 -0
server/data/dirty.csv +18 -0
server/data/products.csv +5 -0
server/data/sales.csv +13 -0
server/data/simple.csv +11 -0
server/quantum_openenv_env_environment.py +533 -0
server/requirements.txt +4 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,18 @@

+FROM python:3.10-slim
+WORKDIR /app
+RUN pip install --no-cache-dir uv
+COPY server/requirements.txt /tmp/requirements.txt
+RUN pip install --no-cache-dir -r /tmp/requirements.txt
+COPY . /app/
+ENV PYTHONPATH=/app
+ENV DATA_DIR=/app/server/data
+EXPOSE 8000
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,10 +1,156 @@
 ---
 title: Data Analysis Env
-emoji: 🐨
-colorFrom: gray
 colorTo: green
 sdk: docker
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Data Analysis Env
+emoji: 📊
+colorFrom: blue
 colorTo: green
 sdk: docker
+app_file: inference.py
+pytorch: false
+python_version: "3.10"
+tags:
+- data-analysis
+- pandas
+- openenv
+- ai-agents
+license: mit
+base_path: /web
 ---
+# Data Analysis OpenEnv Environment
+A real-world OpenEnv environment for training and evaluating AI agents on pandas data analysis tasks.
+## Environment Description
+This environment simulates real-world data analysis workflows that humans perform daily:
+- Loading and exploring CSV data
+- Cleaning dirty data (handling missing values, removing duplicates)
+- Transforming data (filtering, sorting, selecting columns)
+- Merging multiple datasets
+- Computing statistics and aggregations
+## Task Descriptions
+### Task 1: Basic Statistics (Easy)
+- **Objective**: Load `simple.csv` and calculate the mean of the `price` column
+- **Difficulty**: Easy
+- **Expected Score**: 0.7+ for correct mean calculation
+### Task 2: Data Cleaning (Medium)
+- **Objective**: Load `dirty.csv`, fill missing values (mean), remove duplicates, calculate median of `age`
+- **Difficulty**: Medium
+- **Expected Score**: 0.7+ for correct cleaning and median calculation
+### Task 3: Multi-table Analysis (Hard)
+- **Objective**: Load `sales.csv` and `products.csv`, merge on product_id, calculate total sales per category
+- **Difficulty**: Hard
+- **Expected Score**: 0.7+ for correct merge and aggregation
+## Action Space
+```python
+DataAnalysisAction(
+    tool: str,           # Tool name: load_csv, show_data, show_columns, fill_missing,
+                       # remove_duplicates, filter_rows, select_columns, group_by,
+                       # calculate, sort_by, get_result, merge_datasets
+    parameters: dict     # Tool parameters
+)
+```
+## Observation Space
+```python
+DataAnalysisObservation(
+    done: bool,                    # Episode done flag
+    reward: float,                 # Reward (0.0-1.0)
+    success: bool,                 # Tool executed successfully
+    output: str,                  # Tool output
+    data_shape: tuple[int, int],   # (rows, columns)
+    columns: list[str],             # Column names
+    tools_used: list[str],         # History of tools called
+    error: Optional[str]           # Error message if any
+)
+```
+## Reward Function
+- **+0.1**: Each successful tool execution
+- **+0.5 × score**: Final result grading (score based on accuracy)
+- **-0.1**: Failed tool execution or invalid tool
+- **0.0**: Episode ends without meaningful progress
+## Setup Instructions
+### Local Development
+```bash
+# Install dependencies
+cd data_analysis_env
+pip install -r server/requirements.txt
+# Run the server
+python -m server.app
+# Or use uvicorn
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+```
+### Docker
+```bash
+# Build the image
+docker build -t data_analysis_env .
+# Run the container
+docker run -p 8000:8000 data_analysis_env
+```
+### Running Inference
+```bash
+# Set environment variables
+export HF_TOKEN=your_token
+export API_BASE_URL=https://router.huggingface.co/v1
+export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
+export ENV_URL=http://localhost:8000
+# Run inference
+python inference.py
+```
+## Baseline Scores
+| Task | Expected Score |
+|------|--------------|
+| task_1 (Easy) | 0.7-1.0 |
+| task_2 (Medium) | 0.5-0.8 |
+| task_3 (Hard) | 0.3-0.7 |
+## API Endpoints
+- `POST /reset` - Reset environment with task name
+- `POST /step` - Execute action
+- `GET /state` - Get current state
+## Files
+```
+data_analysis_env/
+├── __init__.py          # Package init
+├── models.py            # Pydantic models
+├── client.py             # Client implementation
+├── inference.py         # Inference script
+├── openenv.yaml         # OpenEnv spec
+├── Dockerfile           # Docker configuration
+├── server/
+│   ├── app.py           # FastAPI app
+│   ├── data_analysis_environment.py  # Environment implementation
+│   ├── Dockerfile       # Server Dockerfile
+│   ├── requirements.txt
+│   └── data/
+│       ├── simple.csv
+│       ├── dirty.csv
+│       ├── sales.csv
+│       └── products.csv
+└── README.md
+```

__init__.py ADDED Viewed

	@@ -0,0 +1,49 @@

+from .models import (
+    DataAnalysisAction,
+    DataAnalysisObservation,
+    DataAnalysisState,
+    AVAILABLE_TOOLS,
+)
+from .client import DataAnalysisEnv
+__all__ = [
+    "DataAnalysisAction",
+    "DataAnalysisObservation",
+    "DataAnalysisState",
+    "DataAnalysisEnv",
+    "AVAILABLE_TOOLS",
+]
+TASKS = {
+    "task_1": {
+        "name": "Basic Statistics",
+        "description": "Load simple.csv and calculate the mean of the 'price' column",
+        "datafile": "simple.csv",
+        "target_column": "price",
+        "target_operation": "mean",
+        "expected_answer": None,
+        "difficulty": "easy",
+    },
+    "task_2": {
+        "name": "Data Cleaning",
+        "description": "Load dirty.csv, fill missing values, remove duplicates, then calculate median of 'age'",
+        "datafile": "dirty.csv",
+        "target_column": "age",
+        "target_operation": "median",
+        "expected_answer": None,
+        "difficulty": "medium",
+    },
+    "task_3": {
+        "name": "Multi-table Analysis",
+        "description": "Load sales.csv and products.csv, merge on product_id, calculate total sales per category",
+        "datafile": "sales.csv",
+        "secondary_datafile": "products.csv",
+        "target_column": "sales",
+        "group_by_column": "category",
+        "target_operation": "sum",
+        "expected_answer": None,
+        "difficulty": "hard",
+    },
+}

client.py ADDED Viewed

	@@ -0,0 +1,66 @@

+from typing import Optional
+import httpx
+from openenv.core.env_client import EnvClient, StepResult
+from models import DataAnalysisAction, DataAnalysisObservation, DataAnalysisState
+class DataAnalysisEnv(EnvClient):
+    def __init__(self, base_url: str = "http://localhost:8000"):
+        self._base_url = base_url.rstrip("/")
+        if self._base_url.startswith("ws://"):
+            self._base_url = self._base_url.replace("ws://", "http://")
+        elif not self._base_url.startswith("http://"):
+            self._base_url = "http://" + self._base_url
+        self._client: Optional[httpx.AsyncClient] = None
+    def _get_client(self) -> httpx.AsyncClient:
+        if self._client is None:
+            self._client = httpx.AsyncClient(base_url=self._base_url, timeout=60.0)
+        return self._client
+    async def reset(self, task: str = "task_1", **kwargs) -> StepResult:
+        client = self._get_client()
+        response = await client.post("/reset", json={"task": task})
+        response.raise_for_status()
+        data = response.json()
+        return self._parse_result(data)
+    async def step(self, action: DataAnalysisAction) -> StepResult:
+        payload = {
+            "action": {
+                "tool": action.tool,
+                "parameters": action.parameters,
+            }
+        }
+        client = self._get_client()
+        response = await client.post("/step", json=payload)
+        response.raise_for_status()
+        data = response.json()
+        return self._parse_result(data)
+    async def state(self) -> DataAnalysisState:
+        client = self._get_client()
+        response = await client.get("/state")
+        response.raise_for_status()
+        data = response.json()
+        return DataAnalysisState(**data)
+    async def close(self):
+        if self._client:
+            await self._client.aclose()
+            self._client = None
+    @staticmethod
+    def _parse_result(payload: dict) -> StepResult:
+        obs = DataAnalysisObservation(**payload.get("observation", {}))
+        return StepResult(
+            observation=obs,
+            reward=payload.get("reward", 0.0),
+            done=payload.get("done", False),
+        )
+    @staticmethod
+    def _parse_state(payload: dict) -> DataAnalysisState:
+        return DataAnalysisState(**payload)

inference.py ADDED Viewed

	@@ -0,0 +1,244 @@

+import asyncio
+import os
+import sys
+import textwrap
+from typing import List, Optional
+from openai import OpenAI
+from openenv.core.env_client import StepResult
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
+BENCHMARK = "data_analysis_env"
+MAX_STEPS = 20
+SUCCESS_SCORE_THRESHOLD = 0.7
+TASK_INSTRUCTIONS = {
+    "task_1": textwrap.dedent(
+        """You are a data analysis assistant. Your task is to: 1. Load the CSV file 'simple.csv' 2. Calculate the mean of the 'price' column. Available tools: load_csv(filename='filename.csv'), show_data(), show_columns(), calculate(column='column_name', operation='mean|median|sum|count|std|min|max'). Start by loading the data, then calculate the mean of the price column."""
+    ),
+    "task_2": textwrap.dedent(
+        """You are a data analysis assistant. Your task is to: 1. Load the CSV file 'dirty.csv' 2. Fill missing values (use mean) 3. Remove duplicate rows 4. Calculate the median of the 'age' column. Available tools: load_csv(filename='filename.csv'), fill_missing(value='mean|median|zero|value'), remove_duplicates(), show_data(), show_columns(), calculate(column='column_name', operation='mean|median|sum|count|std|min|max'). Start by loading the data, then clean it, then calculate the median."""
+    ),
+    "task_3": textwrap.dedent(
+        """You are a data analysis assistant. Your task is to: 1. Load 'sales.csv' and 'products.csv' 2. Merge them on 'product_id' 3. Group by 'category' and sum the 'sales' column 4. Get the final result. Available tools: load_csv(filename='filename.csv'), merge_datasets(filename='filename.csv', on='column_name'), show_data(), show_columns(), group_by(group_column='column_name', agg_column='column_name', operation='sum|mean|count'), calculate(column='column_name', operation='sum|mean|count'), get_result(). Start by loading both files, then merge, then group and aggregate."""
+    ),
+}
+def get_action_from_response(response: str):
+    from data_analysis_env import DataAnalysisAction
+    response = response.strip()
+    if response.lower() in ["done", "get_result()"]:
+        return DataAnalysisAction(tool="get_result", parameters={})
+    if "(" not in response or ")" not in response:
+        return None
+    try:
+        tool_name = response.split("(")[0].strip()
+        params_str = response.split("(")[1].split(")")[0].strip()
+        parameters = {}
+        if params_str:
+            for param in params_str.split(","):
+                param = param.strip()
+                if "=" in param:
+                    key, value = param.split("=", 1)
+                    key = key.strip()
+                    value = value.strip().strip("'\"")
+                    if value.lower() == "none":
+                        value = None
+                    elif value.lower() == "true":
+                        value = True
+                    elif value.lower() == "false":
+                        value = False
+                    else:
+                        try:
+                            if "." in value:
+                                value = float(value)
+                            else:
+                                value = int(value)
+                        except ValueError:
+                            pass
+                    parameters[key] = value
+        return DataAnalysisAction(tool=tool_name, parameters=parameters)
+    except Exception as e:
+        print(f"Error parsing action: {e}", file=sys.stderr)
+        return None
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(
+    step: int, action: str, reward: float, done: bool, error: Optional[str]
+) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+async def run_task(client: OpenAI, env, task_name: str):
+    from data_analysis_env import DataAnalysisAction
+    log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
+    instruction = TASK_INSTRUCTIONS.get(task_name, "")
+    messages = [
+        {"role": "system", "content": instruction},
+        {"role": "user", "content": "Begin the analysis task."},
+    ]
+    step = 0
+    rewards = []
+    last_error = None
+    result = await env.reset(task=task_name)
+    obs = result.observation
+    reward_val = obs.reward if obs.reward is not None else 0.0
+    print(
+        f"[STEP] step={step} action=reset reward={reward_val:.2f} done={result.done} error=null",
+        flush=True,
+    )
+    while not result.done and step < MAX_STEPS:
+        step += 1
+        response = (
+            client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=messages
+                + [{"role": "assistant", "content": f"Previous output: {obs.output}"}],
+                temperature=0.1,
+                max_tokens=500,
+            )
+            .choices[0]
+            .message.content
+        )
+        action = get_action_from_response(response)
+        if action is None:
+            last_error = "Could not parse action"
+            print(
+                f"[STEP] step={step} action='{response}' reward={obs.reward:.2f} done=false error={last_error}",
+                flush=True,
+            )
+            messages.append(
+                {
+                    "role": "user",
+                    "content": f"Invalid action format. Please use tool_name(param1=value1, param2=value2). Error: {last_error}",
+                }
+            )
+            continue
+        result = await env.step(action)
+        obs = result.observation
+        reward_val = obs.reward if obs.reward is not None else 0.0
+        rewards.append(reward_val)
+        error_str = obs.error if obs.error else "null"
+        print(
+            f"[STEP] step={step} action={action.tool}({action.parameters}) reward={reward_val:.2f} done={result.done} error={error_str}",
+            flush=True,
+        )
+        if obs.error:
+            last_error = obs.error
+            messages.append(
+                {
+                    "role": "user",
+                    "content": f"Error: {obs.error}. Please try a different tool or correct parameters.",
+                }
+            )
+        else:
+            messages.append(
+                {
+                    "role": "user",
+                    "content": f"Tool executed successfully. Output: {obs.output}",
+                }
+            )
+        if result.done:
+            break
+    score = obs.reward if obs.reward is not None else 0.0
+    success = score >= SUCCESS_SCORE_THRESHOLD
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    log_end(success=success, steps=step, score=score, rewards=rewards)
+    return {
+        "task": task_name,
+        "success": success,
+        "steps": step,
+        "score": score,
+        "rewards": rewards,
+    }
+async def main():
+    from data_analysis_env import DataAnalysisEnv
+    if not API_KEY:
+        print(
+            "Error: HF_TOKEN or API_KEY environment variable not set", file=sys.stderr
+        )
+        sys.exit(1)
+    client = OpenAI(api_key=API_KEY, base_url=API_BASE_URL)
+    base_url = os.getenv("ENV_URL", "http://localhost:8000")
+    env = DataAnalysisEnv(base_url=base_url)
+    results = []
+    for task_name in ["task_1", "task_2", "task_3"]:
+        try:
+            result = await run_task(client, env, task_name)
+            results.append(result)
+        except Exception as e:
+            print(f"Error running {task_name}: {e}", file=sys.stderr)
+            results.append(
+                {
+                    "task": task_name,
+                    "success": False,
+                    "steps": 0,
+                    "score": 0.0,
+                    "rewards": [],
+                }
+            )
+    await env.close()
+    avg_score = sum(r["score"] for r in results) / len(results)
+    print(f"\n=== Summary ===")
+    print(f"Average Score: {avg_score:.2f}")
+    for r in results:
+        print(f"  {r['task']}: {r['score']:.2f} ({'PASS' if r['success'] else 'FAIL'})")
+if __name__ == "__main__":
+    asyncio.run(main())

models.py ADDED Viewed

	@@ -0,0 +1,70 @@

+from typing import Any, Literal, Optional
+from pydantic import BaseModel, Field, field_validator
+class DataAnalysisAction(BaseModel):
+    tool: str = Field(..., description="Tool name to execute")
+    parameters: dict[str, Any] = Field(
+        default_factory=dict, description="Tool parameters"
+    )
+    @field_validator("tool", mode="before")
+    @classmethod
+    def _coerce_tool(cls, value):
+        if isinstance(value, dict):
+            return value.get("tool", "")
+        return str(value)
+class DataAnalysisObservation(BaseModel):
+    done: bool = Field(default=False, description="Whether episode is done")
+    reward: float = Field(default=0.0, description="Reward for this step")
+    success: bool = Field(
+        default=True, description="Whether tool executed successfully"
+    )
+    output: str = Field(default="", description="Tool output or error message")
+    data_shape: Optional[tuple[int, int]] = Field(
+        default=None, description="(rows, columns) of current data"
+    )
+    columns: list[str] = Field(
+        default_factory=list, description="Column names of current data"
+    )
+    tools_used: list[str] = Field(
+        default_factory=list, description="History of tools called"
+    )
+    error: Optional[str] = Field(
+        default=None, description="Error message if tool failed"
+    )
+    @field_validator("data_shape", mode="before")
+    @classmethod
+    def _coerce_shape(cls, value):
+        if isinstance(value, list) and len(value) == 2:
+            return tuple(value)
+        return value
+class DataAnalysisState(BaseModel):
+    episode_id: Optional[str] = Field(
+        default=None, description="Unique episode identifier"
+    )
+    task_name: str = Field(default="", description="Current task name")
+    step_count: int = Field(default=0, description="Number of steps taken")
+    max_steps: int = Field(default=20, description="Maximum steps allowed per episode")
+    data_loaded: bool = Field(default=False, description="Whether data has been loaded")
+AVAILABLE_TOOLS = [
+    "load_csv",
+    "show_data",
+    "show_columns",
+    "fill_missing",
+    "remove_duplicates",
+    "filter_rows",
+    "select_columns",
+    "group_by",
+    "calculate",
+    "sort_by",
+    "get_result",
+    "merge_datasets",
+]

openenv.yaml ADDED Viewed

	@@ -0,0 +1,20 @@

+spec_version: 1
+name: data_analysis_env
+type: environment
+runtime: fastapi
+app: server.app:app
+port: 8000
+metadata:
+  title: Data Analysis Env
+  description: Real-world data analysis tasks using pandas - load, clean, transform, and analyze CSV data
+  difficulty:
+    - easy
+    - medium
+    - hard
+  tags:
+    - data-analysis
+    - pandas
+    - openenv
+    - ai-agents
+  author: Meta Hackathon
+  version: "1.0.0"

pyproject.toml ADDED Viewed

	@@ -0,0 +1,29 @@

+[project]
+name = "data_analysis_env"
+version = "0.1.0"
+description = "Data Analysis Environment for OpenEnv - A real-world RL task for teaching agents pandas data analysis"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv-core>=0.1.0",
+    "pandas>=2.0.0",
+    "fastapi>=0.100.0",
+    "uvicorn>=0.23.0",
+]
+[project.scripts]
+data_analysis_env = "server.app:main"
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.0.0",
+    "black>=23.0.0",
+    "mypy>=1.0.0",
+]
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.build.targets.wheel]
+packages = ["data_analysis_env"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Server package

server/app.py ADDED Viewed

	@@ -0,0 +1,30 @@

+import os
+from pathlib import Path
+from openenv.core.env_server import create_app
+from server.quantum_openenv_env_environment import DataAnalysisEnvironment
+from models import DataAnalysisAction, DataAnalysisObservation
+def create_data_analysis_environment():
+    data_dir = os.getenv("DATA_DIR", "/app/data")
+    return DataAnalysisEnvironment(data_dir=data_dir)
+app = create_app(
+    create_data_analysis_environment,
+    DataAnalysisAction,
+    DataAnalysisObservation,
+    env_name="data_analysis_env",
+)
+def main():
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+if __name__ == "__main__":
+    main()

server/data/dirty.csv ADDED Viewed

	@@ -0,0 +1,18 @@

+name,age,salary,city
+John,25,50000,New York
+Jane,30,60000,Los Angeles
+Bob,,55000,Chicago
+Alice,28,52000,Houston
+John,25,50000,New York
+Charlie,35,70000,Phoenix
+Jane,30,60000,Los Angeles
+David,,58000,San Diego
+Eve,32,,Philadelphia
+Frank,29,54000,Dallas
+Bob,35,55000,Chicago
+Grace,27,51000,Austin
+Henry,,62000,Seattle
+Ivy,31,56000,Denver
+John,25,50000,New York
+Jack,33,59000,Boston
+Kelly,26,,Portland

server/data/products.csv ADDED Viewed

	@@ -0,0 +1,5 @@

+product_id,product_name,category,unit_price
+P001,Widget Alpha,Electronics,50.00
+P002,Widget Beta,Electronics,50.00
+P003,Widget Gamma,Home,50.00
+P004,Widget Delta,Home,50.00

server/data/sales.csv ADDED Viewed

	@@ -0,0 +1,13 @@

+transaction_id,product_id,quantity,sales,date
+1,P001,5,250.00,2024-01-15
+2,P002,3,150.00,2024-01-16
+3,P001,2,100.00,2024-01-17
+4,P003,4,200.00,2024-01-18
+5,P002,6,300.00,2024-01-19
+6,P001,3,150.00,2024-01-20
+7,P003,2,100.00,2024-01-21
+8,P002,5,250.00,2024-01-22
+9,P001,4,200.00,2024-01-23
+10,P003,3,150.00,2024-01-24
+11,P002,2,100.00,2024-01-25
+12,P001,5,250.00,2024-01-26

server/data/simple.csv ADDED Viewed

	@@ -0,0 +1,11 @@

+product,price,category
+Widget A,29.99,Electronics
+Widget B,49.99,Electronics
+Widget C,19.99,Electronics
+Widget D,39.99,Electronics
+Widget E,59.99,Electronics
+Widget F,24.99,Electronics
+Widget G,34.99,Electronics
+Widget H,44.99,Electronics
+Widget I,54.99,Electronics
+Widget J,64.99,Electronics

server/quantum_openenv_env_environment.py ADDED Viewed

	@@ -0,0 +1,533 @@

+import os
+from pathlib import Path
+from typing import Any, Optional
+import pandas as pd
+import uuid
+from openenv.core.env_server import Environment
+from models import (
+    DataAnalysisAction,
+    DataAnalysisObservation,
+    DataAnalysisState,
+    AVAILABLE_TOOLS,
+)
+TASKS = {
+    "task_1": {
+        "name": "Basic Statistics",
+        "description": "Load simple.csv and calculate the mean of the 'price' column",
+        "datafile": "simple.csv",
+        "target_column": "price",
+        "target_operation": "mean",
+        "expected_answer": None,
+        "difficulty": "easy",
+    },
+    "task_2": {
+        "name": "Data Cleaning",
+        "description": "Load dirty.csv, fill missing values, remove duplicates, then calculate median of 'age'",
+        "datafile": "dirty.csv",
+        "target_column": "age",
+        "target_operation": "median",
+        "expected_answer": None,
+        "difficulty": "medium",
+    },
+    "task_3": {
+        "name": "Multi-table Analysis",
+        "description": "Load sales.csv and products.csv, merge on product_id, calculate total sales per category",
+        "datafile": "sales.csv",
+        "secondary_datafile": "products.csv",
+        "target_column": "sales",
+        "group_by_column": "category",
+        "target_operation": "sum",
+        "expected_answer": None,
+        "difficulty": "hard",
+    },
+}
+class DataAnalysisEnvironment(Environment):
+    def __init__(self, data_dir: Optional[str] = None):
+        super().__init__()
+        self._data_dir = data_dir or str(Path(__file__).parent / "data")
+        self._state = DataAnalysisState()
+        self._df: Optional[pd.DataFrame] = None
+        self._secondary_df: Optional[pd.DataFrame] = None
+        self._last_result: Any = None
+        self._reward = 0.0
+        self._tools_used: list[str] = []
+    def reset(
+        self, seed: Optional[int] = None, episode_id: Optional[str] = None, **kwargs
+    ) -> DataAnalysisObservation:
+        task_name = kwargs.get("task", "task_1")
+        self._state = DataAnalysisState(
+            episode_id=episode_id or str(uuid.uuid4()),
+            task_name=task_name,
+            step_count=0,
+            max_steps=20,
+            data_loaded=False,
+        )
+        self._df = None
+        self._secondary_df = None
+        self._last_result = None
+        self._reward = 0.0
+        self._tools_used = []
+        task = TASKS.get(task_name, TASKS["task_1"])
+        datafile = os.path.join(self._data_dir, task.get("datafile", "simple.csv"))
+        if os.path.exists(datafile):
+            self._df = pd.read_csv(datafile)
+            self._state.data_loaded = True
+            if task_name == "task_1":
+                task["expected_answer"] = float(self._df[task["target_column"]].mean())
+            elif task_name == "task_2":
+                df_clean = self._df.fillna(
+                    self._df.median(numeric_only=True)
+                ).drop_duplicates()
+                task["expected_answer"] = float(
+                    df_clean[task["target_column"]].median()
+                )
+            elif task_name == "task_3":
+                secondary = os.path.join(
+                    self._data_dir, task.get("secondary_datafile", "products.csv")
+                )
+                if os.path.exists(secondary):
+                    self._secondary_df = pd.read_csv(secondary)
+                    merged = self._df.merge(self._secondary_df, on="product_id")
+                    task["expected_answer"] = (
+                        merged.groupby(task["group_by_column"])[task["target_column"]]
+                        .sum()
+                        .to_dict()
+                    )
+        return DataAnalysisObservation(
+            done=False,
+            reward=0.0,
+            success=True,
+            output=f"Ready. Task: {task['name']}. {task['description']}",
+            data_shape=tuple(self._df.shape) if self._df is not None else None,
+            columns=list(self._df.columns) if self._df is not None else [],
+            tools_used=[],
+        )
+    def step(self, action: DataAnalysisAction) -> DataAnalysisObservation:
+        self._state.step_count += 1
+        tool = action.tool
+        params = action.parameters
+        self._tools_used.append(f"{tool}({params})")
+        if tool not in AVAILABLE_TOOLS:
+            self._reward = max(0, self._reward - 0.1)
+            return DataAnalysisObservation(
+                done=False,
+                reward=self._reward,
+                success=False,
+                output=f"Unknown tool: {tool}",
+                data_shape=tuple(self._df.shape) if self._df is not None else None,
+                columns=list(self._df.columns) if self._df is not None else [],
+                tools_used=self._tools_used,
+                error=f"Tool '{tool}' not found. Available: {AVAILABLE_TOOLS}",
+            )
+        try:
+            result = self._execute_tool(tool, params)
+            if result["success"]:
+                self._reward = min(1.0, self._reward + 0.1)
+            else:
+                self._reward = max(0, self._reward - 0.1)
+            done = self._state.step_count >= self._state.max_steps
+            if done and self._reward < 0.5:
+                self._reward = 0.0
+            return DataAnalysisObservation(
+                done=done,
+                reward=self._reward,
+                success=result["success"],
+                output=result["output"],
+                data_shape=tuple(self._df.shape) if self._df is not None else None,
+                columns=list(self._df.columns) if self._df is not None else [],
+                tools_used=self._tools_used,
+                error=result.get("error"),
+            )
+        except Exception as e:
+            self._reward = max(0, self._reward - 0.1)
+            return DataAnalysisObservation(
+                done=False,
+                reward=self._reward,
+                success=False,
+                output=f"Error executing {tool}: {str(e)}",
+                data_shape=tuple(self._df.shape) if self._df is not None else None,
+                columns=list(self._df.columns) if self._df is not None else [],
+                tools_used=self._tools_used,
+                error=str(e),
+            )
+    def _execute_tool(self, tool: str, params: dict) -> dict:
+        if tool == "load_csv":
+            return self._tool_load_csv(params)
+        elif tool == "show_data":
+            return self._tool_show_data(params)
+        elif tool == "show_columns":
+            return self._tool_show_columns(params)
+        elif tool == "fill_missing":
+            return self._tool_fill_missing(params)
+        elif tool == "remove_duplicates":
+            return self._tool_remove_duplicates(params)
+        elif tool == "filter_rows":
+            return self._tool_filter_rows(params)
+        elif tool == "select_columns":
+            return self._tool_select_columns(params)
+        elif tool == "group_by":
+            return self._tool_group_by(params)
+        elif tool == "calculate":
+            return self._tool_calculate(params)
+        elif tool == "sort_by":
+            return self._tool_sort_by(params)
+        elif tool == "get_result":
+            return self._tool_get_result(params)
+        elif tool == "merge_datasets":
+            return self._tool_merge_datasets(params)
+        return {"success": False, "output": f"Unknown tool: {tool}"}
+    def _tool_load_csv(self, params: dict) -> dict:
+        filename = params.get("filename", "")
+        filepath = os.path.join(self._data_dir, filename)
+        if not os.path.exists(filepath):
+            return {
+                "success": False,
+                "output": f"File not found: {filename}",
+                "error": "FileNotFound",
+            }
+        self._df = pd.read_csv(filepath)
+        self._state.data_loaded = True
+        return {
+            "success": True,
+            "output": f"Loaded {filename}: {self._df.shape[0]} rows, {self._df.shape[1]} columns. Columns: {list(self._df.columns)}",
+        }
+    def _tool_show_data(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        n = params.get("n", 5)
+        head = self._df.head(n).to_string()
+        return {
+            "success": True,
+            "output": f"Data shape: {self._df.shape}\n{head}",
+        }
+    def _tool_show_columns(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        cols = [(col, str(self._df[col].dtype)) for col in self._df.columns]
+        output = "Columns:\n" + "\n".join([f"  {c}: {t}" for c, t in cols])
+        return {"success": True, "output": output}
+    def _tool_fill_missing(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        method = params.get("value", "mean")
+        if method == "mean":
+            self._df = self._df.fillna(self._df.mean(numeric_only=True))
+        elif method == "median":
+            self._df = self._df.fillna(self._df.median(numeric_only=True))
+        elif method == "zero":
+            self._df = self._df.fillna(0)
+        else:
+            self._df = self._df.fillna(method)
+        return {
+            "success": True,
+            "output": f"Filled missing values with {method}. Shape: {self._df.shape}",
+        }
+    def _tool_remove_duplicates(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        before = len(self._df)
+        self._df = self._df.drop_duplicates()
+        removed = before - len(self._df)
+        return {
+            "success": True,
+            "output": f"Removed {removed} duplicate rows. Remaining: {len(self._df)} rows",
+        }
+    def _tool_filter_rows(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        column = params.get("column", "")
+        operator = params.get("operator", "==")
+        value = params.get("value", None)
+        if column not in self._df.columns:
+            return {
+                "success": False,
+                "output": f"Column not found: {column}",
+                "error": "ColumnNotFound",
+            }
+        try:
+            if operator == "==":
+                mask = self._df[column] == value
+            elif operator == "!=":
+                mask = self._df[column] != value
+            elif operator == ">":
+                mask = self._df[column] > value
+            elif operator == ">=":
+                mask = self._df[column] >= value
+            elif operator == "<":
+                mask = self._df[column] < value
+            elif operator == "<=":
+                mask = self._df[column] <= value
+            else:
+                return {
+                    "success": False,
+                    "output": f"Unknown operator: {operator}",
+                    "error": "InvalidOperator",
+                }
+            self._df = self._df[mask]
+            return {"success": True, "output": f"Filtered to {len(self._df)} rows"}
+        except Exception as e:
+            return {
+                "success": False,
+                "output": f"Filter error: {str(e)}",
+                "error": str(e),
+            }
+    def _tool_select_columns(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        columns = params.get("columns", [])
+        missing = [c for c in columns if c not in self._df.columns]
+        if missing:
+            return {
+                "success": False,
+                "output": f"Columns not found: {missing}",
+                "error": "ColumnNotFound",
+            }
+        self._df = self._df[columns]
+        return {
+            "success": True,
+            "output": f"Selected columns: {columns}. Shape: {self._df.shape}",
+        }
+    def _tool_group_by(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        group_column = params.get("group_column", "")
+        agg_column = params.get("agg_column", "")
+        operation = params.get("operation", "mean")
+        if group_column not in self._df.columns or agg_column not in self._df.columns:
+            return {
+                "success": False,
+                "output": "Columns not found",
+                "error": "ColumnNotFound",
+            }
+        result = self._df.groupby(group_column)[agg_column].agg(operation)
+        self._last_result = result.to_dict()
+        return {
+            "success": True,
+            "output": f"Grouped by {group_column}, aggregated {agg_column} with {operation}:\n{result.to_string()}",
+        }
+    def _tool_calculate(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        column = params.get("column", "")
+        operation = params.get("operation", "mean")
+        if column not in self._df.columns:
+            return {
+                "success": False,
+                "output": f"Column not found: {column}",
+                "error": "ColumnNotFound",
+            }
+        try:
+            if operation == "mean":
+                result = self._df[column].mean()
+            elif operation == "median":
+                result = self._df[column].median()
+            elif operation == "sum":
+                result = self._df[column].sum()
+            elif operation == "count":
+                result = self._df[column].count()
+            elif operation == "std":
+                result = self._df[column].std()
+            elif operation == "min":
+                result = self._df[column].min()
+            elif operation == "max":
+                result = self._df[column].max()
+            else:
+                return {
+                    "success": False,
+                    "output": f"Unknown operation: {operation}",
+                    "error": "InvalidOperation",
+                }
+            self._last_result = float(result)
+            return {"success": True, "output": f"{operation}({column}) = {result}"}
+        except Exception as e:
+            return {
+                "success": False,
+                "output": f"Calculation error: {str(e)}",
+                "error": str(e),
+            }
+    def _tool_sort_by(self, params: dict) -> dict:
+        if self._df is None:
+            return {"success": False, "output": "No data loaded", "error": "NoData"}
+        column = params.get("column", "")
+        ascending = params.get("ascending", True)
+        if column not in self._df.columns:
+            return {
+                "success": False,
+                "output": f"Column not found: {column}",
+                "error": "ColumnNotFound",
+            }
+        self._df = self._df.sort_values(by=column, ascending=ascending)
+        return {
+            "success": True,
+            "output": f"Sorted by {column} (ascending={ascending})",
+        }
+    def _tool_get_result(self, params: dict) -> dict:
+        task = TASKS.get(self._state.task_name, TASKS["task_1"])
+        if self._last_result is not None:
+            score = self._grade_result(self._last_result, task)
+            self._reward = min(1.0, self._reward + 0.5 * score)
+            return {
+                "success": True,
+                "output": f"Final result: {self._last_result}",
+                "score": score,
+            }
+        return {"success": False, "output": "No result available", "error": "NoResult"}
+    def _tool_merge_datasets(self, params: dict) -> dict:
+        filename = params.get("filename", "")
+        on = params.get("on", "")
+        filepath = os.path.join(self._data_dir, filename)
+        if not os.path.exists(filepath):
+            return {
+                "success": False,
+                "output": f"File not found: {filename}",
+                "error": "FileNotFound",
+            }
+        other_df = pd.read_csv(filepath)
+        if on not in self._df.columns or on not in other_df.columns:
+            return {
+                "success": False,
+                "output": f"Merge column not found: {on}",
+                "error": "ColumnNotFound",
+            }
+        self._df = self._df.merge(other_df, on=on)
+        return {
+            "success": True,
+            "output": f"Merged with {filename} on {on}. Shape: {self._df.shape}",
+        }
+    def _grade_result(self, result: Any, task: dict) -> float:
+        task_name = self._state.task_name
+        if task_name == "task_1":
+            expected = task.get("expected_answer", 0)
+            if expected is None:
+                return 0.0
+            try:
+                actual = float(result)
+                if abs(actual - expected) < 0.01:
+                    return 1.0
+                elif abs(actual - expected) < abs(expected) * 0.1:
+                    return 0.7
+                else:
+                    return 0.3
+            except:
+                return 0.0
+        elif task_name == "task_2":
+            expected = task.get("expected_answer", 0)
+            if expected is None:
+                return 0.0
+            try:
+                actual = float(result)
+                if abs(actual - expected) < 0.01:
+                    return 1.0
+                elif abs(actual - expected) < abs(expected) * 0.1:
+                    return 0.7
+                else:
+                    return 0.3
+            except:
+                return 0.0
+        elif task_name == "task_3":
+            expected = task.get("expected_answer", {})
+            if expected is None or not isinstance(expected, dict):
+                return 0.0
+            try:
+                actual = dict(result) if hasattr(result, "items") else result
+                if isinstance(actual, dict) and isinstance(expected, dict):
+                    if set(actual.keys()) == set(expected.keys()):
+                        total_error = sum(
+                            abs(actual.get(k, 0) - expected.get(k, 0)) for k in expected
+                        )
+                        if total_error < 0.01:
+                            return 1.0
+                        elif total_error < 50:
+                            return 0.7
+                        else:
+                            return 0.3
+                return 0.5
+            except:
+                return 0.2
+        return 0.0
+    @property
+    def state(self) -> DataAnalysisState:
+        return self._state

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+openenv-core
+pandas
+fastapi
+uvicorn

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff