Spaces:
Sleeping
Sleeping
File size: 4,433 Bytes
fa1e87d 95d976b fa1e87d 95d976b fa1e87d 95d976b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | ---
title: CodeDark Environment Server
emoji: 📊
colorFrom: yellow
colorTo: purple
sdk: docker
pinned: false
license: mit
tags:
- openenv
- reinforcement-learning
- data-analytics
- agents
- benchmark
---
# CodeDark: Data Analytics Environment for RL Agents
**OpenEnv-compatible multi-turn environment for training AI agents on real business analytics tasks.**
## Overview
CodeDark is the first data analytics environment in the OpenEnv ecosystem. It challenges AI agents to analyze CSV datasets using Python/Pandas, testing their ability to be data scientists rather than just code executors.
### Key Features
- **Real Business Tasks**: Bank marketing and road safety datasets with genuine analytical questions
- **Multi-Turn Interaction**: Agents explore data, save notes, ask clarifications, and submit answers
- **Shaped Rewards**: 80% correctness + 10% efficiency + 10% token cost
- **Pre-Benchmarked**: 25 curated L5-L6 difficulty tasks validated on 11+ models
## Quick Start
### Connect to the Environment
```python
from openenv import EnvClient
# Connect to this Space
env = EnvClient.from_hub("openenv/codedark")
# Reset for a new task
obs = env.reset()
print(f"Task: {obs['question']}")
# Execute Python code
obs = env.step({"tool": "run_python", "args": "<code>result = df.shape</code>"})
print(f"Result: {obs['stdout']}")
# Submit answer
obs = env.step({"tool": "submit_answer", "args": "<answer>42.5</answer>"})
print(f"Reward: {obs['reward']}")
```
### Available Tools
| Tool | Description |
| --------------- | -------------------------------------------------------------- |
| `run_python` | Execute Python/pandas code. Store result in `result` variable. |
| `read_notes` | Read saved notes from previous turns. |
| `save_note` | Save observations for later recall. |
| `clarify` | Ask clarifying questions (max 2 per episode). |
| `submit_answer` | Submit final answer. Ends episode. |
## Datasets
### Bank Marketing (750K rows)
- **Target**: Term deposit subscription prediction
- **Features**: age, job, marital, education, balance, housing, loan, contact, day, month, duration, campaign
### Road Safety (500K rows)
- **Target**: Accident risk assessment
- **Features**: road_type, num_lanes, curvature, speed_limit, lighting, weather, time_of_day
## Task Difficulty
| Level | Complexity | Example |
| ----- | --------------- | -------------------------------------------- |
| L4 | Quartile/binned | "Subscription rate in Q1 balance?" |
| L5 | Multi-condition | "Rate for month='may' AND job='management'?" |
| L6 | Nested extrema | "In lowest subscription month, avg day?" |
## Reward Structure
| Component | Weight | Description |
| ----------- | ------ | ----------------------------------------------- |
| Correctness | 80% | Binary correct/incorrect with numeric tolerance |
| Efficiency | 10% | Fewer turns = better score |
| Token Cost | 10% | Lower token usage = better score |
## API Endpoints
| Endpoint | Method | Description |
| ----------- | ------ | --------------------- |
| `/health` | GET | Health check |
| `/reset` | POST | Reset for new episode |
| `/step` | POST | Execute action |
| `/state` | GET | Current state |
| `/metadata` | GET | Environment metadata |
| `/schema` | GET | Type schemas |
## Benchmark Results
Pre-benchmarked on 11+ models with 1,844 completions:
| Model | Accuracy | Avg Turns |
| ---------------- | -------- | --------- |
| Claude Opus 4.5 | 77.3% | 4.2 |
| Qwen3 Max | 46.7% | 5.1 |
| Mistral Large | 45.3% | 5.8 |
| Llama 4 Maverick | 38.7% | 6.2 |
## Links
- **GitHub**: [vj-09/codeblue-env](https://github.com/vj-09/codeblue-env)
- **Leaderboard**: [analytics-rl.com](https://www.analytics-rl.com)
- **OpenEnv Spec**: [meta-pytorch/OpenEnv](https://github.com/meta-pytorch/OpenEnv)
## License
MIT License
## Author
**Vijay Athithya**
- GitHub: [@vj-09](https://github.com/vj-09)
- LinkedIn: [vijay-athithya](https://www.linkedin.com/in/vijay-athithya/)
|