Spaces:

albert-einstein-09
/

codedark

Sleeping

App Files Files Community

codedark / README.md

albert-einstein-09

Upload folder using huggingface_hub

95d976b verified about 1 month ago

preview code

raw

history blame contribute delete

4.43 kB

	---
	title: CodeDark Environment Server
	emoji: 📊
	colorFrom: yellow
	colorTo: purple
	sdk: docker
	pinned: false
	license: mit
	tags:
	- openenv
	- reinforcement-learning
	- data-analytics
	- agents
	- benchmark
	---

	# CodeDark: Data Analytics Environment for RL Agents

	OpenEnv-compatible multi-turn environment for training AI agents on real business analytics tasks.

	## Overview

	CodeDark is the first data analytics environment in the OpenEnv ecosystem. It challenges AI agents to analyze CSV datasets using Python/Pandas, testing their ability to be data scientists rather than just code executors.

	### Key Features

	- Real Business Tasks: Bank marketing and road safety datasets with genuine analytical questions
	- Multi-Turn Interaction: Agents explore data, save notes, ask clarifications, and submit answers
	- Shaped Rewards: 80% correctness + 10% efficiency + 10% token cost
	- Pre-Benchmarked: 25 curated L5-L6 difficulty tasks validated on 11+ models

	## Quick Start

	### Connect to the Environment

	```python
	from openenv import EnvClient

	# Connect to this Space
	env = EnvClient.from_hub("openenv/codedark")

	# Reset for a new task
	obs = env.reset()
	print(f"Task: {obs['question']}")

	# Execute Python code
	obs = env.step({"tool": "run_python", "args": "<code>result = df.shape</code>"})
	print(f"Result: {obs['stdout']}")

	# Submit answer
	obs = env.step({"tool": "submit_answer", "args": "<answer>42.5</answer>"})
	print(f"Reward: {obs['reward']}")
	```

	### Available Tools

	\| Tool \| Description \|
	\| --------------- \| -------------------------------------------------------------- \|
	\| `run_python` \| Execute Python/pandas code. Store result in `result` variable. \|
	\| `read_notes` \| Read saved notes from previous turns. \|
	\| `save_note` \| Save observations for later recall. \|
	\| `clarify` \| Ask clarifying questions (max 2 per episode). \|
	\| `submit_answer` \| Submit final answer. Ends episode. \|

	## Datasets

	### Bank Marketing (750K rows)

	- Target: Term deposit subscription prediction
	- Features: age, job, marital, education, balance, housing, loan, contact, day, month, duration, campaign

	### Road Safety (500K rows)

	- Target: Accident risk assessment
	- Features: road_type, num_lanes, curvature, speed_limit, lighting, weather, time_of_day

	## Task Difficulty

	\| Level \| Complexity \| Example \|
	\| ----- \| --------------- \| -------------------------------------------- \|
	\| L4 \| Quartile/binned \| "Subscription rate in Q1 balance?" \|
	\| L5 \| Multi-condition \| "Rate for month='may' AND job='management'?" \|
	\| L6 \| Nested extrema \| "In lowest subscription month, avg day?" \|

	## Reward Structure

	\| Component \| Weight \| Description \|
	\| ----------- \| ------ \| ----------------------------------------------- \|
	\| Correctness \| 80% \| Binary correct/incorrect with numeric tolerance \|
	\| Efficiency \| 10% \| Fewer turns = better score \|
	\| Token Cost \| 10% \| Lower token usage = better score \|

	## API Endpoints

	\| Endpoint \| Method \| Description \|
	\| ----------- \| ------ \| --------------------- \|
	\| `/health` \| GET \| Health check \|
	\| `/reset` \| POST \| Reset for new episode \|
	\| `/step` \| POST \| Execute action \|
	\| `/state` \| GET \| Current state \|
	\| `/metadata` \| GET \| Environment metadata \|
	\| `/schema` \| GET \| Type schemas \|

	## Benchmark Results

	Pre-benchmarked on 11+ models with 1,844 completions:

	\| Model \| Accuracy \| Avg Turns \|
	\| ---------------- \| -------- \| --------- \|
	\| Claude Opus 4.5 \| 77.3% \| 4.2 \|
	\| Qwen3 Max \| 46.7% \| 5.1 \|
	\| Mistral Large \| 45.3% \| 5.8 \|
	\| Llama 4 Maverick \| 38.7% \| 6.2 \|

	## Links

	- GitHub: [vj-09/codeblue-env](https://github.com/vj-09/codeblue-env)
	- Leaderboard: [analytics-rl.com](https://www.analytics-rl.com)
	- OpenEnv Spec: [meta-pytorch/OpenEnv](https://github.com/meta-pytorch/OpenEnv)

	## License

	MIT License

	## Author

	Vijay Athithya

	- GitHub: [@vj-09](https://github.com/vj-09)
	- LinkedIn: [vijay-athithya](https://www.linkedin.com/in/vijay-athithya/)