Spaces:

landrew9
/

CollabReasoning

Sleeping

App Files Files Community

CollabReasoning / README.md

Andrew Lara

Deploy landing page update to Space

ee91164 15 days ago

preview code

raw

history blame contribute delete

4.64 kB

	---
	title: ReasoningEconomicsEnv
	sdk: docker
	app_port: 8000
	tags:
	- openenv
	- reasoning-economic-env
	- rl
	- math
	---

	# ReasoningEconomicsEnv

	An RL environment for learning to allocate reasoning compute under budget constraints.

	> Modern reasoning models like DeepSeek-R1 "think" by generating internal tokens before
	> answering. More tokens = deeper reasoning = better answers — but tokens cost compute and
	> money. How should an agent decide how much to think on each problem?

	ReasoningEconomicsEnv frames this as a sequential decision problem: an agent faces a series
	of math questions with a fixed total token budget and must learn to allocate tokens wisely
	— spending less on easy questions, more on hard ones.

	Built on [Meta's OpenEnv framework](https://github.com/meta-pytorch/OpenEnv) for the
	[AgentX–AgentBeats Competition](https://rdi.berkeley.edu/agentx-agentbeats) hosted by
	Berkeley RDI.

	---

	## How It Works

	```
	Episode (10 questions, 4000 token budget)
	┌─────────────────────────────────────────────────────────┐
	│ 1. Agent observes: question embedding, remaining budget │
	│ 2. Agent decides: token allocation (50–800) │
	│ 3. Solver attempts question with that token limit │
	│ 4. Reward = correctness − β·cost + γ·efficiency_bonus │
	│ 5. Repeat until all questions answered or budget gone │
	└─────────────────────────────────────────────────────────┘
	```

	Reward formula: `R = correctness(±1/−0.1) − β·(tokens_used/budget) + γ·(savings/budget)`

	---

	## Quick Start

	```bash
	pip install -e .

	# Run the OpenEnv server
	uvicorn reasonbudget_gym.server.app:app --port 8000

	# In another terminal — use the Python client
	python -c "
	from reasonbudget_gym.client import ReasonBudgetClient
	client = ReasonBudgetClient()
	obs = client.reset()
	result = client.step(200)
	print(result.reward, result.done)
	"
	```

	Or run baseline evaluation locally:

	```bash
	python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json
	python -m reasonbudget_gym.eval.plots eval_results.json
	```

	---

	## Baselines

	\| Agent \| Mean Accuracy \| Mean Reward \| Budget Used \|
	\|-------\|---------------\|-------------\|-------------\|
	\| `uniform` \| 0.780 \| 7.620 \| 100.0% \|
	\| `greedy_max` \| 0.840 \| 4.163 \| 100.0% \|
	\| `oracle` \| 0.728 \| 6.933 \| 98.3% \|
	\| `bandit` \| 0.744 \| 6.526 \| 98.8% \|

	Evaluation command:

	```bash
	python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json
	```

	![Baseline comparison](docs/agent_comparison.png)

	![Budget pacing](docs/budget_pacing.png)

	---

	## Observation Space

	\| Field \| Shape \| Description \|
	\|-------\|-------\|-------------\|
	\| `question_embedding` \| 384-dim \| Sentence-transformer encoding \|
	\| `remaining_budget` \| int \| Tokens left in episode \|
	\| `questions_remaining` \| int \| Questions left \|
	\| `budget_per_remaining` \| float \| remaining / questions_left \|
	\| `accuracy_so_far` \| float \| Running accuracy [0, 1] \|
	\| `history` \| list \| Past (allocated, used, correct) tuples \|

	Action: integer token allocation, clamped to `[min_tokens, max_tokens]` and remaining budget.

	---

	## Data

	The repo ships with a deterministic offline question bundle and response cache under
	`reasonbudget_gym/data/`, so demos and tests work without external services.

	A synthetic cache (`reasonbudget_gym/data/response_cache.json`) simulates realistic
	DeepSeek-R1 accuracy curves across 4 difficulty tiers: `gsm8k`, `math_l1_l2`, `math_l3`,
	`math_l4_l5`. The sampler also caches MiniLM embeddings to
	`reasonbudget_gym/data/embeddings.npy` after the first run.

	Regenerate the synthetic cache with:

	```bash
	python reasonbudget_gym/data/generate_synthetic_cache.py
	```

	---

	## Deployment (Docker / HF Spaces)

	```bash
	docker build -t reasoning-economic-env .
	docker run -p 8000:8000 reasoning-economic-env
	curl http://localhost:8000/health
	```

	---

	## Related Work

	- [MAS-TTS](https://github.com/jincan333/MAS-TTS): Allocates reasoning across agents on
	one problem vs. our approach of allocating across questions for a single agent.
	- [AgentTTS](https://arxiv.org/abs/2508.00890): Test-time compute-optimal scaling across
	multi-stage complex tasks.

	---

	## Citation

	Part of the AgentX–AgentBeats Competition (Berkeley RDI, 2026).
	Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) by Meta/PyTorch.