Spaces:

landrew9
/

CollabReasoning

Sleeping

App Files Files Community

CollabReasoning / README.md

Andrew Lara

Deploy landing page update to Space

ee91164 14 days ago

preview code

raw

history blame contribute delete

4.64 kB

metadata

title: ReasoningEconomicsEnv
sdk: docker
app_port: 8000
tags:
  - openenv
  - reasoning-economic-env
  - rl
  - math

ReasoningEconomicsEnv

An RL environment for learning to allocate reasoning compute under budget constraints.

Modern reasoning models like DeepSeek-R1 "think" by generating internal tokens before answering. More tokens = deeper reasoning = better answers — but tokens cost compute and money. How should an agent decide how much to think on each problem?

ReasoningEconomicsEnv frames this as a sequential decision problem: an agent faces a series of math questions with a fixed total token budget and must learn to allocate tokens wisely — spending less on easy questions, more on hard ones.

Built on Meta's OpenEnv framework for the AgentX–AgentBeats Competition hosted by Berkeley RDI.

How It Works

Episode (10 questions, 4000 token budget)
┌─────────────────────────────────────────────────────────┐
│  1. Agent observes: question embedding, remaining budget │
│  2. Agent decides: token allocation (50–800)            │
│  3. Solver attempts question with that token limit      │
│  4. Reward = correctness − β·cost + γ·efficiency_bonus  │
│  5. Repeat until all questions answered or budget gone  │
└─────────────────────────────────────────────────────────┘

Reward formula: R = correctness(±1/−0.1) − β·(tokens_used/budget) + γ·(savings/budget)

Quick Start

pip install -e .

# Run the OpenEnv server
uvicorn reasonbudget_gym.server.app:app --port 8000

# In another terminal — use the Python client
python -c "
from reasonbudget_gym.client import ReasonBudgetClient
client = ReasonBudgetClient()
obs = client.reset()
result = client.step(200)
print(result.reward, result.done)
"

Or run baseline evaluation locally:

python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json
python -m reasonbudget_gym.eval.plots eval_results.json

Baselines

Agent	Mean Accuracy	Mean Reward	Budget Used
`uniform`	0.780	7.620	100.0%
`greedy_max`	0.840	4.163	100.0%
`oracle`	0.728	6.933	98.3%
`bandit`	0.744	6.526	98.8%

Evaluation command:

python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json

Observation Space

Field	Shape	Description
`question_embedding`	384-dim	Sentence-transformer encoding
`remaining_budget`	int	Tokens left in episode
`questions_remaining`	int	Questions left
`budget_per_remaining`	float	remaining / questions_left
`accuracy_so_far`	float	Running accuracy [0, 1]
`history`	list	Past (allocated, used, correct) tuples

Action: integer token allocation, clamped to [min_tokens, max_tokens] and remaining budget.

Data

The repo ships with a deterministic offline question bundle and response cache under reasonbudget_gym/data/, so demos and tests work without external services.

A synthetic cache (reasonbudget_gym/data/response_cache.json) simulates realistic DeepSeek-R1 accuracy curves across 4 difficulty tiers: gsm8k, math_l1_l2, math_l3, math_l4_l5. The sampler also caches MiniLM embeddings to reasonbudget_gym/data/embeddings.npy after the first run.

Regenerate the synthetic cache with:

python reasonbudget_gym/data/generate_synthetic_cache.py

Deployment (Docker / HF Spaces)

docker build -t reasoning-economic-env .
docker run -p 8000:8000 reasoning-economic-env
curl http://localhost:8000/health

Related Work

MAS-TTS: Allocates reasoning across agents on one problem vs. our approach of allocating across questions for a single agent.
AgentTTS: Test-time compute-optimal scaling across multi-stage complex tasks.

Citation

Part of the AgentX–AgentBeats Competition (Berkeley RDI, 2026). Built on OpenEnv by Meta/PyTorch.