Spaces:
Sleeping
Sleeping
File size: 4,640 Bytes
ee91164 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | ---
title: ReasoningEconomicsEnv
sdk: docker
app_port: 8000
tags:
- openenv
- reasoning-economic-env
- rl
- math
---
# ReasoningEconomicsEnv
**An RL environment for learning to allocate reasoning compute under budget constraints.**
> Modern reasoning models like DeepSeek-R1 "think" by generating internal tokens before
> answering. More tokens = deeper reasoning = better answers β but tokens cost compute and
> money. How should an agent decide how much to think on each problem?
ReasoningEconomicsEnv frames this as a sequential decision problem: an agent faces a series
of math questions with a fixed total token budget and must learn to **allocate tokens wisely**
β spending less on easy questions, more on hard ones.
Built on [Meta's OpenEnv framework](https://github.com/meta-pytorch/OpenEnv) for the
[AgentXβAgentBeats Competition](https://rdi.berkeley.edu/agentx-agentbeats) hosted by
Berkeley RDI.
---
## How It Works
```
Episode (10 questions, 4000 token budget)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Agent observes: question embedding, remaining budget β
β 2. Agent decides: token allocation (50β800) β
β 3. Solver attempts question with that token limit β
β 4. Reward = correctness β Ξ²Β·cost + Ξ³Β·efficiency_bonus β
β 5. Repeat until all questions answered or budget gone β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
**Reward formula:** `R = correctness(Β±1/β0.1) β Ξ²Β·(tokens_used/budget) + Ξ³Β·(savings/budget)`
---
## Quick Start
```bash
pip install -e .
# Run the OpenEnv server
uvicorn reasonbudget_gym.server.app:app --port 8000
# In another terminal β use the Python client
python -c "
from reasonbudget_gym.client import ReasonBudgetClient
client = ReasonBudgetClient()
obs = client.reset()
result = client.step(200)
print(result.reward, result.done)
"
```
**Or run baseline evaluation locally:**
```bash
python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json
python -m reasonbudget_gym.eval.plots eval_results.json
```
---
## Baselines
| Agent | Mean Accuracy | Mean Reward | Budget Used |
|-------|---------------|-------------|-------------|
| `uniform` | 0.780 | 7.620 | 100.0% |
| `greedy_max` | 0.840 | 4.163 | 100.0% |
| `oracle` | 0.728 | 6.933 | 98.3% |
| `bandit` | 0.744 | 6.526 | 98.8% |
Evaluation command:
```bash
python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json
```


---
## Observation Space
| Field | Shape | Description |
|-------|-------|-------------|
| `question_embedding` | 384-dim | Sentence-transformer encoding |
| `remaining_budget` | int | Tokens left in episode |
| `questions_remaining` | int | Questions left |
| `budget_per_remaining` | float | remaining / questions_left |
| `accuracy_so_far` | float | Running accuracy [0, 1] |
| `history` | list | Past (allocated, used, correct) tuples |
**Action:** integer token allocation, clamped to `[min_tokens, max_tokens]` and remaining budget.
---
## Data
The repo ships with a deterministic offline question bundle and response cache under
`reasonbudget_gym/data/`, so demos and tests work without external services.
A **synthetic cache** (`reasonbudget_gym/data/response_cache.json`) simulates realistic
DeepSeek-R1 accuracy curves across 4 difficulty tiers: `gsm8k`, `math_l1_l2`, `math_l3`,
`math_l4_l5`. The sampler also caches MiniLM embeddings to
`reasonbudget_gym/data/embeddings.npy` after the first run.
Regenerate the synthetic cache with:
```bash
python reasonbudget_gym/data/generate_synthetic_cache.py
```
---
## Deployment (Docker / HF Spaces)
```bash
docker build -t reasoning-economic-env .
docker run -p 8000:8000 reasoning-economic-env
curl http://localhost:8000/health
```
---
## Related Work
- **[MAS-TTS](https://github.com/jincan333/MAS-TTS):** Allocates reasoning across *agents* on
one problem vs. our approach of allocating across *questions* for a single agent.
- **[AgentTTS](https://arxiv.org/abs/2508.00890):** Test-time compute-optimal scaling across
multi-stage complex tasks.
---
## Citation
Part of the AgentXβAgentBeats Competition (Berkeley RDI, 2026).
Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) by Meta/PyTorch.
|