Spaces:
Sleeping
Sleeping
| title: ReasoningEconomicsEnv | |
| sdk: docker | |
| app_port: 8000 | |
| tags: | |
| - openenv | |
| - reasoning-economic-env | |
| - rl | |
| - math | |
| # ReasoningEconomicsEnv | |
| **An RL environment for learning to allocate reasoning compute under budget constraints.** | |
| > Modern reasoning models like DeepSeek-R1 "think" by generating internal tokens before | |
| > answering. More tokens = deeper reasoning = better answers β but tokens cost compute and | |
| > money. How should an agent decide how much to think on each problem? | |
| ReasoningEconomicsEnv frames this as a sequential decision problem: an agent faces a series | |
| of math questions with a fixed total token budget and must learn to **allocate tokens wisely** | |
| β spending less on easy questions, more on hard ones. | |
| Built on [Meta's OpenEnv framework](https://github.com/meta-pytorch/OpenEnv) for the | |
| [AgentXβAgentBeats Competition](https://rdi.berkeley.edu/agentx-agentbeats) hosted by | |
| Berkeley RDI. | |
| --- | |
| ## How It Works | |
| ``` | |
| Episode (10 questions, 4000 token budget) | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β 1. Agent observes: question embedding, remaining budget β | |
| β 2. Agent decides: token allocation (50β800) β | |
| β 3. Solver attempts question with that token limit β | |
| β 4. Reward = correctness β Ξ²Β·cost + Ξ³Β·efficiency_bonus β | |
| β 5. Repeat until all questions answered or budget gone β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Reward formula:** `R = correctness(Β±1/β0.1) β Ξ²Β·(tokens_used/budget) + Ξ³Β·(savings/budget)` | |
| --- | |
| ## Quick Start | |
| ```bash | |
| pip install -e . | |
| # Run the OpenEnv server | |
| uvicorn reasonbudget_gym.server.app:app --port 8000 | |
| # In another terminal β use the Python client | |
| python -c " | |
| from reasonbudget_gym.client import ReasonBudgetClient | |
| client = ReasonBudgetClient() | |
| obs = client.reset() | |
| result = client.step(200) | |
| print(result.reward, result.done) | |
| " | |
| ``` | |
| **Or run baseline evaluation locally:** | |
| ```bash | |
| python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json | |
| python -m reasonbudget_gym.eval.plots eval_results.json | |
| ``` | |
| --- | |
| ## Baselines | |
| | Agent | Mean Accuracy | Mean Reward | Budget Used | | |
| |-------|---------------|-------------|-------------| | |
| | `uniform` | 0.780 | 7.620 | 100.0% | | |
| | `greedy_max` | 0.840 | 4.163 | 100.0% | | |
| | `oracle` | 0.728 | 6.933 | 98.3% | | |
| | `bandit` | 0.744 | 6.526 | 98.8% | | |
| Evaluation command: | |
| ```bash | |
| python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json | |
| ``` | |
|  | |
|  | |
| --- | |
| ## Observation Space | |
| | Field | Shape | Description | | |
| |-------|-------|-------------| | |
| | `question_embedding` | 384-dim | Sentence-transformer encoding | | |
| | `remaining_budget` | int | Tokens left in episode | | |
| | `questions_remaining` | int | Questions left | | |
| | `budget_per_remaining` | float | remaining / questions_left | | |
| | `accuracy_so_far` | float | Running accuracy [0, 1] | | |
| | `history` | list | Past (allocated, used, correct) tuples | | |
| **Action:** integer token allocation, clamped to `[min_tokens, max_tokens]` and remaining budget. | |
| --- | |
| ## Data | |
| The repo ships with a deterministic offline question bundle and response cache under | |
| `reasonbudget_gym/data/`, so demos and tests work without external services. | |
| A **synthetic cache** (`reasonbudget_gym/data/response_cache.json`) simulates realistic | |
| DeepSeek-R1 accuracy curves across 4 difficulty tiers: `gsm8k`, `math_l1_l2`, `math_l3`, | |
| `math_l4_l5`. The sampler also caches MiniLM embeddings to | |
| `reasonbudget_gym/data/embeddings.npy` after the first run. | |
| Regenerate the synthetic cache with: | |
| ```bash | |
| python reasonbudget_gym/data/generate_synthetic_cache.py | |
| ``` | |
| --- | |
| ## Deployment (Docker / HF Spaces) | |
| ```bash | |
| docker build -t reasoning-economic-env . | |
| docker run -p 8000:8000 reasoning-economic-env | |
| curl http://localhost:8000/health | |
| ``` | |
| --- | |
| ## Related Work | |
| - **[MAS-TTS](https://github.com/jincan333/MAS-TTS):** Allocates reasoning across *agents* on | |
| one problem vs. our approach of allocating across *questions* for a single agent. | |
| - **[AgentTTS](https://arxiv.org/abs/2508.00890):** Test-time compute-optimal scaling across | |
| multi-stage complex tasks. | |
| --- | |
| ## Citation | |
| Part of the AgentXβAgentBeats Competition (Berkeley RDI, 2026). | |
| Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) by Meta/PyTorch. | |