CollabReasoning / CODEX_CONTEXT.md
Andrew Lara
Deploy landing page update to Space
ee91164

Codex Context — ReasoningEconomicsEnv

Project

  • Repo root: /Users/andrew/Mac/RL Research
  • GitHub repo: git@github.com:laraandrew/reasoningeconomicsenv.git
  • Active branch: polish-and-deploy
  • Hugging Face Space: landrew9/CollabReasoning
  • Package: reasonbudget_gym
  • Goal: RL environment for token-budget allocation, competition submission, Docker-based HF Space deployment

Remotes

  • origin: git@github.com:laraandrew/reasoningeconomicsenv.git
  • hf: https://huggingface.co/spaces/landrew9/CollabReasoning

Current State

  • main and polish-and-deploy originally pointed to the same base commit.
  • Work on polish-and-deploy is pushed to GitHub through commit efdc42b.
  • The shipped cache works:
    • CachedSolver(EnvConfig())._cache loads 500 entries.
  • The environment now defaults to an offline-safe path for cached runs:
    • EpisodeSampler uses deterministic bundled questions when the cached solver is active.
  • Real question embeddings are enabled and cached at:
    • reasonbudget_gym/data/embeddings.npy
  • README now contains measured evaluation metrics and embedded plot assets.
  • CI exists at .github/workflows/ci.yml.
  • Dockerfile was slimmed to a runtime-only serving image suitable for HF Spaces.
  • The Hugging Face Space repo was force-updated from a clean temporary clone because Hugging Face rejected the branch's historical raw binary blobs.
  • The live Space is currently:
    • Hub page: https://huggingface.co/spaces/landrew9/CollabReasoning
    • Host: https://landrew9-collabreasoning.hf.space
    • Runtime stage: RUNNING
    • Health endpoint: /health
    • Root path originally returned 404; a landing page at / was then added in server/app.py

Local Tooling

  • Hugging Face CLI installed globally via the official installer.
  • Binary path: /Users/andrew/.local/bin/hf
  • Reported version at install time: 1.8.0
  • Installer added /Users/andrew/.local/bin to /Users/andrew/.zshrc
  • git-lfs and git-xet are installed and initialized globally.
  • .gitattributes now tracks:
    • docs/*.png
    • reasonbudget_gym/data/*.npy

Verified Commands

  • Tests:
    • .venv/bin/python -m pytest reasonbudget_gym/tests/ -v
    • Result: 8 passed
  • Eval:
    • .venv/bin/python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json
  • Plot generation:
    • .venv/bin/python -c "from reasonbudget_gym.eval.plots import agent_comparison, budget_pacing; agent_comparison('eval_results.json', 'docs/agent_comparison.png'); budget_pacing('eval_results.json', 'docs/budget_pacing.png')"
  • PPO smoke test:
    • .venv/bin/python -m reasonbudget_gym.training.ppo_train --n_episodes 100 --output_dir runs/smoke
    • Completed successfully and wrote checkpoints.
  • Docker:
    • docker build -t reasoning-economic-env .
    • docker run -d -p 8000:8000 --name reasoning-economic-env-test reasoning-economic-env
    • curl http://127.0.0.1:8000/health
    • Result: {"status":"ok","env":"ReasonBudgetEnv","version":"0.1.0"}

Current Eval Numbers

From eval_results.json with --n_episodes 50 --seed 42:

Agent Mean Accuracy Mean Reward Budget Used
uniform 0.780 7.620 100.0%
greedy_max 0.840 4.163 100.0%
oracle 0.728 6.933 98.3%
bandit 0.744 6.526 98.8%

Important Files

  • reasonbudget_gym/env/episode_sampler.py
  • reasonbudget_gym/env/config.py
  • reasonbudget_gym/solver/cached_solver.py
  • reasonbudget_gym/eval/evaluate.py
  • reasonbudget_gym/server/app.py
  • Dockerfile
  • README.md
  • .github/workflows/ci.yml
  • eval_results.json
  • docs/agent_comparison.png
  • docs/budget_pacing.png

Git History Added On This Branch

  • 29b6ad0 Add gitignore for local dev artifacts
  • ecd0ab1 Use bundled questions for cached offline runs
  • 9e122a2 Cache MiniLM question embeddings
  • c4d6234 Add GitHub Actions test workflow
  • fc6c606 Add baseline eval results and README plots
  • 280a6de Slim Docker image for HF deployment
  • fc4c73c Add living Codex context file
  • efdc42b Track Space binaries with Xet

Notes For Next Codex

  • Keep HANDOFF.md deleted; update this file instead.
  • Do not remove reasonbudget_gym/data/response_cache.json or reasonbudget_gym/data/embeddings.npy; they are part of the current offline/demo story.
  • The Docker image should stay lean; avoid reintroducing sentence-transformers, datasets, or training dependencies into the serving image unless truly needed.
  • If enabling the live solver later, configure secrets in Hugging Face Space settings rather than hard-coding them.
  • The local repo may also have an hf remote pointing at the Space repo; if so, pushes there will trigger Space rebuilds.