Spaces:

landrew9
/

CollabReasoning

Sleeping

CollabReasoning / CODEX_CONTEXT.md

Andrew Lara

Deploy landing page update to Space

ee91164 17 days ago

4.76 kB

Codex Context — ReasoningEconomicsEnv

Repo root: /Users/andrew/Mac/RL Research
GitHub repo: git@github.com:laraandrew/reasoningeconomicsenv.git
Active branch: polish-and-deploy
Hugging Face Space: landrew9/CollabReasoning
Package: reasonbudget_gym
Goal: RL environment for token-budget allocation, competition submission, Docker-based HF Space deployment

main and polish-and-deploy originally pointed to the same base commit.
Work on polish-and-deploy is pushed to GitHub through commit efdc42b.
The shipped cache works:
- CachedSolver(EnvConfig())._cache loads 500 entries.
The environment now defaults to an offline-safe path for cached runs:
- EpisodeSampler uses deterministic bundled questions when the cached solver is active.
Real question embeddings are enabled and cached at:
- reasonbudget_gym/data/embeddings.npy
README now contains measured evaluation metrics and embedded plot assets.
CI exists at .github/workflows/ci.yml.
Dockerfile was slimmed to a runtime-only serving image suitable for HF Spaces.
The Hugging Face Space repo was force-updated from a clean temporary clone because Hugging Face rejected the branch's historical raw binary blobs.
The live Space is currently:
- Hub page: https://huggingface.co/spaces/landrew9/CollabReasoning
- Host: https://landrew9-collabreasoning.hf.space
- Runtime stage: RUNNING
- Health endpoint: /health
- Root path originally returned 404; a landing page at / was then added in server/app.py

Tests:
- .venv/bin/python -m pytest reasonbudget_gym/tests/ -v
- Result: 8 passed
Eval:
- .venv/bin/python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json
Plot generation:
- .venv/bin/python -c "from reasonbudget_gym.eval.plots import agent_comparison, budget_pacing; agent_comparison('eval_results.json', 'docs/agent_comparison.png'); budget_pacing('eval_results.json', 'docs/budget_pacing.png')"
PPO smoke test:
- .venv/bin/python -m reasonbudget_gym.training.ppo_train --n_episodes 100 --output_dir runs/smoke
- Completed successfully and wrote checkpoints.
Docker:
- docker build -t reasoning-economic-env .
- docker run -d -p 8000:8000 --name reasoning-economic-env-test reasoning-economic-env
- curl http://127.0.0.1:8000/health
- Result: {"status":"ok","env":"ReasonBudgetEnv","version":"0.1.0"}

From eval_results.json with --n_episodes 50 --seed 42:

Agent	Mean Accuracy	Mean Reward	Budget Used
`uniform`	0.780	7.620	100.0%
`greedy_max`	0.840	4.163	100.0%
`oracle`	0.728	6.933	98.3%
`bandit`	0.744	6.526	98.8%

Keep HANDOFF.md deleted; update this file instead.
Do not remove reasonbudget_gym/data/response_cache.json or reasonbudget_gym/data/embeddings.npy; they are part of the current offline/demo story.
The Docker image should stay lean; avoid reintroducing sentence-transformers, datasets, or training dependencies into the serving image unless truly needed.
If enabling the live solver later, configure secrets in Hugging Face Space settings rather than hard-coding them.
The local repo may also have an hf remote pointing at the Space repo; if so, pushes there will trigger Space rebuilds.