Spaces:

thomasm6m6
/

openenv_hack

Runtime error

App Files Files Community

thomasm6m6 commited on Mar 8

Commit

3650272

verified ·

1 Parent(s): 7bbdee4

Switch Space to minimal OpenEnv demo

Browse files

Files changed (1) hide show

README.md +9 -103

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Freeciv Environment Server
-emoji: 🎮
 colorFrom: blue
 colorTo: indigo
 sdk: docker
@@ -11,107 +11,13 @@ tags:
   - openenv
 ---
-# freeciv-env
-OpenEnv environment for Freeciv, built on top of `freeciv-bot`.
-## Current scope
-This environment exposes a small, trainable action surface:
-- `end_turn`
-- `move_unit(unit_id, direction)`
-- `build_city(unit_id)`
-- `set_city_production(city_id, target)`
-- `set_research(tech_name)`
-Observations are text-first and include compact structured summaries of:
-- current turn
-- score
-- known and visible map tiles
-- units
-- cities
-- legal actions
-## Local development
-Install dependencies:
-```bash
-uv sync --extra dev
-```
-Run tests:
-```bash
-uv run pytest
-```
-Run the server:
-```bash
-uv run uvicorn freeciv_env.server.app:app --host 0.0.0.0 --port 8000
-```
-Run the fast GRPO loop:
-```bash
-uv sync --extra dev --extra train
-uv run python scripts/train_grpo_fast.py --env-url http://127.0.0.1 --max-steps 50
-```
-## Hackathon / Unsloth notes
-For the hackathon Colab submission path on H100s, Unsloth recommended the BF16 OpenEnv gpt-oss 20B notebook:
-- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/OpenEnv_gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynb>
-If you adapt that notebook for this environment, reduce `max_steps` to `300` for a faster run.
-Useful notebook indexes:
-- RL notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl>
-- all notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks>
-- notebook repo: <https://github.com/unslothai/notebooks/tree/main/nb>
-If GRPO is too slow, start from a smaller notebook with `fast_inference = True` and add the Freeciv/OpenEnv calls:
-- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb>
-- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Advanced_Llama3_2_(3B)_GRPO_LoRA.ipynb>
-If vLLM GRPO fails, Unsloth suggested a clean virtualenv install:
-```bash
-python -m venv unsloth_env
-source unsloth_env/bin/activate
-pip install --upgrade pip && pip install uv
-uv pip install unsloth vllm --torch-backend=auto
-```
-If Unsloth is already installed, update it for the latest GRPO fixes:
-```bash
-pip install --upgrade --no-cache-dir --no-deps unsloth unsloth_zoo
-```
-## Live runtime requirements
-The default server app uses `freeciv-bot` against a local Freeciv Web runtime.
-Environment variables:
-- `FREECIV_SERVER_URL` (default: `http://127.0.0.1`)
-- `FREECIV_USERNAME` (default: `openenvbot`)
-- `FREECIV_CLIENT_PORT` (default: `6000`)
-- `FREECIV_TURN_TIMEOUT_S` (default: `60`)
-The included automated tests use a fake session backend, so they do not require a live Freeciv server.
-The GRPO training script uses:
-- `Qwen/Qwen3.5-0.8B`
-- Unsloth bf16 LoRA loading
-- TRL `GRPOTrainer`
-- integer-only action selection to minimize generated tokens
-- offline GRPO over env-sampled states for maximum throughput

 ---
+title: Minimal OpenEnv Demo
+emoji: ✅
 colorFrom: blue
 colorTo: indigo
 sdk: docker
   - openenv
 ---
+# minimal-openenv-demo
+A tiny OpenEnv Space for UI screenshots.
+Actions:
+- `noop`
+- `increment(amount)`
+- `finish`
+The environment only maintains a small counter and always responds immediately.