Spaces:
Runtime error
Runtime error
Switch Space to minimal OpenEnv demo
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: docker
|
|
@@ -11,107 +11,13 @@ tags:
|
|
| 11 |
- openenv
|
| 12 |
---
|
| 13 |
|
| 14 |
-
#
|
| 15 |
|
| 16 |
-
OpenEnv
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
- `end_turn`
|
| 23 |
-
- `move_unit(unit_id, direction)`
|
| 24 |
-
- `build_city(unit_id)`
|
| 25 |
-
- `set_city_production(city_id, target)`
|
| 26 |
-
- `set_research(tech_name)`
|
| 27 |
-
|
| 28 |
-
Observations are text-first and include compact structured summaries of:
|
| 29 |
-
|
| 30 |
-
- current turn
|
| 31 |
-
- score
|
| 32 |
-
- known and visible map tiles
|
| 33 |
-
- units
|
| 34 |
-
- cities
|
| 35 |
-
- legal actions
|
| 36 |
-
|
| 37 |
-
## Local development
|
| 38 |
-
|
| 39 |
-
Install dependencies:
|
| 40 |
-
|
| 41 |
-
```bash
|
| 42 |
-
uv sync --extra dev
|
| 43 |
-
```
|
| 44 |
-
|
| 45 |
-
Run tests:
|
| 46 |
-
|
| 47 |
-
```bash
|
| 48 |
-
uv run pytest
|
| 49 |
-
```
|
| 50 |
-
|
| 51 |
-
Run the server:
|
| 52 |
-
|
| 53 |
-
```bash
|
| 54 |
-
uv run uvicorn freeciv_env.server.app:app --host 0.0.0.0 --port 8000
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
Run the fast GRPO loop:
|
| 58 |
-
|
| 59 |
-
```bash
|
| 60 |
-
uv sync --extra dev --extra train
|
| 61 |
-
uv run python scripts/train_grpo_fast.py --env-url http://127.0.0.1 --max-steps 50
|
| 62 |
-
```
|
| 63 |
-
|
| 64 |
-
## Hackathon / Unsloth notes
|
| 65 |
-
|
| 66 |
-
For the hackathon Colab submission path on H100s, Unsloth recommended the BF16 OpenEnv gpt-oss 20B notebook:
|
| 67 |
-
|
| 68 |
-
- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/OpenEnv_gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynb>
|
| 69 |
-
|
| 70 |
-
If you adapt that notebook for this environment, reduce `max_steps` to `300` for a faster run.
|
| 71 |
-
|
| 72 |
-
Useful notebook indexes:
|
| 73 |
-
|
| 74 |
-
- RL notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl>
|
| 75 |
-
- all notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks>
|
| 76 |
-
- notebook repo: <https://github.com/unslothai/notebooks/tree/main/nb>
|
| 77 |
-
|
| 78 |
-
If GRPO is too slow, start from a smaller notebook with `fast_inference = True` and add the Freeciv/OpenEnv calls:
|
| 79 |
-
|
| 80 |
-
- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb>
|
| 81 |
-
- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Advanced_Llama3_2_(3B)_GRPO_LoRA.ipynb>
|
| 82 |
-
|
| 83 |
-
If vLLM GRPO fails, Unsloth suggested a clean virtualenv install:
|
| 84 |
-
|
| 85 |
-
```bash
|
| 86 |
-
python -m venv unsloth_env
|
| 87 |
-
source unsloth_env/bin/activate
|
| 88 |
-
pip install --upgrade pip && pip install uv
|
| 89 |
-
uv pip install unsloth vllm --torch-backend=auto
|
| 90 |
-
```
|
| 91 |
-
|
| 92 |
-
If Unsloth is already installed, update it for the latest GRPO fixes:
|
| 93 |
-
|
| 94 |
-
```bash
|
| 95 |
-
pip install --upgrade --no-cache-dir --no-deps unsloth unsloth_zoo
|
| 96 |
-
```
|
| 97 |
-
|
| 98 |
-
## Live runtime requirements
|
| 99 |
-
|
| 100 |
-
The default server app uses `freeciv-bot` against a local Freeciv Web runtime.
|
| 101 |
-
|
| 102 |
-
Environment variables:
|
| 103 |
-
|
| 104 |
-
- `FREECIV_SERVER_URL` (default: `http://127.0.0.1`)
|
| 105 |
-
- `FREECIV_USERNAME` (default: `openenvbot`)
|
| 106 |
-
- `FREECIV_CLIENT_PORT` (default: `6000`)
|
| 107 |
-
- `FREECIV_TURN_TIMEOUT_S` (default: `60`)
|
| 108 |
-
|
| 109 |
-
The included automated tests use a fake session backend, so they do not require a live Freeciv server.
|
| 110 |
-
|
| 111 |
-
The GRPO training script uses:
|
| 112 |
-
|
| 113 |
-
- `Qwen/Qwen3.5-0.8B`
|
| 114 |
-
- Unsloth bf16 LoRA loading
|
| 115 |
-
- TRL `GRPOTrainer`
|
| 116 |
-
- integer-only action selection to minimize generated tokens
|
| 117 |
-
- offline GRPO over env-sampled states for maximum throughput
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Minimal OpenEnv Demo
|
| 3 |
+
emoji: ✅
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: docker
|
|
|
|
| 11 |
- openenv
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# minimal-openenv-demo
|
| 15 |
|
| 16 |
+
A tiny OpenEnv Space for UI screenshots.
|
| 17 |
|
| 18 |
+
Actions:
|
| 19 |
+
- `noop`
|
| 20 |
+
- `increment(amount)`
|
| 21 |
+
- `finish`
|
| 22 |
|
| 23 |
+
The environment only maintains a small counter and always responds immediately.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|