thomasm6m6 commited on
Commit
3650272
·
verified ·
1 Parent(s): 7bbdee4

Switch Space to minimal OpenEnv demo

Browse files
Files changed (1) hide show
  1. README.md +9 -103
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: Freeciv Environment Server
3
- emoji: 🎮
4
  colorFrom: blue
5
  colorTo: indigo
6
  sdk: docker
@@ -11,107 +11,13 @@ tags:
11
  - openenv
12
  ---
13
 
14
- # freeciv-env
15
 
16
- OpenEnv environment for Freeciv, built on top of `freeciv-bot`.
17
 
18
- ## Current scope
 
 
 
19
 
20
- This environment exposes a small, trainable action surface:
21
-
22
- - `end_turn`
23
- - `move_unit(unit_id, direction)`
24
- - `build_city(unit_id)`
25
- - `set_city_production(city_id, target)`
26
- - `set_research(tech_name)`
27
-
28
- Observations are text-first and include compact structured summaries of:
29
-
30
- - current turn
31
- - score
32
- - known and visible map tiles
33
- - units
34
- - cities
35
- - legal actions
36
-
37
- ## Local development
38
-
39
- Install dependencies:
40
-
41
- ```bash
42
- uv sync --extra dev
43
- ```
44
-
45
- Run tests:
46
-
47
- ```bash
48
- uv run pytest
49
- ```
50
-
51
- Run the server:
52
-
53
- ```bash
54
- uv run uvicorn freeciv_env.server.app:app --host 0.0.0.0 --port 8000
55
- ```
56
-
57
- Run the fast GRPO loop:
58
-
59
- ```bash
60
- uv sync --extra dev --extra train
61
- uv run python scripts/train_grpo_fast.py --env-url http://127.0.0.1 --max-steps 50
62
- ```
63
-
64
- ## Hackathon / Unsloth notes
65
-
66
- For the hackathon Colab submission path on H100s, Unsloth recommended the BF16 OpenEnv gpt-oss 20B notebook:
67
-
68
- - <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/OpenEnv_gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynb>
69
-
70
- If you adapt that notebook for this environment, reduce `max_steps` to `300` for a faster run.
71
-
72
- Useful notebook indexes:
73
-
74
- - RL notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl>
75
- - all notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks>
76
- - notebook repo: <https://github.com/unslothai/notebooks/tree/main/nb>
77
-
78
- If GRPO is too slow, start from a smaller notebook with `fast_inference = True` and add the Freeciv/OpenEnv calls:
79
-
80
- - <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb>
81
- - <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Advanced_Llama3_2_(3B)_GRPO_LoRA.ipynb>
82
-
83
- If vLLM GRPO fails, Unsloth suggested a clean virtualenv install:
84
-
85
- ```bash
86
- python -m venv unsloth_env
87
- source unsloth_env/bin/activate
88
- pip install --upgrade pip && pip install uv
89
- uv pip install unsloth vllm --torch-backend=auto
90
- ```
91
-
92
- If Unsloth is already installed, update it for the latest GRPO fixes:
93
-
94
- ```bash
95
- pip install --upgrade --no-cache-dir --no-deps unsloth unsloth_zoo
96
- ```
97
-
98
- ## Live runtime requirements
99
-
100
- The default server app uses `freeciv-bot` against a local Freeciv Web runtime.
101
-
102
- Environment variables:
103
-
104
- - `FREECIV_SERVER_URL` (default: `http://127.0.0.1`)
105
- - `FREECIV_USERNAME` (default: `openenvbot`)
106
- - `FREECIV_CLIENT_PORT` (default: `6000`)
107
- - `FREECIV_TURN_TIMEOUT_S` (default: `60`)
108
-
109
- The included automated tests use a fake session backend, so they do not require a live Freeciv server.
110
-
111
- The GRPO training script uses:
112
-
113
- - `Qwen/Qwen3.5-0.8B`
114
- - Unsloth bf16 LoRA loading
115
- - TRL `GRPOTrainer`
116
- - integer-only action selection to minimize generated tokens
117
- - offline GRPO over env-sampled states for maximum throughput
 
1
  ---
2
+ title: Minimal OpenEnv Demo
3
+ emoji:
4
  colorFrom: blue
5
  colorTo: indigo
6
  sdk: docker
 
11
  - openenv
12
  ---
13
 
14
+ # minimal-openenv-demo
15
 
16
+ A tiny OpenEnv Space for UI screenshots.
17
 
18
+ Actions:
19
+ - `noop`
20
+ - `increment(amount)`
21
+ - `finish`
22
 
23
+ The environment only maintains a small counter and always responds immediately.