thomasm6m6's picture
Initial Freeciv OpenEnv Space
8dc7642 verified
Metadata-Version: 2.4
Name: freeciv-env
Version: 0.1.0
Summary: OpenEnv environment for Freeciv via freeciv-bot
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: openenv-core[core]==0.2.1
Requires-Dist: freecivbot @ git+https://github.com/chris1869/freeciv-bot.git
Requires-Dist: uvicorn>=0.35.0
Provides-Extra: dev
Requires-Dist: pytest>=8.4.1; extra == "dev"
Requires-Dist: requests>=2.32.5; extra == "dev"
Provides-Extra: train
Requires-Dist: accelerate>=1.10.0; extra == "train"
Requires-Dist: bitsandbytes>=0.47.0; extra == "train"
Requires-Dist: datasets>=4.0.0; extra == "train"
Requires-Dist: trl>=0.24.0; extra == "train"
Requires-Dist: unsloth>=2026.3.4; extra == "train"
---
title: Freeciv Environment Server
emoji: 🎮
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# freeciv-env
OpenEnv environment for Freeciv, built on top of `freeciv-bot`.
## Current scope
This environment exposes a small, trainable action surface:
- `end_turn`
- `move_unit(unit_id, direction)`
- `build_city(unit_id)`
- `set_city_production(city_id, target)`
- `set_research(tech_name)`
Observations are text-first and include compact structured summaries of:
- current turn
- score
- known and visible map tiles
- units
- cities
- legal actions
## Local development
Install dependencies:
```bash
uv sync --extra dev
```
Run tests:
```bash
uv run pytest
```
Run the server:
```bash
uv run uvicorn freeciv_env.server.app:app --host 0.0.0.0 --port 8000
```
Run the fast GRPO loop:
```bash
uv sync --extra dev --extra train
uv run python scripts/train_grpo_fast.py --env-url http://127.0.0.1 --max-steps 50
```
## Hackathon / Unsloth notes
For the hackathon Colab submission path on H100s, Unsloth recommended the BF16 OpenEnv gpt-oss 20B notebook:
- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/OpenEnv_gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynb>
If you adapt that notebook for this environment, reduce `max_steps` to `300` for a faster run.
Useful notebook indexes:
- RL notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl>
- all notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks>
- notebook repo: <https://github.com/unslothai/notebooks/tree/main/nb>
If GRPO is too slow, start from a smaller notebook with `fast_inference = True` and add the Freeciv/OpenEnv calls:
- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb>
- <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Advanced_Llama3_2_(3B)_GRPO_LoRA.ipynb>
If vLLM GRPO fails, Unsloth suggested a clean virtualenv install:
```bash
python -m venv unsloth_env
source unsloth_env/bin/activate
pip install --upgrade pip && pip install uv
uv pip install unsloth vllm --torch-backend=auto
```
If Unsloth is already installed, update it for the latest GRPO fixes:
```bash
pip install --upgrade --no-cache-dir --no-deps unsloth unsloth_zoo
```
## Live runtime requirements
The default server app uses `freeciv-bot` against a local Freeciv Web runtime.
Environment variables:
- `FREECIV_SERVER_URL` (default: `http://127.0.0.1`)
- `FREECIV_USERNAME` (default: `openenvbot`)
- `FREECIV_CLIENT_PORT` (default: `6000`)
- `FREECIV_TURN_TIMEOUT_S` (default: `60`)
The included automated tests use a fake session backend, so they do not require a live Freeciv server.
The GRPO training script uses:
- `Qwen/Qwen3.5-0.8B`
- Unsloth bf16 LoRA loading
- TRL `GRPOTrainer`
- integer-only action selection to minimize generated tokens
- offline GRPO over env-sampled states for maximum throughput