Spaces:
Runtime error
Runtime error
| Metadata-Version: 2.4 | |
| Name: freeciv-env | |
| Version: 0.1.0 | |
| Summary: OpenEnv environment for Freeciv via freeciv-bot | |
| Requires-Python: >=3.11 | |
| Description-Content-Type: text/markdown | |
| Requires-Dist: openenv-core[core]==0.2.1 | |
| Requires-Dist: freecivbot @ git+https://github.com/chris1869/freeciv-bot.git | |
| Requires-Dist: uvicorn>=0.35.0 | |
| Provides-Extra: dev | |
| Requires-Dist: pytest>=8.4.1; extra == "dev" | |
| Requires-Dist: requests>=2.32.5; extra == "dev" | |
| Provides-Extra: train | |
| Requires-Dist: accelerate>=1.10.0; extra == "train" | |
| Requires-Dist: bitsandbytes>=0.47.0; extra == "train" | |
| Requires-Dist: datasets>=4.0.0; extra == "train" | |
| Requires-Dist: trl>=0.24.0; extra == "train" | |
| Requires-Dist: unsloth>=2026.3.4; extra == "train" | |
| title: Freeciv Environment Server | |
| emoji: 🎮 | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| # freeciv-env | |
| OpenEnv environment for Freeciv, built on top of `freeciv-bot`. | |
| ## Current scope | |
| This environment exposes a small, trainable action surface: | |
| - `end_turn` | |
| - `move_unit(unit_id, direction)` | |
| - `build_city(unit_id)` | |
| - `set_city_production(city_id, target)` | |
| - `set_research(tech_name)` | |
| Observations are text-first and include compact structured summaries of: | |
| - current turn | |
| - score | |
| - known and visible map tiles | |
| - units | |
| - cities | |
| - legal actions | |
| ## Local development | |
| Install dependencies: | |
| ```bash | |
| uv sync --extra dev | |
| ``` | |
| Run tests: | |
| ```bash | |
| uv run pytest | |
| ``` | |
| Run the server: | |
| ```bash | |
| uv run uvicorn freeciv_env.server.app:app --host 0.0.0.0 --port 8000 | |
| ``` | |
| Run the fast GRPO loop: | |
| ```bash | |
| uv sync --extra dev --extra train | |
| uv run python scripts/train_grpo_fast.py --env-url http://127.0.0.1 --max-steps 50 | |
| ``` | |
| ## Hackathon / Unsloth notes | |
| For the hackathon Colab submission path on H100s, Unsloth recommended the BF16 OpenEnv gpt-oss 20B notebook: | |
| - <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/OpenEnv_gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynb> | |
| If you adapt that notebook for this environment, reduce `max_steps` to `300` for a faster run. | |
| Useful notebook indexes: | |
| - RL notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl> | |
| - all notebooks: <https://unsloth.ai/docs/get-started/unsloth-notebooks> | |
| - notebook repo: <https://github.com/unslothai/notebooks/tree/main/nb> | |
| If GRPO is too slow, start from a smaller notebook with `fast_inference = True` and add the Freeciv/OpenEnv calls: | |
| - <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb> | |
| - <https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Advanced_Llama3_2_(3B)_GRPO_LoRA.ipynb> | |
| If vLLM GRPO fails, Unsloth suggested a clean virtualenv install: | |
| ```bash | |
| python -m venv unsloth_env | |
| source unsloth_env/bin/activate | |
| pip install --upgrade pip && pip install uv | |
| uv pip install unsloth vllm --torch-backend=auto | |
| ``` | |
| If Unsloth is already installed, update it for the latest GRPO fixes: | |
| ```bash | |
| pip install --upgrade --no-cache-dir --no-deps unsloth unsloth_zoo | |
| ``` | |
| ## Live runtime requirements | |
| The default server app uses `freeciv-bot` against a local Freeciv Web runtime. | |
| Environment variables: | |
| - `FREECIV_SERVER_URL` (default: `http://127.0.0.1`) | |
| - `FREECIV_USERNAME` (default: `openenvbot`) | |
| - `FREECIV_CLIENT_PORT` (default: `6000`) | |
| - `FREECIV_TURN_TIMEOUT_S` (default: `60`) | |
| The included automated tests use a fake session backend, so they do not require a live Freeciv server. | |
| The GRPO training script uses: | |
| - `Qwen/Qwen3.5-0.8B` | |
| - Unsloth bf16 LoRA loading | |
| - TRL `GRPOTrainer` | |
| - integer-only action selection to minimize generated tokens | |
| - offline GRPO over env-sampled states for maximum throughput | |