| | --- |
| | language: |
| | - en |
| | license: mit |
| | tags: |
| | - reinforcement-learning |
| | - q-learning |
| | - game-ai |
| | - teeworlds |
| | - openenv |
| | library_name: custom |
| | pipeline_tag: reinforcement-learning |
| | model-index: |
| | - name: teeunit-agent |
| | results: |
| | - task: |
| | type: reinforcement-learning |
| | name: Game Playing |
| | dataset: |
| | type: custom |
| | name: TeeUnit Environment |
| | metrics: |
| | - type: reward |
| | value: 39.38 |
| | name: Total Reward (20 episodes) |
| | --- |
| | |
| | # TeeUnit Agent |
| |
|
| | Trained RL agents for the [TeeUnit Environment](https://huggingface.co/spaces/ziadbc/teeunit-env) - an OpenEnv-compatible Teeworlds arena for LLM-based reinforcement learning. |
| |
|
| | ## Environment |
| |
|
| | - **Space**: [ziadbc/teeunit-env](https://huggingface.co/spaces/ziadbc/teeunit-env) |
| | - **GitHub**: [ziadgit/teeunit](https://github.com/ziadgit/teeunit) |
| | - **Game**: Teeworlds 0.7.5 arena (simulation mode) |
| |
|
| | ## Available Models |
| |
|
| | ### Q-Learning Agent (Latest) |
| | - **File**: `teeunit_qlearning_agent.json` / `teeunit_qlearning_agent.pkl` |
| | - **Algorithm**: Tabular Q-Learning |
| | - **Training**: 20 episodes, 938 steps |
| | - **Total Reward**: 39.38 |
| |
|
| | ### Actions |
| | The agent can perform 7 actions: |
| | | Action | Description | |
| | |--------|-------------| |
| | | `move left` | Move character left | |
| | | `move right` | Move character right | |
| | | `move none` | Stop moving | |
| | | `jump` | Jump | |
| | | `shoot pistol` | Fire pistol (weapon 1) | |
| | | `shoot shotgun` | Fire shotgun (weapon 2) | |
| | | `hook` | Use grappling hook | |
| |
|
| | ## Usage |
| |
|
| | ### Load and Use the Agent |
| |
|
| | ```python |
| | import json |
| | import random |
| | |
| | # Load model |
| | with open('teeunit_qlearning_agent.json') as f: |
| | model = json.load(f) |
| | |
| | q_table = model['q_table'] |
| | actions = model['actions'] |
| | |
| | def get_state_key(status_text): |
| | """Extract state from game status text.""" |
| | lines = status_text.split('\n') |
| | state = [] |
| | for line in lines: |
| | if 'Position:' in line: |
| | try: |
| | pos = line.split('(')[1].split(')')[0] |
| | x, y = map(float, pos.split(',')) |
| | state.append(f'pos_{int(x//100)}_{int(y//100)}') |
| | except: |
| | state.append('pos_unknown') |
| | if 'Health:' in line: |
| | try: |
| | health = int(line.split(':')[1].split('/')[0].strip()) |
| | state.append(f'hp_{health//3}') |
| | except: |
| | pass |
| | if 'units away' in line: |
| | try: |
| | dist = float(line.split(',')[-1].replace('units away', '').strip()) |
| | state.append(f'enemy_{"close" if dist < 100 else "mid" if dist < 200 else "far"}') |
| | except: |
| | pass |
| | return str(tuple(sorted(state))) if state else "('default',)" |
| | |
| | def choose_action(state_key): |
| | """Choose best action for given state.""" |
| | if state_key in q_table: |
| | q_values = q_table[state_key] |
| | best_action = max(q_values.keys(), key=lambda a: q_values[a]) |
| | return int(best_action) |
| | return random.randint(0, len(actions) - 1) |
| | |
| | # Example usage |
| | state_key = get_state_key(status_text) |
| | action_idx = choose_action(state_key) |
| | action = actions[action_idx] |
| | print(f"Action: {action['tool']} with args {action['args']}") |
| | ``` |
| |
|
| | ### Connect to Environment |
| |
|
| | ```python |
| | import asyncio |
| | import websockets |
| | import json |
| | |
| | async def play(): |
| | uri = 'wss://ziadbc-teeunit-env.hf.space/ws' |
| | |
| | async with websockets.connect(uri) as ws: |
| | # Reset environment |
| | await ws.send(json.dumps({'type': 'reset', 'data': {}})) |
| | await ws.recv() |
| | |
| | # Get status |
| | await ws.send(json.dumps({ |
| | 'type': 'step', |
| | 'data': {'type': 'call_tool', 'tool_name': 'get_status', 'arguments': {}} |
| | })) |
| | resp = json.loads(await ws.recv()) |
| | status = resp['data']['observation']['result']['data'] |
| | |
| | # Choose and execute action |
| | state_key = get_state_key(status) |
| | action = actions[choose_action(state_key)] |
| | |
| | await ws.send(json.dumps({ |
| | 'type': 'step', |
| | 'data': {'type': 'call_tool', 'tool_name': action['tool'], 'arguments': action['args']} |
| | })) |
| | resp = json.loads(await ws.recv()) |
| | reward = resp['data']['reward'] |
| | print(f"Reward: {reward}") |
| | |
| | asyncio.run(play()) |
| | ``` |
| |
|
| | ## Training Your Own Agent |
| |
|
| | See the [Colab notebook](https://github.com/ziadgit/teeunit/blob/main/notebooks/teeunit_training.ipynb) for training examples using: |
| | - **Q-Learning** (tabular) |
| | - **Stable Baselines3** (PPO, A2C) |
| | - **Unsloth/TRL** (LLM fine-tuning) |
| |
|
| | ## Environment API |
| |
|
| | The TeeUnit environment exposes these MCP tools: |
| |
|
| | | Tool | Arguments | Description | |
| | |------|-----------|-------------| |
| | | `move` | `direction: "left"\|"right"\|"none"` | Move horizontally | |
| | | `jump` | - | Jump (can double-jump) | |
| | | `aim` | `x: int, y: int` | Aim at coordinates | |
| | | `shoot` | `weapon: 0-5` | Fire weapon | |
| | | `hook` | - | Toggle grappling hook | |
| | | `get_status` | - | Get game state as text | |
| |
|
| | ## License |
| |
|
| | MIT License - See [GitHub repo](https://github.com/ziadgit/teeunit) for details. |
| |
|