File size: 4,974 Bytes

---
language:
- en
license: mit
tags:
- reinforcement-learning
- q-learning
- game-ai
- teeworlds
- openenv
library_name: custom
pipeline_tag: reinforcement-learning
model-index:
- name: teeunit-agent
  results:
  - task:
      type: reinforcement-learning
      name: Game Playing
    dataset:
      type: custom
      name: TeeUnit Environment
    metrics:
    - type: reward
      value: 39.38
      name: Total Reward (20 episodes)
---

# TeeUnit Agent

Trained RL agents for the [TeeUnit Environment](https://huggingface.co/spaces/ziadbc/teeunit-env) - an OpenEnv-compatible Teeworlds arena for LLM-based reinforcement learning.

## Environment

- **Space**: [ziadbc/teeunit-env](https://huggingface.co/spaces/ziadbc/teeunit-env)
- **GitHub**: [ziadgit/teeunit](https://github.com/ziadgit/teeunit)
- **Game**: Teeworlds 0.7.5 arena (simulation mode)

## Available Models

### Q-Learning Agent (Latest)
- **File**: `teeunit_qlearning_agent.json` / `teeunit_qlearning_agent.pkl`
- **Algorithm**: Tabular Q-Learning
- **Training**: 20 episodes, 938 steps
- **Total Reward**: 39.38

### Actions
The agent can perform 7 actions:
| Action | Description |
|--------|-------------|
| `move left` | Move character left |
| `move right` | Move character right |
| `move none` | Stop moving |
| `jump` | Jump |
| `shoot pistol` | Fire pistol (weapon 1) |
| `shoot shotgun` | Fire shotgun (weapon 2) |
| `hook` | Use grappling hook |

## Usage

### Load and Use the Agent

```python
import json
import random

# Load model
with open('teeunit_qlearning_agent.json') as f:
    model = json.load(f)

q_table = model['q_table']
actions = model['actions']

def get_state_key(status_text):
    """Extract state from game status text."""
    lines = status_text.split('\n')
    state = []
    for line in lines:
        if 'Position:' in line:
            try:
                pos = line.split('(')[1].split(')')[0]
                x, y = map(float, pos.split(','))
                state.append(f'pos_{int(x//100)}_{int(y//100)}')
            except:
                state.append('pos_unknown')
        if 'Health:' in line:
            try:
                health = int(line.split(':')[1].split('/')[0].strip())
                state.append(f'hp_{health//3}')
            except:
                pass
        if 'units away' in line:
            try:
                dist = float(line.split(',')[-1].replace('units away', '').strip())
                state.append(f'enemy_{"close" if dist < 100 else "mid" if dist < 200 else "far"}')
            except:
                pass
    return str(tuple(sorted(state))) if state else "('default',)"

def choose_action(state_key):
    """Choose best action for given state."""
    if state_key in q_table:
        q_values = q_table[state_key]
        best_action = max(q_values.keys(), key=lambda a: q_values[a])
        return int(best_action)
    return random.randint(0, len(actions) - 1)

# Example usage
state_key = get_state_key(status_text)
action_idx = choose_action(state_key)
action = actions[action_idx]
print(f"Action: {action['tool']} with args {action['args']}")
```

### Connect to Environment

```python
import asyncio
import websockets
import json

async def play():
    uri = 'wss://ziadbc-teeunit-env.hf.space/ws'
    
    async with websockets.connect(uri) as ws:
        # Reset environment
        await ws.send(json.dumps({'type': 'reset', 'data': {}}))
        await ws.recv()
        
        # Get status
        await ws.send(json.dumps({
            'type': 'step',
            'data': {'type': 'call_tool', 'tool_name': 'get_status', 'arguments': {}}
        }))
        resp = json.loads(await ws.recv())
        status = resp['data']['observation']['result']['data']
        
        # Choose and execute action
        state_key = get_state_key(status)
        action = actions[choose_action(state_key)]
        
        await ws.send(json.dumps({
            'type': 'step',
            'data': {'type': 'call_tool', 'tool_name': action['tool'], 'arguments': action['args']}
        }))
        resp = json.loads(await ws.recv())
        reward = resp['data']['reward']
        print(f"Reward: {reward}")

asyncio.run(play())
```

## Training Your Own Agent

See the [Colab notebook](https://github.com/ziadgit/teeunit/blob/main/notebooks/teeunit_training.ipynb) for training examples using:
- **Q-Learning** (tabular)
- **Stable Baselines3** (PPO, A2C)
- **Unsloth/TRL** (LLM fine-tuning)

## Environment API

The TeeUnit environment exposes these MCP tools:

| Tool | Arguments | Description |
|------|-----------|-------------|
| `move` | `direction: "left"\|"right"\|"none"` | Move horizontally |
| `jump` | - | Jump (can double-jump) |
| `aim` | `x: int, y: int` | Aim at coordinates |
| `shoot` | `weapon: 0-5` | Fire weapon |
| `hook` | - | Toggle grappling hook |
| `get_status` | - | Get game state as text |

## License

MIT License - See [GitHub repo](https://github.com/ziadgit/teeunit) for details.