docs: add model card YAML metadata

1aa7daf 3 days ago

4.97 kB

	---
	language:
	- en
	license: mit
	tags:
	- reinforcement-learning
	- q-learning
	- game-ai
	- teeworlds
	- openenv
	library_name: custom
	pipeline_tag: reinforcement-learning
	model-index:
	- name: teeunit-agent
	results:
	- task:
	type: reinforcement-learning
	name: Game Playing
	dataset:
	type: custom
	name: TeeUnit Environment
	metrics:
	- type: reward
	value: 39.38
	name: Total Reward (20 episodes)
	---

	# TeeUnit Agent

	Trained RL agents for the [TeeUnit Environment](https://huggingface.co/spaces/ziadbc/teeunit-env) - an OpenEnv-compatible Teeworlds arena for LLM-based reinforcement learning.

	## Environment

	- Space: [ziadbc/teeunit-env](https://huggingface.co/spaces/ziadbc/teeunit-env)
	- GitHub: [ziadgit/teeunit](https://github.com/ziadgit/teeunit)
	- Game: Teeworlds 0.7.5 arena (simulation mode)

	## Available Models

	### Q-Learning Agent (Latest)
	- File: `teeunit_qlearning_agent.json` / `teeunit_qlearning_agent.pkl`
	- Algorithm: Tabular Q-Learning
	- Training: 20 episodes, 938 steps
	- Total Reward: 39.38

	### Actions
	The agent can perform 7 actions:
	\| Action \| Description \|
	\|--------\|-------------\|
	\| `move left` \| Move character left \|
	\| `move right` \| Move character right \|
	\| `move none` \| Stop moving \|
	\| `jump` \| Jump \|
	\| `shoot pistol` \| Fire pistol (weapon 1) \|
	\| `shoot shotgun` \| Fire shotgun (weapon 2) \|
	\| `hook` \| Use grappling hook \|

	## Usage

	### Load and Use the Agent

	```python
	import json
	import random

	# Load model
	with open('teeunit_qlearning_agent.json') as f:
	model = json.load(f)

	q_table = model['q_table']
	actions = model['actions']

	def get_state_key(status_text):
	"""Extract state from game status text."""
	lines = status_text.split('\n')
	state = []
	for line in lines:
	if 'Position:' in line:
	try:
	pos = line.split('(')[1].split(')')[0]
	x, y = map(float, pos.split(','))
	state.append(f'pos_{int(x//100)}_{int(y//100)}')
	except:
	state.append('pos_unknown')
	if 'Health:' in line:
	try:
	health = int(line.split(':')[1].split('/')[0].strip())
	state.append(f'hp_{health//3}')
	except:
	pass
	if 'units away' in line:
	try:
	dist = float(line.split(',')[-1].replace('units away', '').strip())
	state.append(f'enemy_{"close" if dist < 100 else "mid" if dist < 200 else "far"}')
	except:
	pass
	return str(tuple(sorted(state))) if state else "('default',)"

	def choose_action(state_key):
	"""Choose best action for given state."""
	if state_key in q_table:
	q_values = q_table[state_key]
	best_action = max(q_values.keys(), key=lambda a: q_values[a])
	return int(best_action)
	return random.randint(0, len(actions) - 1)

	# Example usage
	state_key = get_state_key(status_text)
	action_idx = choose_action(state_key)
	action = actions[action_idx]
	print(f"Action: {action['tool']} with args {action['args']}")
	```

	### Connect to Environment

	```python
	import asyncio
	import websockets
	import json

	async def play():
	uri = 'wss://ziadbc-teeunit-env.hf.space/ws'

	async with websockets.connect(uri) as ws:
	# Reset environment
	await ws.send(json.dumps({'type': 'reset', 'data': {}}))
	await ws.recv()

	# Get status
	await ws.send(json.dumps({
	'type': 'step',
	'data': {'type': 'call_tool', 'tool_name': 'get_status', 'arguments': {}}
	}))
	resp = json.loads(await ws.recv())
	status = resp['data']['observation']['result']['data']

	# Choose and execute action
	state_key = get_state_key(status)
	action = actions[choose_action(state_key)]

	await ws.send(json.dumps({
	'type': 'step',
	'data': {'type': 'call_tool', 'tool_name': action['tool'], 'arguments': action['args']}
	}))
	resp = json.loads(await ws.recv())
	reward = resp['data']['reward']
	print(f"Reward: {reward}")

	asyncio.run(play())
	```

	## Training Your Own Agent

	See the [Colab notebook](https://github.com/ziadgit/teeunit/blob/main/notebooks/teeunit_training.ipynb) for training examples using:
	- Q-Learning (tabular)
	- Stable Baselines3 (PPO, A2C)
	- Unsloth/TRL (LLM fine-tuning)

	## Environment API

	The TeeUnit environment exposes these MCP tools:

	\| Tool \| Arguments \| Description \|
	\|------\|-----------\|-------------\|
	\| `move` \| `direction: "left"\\|"right"\\|"none"` \| Move horizontally \|
	\| `jump` \| - \| Jump (can double-jump) \|
	\| `aim` \| `x: int, y: int` \| Aim at coordinates \|
	\| `shoot` \| `weapon: 0-5` \| Fire weapon \|
	\| `hook` \| - \| Toggle grappling hook \|
	\| `get_status` \| - \| Get game state as text \|

	## License

	MIT License - See [GitHub repo](https://github.com/ziadgit/teeunit) for details.