Spaces:
Sleeping
title: Carrom RL Env
emoji: π―
colorFrom: yellow
colorTo: red
sdk: docker
pinned: false
app_port: 8000
tags:
- openenv
Carrom RL Env
About
Carrom RL Env is an OpenEnv-compatible, physics-based reinforcement-learning environment for the South Asian board game Carrom. Pieces slide on a Pymunk-simulated board with Coulomb (boric-acid-style) kinetic friction, and every shot is scored under the full International Carrom Federation (ICF) rule set β due rule, queen cover, foul handling, colour-based turn continuation.
The environment ships with LLM-friendly text actions ("aim at queen_0 with strong force from centre"), rich text-summary observations that include live rule reminders, and a green-agent evaluator (AgentBeats-style) that owns the task suite and scoring so any policy β random, heuristic, LLM-behind-an-API, or a freshly GRPO-trained model β can be benchmarked head-to-head on a consistent ICF-compliance score. Deploys as a single Docker container exposing both a FastAPI + WebSocket OpenEnv API at the root and a live Gradio board at the same URL for human or LLM auto-play.
Features
- Coulomb board friction β per-body
velocity_funcapplies constant deceleration (not viscous drag), matching pieces on a boric-acid-powdered carrom surface - ICF-compliant rules β due rule, queen cover, foul handling, color-based turn continuation
- LLM-friendly β text actions (
"aim at queen_0 with strong force") and rich board-state observations with rule reminders - Multi-agent β single-agent API with automatic scripted opponent turns
- Green Agent (evaluator) β task suite + ICF-aware scoring for purple-agent benchmarking, Γ la AgentBeats
- Deterministic β seeded resets for reproducible experiments
- OpenEnv standard β
reset()/step()/state()API with WebSocket support
Installation
pip install -e .
Optional rendering:
pip install -e ".[render]"
Quick Start
As a client (connecting to a running Space)
import asyncio
from client import CarromEnv
from carrom_env.models import Action
async def main():
async with CarromEnv(base_url="https://your-space.hf.space") as env:
result = await env.reset()
print(result.observation.text_summary)
result = await env.step(Action(placement_x=0.0, angle=0.1, force=0.6))
print(f"Reward: {result.reward}, Done: {result.done}")
asyncio.run(main())
Synchronous usage:
from client import CarromEnv
from carrom_env.models import Action
with CarromEnv(base_url="http://localhost:8000").sync() as env:
result = env.reset()
result = env.step(Action(placement_x=0.0, angle=0.0, force=0.6))
Local development
from carrom_env.env import CarromEnv
from carrom_env.models import Action
env = CarromEnv(seed=42)
obs = env.reset()
action = Action(placement_x=0.0, angle=0.0, force=0.6)
obs, reward, terminated, truncated, info = env.step(action)
Text actions (for LLM agents)
action = Action(action_type="text", text="aim at queen_0 with strong force from center")
obs, reward, terminated, truncated, info = env.step(action)
Game Rules (ICF-Compliant)
This environment implements the key rules from the International Carrom Federation (ICF).
Board & Pieces
- 9 black coins, 9 white coins, 1 queen (red) β 19 pieces total
- Agent plays white; opponent plays black
- Four corner pockets
Shooting
- On each turn the player places their striker anywhere on their baseline and shoots
- Striker placement is automatically nudged away from any coin sitting on the baseline
Scoring & Turn Continuation
- Pocket your own colour β +1 point, take another turn
- Pocket the queen β +3 points; you must then "cover" it (see below)
- Miss (no own coin pocketed) β turn passes to opponent
Due Rule
- If you pocket your opponent's colour, that coin is returned to the board centre
- You score nothing for it and your turn ends β even if you also pocketed own coins on the same shot, turn continuation only applies to own-colour pockets
Queen Cover Rule
- After pocketing the queen you must pocket one of your own coins on the same shot or on your next turn to "cover" it
- If you fail to cover, the queen is returned to the board centre and your queen points are reversed
Foul
- Pocketing the striker is a foul
- One of your previously pocketed coins is returned to the board centre
- Your turn ends and passes to the opponent
Win Condition
All coins cleared from the board β game ends; the player with the higher score wins.
ICF Compliance Table
| Rule | Status | Notes |
|---|---|---|
| 9 black + 9 white + 1 queen | β | Full piece complement |
| Agent = white, Opponent = black | β | Enforced throughout |
| Score 1 pt per own coin | β | |
| Queen = 3 pts | β | Simplified from ICF face-value (1β9) |
| Due rule β opponent's coin returns to centre, no score, turn ends | β | |
| Queen cover rule β cover on same/next shot or queen returns | β | |
| Foul β striker pocketed returns own coin, ends turn | β | |
| Turn continuation on own-colour pocket only | β | Due coins do not extend turn |
| Baseline shooting with obstruction check | β | Striker nudged clear of coins |
| Coulomb board friction (boric-acid surface, ΞΌ_k β 0.04) | β | BOARD_DECEL = 2.5 units/sΒ² via velocity_func |
| Elastic rubber cushion walls | β | ELASTICITY = 0.92 |
| Pocket capture (no corner dead zones) | β | pocket_capture_radius = 0.09 decoupled from wall gap |
| Numbered coin scoring (ICF 1β9 per colour) | β | Simplified to 1 pt per coin |
| Touch-coin / out-of-turn penalties | β | Not applicable for AI agents |
Physics Design
Coulomb Board Friction
Real carrom boards are dusted with boric acid powder giving a kinetic friction coefficient of roughly ΞΌ_k β 0.02β0.05. Unlike viscous drag (speed-proportional), sliding friction produces constant deceleration regardless of a piece's current speed.
This environment implements Coulomb friction via Pymunk's body.velocity_func callback on every piece and the striker:
a_friction = BOARD_DECEL # 2.5 units/sΒ² β equivalent to ΞΌ_k β 0.04 on a normalised board
With BOARD_DECEL = 2.5:
- Full-force shot (vβ β 5 units/s): pieces settle in ~2 seconds after bouncing
- Medium shot (vβ β 2.5 units/s): pieces settle in ~1 second
- The simulation ends early once all pieces drop below
SETTLE_VELOCITY = 0.02 units/s
Contact Physics
Shape-to-shape contact friction (FRICTION = 0.15) handles the interaction between colliding pieces and between pieces and the rubber-cushioned walls. Collision restitution is ELASTICITY = 0.92, reflecting the near-elastic bounce of polished wooden pieces off a rubber cushion.
Pocket Detection Geometry
Pocket capture uses two separate radii to handle a subtle geometry problem:
| Field | Value | Purpose |
|---|---|---|
pocket_radius |
0.06 |
Visual pocket size; also the wall gap at each corner |
pocket_capture_radius |
0.09 |
Radius within which a piece is considered pocketed |
Why they differ: walls have a wall_thickness = 0.02 and end at pocket_radius from each corner. Pymunk segments have rounded endcaps, so a piece (radius 0.03) rolling along a wall is constrained to stay at distance β₯ 0.05 from the wall endcap. A piece can therefore come to rest at e.g. (-0.44, -0.45) β inside the pocket gap but at distance β 0.078 from the corner, which was outside the old 0.06 detection radius (a "dead zone"). pocket_capture_radius = pocket_radius + coin_radius = 0.09 fires as soon as the coin's edge reaches the pocket rim, eliminating the dead zone.
Action Space
| Field | Type | Description |
|---|---|---|
action_type |
str |
"numeric" (default) or "text" for natural-language actions |
placement_x |
float |
Striker placement along baseline [-0.4, 0.4], 0 = center |
angle |
float |
Shot angle in radians, 0 = straight ahead toward +y |
force |
float |
Normalized shot force in [0, 1] |
text |
str |
Natural-language shot description (when action_type="text") |
Observation
| Field | Type | Description |
|---|---|---|
positions |
List[List[float]] |
[N, 2] positions for striker + coins |
velocities |
List[List[float]] |
[N, 2] velocities |
pocketed |
List[bool] |
[N] pocketed flags |
agent_score |
int |
Agent's current score |
opponent_score |
int |
Opponent's current score |
current_player |
str |
"agent" or "opponent" |
remaining_coins |
int |
Coins still on the board |
coins |
List[CoinInfo] |
Per-coin details with nearest pocket info |
text_summary |
str |
Rich text board state for LLM prompting (includes rule reminders) |
Reward Design
| Event | Reward | Description |
|---|---|---|
| Each agent turn | β0.01 | Small negative to encourage efficiency |
| Own coin potted | +1.0 | Per own-colour coin pocketed |
| Queen potted | +3.0 | Queen is worth 3Γ |
| Due coin (opponent's colour potted) | β0.3 | Coin returned to centre; teaches avoidance |
| Foul (striker pocketed) | β1.5 | Score β1 plus β0.5 extra penalty |
| Win (cleared board, agent leads) | +5.0 | Bonus for winning |
| Loss (cleared board, opponent leads) | β2.0 | Penalty for losing |
| Opponent scores | β0.5Γ | Partial penalty when opponent pots own coins |
info Dict Keys
| Key | Type | Description |
|---|---|---|
sim_steps |
float |
Physics steps taken this turn |
energy |
float |
Cumulative kinetic energy this turn |
coin_potted |
float |
Own coins pocketed this turn |
due_coins |
float |
Opponent's coins returned to centre (due rule) |
foul |
float |
1.0 if striker was pocketed |
queen_cover_pending |
bool |
True if queen cover is still required |
placement_x_actual |
float |
Actual striker x after obstruction nudge |
Green Agent (Evaluator)
In the AgentBeats / AgentX taxonomy:
- π’ Green Agent β evaluator: defines tasks, environment, and scoring
- π£ Purple Agent β competitor: the AI being tested (any
Callable[[Observation], Action]) - π΄ Red Agent β adversarial tester (not used here)
GreenCarromAgent is the green agent for this benchmark. It owns:
- A task suite β curated seeded boards across
easy/standard/hardtiers - The environment β wraps
CarromEnvwith full ICF rules - Scoring β ICF-aware metrics (reward, win rate, ICF compliance from dues/fouls) plus compute efficiency
from carrom_env.green_agent import GreenCarromAgent, Task
def my_purple_agent(obs):
return Action(placement_x=0.0, angle=0.1, force=0.6)
# Default suite: 3 easy + 3 standard + 3 hard tasks
evaluator = GreenCarromAgent()
report = evaluator.evaluate(my_purple_agent, verbose=True)
print(report.summary())
# {'n_tasks': 9, 'avg_reward': ..., 'win_rate': ..., 'icf_compliance': ...,
# 'efficiency_score': ..., ...}
# Or define a custom suite
tasks = [Task(task_id="focus", seed=0, max_turns=30, tier="standard")]
report = GreenCarromAgent(tasks=tasks).evaluate(my_purple_agent)
report.by_tier() # per-tier breakdown
Scoring metrics
| Metric | Type | Description |
|---|---|---|
avg_reward |
game | Mean episode reward across the suite |
win_rate |
game | Fraction of tasks where agent beat the opponent |
avg_coins_potted |
game | Mean own-coins pocketed per task |
avg_dues |
ICF | Mean opponent-coin pockets per task (lower = better) |
avg_fouls |
ICF | Mean strikers pocketed per task (lower = better) |
icf_compliance |
ICF | 1 β (dues + fouls) / turns β fraction of shots obeying ICF rules |
total_sim_steps |
compute | Total physics steps across all tasks |
efficiency_score |
compute | Coins potted per 1000 sim steps |
What max_turns counts
Task.max_turns (and MAX_STEPS in the inference script) counts combined agent + opponent turns β the env's internal turn counter increments once for every played shot, on either side. A setting of max_turns=200 therefore caps the episode at ~100 agent shots + ~100 heuristic-opponent shots. Set it to 400 if you want roughly 200 agent shots, or pass a custom Task with whatever cap you need.
Benchmark Results
Full inference logs live in inference_runs/.
MiniMaxAI/MiniMax-M2.5-fast (Nebius)
1 task Γ 200 turns (seed 0), via https://api.tokenfactory.us-central1.nebius.com/v1.
Full log: inference_runs/minimax-m2.5-fast_200turns_inference.log
| Purple Agent | Reward | Win% | Coins | Dues | ICF% | Efficiency |
|---|---|---|---|---|---|---|
| Random | β4.21 | 0% | 4.0 | 2.00 | 98% | 0.368 |
| Heuristic (ICF-aware) | β6.31 | 0% | 4.0 | 0.00 | 100% | 0.337 |
| LLM Β· MiniMax-M2.5-fast | +4.07 | 100% | 8.0 | 1.00 | 97% | 0.759 |
MiniMax-M2.5-fast wins the board at 8 coins potted with 1 due and 5 fouls, beating both baselines on reward and efficiency. The heuristic tanks to β6.31 because it's aggressive about shooting at white coins but hands the opponent easy board position on misses.
nvidia/Nemotron-3-Super-120b-a12b (Nebius)
1 task Γ 200 turns (seed 0), via https://api.tokenfactory.us-central1.nebius.com/v1.
Full log: inference_runs/nemotron-3-super-120b_200turns_inference.log
Wall-clock: 2001.6 s (~33 min) for the full LLM episode.
| Purple Agent | Reward | Win% | Coins | Dues | ICF% | Efficiency |
|---|---|---|---|---|---|---|
| Random | +4.17 | 100% | 8.0 | 1.00 | 100% | 0.712 |
| Heuristic (ICF-aware) | β6.31 | 0% | 4.0 | 0.00 | 100% | 0.337 |
| LLM Β· Nemotron-3-Super-120b | +11.94 | 100% | 13.0 | 2.00 | 97% | 1.179 |
Nemotron β NVIDIA's 120B hybrid-MoE reasoning model β outscores every baseline on every game metric and has the highest compute efficiency I've seen so far at 1.18 coins potted per 1 000 sim steps. One parse-failure fallback over 106 agent shots, 2 dues, 4 fouls. The extra reasoning horsepower shows up as more aggressive scoring play than MiniMax (+11.94 vs +4.07 reward on the same seed) at a slight cost in ICF compliance (2 dues vs 1).
Frontier-model comparison (seed 0)
| Model | Reward | Coins | Dues | Fouls | ICF % | Efficiency |
|---|---|---|---|---|---|---|
| MiniMax-M2.5-fast | +4.07 | 8 | 1 | 5 | 97% | 0.759 |
| Nemotron-3-Super-120b | +11.94 | 13 | 2 | 4 | 97% | 1.179 |
Setup & Usage
Prerequisites
- Python 3.10+
- Docker (for containerized deployment)
pip install openenv-core pymunk
Local Development
pip install -e ".[dev]"
PYTHONPATH=. uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload
Docker
docker build -t carrom-env:latest .
docker run -p 8000:8000 carrom-env:latest
Baseline Inference
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen3-4B"
export HF_TOKEN="hf_..."
python inference.py
Project Structure
carrom_rl_env/
βββ __init__.py # Module exports
βββ carrom_env/
β βββ __init__.py # Package exports
β βββ env.py # CarromEnv (physics + ICF game logic)
β βββ models.py # Action, Observation, State models
β βββ constants.py # Board config + physics constants (BOARD_DECEL, FRICTION, β¦)
β βββ green_agent.py # Green Agent (evaluator: task suite + ICF-aware scoring)
βββ client.py # CarromEnv (EnvClient)
βββ inference.py # Baseline inference script
βββ server/
β βββ __init__.py
β βββ carrom_environment.py # Server-side Environment wrapper
β βββ app.py # FastAPI application
βββ examples/
β βββ train_stub.py # Quick demo
β βββ grpo_utils.py # GRPO training utilities
β βββ grpo_carrom_tutorial.ipynb # Training notebook
βββ tests/
β βββ test_env_basic.py # Test suite
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Dependencies
βββ Dockerfile # Container image
βββ README.md # This file