Spaces:

nuriyev
/

chess

Paused

App Files Files Community

nuriyev commited on Dec 30, 2025

Commit

c789271

1 Parent(s): 2da0b5e

update model reference and improve output format in README and app.py

Browse files

Files changed (2) hide show

README.md +5 -5
app.py +36 -73

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ short_description: Play against chess-playing reasoning LLM
 # ♔ Chess Reasoner
-Play chess against a reasoning LLM! This demo showcases **[nuriyev/chess-reasoner](https://huggingface.co/nuriyev/chess-reasoner)**, a Qwen3-4B model fine-tuned to output structured reasoning before selecting moves.
 ## 🎮 How to Play
@@ -28,20 +28,20 @@ Play chess against a reasoning LLM! This demo showcases **[nuriyev/chess-reasone
 |-----------|-------|
 | Base Model | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) |
 | Training | SFT with LoRA (r=32) on reasoning traces |
-| Dataset | [nuriyev/chess-reasoning](https://huggingface.co/datasets/nuriyev/chess-reasoning) |
-| Output Format | `<think>reasoning</think><uci_move>move</uci_move>` |
 ## 📋 Output Format
 The model outputs structured reasoning:
 ```
-<think>The opponent left their queen undefended. Taking it wins material.</think>
 <uci_move>d4d8</uci_move>
 ```
 ## ⚠️ Limitations
-This is an **SFT checkpoint** focused on format alignment. The model outputs valid reasoning but hasn't been optimized for chess strength via reinforcement learning yet. A GRPO stage using Stockfish rewards is planned.
 ## 🔗 Links

 # ♔ Chess Reasoner
+Play chess against a reasoning LLM! This demo showcases **[nuriyev/chess-reasoner-grpo](https://huggingface.co/nuriyev/chess-reasoner-grpo)**, a Qwen3-4B model tuned to play chess with detailed reasoning traces.
 ## 🎮 How to Play
 |-----------|-------|
 | Base Model | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) |
 | Training | SFT with LoRA (r=32) on reasoning traces |
+| Dataset | [aicrowd/ChessExplained](https://huggingface.co/datasets/aicrowd/ChessExplained) |
+| Output Format | `<reason>...</reason><uci_move>move</uci_move>` |
 ## 📋 Output Format
 The model outputs structured reasoning:
 ```
+<reason>The opponent left their queen undefended. Taking it wins material.</reason>
 <uci_move>d4d8</uci_move>
 ```
 ## ⚠️ Limitations
+Model is still very bad at playing chess! I am working on creating a beast. Coming soon...
 ## 🔗 Links

app.py CHANGED Viewed

@@ -11,7 +11,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 # Model Loading
 # ============================================================================
-MODEL_ID = "nuriyev/chess-reasoner"
 print("Loading model...")
 tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
@@ -20,84 +20,48 @@ model = AutoModelForCausalLM.from_pretrained(
     torch_dtype=torch.float16,
     device_map="auto",
     trust_remote_code=True,
 )
 model.eval()
 print("Model loaded!")
-# Custom chat template (matching training)
-CHAT_TEMPLATE = """{%- if messages[0].role == 'system' %}
-    {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
-{%- endif %}
-{%- for message in messages %}
-    {%- if message.role == 'user' or (message.role == 'system' and not loop.first) %}
-        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
-    {%- elif message.role == 'assistant' %}
-        {{- '<|im_start|>assistant\n' + message.content + '<|im_end|>\n' }}
-    {%- endif %}
-{%- endfor %}
-{%- if add_generation_prompt %}
-    {{- '<|im_start|>assistant\n<think>\n' }}
-{%- endif %}"""
-tokenizer.chat_template = CHAT_TEMPLATE
-# ============================================================================
-# Chess Rendering (matching training exactly)
-# ============================================================================
-UNICODE_PIECES = {
-    'P': '♙', 'R': '♖', 'N': '♘', 'B': '♗', 'Q': '♕', 'K': '♔',
-    'p': '♟', 'r': '♜', 'n': '♞', 'b': '♝', 'q': '♛', 'k': '♚',
-}
-def render_board_unicode(board: chess.Board) -> str:
-    """Render the chess board using Unicode pieces (matching training format)."""
-    lines = []
-    files = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
-    ranks = ['8', '7', '6', '5', '4', '3', '2', '1']
-    coord_parts = [f" {file} " for file in files]
-    coord_line = "   " + "".join(coord_parts) + "  "
-    lines.append(coord_line)
-    border_width = len(files) * 3
-    lines.append("   +" + "-" * border_width + "+")
-    for rank_idx, rank in enumerate(ranks):
-        line_parts = [f"{rank} |"]
-        for file_idx, file in enumerate(files):
-            square = chess.parse_square(file + rank)
-            piece = board.piece_at(square)
-            piece_char = "·" if piece is None else UNICODE_PIECES[piece.symbol(
-            )]
-            line_parts.append(f" {piece_char} ")
-        line_parts.append(f"| {rank}")
-        lines.append("".join(line_parts))
-    lines.append("   +" + "-" * border_width + "+")
-    lines.append(coord_line)
-    return "\n".join(lines)
 # ============================================================================
 # Prompts (matching training exactly)
 # ============================================================================
-SYSTEM_PROMPT = """You are an expert chess player.
-Given a current game state, you must select the best next move. Think in 1-2 sentences, then output your chosen move.
-Output format:
-<think>brief thinking (2 sentences max)</think>
-<uci_move>your_move</uci_move>"""
-USER_PROMPT = Template("""Here is the current game state
-Board (Fen): {{ fen }}
-Turn: It is your turn ({{ turn }})
-Legal Moves: {{ legal_moves }}
-Board (Unicode):
-{{ board_utf }}""")
 # ============================================================================
@@ -111,12 +75,11 @@ def get_model_move(fen: str) -> tuple[str, str, str]:
     turn = "white" if board.turn else "black"
     messages = [
-        {"role": "system", "content": SYSTEM_PROMPT},
         {"role": "user", "content": USER_PROMPT.render(
             fen=fen,
-            board_utf=render_board_unicode(board),
-            turn=turn,
-            legal_moves=", ".join([move.uci() for move in board.legal_moves])
         )},
     ]
@@ -143,7 +106,7 @@ def get_model_move(fen: str) -> tuple[str, str, str]:
         outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=False)
     # Parse the output
-    think_match = re.search(r'<think>(.*?)</think>', generated, re.DOTALL)
     move_match = re.search(r'<uci_move>(.*?)</uci_move>', generated)
     reasoning = think_match.group(1).strip(
@@ -265,7 +228,7 @@ with gr.Blocks(title="♟️ Chess Reasoner") as demo:
     gr.Markdown("""
     ---
-    **Model:** [nuriyev/chess-reasoner](https://huggingface.co/nuriyev/chess-reasoner) • Fine-tuned from Qwen3-4B-Instruct
     """)
     # Events

 # Model Loading
 # ============================================================================
+MODEL_ID = "nuriyev/chess-reasoner-grpo"
 print("Loading model...")
 tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
     torch_dtype=torch.float16,
     device_map="auto",
     trust_remote_code=True,
+    revision="b7e531a630fd35065f9c8287f4bd21dff42f871b",
 )
 model.eval()
 print("Model loaded!")
 # ============================================================================
 # Prompts (matching training exactly)
 # ============================================================================
+USER_PROMPT = Template("""You are an expert chess player.
+Given a current game state, you must select the best legal next move. Think in 1-2 sentences, then output your chosen move.
+## State
+Board:
+{% set fen_board = FEN.split()[0] %}
+{%- set ns = namespace(board='') -%}
+{%- for char in fen_board -%}
+{%- if char in '12345678' -%}
+{%- set ns.board = ns.board ~ '.' * (char|int) -%}
+{%- elif char != '/' -%}
+{%- set ns.board = ns.board ~ char -%}
+{%- endif -%}
+{%- endfor -%}
+{#- Output coordinate grid by file -#}
+{%- set files = 'abcdefgh' -%}
+{% for f in range(8) %}
+{%- for r in range(1, 9) -%}
+{{ files[f] }}{{ r }}:{{ ns.board[(8-r)*8 + f] }}{% if r < 8 %} {% endif -%}
+{%- endfor %}
+{% endfor %}
+Turn: It is your turn ({{ side_to_move }})
+Legal Moves: {{ legal_moves_uci }}
+## Output format
+<reason>...brief thinking (1-2 first-person very short concise sentences, identifying threat or opportunity, then deciding on the best move to play next)...</reason>
+<uci_move>...your_move...</uci_move>
+NOTE: capital letters are white, lowercase are black.""")
 # ============================================================================
     turn = "white" if board.turn else "black"
     messages = [
         {"role": "user", "content": USER_PROMPT.render(
             fen=fen,
+            side_to_move=turn,
+            legal_moves_uci=", ".join([move.uci()
+                                      for move in board.legal_moves])
         )},
     ]
         outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=False)
     # Parse the output
+    think_match = re.search(r'<reason>(.*?)</reason>', generated, re.DOTALL)
     move_match = re.search(r'<uci_move>(.*?)</uci_move>', generated)
     reasoning = think_match.group(1).strip(
     gr.Markdown("""
     ---
+    **Model:** [nuriyev/chess-reasoner-grpo](https://huggingface.co/nuriyev/chess-reasoner-grpo) • Fine-tuned from Qwen3-4B-Instruct
     """)
     # Events