add examples

b49a9f5 verified 2 months ago

3.77 kB

	---
	base_model: unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- qwen3
	- chess
	- reasoning
	- sft
	license: apache-2.0
	language:
	- en
	datasets:
	- nuriyev/chess-reasoning
	pipeline_tag: text-generation
	---

	# Chess Reasoner

	A chess move prediction model fine-tuned from Qwen3-4B-Instruct to output structured reasoning before selecting moves.

	## Overview

	This model is Phase 1 of a two-stage training pipeline:
	1. SFT (this model) — Align the model to output in a specific `<think>` + `<uci_move>` format
	2. GRPO (next step) — Reinforce with Stockfish rewards for stronger play

	## Output Format

	```
	<think>brief reasoning (1-2 sentences)</think>
	<uci_move>e2e4</uci_move>
	```

	## Usage

	### System Prompt

	```python
	SYSTEM_PROMPT = """You are an expert chess player.

	Given a current game state, you must select the best legal next move. Think in 1-2 sentences, then output your chosen move.

	Output format:
	<think>brief thinking (2 sentences max)</think>
	<uci_move>your_move</uci_move>"""
	```

	### User Prompt Template

	The model expects the board state in the following format:

	```
	Here is the current game state
	Board (Fen): <FEN string>
	Turn: It is your turn (<white/black>)
	Legal Moves: <comma-separated UCI moves>
	Board:
	<board representation>
	```

	### Full Inference Example

	```python
	import chess
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model
	model = AutoModelForCausalLM.from_pretrained("nuriyev/chess-reasoner")
	tokenizer = AutoTokenizer.from_pretrained("nuriyev/chess-reasoner")

	# System prompt
	SYSTEM_PROMPT = """You are an expert chess player.

	Given a current game state, you must select the best legal next move. Think in 1-2 sentences, then output your chosen move.

	Output format:
	<think>brief thinking (2 sentences max)</think>
	<uci_move>your_move</uci_move>"""

	# Example position: after 1. e4
	fen = "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1"
	board = chess.Board(fen)

	# Build user prompt
	user_content = f"""Here is the current game state
	Board (Fen): {fen}
	Turn: It is your turn ({'white' if board.turn else 'black'})
	Legal Moves: {', '.join([move.uci() for move in board.legal_moves])}
	Board:
	{board}"""

	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": user_content},
	]

	# Generate
	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(
	inputs,
	max_new_tokens=128,
	do_sample=False,
	)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Example Output

	```
	<think>The e5 pawn is undefended, so I will move my knight to e5 to challenge the center and set up a queen attack on g7.</think>
	<uci_move>c7c5</uci_move>
	```

	## Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base Model \| Qwen/Qwen3-4B-Instruct-2507 \|
	\| Method \| SFT with LoRA (r=32, α=64) \|
	\| Dataset \| [nuriyev/chess-reasoning](https://huggingface.co/datasets/nuriyev/chess-reasoning) \|
	\| Epochs \| 2 \|
	\| Learning Rate \| 2e-4 \|
	\| Batch Size \| 16 \|
	\| Max Seq Length \| 1024 \|

	Trained using [Unsloth](https://github.com/unslothai/unsloth) with response-only loss masking.

	## Limitations

	This SFT checkpoint is format-aligned but not yet optimized for move quality. The upcoming GRPO stage will train against Stockfish evaluations to improve actual chess performance.

	## LoRA Adapter

	Also available: [nuriyev/chess-reasoner-lora](https://huggingface.co/nuriyev/chess-reasoner-lora)

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)