Update README.md
Browse files
README.md
CHANGED
|
@@ -17,31 +17,29 @@ language:
|
|
| 17 |
- en
|
| 18 |
---
|
| 19 |
|
| 20 |
-
#
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
-
-
|
| 27 |
-
- This model was created using GRPO and Unsloth. It was trained to reason over Connect Four and learn to play it strategically.
|
| 28 |
-
- It's made for a specific project task.
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
- **License:** *TBD*
|
| 34 |
-
- **Finetuned from model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
|
| 35 |
-
- **Trained Using:** [TRL](https://github.com/huggingface/trl)'s GRPO.
|
| 36 |
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
|
| 40 |
-

|
| 41 |
|
| 42 |
-
|
| 43 |
|
| 44 |
-
* Solution #1:
|
| 45 |
```python
|
| 46 |
from transformers import pipeline
|
| 47 |
|
|
@@ -56,7 +54,7 @@ Board:
|
|
| 56 |
Strategy:
|
| 57 |
1. Identify taken positions, and empty positions.
|
| 58 |
2. Find and execute winning moves.
|
| 59 |
-
3. If There isn't a winning move, then block your opponent
|
| 60 |
4. Control the center and set up future moves.
|
| 61 |
|
| 62 |
Respond in XML:
|
|
@@ -77,69 +75,72 @@ board = {
|
|
| 77 |
generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
|
| 78 |
|
| 79 |
# use 'empty', 'one_move' or 'four_moves' in board['']
|
| 80 |
-
output = generator([
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
print(output["generated_text"])
|
| 82 |
```
|
| 83 |
-
* Solution #2:
|
| 84 |
-
[GGUF Q8](https://hf.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/quadconnect.Q8_0.gguf): Download the Quantized GGUF in any of your favorite GGUF inference engine(e.g. LMStudio)
|
| 85 |
-
|
| 86 |
-
* Solution #3:
|
| 87 |
-
[Huggingface Space](http://hf.co/spaces/Lyte/QuadConnect)): You can duplicate the space or download the code from the space and use it locally.
|
| 88 |
|
| 89 |
-
|
| 90 |
|
| 91 |
-
|
| 92 |
|
| 93 |
-
|
| 94 |
|
| 95 |
-
|
| 96 |
-
- The final dataset is Lyte/ConnectFour-T10
|
| 97 |
|
| 98 |
-
|
| 99 |
|
| 100 |
-
|
| 101 |
|
| 102 |
-
|
| 103 |
-
* temperature=0.6, top_p=0.95, max_tokens=1024
|
| 104 |
|
| 105 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
-
|
| 108 |
-
|-----------------------|--------------------------------|--------------------------------|--------------------------------|--------------------------------|
|
| 109 |
-
| Total games evaluated | 5082 | 5082 | 5082 | 5082 |
|
| 110 |
-
| Correct predictions | 518 | 394 | 516 | **713** |
|
| 111 |
-
| Accuracy | 10.19% | 7.75% | 10.15% | **14.03%** |
|
| 112 |
-
| Most common move | d (41.14%) | d (67.61%) | a (38.72%) | **a (31.01%)** |
|
| 113 |
-
| Middle column usage | 75.05% | 99.53% | 29.08% | **35.43%** |
|
| 114 |
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
-
|
| 118 |
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
| e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) | 562 (11.27%) |
|
| 126 |
-
| f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) | 408 (8.18%) |
|
| 127 |
-
| g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) | 342 (6.86%) |
|
| 128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
-
|
| 131 |
-
### Framework versions
|
| 132 |
-
|
| 133 |
- TRL: 0.15.1
|
| 134 |
- Transformers: 4.49.0
|
| 135 |
-
-
|
| 136 |
- Datasets: 3.2.0
|
| 137 |
- Tokenizers: 0.21.0
|
| 138 |
|
| 139 |
-
## Citations
|
| 140 |
-
|
| 141 |
-
Cite GRPO as:
|
| 142 |
|
|
|
|
| 143 |
```bibtex
|
| 144 |
@article{zhihong2024deepseekmath,
|
| 145 |
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
|
|
@@ -147,18 +148,16 @@ Cite GRPO as:
|
|
| 147 |
year = 2024,
|
| 148 |
eprint = {arXiv:2402.03300},
|
| 149 |
}
|
| 150 |
-
|
| 151 |
```
|
| 152 |
|
| 153 |
-
|
| 154 |
-
|
| 155 |
```bibtex
|
| 156 |
@misc{vonwerra2022trl,
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
}
|
| 164 |
```
|
|
|
|
| 17 |
- en
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# QuadConnect2.5-0.5B-v0.0.9b - A Strategic Connect Four AI
|
| 21 |
|
| 22 |
+

|
| 23 |
|
| 24 |
+
## 🎮 Overview
|
| 25 |
|
| 26 |
+
QuadConnect2.5-0.5B is a specialized language model trained to master the game of Connect Four. Built on Qwen 2.5 (0.5B parameter base), this model uses GRPO (Gradient-based Reward Policy Optimization) to learn the strategic intricacies of Connect Four gameplay.
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
**Status**: Early training experiments (v0.0.9b) - Reward functions still evolving
|
| 29 |
+
|
| 30 |
+
## 🔍 Model Details
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
- **Developed by:** [Lyte](https://hf.co/Lyte)
|
| 33 |
+
- **Model type:** Small Language Model (SLM)
|
| 34 |
+
- **Language:** English
|
| 35 |
+
- **Base model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
|
| 36 |
+
- **Training method:** [TRL](https://github.com/huggingface/trl)'s GRPO
|
| 37 |
+
- **Training data:** [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10)
|
| 38 |
|
| 39 |
+
## 🚀 Quick Start
|
|
|
|
| 40 |
|
| 41 |
+
### Option 1: Using Transformers
|
| 42 |
|
|
|
|
| 43 |
```python
|
| 44 |
from transformers import pipeline
|
| 45 |
|
|
|
|
| 54 |
Strategy:
|
| 55 |
1. Identify taken positions, and empty positions.
|
| 56 |
2. Find and execute winning moves.
|
| 57 |
+
3. If There isn't a winning move, then block your opponent's potential wins.
|
| 58 |
4. Control the center and set up future moves.
|
| 59 |
|
| 60 |
Respond in XML:
|
|
|
|
| 75 |
generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
|
| 76 |
|
| 77 |
# use 'empty', 'one_move' or 'four_moves' in board['']
|
| 78 |
+
output = generator([
|
| 79 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
| 80 |
+
{"role": "user", "content": board['empty']}
|
| 81 |
+
], max_new_tokens=10245, return_full_text=False)[0]
|
| 82 |
+
|
| 83 |
print(output["generated_text"])
|
| 84 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
+
### Option 2: Using GGUF
|
| 87 |
|
| 88 |
+
Download the [Quantized GGUF (Q8_0)](https://huggingface.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/unsloth.Q8_0.gguf) and use it in your favorite GGUF inference engine (e.g., LMStudio).
|
| 89 |
|
| 90 |
+
### Option 3: Using Hugging Face Space
|
| 91 |
|
| 92 |
+
Visit the [QuadConnect Space](https://huggingface.co/spaces/Lyte/QuadConnect) to interact with the model directly. You can also duplicate the space or download its code for local use.
|
|
|
|
| 93 |
|
| 94 |
+
## 📊 Evaluation Results
|
| 95 |
|
| 96 |
+
Model performance was evaluated on the [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10) validation split with various temperature settings.
|
| 97 |
|
| 98 |
+
### Summary Metrics Comparison
|
|
|
|
| 99 |
|
| 100 |
+
| Metric | v0.0.6b (Temp 0.6) | v0.0.8b (Temp 0.6) | v0.0.9b (Temp 0.6) | v0.0.9b (Temp 0.8) | v0.0.9b (Temp 1.0) |
|
| 101 |
+
|--------|-------------------|-------------------|-------------------|-------------------|-------------------|
|
| 102 |
+
| Total games evaluated | 5082 | 5082 | 5082 | 5082 | 5082 |
|
| 103 |
+
| Correct predictions | 518 | 394 | 516 | **713** | 677 |
|
| 104 |
+
| Accuracy | 10.19% | 7.75% | 10.15% | **14.03%** | 13.32% |
|
| 105 |
+
| Most common move | d (41.14%) | d (67.61%) | a (38.72%) | a (31.01%) | a (26.99%) |
|
| 106 |
+
| Middle column usage | 75.05% | 99.53% | 29.08% | 35.43% | 39.49% |
|
| 107 |
|
| 108 |
+
### Move Distribution by Column
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
+
| Column | v0.0.6b (Temp 0.6) | v0.0.8b (Temp 0.6) | v0.0.9b (Temp 0.6) | v0.0.9b (Temp 0.8) | v0.0.9b (Temp 1.0) |
|
| 111 |
+
|--------|-------------------|-------------------|-------------------|-------------------|-------------------|
|
| 112 |
+
| a | 603 (19.02%) | 3 (0.12%) | 1447 (38.72%) | 1547 (31.01%) | 1351 (26.99%) |
|
| 113 |
+
| b | 111 (3.50%) | 4 (0.16%) | 644 (17.23%) | 924 (18.52%) | 997 (19.92%) |
|
| 114 |
+
| c | 785 (24.76%) | 463 (17.96%) | 648 (17.34%) | 1003 (20.11%) | 985 (19.68%) |
|
| 115 |
+
| d | 1304 (41.14%) | 1743 (67.61%) | 101 (2.70%) | 202 (4.05%) | 306 (6.11%) |
|
| 116 |
+
| e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) | 562 (11.27%) | 686 (13.70%) |
|
| 117 |
+
| f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) | 408 (8.18%) | 354 (7.07%) |
|
| 118 |
+
| g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) | 342 (6.86%) | 327 (6.53%) |
|
| 119 |
|
| 120 |
+
## 🔧 Training Details
|
| 121 |
|
| 122 |
+
### Data Preparation
|
| 123 |
+
1. Started with [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection)
|
| 124 |
+
2. Filtered for clean, complete entries
|
| 125 |
+
3. Further filtered to include only games with 10 or fewer turns
|
| 126 |
+
4. Split into train and validation sets
|
| 127 |
+
5. Final dataset: [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10)
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
+
### Evaluation Parameters
|
| 130 |
+
- Temperature: 0.6, 0.8, 1.0 (compared)
|
| 131 |
+
- Top-p: 0.95
|
| 132 |
+
- Max tokens: 1024
|
| 133 |
|
| 134 |
+
### Framework Versions
|
|
|
|
|
|
|
| 135 |
- TRL: 0.15.1
|
| 136 |
- Transformers: 4.49.0
|
| 137 |
+
- PyTorch: 2.5.1+cu121
|
| 138 |
- Datasets: 3.2.0
|
| 139 |
- Tokenizers: 0.21.0
|
| 140 |
|
| 141 |
+
## 📚 Citations
|
|
|
|
|
|
|
| 142 |
|
| 143 |
+
For GRPO:
|
| 144 |
```bibtex
|
| 145 |
@article{zhihong2024deepseekmath,
|
| 146 |
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
|
|
|
|
| 148 |
year = 2024,
|
| 149 |
eprint = {arXiv:2402.03300},
|
| 150 |
}
|
|
|
|
| 151 |
```
|
| 152 |
|
| 153 |
+
For TRL:
|
|
|
|
| 154 |
```bibtex
|
| 155 |
@misc{vonwerra2022trl,
|
| 156 |
+
title = {{TRL: Transformer Reinforcement Learning}},
|
| 157 |
+
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
|
| 158 |
+
year = 2020,
|
| 159 |
+
journal = {GitHub repository},
|
| 160 |
+
publisher = {GitHub},
|
| 161 |
+
howpublished = {\url{https://github.com/huggingface/trl}}
|
| 162 |
}
|
| 163 |
```
|