|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: WebScraper991923/Affine-S3 |
|
|
tags: |
|
|
- qwen3 |
|
|
- affine |
|
|
- game |
|
|
- reinforcement-learning |
|
|
- openspiel |
|
|
--- |
|
|
|
|
|
# Affine-S3-GAME-Improved |
|
|
|
|
|
Fine-tuned version of [WebScraper991923/Affine-S3](https://huggingface.co/WebScraper991923/Affine-S3) with improved GAME (OpenSpiel) performance for Bittensor Subnet 120 (Affine). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: WebScraper991923/Affine-S3 (Qwen3-4B) |
|
|
- **Training**: LoRA fine-tuning on 7,071 MCTS-generated game examples |
|
|
- **Target**: Improved strategic game-playing for Affine evaluation |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Method**: LoRA (r=32, alpha=32) |
|
|
- **Data**: 7,071 examples from MCTS self-play across 9 games: |
|
|
- checkers (2,702 examples) |
|
|
- gin_rummy (1,896 examples) |
|
|
- othello (1,209 examples) |
|
|
- quoridor, phantom_ttt, hex, dots_and_boxes, leduc_poker, liars_dice |
|
|
- **Epochs**: 2 |
|
|
- **Final Loss**: 0.024 |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Benchmark | Base Model | This Model | |
|
|
|-----------|------------|------------| |
|
|
| GAME Accuracy | ~30% | **76%** | |
|
|
| LGC | 99.9% | 99.9% (preserved) | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("altro/Affine-S3-GAME", torch_dtype="bfloat16", device_map="auto") |
|
|
tokenizer = AutoTokenizer.from_pretrained("altro/Affine-S3-GAME") |
|
|
``` |
|
|
|
|
|
## Affine Competition |
|
|
|
|
|
This model is designed for Bittensor Subnet 120 (Affine), which rewards models that dominate the Pareto frontier across multiple RL evaluation tasks. |
|
|
|