🏆 Evaluation Results
#3
by
nathanael-fijalkow
- opened
Evaluation Results
Model: LLM-course/chess-player-v2
Parameters: 997,136 [PASS]
Chess library check: [PASS]
Performance
| Metric | Value |
|---|---|
| Total moves played | 500 |
| Games played | 31 |
| Legal moves (first try) | 0 (0.0%) |
| Legal moves (with retries) | 0 (0.0%) |
Interpretation
- >90% legal rate: Excellent! Model has learned chess rules well.
- 70-90% legal rate: Good, but room for improvement.
- <70% legal rate: Model struggles with legal move generation.
This is really strange! performs very well locally!
You ran the evaluation script using the model from HF?
Yes, I'm working to find a solution, here I changed MHA to GQA and sin pos embedding to rope. I'm working on a solution.