🏆 Evaluation Results

#3
by nathanael-fijalkow - opened
MVA+IASD LLM for code and proof org

Evaluation Results

Model: LLM-course/chess-player-v2
Parameters: 997,136 [PASS]
Chess library check: [PASS]

Performance

Metric Value
Total moves played 500
Games played 31
Legal moves (first try) 0 (0.0%)
Legal moves (with retries) 0 (0.0%)

Interpretation

  • >90% legal rate: Excellent! Model has learned chess rules well.
  • 70-90% legal rate: Good, but room for improvement.
  • <70% legal rate: Model struggles with legal move generation.
MVA+IASD LLM for code and proof org

This is really strange! performs very well locally!

MVA+IASD LLM for code and proof org
edited 6 days ago

You ran the evaluation script using the model from HF?

MVA+IASD LLM for code and proof org
edited 6 days ago

Yes, I'm working to find a solution, here I changed MHA to GQA and sin pos embedding to rope. I'm working on a solution.

Sign up or log in to comment