Spaces:
Runtime error
Runtime error
| <html><head><meta charset='utf-8'><title>Training results</title> | |
| <style> | |
| body { font-family: -apple-system, BlinkMacSystemFont, sans-serif; max-width: 1000px; margin: 40px auto; line-height: 1.45; padding: 0 20px; } | |
| .card { background: #f6f8fa; border-radius: 10px; padding: 18px; margin: 16px 0; } | |
| img { max-width: 100%; border: 1px solid #ddd; border-radius: 8px; } | |
| table { border-collapse: collapse; width: 100%; } | |
| td, th { border-bottom: 1px solid #ddd; padding: 8px; text-align: left; } | |
| </style></head><body> | |
| <h1>Training results</h1> | |
| <div class='card'> | |
| <b>Live backend:</b> real Freeciv Web on H100<br> | |
| <b>Model:</b> Qwen/Qwen3.5-0.8B + Unsloth LoRA + TRL GRPO<br> | |
| <b>Run:</b> 10 steps, 32 live states, batch size 8<br> | |
| <b>Train runtime:</b> None | |
| </div> | |
| <div class='card'> | |
| <b>Observed reward improvement:</b> 0.125 → 1.000<br> | |
| <b>Best visible point:</b> step 10 reward 1.000 | |
| </div> | |
| <h2>Reward curve</h2> | |
| <p><img src='reward_curve.png' alt='reward curve'></p> | |
| <h2>Start vs end</h2> | |
| <p><img src='before_after_reward.png' alt='before after reward'></p> | |
| <h2>Per-step reward</h2> | |
| <table><tr><th>step</th><th>reward</th><th>reward std</th></tr><tr><td>1</td><td>0.125</td><td>0.250</td></tr><tr><td>2</td><td>0.375</td><td>0.539</td></tr><tr><td>3</td><td>0.250</td><td>0.500</td></tr><tr><td>4</td><td>0.500</td><td>0.577</td></tr><tr><td>5</td><td>0.625</td><td>0.539</td></tr><tr><td>6</td><td>0.875</td><td>0.250</td></tr><tr><td>7</td><td>0.750</td><td>0.500</td></tr><tr><td>8</td><td>0.875</td><td>0.250</td></tr><tr><td>9</td><td>0.750</td><td>0.500</td></tr><tr><td>10</td><td>1.000</td><td>0.000</td></tr></table> | |
| </body></html> |