Update README.md
Browse files
README.md
CHANGED
|
@@ -8,10 +8,46 @@ pinned: false
|
|
| 8 |
---
|
| 9 |
|
| 10 |
<div style="max-width:980px;margin:0 auto;padding:24px 16px;font:16px/1.55 system-ui,-apple-system,Segoe UI,Roboto;">
|
|
|
|
| 11 |
<header style="display:flex;gap:16px;align-items:center;margin-bottom:12px;">
|
| 12 |
-
<img src="./assets/logo.png" alt="
|
| 13 |
<div>
|
| 14 |
-
<h1 style="margin:0;font-size:26px">
|
| 15 |
-
<p style="margin:4px 0 0;color:#64748b">
|
| 16 |
</div>
|
| 17 |
</header>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
<div style="max-width:980px;margin:0 auto;padding:24px 16px;font:16px/1.55 system-ui,-apple-system,Segoe UI,Roboto;">
|
| 11 |
+
|
| 12 |
<header style="display:flex;gap:16px;align-items:center;margin-bottom:12px;">
|
| 13 |
+
<img src="./assets/logo.png" alt="Foaster.ai" width="42" height="42" style="border-radius:8px">
|
| 14 |
<div>
|
| 15 |
+
<h1 style="margin:0;font-size:26px">Foaster.ai</h1>
|
| 16 |
+
<p style="margin:4px 0 0;color:#64748b">Reshaping businesses for the agentic era.</p>
|
| 17 |
</div>
|
| 18 |
</header>
|
| 19 |
+
|
| 20 |
+
<p><strong>Foaster.ai</strong> is a French start-up building agentic systems and running research to understand how LLMs behave in social settings. At <em>Foaster Labs</em>, our first project—<strong>Werewolf Benchmark</strong>—pits leading models against each other in live social-deduction games to measure leadership, bluffing, and resistance to manipulation.</p>
|
| 21 |
+
|
| 22 |
+
<p>We ran <strong>210 full games</strong> with <strong>7 top models</strong> to produce a <em>role-conditioned</em> Elo: wolves ≈ manipulation power, villagers ≈ manipulation resistance. GPT-5 currently sits alone at the top—contenders welcome. 🐺</p>
|
| 23 |
+
|
| 24 |
+
<div style="margin:22px 0 8px;font-weight:700;">Role-conditioned Elo (quick view)</div>
|
| 25 |
+
<p style="margin:0 0 12px;color:#64748b">Overall Elo plus per-role splits (ELO-W = wolf, ELO-V = villager). For the full interactive, <a href="./index.html">open the Space</a>.</p>
|
| 26 |
+
|
| 27 |
+
<div style="border:1px solid #e5e7eb;border-radius:12px;overflow:hidden">
|
| 28 |
+
<table style="width:100%;border-collapse:collapse;font-size:14px">
|
| 29 |
+
<thead style="background:#f9fafb;color:#475569;text-transform:uppercase;font-size:12px;letter-spacing:.3px">
|
| 30 |
+
<tr>
|
| 31 |
+
<th style="padding:10px;text-align:left">Rank</th>
|
| 32 |
+
<th style="padding:10px;text-align:left">Model</th>
|
| 33 |
+
<th style="padding:10px;text-align:center">ELO</th>
|
| 34 |
+
<th style="padding:10px;text-align:center">ELO-W</th>
|
| 35 |
+
<th style="padding:10px;text-align:center">ELO-V</th>
|
| 36 |
+
<th style="padding:10px;text-align:center">Win rate</th>
|
| 37 |
+
<th style="padding:10px;text-align:center">Matches</th>
|
| 38 |
+
</tr>
|
| 39 |
+
</thead>
|
| 40 |
+
<tbody>
|
| 41 |
+
<tr><td style="padding:10px">🥇</td><td style="padding:10px">GPT-5 (OpenAI)</td><td style="padding:10px;text-align:center">1492</td><td style="padding:10px;text-align:center">1508</td><td style="padding:10px;text-align:center">1476</td><td style="padding:10px;text-align:center">96.7%</td><td style="padding:10px;text-align:center">60</td></tr>
|
| 42 |
+
<tr><td style="padding:10px">🥈</td><td style="padding:10px">Gemini 2.5 Pro (Google)</td><td style="padding:10px;text-align:center">1261</td><td style="padding:10px;text-align:center">1163</td><td style="padding:10px;text-align:center">1360</td><td style="padding:10px;text-align:center">63.3%</td><td style="padding:10px;text-align:center">60</td></tr>
|
| 43 |
+
<tr><td style="padding:10px">🥉</td><td style="padding:10px">Gemini 2.5 Flash (Google)</td><td style="padding:10px;text-align:center">1188</td><td style="padding:10px;text-align:center">1103</td><td style="padding:10px;text-align:center">1273</td><td style="padding:10px;text-align:center">51.7%</td><td style="padding:10px;text-align:center">60</td></tr>
|
| 44 |
+
<tr><td style="padding:10px">#4</td><td style="padding:10px">Qwen3-235B-Instruct (Alibaba)</td><td style="padding:10px;text-align:center">1176</td><td style="padding:10px;text-align:center">1077</td><td style="padding:10px;text-align:center">1274</td><td style="padding:10px;text-align:center">45.0%</td><td style="padding:10px;text-align:center">60</td></tr>
|
| 45 |
+
<tr><td style="padding:10px">#5</td><td style="padding:10px">GPT-5-mini (OpenAI)</td><td style="padding:10px;text-align:center">1173</td><td style="padding:10px;text-align:center">1107</td><td style="padding:10px;text-align:center">1239</td><td style="padding:10px;text-align:center">41.7%</td><td style="padding:10px;text-align:center">60</td></tr>
|
| 46 |
+
<tr><td style="padding:10px">#6</td><td style="padding:10px">Kimi-K2-Instruct (Moonshot AI)</td><td style="padding:10px;text-align:center">1130</td><td style="padding:10px;text-align:center">1168</td><td style="padding:10px;text-align:center">1091</td><td style="padding:10px;text-align:center">36.7%</td><td style="padding:10px;text-align:center">60</td></tr>
|
| 47 |
+
<tr><td style="padding:10px">#7</td><td style="padding:10px">GPT-OSS-120B (OpenAI)</td><td style="padding:10px;text-align:center">980</td><td style="padding:10px;text-align:center">931</td><td style="padding:10px;text-align:center">1030</td><td style="padding:10px;text-align:center">15.0%</td><td style="padding:10px;text-align:center">60</td></tr>
|
| 48 |
+
</tbody>
|
| 49 |
+
</table>
|
| 50 |
+
</div>
|
| 51 |
+
|
| 52 |
+
</div>
|
| 53 |
+
|