Spaces:
Running
Running
Improve chart clarity and fix visualization issues
Browse files- Replace dual-axis L1 Time/Quality chart with bubble scatter plot
- Add counts/percentage toggle to failure analysis
- Fix label overlaps in heatmap, scatter, and failure charts
- Remove misleading series divider lines from sorted charts
- Fix total-score description for 50% reference line
- Increase margins and use smart label placement throughout
Made-with: Cursor
- README.md +5 -5
- app/src/components/HtmlEmbed.astro +1 -0
- app/src/content/article.mdx +1 -1
- app/src/content/chapters/folding/08-ablations.mdx +3 -3
- app/src/content/embeds/folding/failure-analysis.html +150 -128
- app/src/content/embeds/folding/l1-time-quality.html +104 -152
- app/src/content/embeds/folding/loss-curves.html +11 -11
- app/src/content/embeds/folding/statistical-analysis.html +22 -21
- app/src/content/embeds/folding/subtask-heatmap.html +29 -61
- app/src/content/embeds/folding/success-rates.html +28 -59
- app/src/content/embeds/folding/total-score.html +39 -58
README.md
CHANGED
|
@@ -1,9 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
title: '
|
| 3 |
-
short_desc: '
|
| 4 |
-
emoji:
|
| 5 |
-
colorFrom:
|
| 6 |
-
colorTo:
|
| 7 |
sdk: docker
|
| 8 |
pinned: false
|
| 9 |
header: mini
|
|
|
|
| 1 |
---
|
| 2 |
+
title: 'Unfolding Robotics: Open-Source Shirt Folding from Data to Deployment'
|
| 3 |
+
short_desc: 'The complete open-source recipe for teaching robots to fold clothes'
|
| 4 |
+
emoji: 🤖
|
| 5 |
+
colorFrom: yellow
|
| 6 |
+
colorTo: orange
|
| 7 |
sdk: docker
|
| 8 |
pinned: false
|
| 9 |
header: mini
|
app/src/components/HtmlEmbed.astro
CHANGED
|
@@ -293,6 +293,7 @@ const htmlWithId =
|
|
| 293 |
padding: 24px;
|
| 294 |
z-index: calc(var(--z-elevated) + 1);
|
| 295 |
position: relative;
|
|
|
|
| 296 |
}
|
| 297 |
.html-embed__card.is-frameless {
|
| 298 |
background: transparent;
|
|
|
|
| 293 |
padding: 24px;
|
| 294 |
z-index: calc(var(--z-elevated) + 1);
|
| 295 |
position: relative;
|
| 296 |
+
overflow: visible;
|
| 297 |
}
|
| 298 |
.html-embed__card.is-frameless {
|
| 299 |
background: transparent;
|
app/src/content/article.mdx
CHANGED
|
@@ -44,7 +44,7 @@ tags:
|
|
| 44 |
- open-source
|
| 45 |
tableOfContentsAutoCollapse: true
|
| 46 |
pdfProOnly: false
|
| 47 |
-
showPdf:
|
| 48 |
---
|
| 49 |
|
| 50 |
import Hero from "./chapters/folding/01-hero.mdx";
|
|
|
|
| 44 |
- open-source
|
| 45 |
tableOfContentsAutoCollapse: true
|
| 46 |
pdfProOnly: false
|
| 47 |
+
showPdf: false
|
| 48 |
---
|
| 49 |
|
| 50 |
import Hero from "./chapters/folding/01-hero.mdx";
|
app/src/content/chapters/folding/08-ablations.mdx
CHANGED
|
@@ -78,7 +78,7 @@ The gap between Series 1 and Series 2 is immediately visible. Experiment 2.5 rea
|
|
| 78 |
id="total-score"
|
| 79 |
src="folding/total-score.html"
|
| 80 |
title="Total Score by Experiment"
|
| 81 |
-
desc="Overall score (% of maximum 1500) per experiment. The
|
| 82 |
/>
|
| 83 |
|
| 84 |
Total score captures partial progress that binary success rate misses. Even failed rollouts earn credit for completed subtasks, revealing that some Series 1 experiments make meaningful progress despite 0% Level 2 success. Only two experiments break the 50% threshold, all from Series 2.
|
|
@@ -86,8 +86,8 @@ Total score captures partial progress that binary success rate misses. Even fail
|
|
| 86 |
<HtmlEmbed
|
| 87 |
id="l1-time-quality"
|
| 88 |
src="folding/l1-time-quality.html"
|
| 89 |
-
title="Level 1 Completion Time
|
| 90 |
-
desc="
|
| 91 |
/>
|
| 92 |
|
| 93 |
Speed and quality correlate strongly with data quality. Series 2 experiments fold 2-3x faster than Series 1 (40s vs 100s+), and fold quality only breaks past 3.0 with high-quality training data. Faster isn't a separate goal from better; it's a consequence of the policy learning a clear, unambiguous strategy.
|
|
|
|
| 78 |
id="total-score"
|
| 79 |
src="folding/total-score.html"
|
| 80 |
title="Total Score by Experiment"
|
| 81 |
+
desc="Overall score (% of maximum 1500) per experiment. The dashed line marks 50% as a reference point."
|
| 82 |
/>
|
| 83 |
|
| 84 |
Total score captures partial progress that binary success rate misses. Even failed rollouts earn credit for completed subtasks, revealing that some Series 1 experiments make meaningful progress despite 0% Level 2 success. Only two experiments break the 50% threshold, all from Series 2.
|
|
|
|
| 86 |
<HtmlEmbed
|
| 87 |
id="l1-time-quality"
|
| 88 |
src="folding/l1-time-quality.html"
|
| 89 |
+
title="Level 1 Completion Time vs. Fold Quality"
|
| 90 |
+
desc="Each bubble is one experiment. X-axis = completion time (faster is left), Y-axis = fold quality (higher is better), bubble size = total success rate. The best experiments cluster in the top-left corner."
|
| 91 |
/>
|
| 92 |
|
| 93 |
Speed and quality correlate strongly with data quality. Series 2 experiments fold 2-3x faster than Series 1 (40s vs 100s+), and fold quality only breaks past 3.0 with high-quality training data. Faster isn't a separate goal from better; it's a consequence of the policy learning a clear, unambiguous strategy.
|
app/src/content/embeds/folding/failure-analysis.html
CHANGED
|
@@ -62,14 +62,42 @@
|
|
| 62 |
padding: 20px 20px 12px;
|
| 63 |
}
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
.chart-title {
|
| 66 |
font-size: 11px;
|
| 67 |
text-transform: uppercase;
|
| 68 |
letter-spacing: 0.08em;
|
| 69 |
color: #8b8fa8;
|
| 70 |
-
margin-bottom: 16px;
|
| 71 |
}
|
| 72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
.legend {
|
| 74 |
display: flex;
|
| 75 |
flex-wrap: wrap;
|
|
@@ -149,15 +177,21 @@
|
|
| 149 |
<!-- LEVEL 2 PANEL -->
|
| 150 |
<div class="panel active" id="panel-l2">
|
| 151 |
<div class="insight-box">
|
| 152 |
-
<strong>Series 1:</strong> nearly all level
|
| 153 |
<strong>Series 2:</strong> Unfold failures collapse (2.5: 0%), but late-stage failures (Fold 3, Rotation) emerge — the model now reliably unfolds but precision degrades at the end.
|
| 154 |
</div>
|
| 155 |
<div class="chart-wrap">
|
| 156 |
-
<div class="chart-
|
| 157 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
<div class="legend" id="legend-l2"></div>
|
| 159 |
</div>
|
| 160 |
-
<p class="note">Each bar = one experiment, showing how its failed Level 2 rollouts distribute across subtasks. Only failed rollouts shown — successful rollouts are excluded.
|
| 161 |
</div>
|
| 162 |
|
| 163 |
<!-- LEVEL 1 PANEL -->
|
|
@@ -166,71 +200,68 @@
|
|
| 166 |
<strong>Level 1 failures</strong> are more distributed since unfolding is given. Series 1 failures concentrate at Fold 2 and Fold 4 (mid-task precision). Series 2 nearly eliminates failures entirely — only 2.3 (mirroring) and 2.4 (chunk=45) regress significantly.
|
| 167 |
</div>
|
| 168 |
<div class="chart-wrap">
|
| 169 |
-
<div class="chart-
|
| 170 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
<div class="legend" id="legend-l1"></div>
|
| 172 |
</div>
|
| 173 |
-
<p class="note">Level 1 begins with the shirt already laid flat, so "Unfold" is not a failure point.
|
| 174 |
</div>
|
| 175 |
|
| 176 |
</div>
|
| 177 |
|
| 178 |
<script>
|
| 179 |
function _initFailureAnalysis() {
|
| 180 |
-
// ── DATA ──────────────────────────────────────────────────────────────────────
|
| 181 |
-
// Derived from raw rollout data: for each failed rollout,
|
| 182 |
-
// "failure point" = first subtask that was NOT reached after a previous one was TRUE
|
| 183 |
-
// Level 1: Unfold is given (None), failure starts from Fold 1
|
| 184 |
-
// Level 2: Unfold is explicit
|
| 185 |
-
|
| 186 |
const EXPERIMENTS = [
|
| 187 |
-
{ id:'1.1', series:1 },
|
| 188 |
-
{ id:'1.2', series:1 },
|
| 189 |
-
{ id:'1.3', series:1 },
|
| 190 |
-
{ id:'1.4', series:1 },
|
| 191 |
-
{ id:'1.5', series:1 },
|
| 192 |
-
{ id:'1.7', series:1 },
|
| 193 |
-
{ id:'2.1', series:2 },
|
| 194 |
-
{ id:'2.2', series:2 },
|
| 195 |
-
{ id:'2.3', series:2 },
|
| 196 |
-
{ id:'2.4', series:2 },
|
| 197 |
-
{ id:'2.5', series:2 },
|
| 198 |
];
|
| 199 |
|
| 200 |
-
// L2 failures: {experimentId: {subtask: count}}
|
| 201 |
const L2_FAILURES = {
|
| 202 |
-
'1.1': { 'Unfold':10 },
|
| 203 |
-
'1.2': { 'Unfold':9, 'Rotation':1 },
|
| 204 |
-
'1.3': { 'Unfold':10 },
|
| 205 |
-
'1.4': { 'Unfold':10 },
|
| 206 |
-
'1.5': { 'Unfold':9, 'Fold 1':1 },
|
| 207 |
-
'1.7': { 'Unfold':8, 'Fold 3':1, 'Rotation':1 },
|
| 208 |
-
'2.1': { 'Unfold':8, 'Rotation':1 },
|
| 209 |
-
'2.2': { 'Unfold':4, 'Rotation':1 },
|
| 210 |
-
'2.3': { 'Unfold':8, 'Fold 1':1 },
|
| 211 |
-
'2.4': { 'Unfold':9, 'Fold 3':1 },
|
| 212 |
-
'2.5': { 'Unfold':2 },
|
| 213 |
};
|
| 214 |
|
| 215 |
-
// L1 failures
|
| 216 |
const L1_FAILURES = {
|
| 217 |
-
'1.1': { 'Fold 2':1 },
|
| 218 |
-
'1.2': { 'Rotation':4, 'Fold 4':2, 'Fold 2':1 },
|
| 219 |
-
'1.3': { 'Rotation':1, 'Fold 4':1 },
|
| 220 |
-
'1.4': { 'Rotation':2, 'Fold 3':1, 'Fold 4':2, 'Fold 2':3 },
|
| 221 |
-
'1.5': { 'Fold 3':2, 'Fold 2':6, 'Fold 1':1 },
|
| 222 |
-
'1.7': { 'Fold 4':1, 'Fold 2':1, 'Rotation':1 },
|
| 223 |
-
'2.1': { 'Fold 2':1, 'Fold 4':1 },
|
| 224 |
-
'2.2': { 'Fold 2':1 },
|
| 225 |
-
'2.3': { 'Fold 1':3, 'Fold 4':3, 'Fold 3':3 },
|
| 226 |
-
'2.4': { 'Rotation':2, 'Fold 4':3, 'Fold 3':1 },
|
| 227 |
-
'2.5': {},
|
| 228 |
};
|
| 229 |
|
| 230 |
const SUBTASKS_L2 = ['Unfold','Fold 1','Fold 2','Fold 3','Fold 4','Rotation'];
|
| 231 |
const SUBTASKS_L1 = ['Fold 1','Fold 2','Fold 3','Fold 4','Rotation'];
|
| 232 |
|
| 233 |
-
// Colour: warm→cool progression. Unfold = red (early), Rotation = teal (late)
|
| 234 |
const COLORS = {
|
| 235 |
'Unfold': '#ef4444',
|
| 236 |
'Fold 1': '#f97316',
|
|
@@ -240,12 +271,22 @@ const COLORS = {
|
|
| 240 |
'Rotation': '#818cf8',
|
| 241 |
};
|
| 242 |
|
| 243 |
-
|
| 244 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 245 |
const svgEl = document.getElementById(svgId);
|
| 246 |
const W = svgEl.parentElement.clientWidth - 40;
|
| 247 |
-
const H =
|
| 248 |
-
const margin = { top:
|
| 249 |
const innerW = W - margin.left - margin.right;
|
| 250 |
const innerH = H - margin.top - margin.bottom;
|
| 251 |
|
|
@@ -256,10 +297,11 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
|
|
| 256 |
.attr('viewBox', `0 0 ${W} ${H}`)
|
| 257 |
.attr('height', H);
|
| 258 |
|
|
|
|
|
|
|
| 259 |
const g = svg.append('g')
|
| 260 |
.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 261 |
|
| 262 |
-
// Prepare stacked data
|
| 263 |
const expIds = experiments.map(a => a.id);
|
| 264 |
const stackData = expIds.map(id => {
|
| 265 |
const row = { id };
|
|
@@ -268,44 +310,39 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
|
|
| 268 |
return row;
|
| 269 |
});
|
| 270 |
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 277 |
|
| 278 |
-
const
|
| 279 |
-
.domain([0, maxTotal])
|
| 280 |
-
.range([innerH, 0])
|
| 281 |
-
.nice();
|
| 282 |
|
| 283 |
-
const
|
|
|
|
|
|
|
| 284 |
|
| 285 |
// Grid lines
|
| 286 |
-
g.append('g')
|
| 287 |
-
.
|
| 288 |
-
.call(d3.axisLeft(y)
|
| 289 |
-
.tickSize(-innerW)
|
| 290 |
-
.tickFormat('')
|
| 291 |
-
.ticks(5))
|
| 292 |
.call(gg => {
|
| 293 |
gg.select('.domain').remove();
|
| 294 |
-
gg.selectAll('line')
|
| 295 |
-
.attr('stroke', '#2a2d3a')
|
| 296 |
-
.attr('stroke-dasharray', '3,3');
|
| 297 |
});
|
| 298 |
|
| 299 |
// Stacked bars
|
| 300 |
-
const layer = g.selectAll('.layer')
|
| 301 |
-
.
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
.attr('fill', d => COLORS[d.key] || '#666');
|
| 305 |
-
|
| 306 |
-
layer.selectAll('rect')
|
| 307 |
-
.data(d => d)
|
| 308 |
-
.join('rect')
|
| 309 |
.attr('x', d => x(d.data.id))
|
| 310 |
.attr('y', d => y(d[1]))
|
| 311 |
.attr('height', d => Math.max(0, y(d[0]) - y(d[1])))
|
|
@@ -313,17 +350,18 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
|
|
| 313 |
.attr('rx', 2)
|
| 314 |
.attr('opacity', 0.88);
|
| 315 |
|
| 316 |
-
//
|
| 317 |
-
g.selectAll('.bar-label')
|
| 318 |
-
.data(stackData)
|
| 319 |
-
.join('text')
|
| 320 |
.attr('class', 'bar-label')
|
| 321 |
.attr('x', d => x(d.id) + x.bandwidth() / 2)
|
| 322 |
-
.attr('y', d => d._total === 0 ? y(0) - 4 : y(d.
|
| 323 |
.attr('text-anchor', 'middle')
|
| 324 |
.attr('fill', d => d._total === 0 ? '#3a3d4a' : '#8b8fa8')
|
| 325 |
.attr('font-size', '9')
|
| 326 |
-
.text(d =>
|
|
|
|
|
|
|
|
|
|
| 327 |
|
| 328 |
// Series divider line
|
| 329 |
const s1Last = experiments.filter(a => a.series === 1).pop().id;
|
|
@@ -332,43 +370,28 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
|
|
| 332 |
const xDiv = x(s1Last) + x.bandwidth() + x.step() * 0.14;
|
| 333 |
g.append('line')
|
| 334 |
.attr('x1', xDiv).attr('x2', xDiv)
|
| 335 |
-
.attr('y1', -
|
| 336 |
-
.attr('stroke', '#3a3d4a')
|
| 337 |
-
|
| 338 |
-
|
| 339 |
-
|
| 340 |
-
g.append('text')
|
| 341 |
-
.attr('x', xDiv - 6)
|
| 342 |
-
.attr('y', -4)
|
| 343 |
-
.attr('text-anchor', 'end')
|
| 344 |
-
.attr('fill', '#f7934f')
|
| 345 |
-
.attr('font-size', '8')
|
| 346 |
-
.attr('letter-spacing', '0.06em')
|
| 347 |
-
.text('SERIES 1');
|
| 348 |
|
| 349 |
if (s2First) {
|
| 350 |
-
g.append('text')
|
| 351 |
-
.attr('
|
| 352 |
-
.attr('y', -4)
|
| 353 |
-
.attr('text-anchor', 'start')
|
| 354 |
-
.attr('fill', '#4dc98a')
|
| 355 |
-
.attr('font-size', '8')
|
| 356 |
-
.attr('letter-spacing', '0.06em')
|
| 357 |
-
.text('SERIES 2');
|
| 358 |
}
|
| 359 |
}
|
| 360 |
|
| 361 |
// Axes
|
| 362 |
g.append('g')
|
| 363 |
-
.call(d3.axisLeft(y).ticks(5).tickSize(4))
|
| 364 |
.call(gg => {
|
| 365 |
gg.select('.domain').attr('stroke', '#2a2d3a');
|
| 366 |
gg.selectAll('text').attr('fill', '#8b8fa8').attr('font-size', '9');
|
| 367 |
gg.selectAll('line').attr('stroke', '#2a2d3a');
|
| 368 |
});
|
| 369 |
|
| 370 |
-
g.append('g')
|
| 371 |
-
.attr('transform', `translate(0,${innerH})`)
|
| 372 |
.call(d3.axisBottom(x).tickSize(0))
|
| 373 |
.call(gg => {
|
| 374 |
gg.select('.domain').attr('stroke', '#2a2d3a');
|
|
@@ -377,19 +400,18 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
|
|
| 377 |
const a = experiments.find(a => a.id === d);
|
| 378 |
return a?.series === 2 ? '#4dc98a' : '#f7934f';
|
| 379 |
})
|
| 380 |
-
.attr('font-size', '
|
| 381 |
-
.attr('
|
|
|
|
|
|
|
|
|
|
| 382 |
});
|
| 383 |
|
| 384 |
// Y axis label
|
| 385 |
-
svg.append('text')
|
| 386 |
-
.attr('
|
| 387 |
-
.attr('
|
| 388 |
-
.
|
| 389 |
-
.attr('text-anchor', 'middle')
|
| 390 |
-
.attr('fill', '#555e7a')
|
| 391 |
-
.attr('font-size', '9')
|
| 392 |
-
.text('Failed rollouts (n)');
|
| 393 |
|
| 394 |
// Legend
|
| 395 |
const legendEl = document.getElementById(legendId);
|
|
@@ -401,14 +423,14 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
|
|
| 401 |
`).join('');
|
| 402 |
}
|
| 403 |
|
| 404 |
-
// ── TAB SWITCHER ──────────────────────────────────────────────────────────────
|
| 405 |
const rendered = { l2: false, l1: false };
|
| 406 |
|
| 407 |
function renderTab(id) {
|
| 408 |
if (rendered[id]) return;
|
| 409 |
rendered[id] = true;
|
| 410 |
-
|
| 411 |
-
if (id === '
|
|
|
|
| 412 |
}
|
| 413 |
|
| 414 |
function showTab(id) {
|
|
@@ -422,8 +444,8 @@ function showTab(id) {
|
|
| 422 |
}
|
| 423 |
|
| 424 |
window.showTab = showTab;
|
|
|
|
| 425 |
|
| 426 |
-
// ── RENDER (only the visible tab) ─────────────────────────────────────────────
|
| 427 |
renderTab('l2');
|
| 428 |
|
| 429 |
}
|
|
|
|
| 62 |
padding: 20px 20px 12px;
|
| 63 |
}
|
| 64 |
|
| 65 |
+
.chart-header {
|
| 66 |
+
display: flex;
|
| 67 |
+
justify-content: space-between;
|
| 68 |
+
align-items: center;
|
| 69 |
+
flex-wrap: wrap;
|
| 70 |
+
gap: 8px;
|
| 71 |
+
margin-bottom: 16px;
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
.chart-title {
|
| 75 |
font-size: 11px;
|
| 76 |
text-transform: uppercase;
|
| 77 |
letter-spacing: 0.08em;
|
| 78 |
color: #8b8fa8;
|
|
|
|
| 79 |
}
|
| 80 |
|
| 81 |
+
.mode-toggle {
|
| 82 |
+
display: flex;
|
| 83 |
+
gap: 0;
|
| 84 |
+
}
|
| 85 |
+
.mode-btn {
|
| 86 |
+
padding: 4px 12px;
|
| 87 |
+
font-size: 10px;
|
| 88 |
+
font-family: inherit;
|
| 89 |
+
cursor: pointer;
|
| 90 |
+
border: 1px solid #2a2d3a;
|
| 91 |
+
background: none;
|
| 92 |
+
color: #8b8fa8;
|
| 93 |
+
transition: all 0.15s;
|
| 94 |
+
letter-spacing: 0.04em;
|
| 95 |
+
}
|
| 96 |
+
.mode-btn:first-child { border-radius: 4px 0 0 4px; }
|
| 97 |
+
.mode-btn:last-child { border-radius: 0 4px 4px 0; border-left: none; }
|
| 98 |
+
.mode-btn.active { background: #252835; color: #e8eaf0; border-color: #4a4d5a; }
|
| 99 |
+
.mode-btn:hover:not(.active) { color: #e8eaf0; }
|
| 100 |
+
|
| 101 |
.legend {
|
| 102 |
display: flex;
|
| 103 |
flex-wrap: wrap;
|
|
|
|
| 177 |
<!-- LEVEL 2 PANEL -->
|
| 178 |
<div class="panel active" id="panel-l2">
|
| 179 |
<div class="insight-box">
|
| 180 |
+
<strong>Series 1:</strong> nearly all level 2 failures occur at Unfold — the robot never gets past step 1.
|
| 181 |
<strong>Series 2:</strong> Unfold failures collapse (2.5: 0%), but late-stage failures (Fold 3, Rotation) emerge — the model now reliably unfolds but precision degrades at the end.
|
| 182 |
</div>
|
| 183 |
<div class="chart-wrap">
|
| 184 |
+
<div class="chart-header">
|
| 185 |
+
<div class="chart-title">Where does the robot fail? — Level 2 failed rollouts by subtask</div>
|
| 186 |
+
<div class="mode-toggle">
|
| 187 |
+
<button class="mode-btn active" id="mode-l2-abs" onclick="setMode('l2','abs')">Counts</button>
|
| 188 |
+
<button class="mode-btn" id="mode-l2-pct" onclick="setMode('l2','pct')">Percentage</button>
|
| 189 |
+
</div>
|
| 190 |
+
</div>
|
| 191 |
+
<svg id="chart-l2" width="100%" height="320" style="overflow:visible"></svg>
|
| 192 |
<div class="legend" id="legend-l2"></div>
|
| 193 |
</div>
|
| 194 |
+
<p class="note">Each bar = one experiment, showing how its failed Level 2 rollouts distribute across subtasks. Only failed rollouts shown — successful rollouts are excluded. Toggle "Percentage" to compare failure distributions regardless of total failure count.</p>
|
| 195 |
</div>
|
| 196 |
|
| 197 |
<!-- LEVEL 1 PANEL -->
|
|
|
|
| 200 |
<strong>Level 1 failures</strong> are more distributed since unfolding is given. Series 1 failures concentrate at Fold 2 and Fold 4 (mid-task precision). Series 2 nearly eliminates failures entirely — only 2.3 (mirroring) and 2.4 (chunk=45) regress significantly.
|
| 201 |
</div>
|
| 202 |
<div class="chart-wrap">
|
| 203 |
+
<div class="chart-header">
|
| 204 |
+
<div class="chart-title">Where does the robot fail? — Level 1 failed rollouts by subtask</div>
|
| 205 |
+
<div class="mode-toggle">
|
| 206 |
+
<button class="mode-btn active" id="mode-l1-abs" onclick="setMode('l1','abs')">Counts</button>
|
| 207 |
+
<button class="mode-btn" id="mode-l1-pct" onclick="setMode('l1','pct')">Percentage</button>
|
| 208 |
+
</div>
|
| 209 |
+
</div>
|
| 210 |
+
<svg id="chart-l1" width="100%" height="320" style="overflow:visible"></svg>
|
| 211 |
<div class="legend" id="legend-l1"></div>
|
| 212 |
</div>
|
| 213 |
+
<p class="note">Level 1 begins with the shirt already laid flat, so "Unfold" is not a failure point. Toggle "Percentage" to compare where each experiment struggles, independent of how many total failures it has.</p>
|
| 214 |
</div>
|
| 215 |
|
| 216 |
</div>
|
| 217 |
|
| 218 |
<script>
|
| 219 |
function _initFailureAnalysis() {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
const EXPERIMENTS = [
|
| 221 |
+
{ id:'1.1 π0', series:1 },
|
| 222 |
+
{ id:'1.2 π0.5', series:1 },
|
| 223 |
+
{ id:'1.3 ΔActions', series:1 },
|
| 224 |
+
{ id:'1.4 RABC low', series:1 },
|
| 225 |
+
{ id:'1.5 RABC high', series:1 },
|
| 226 |
+
{ id:'1.7 Δ+RABC', series:1 },
|
| 227 |
+
{ id:'2.1 HQ', series:2 },
|
| 228 |
+
{ id:'2.2 HQ+RABC+Δ', series:2 },
|
| 229 |
+
{ id:'2.3 HQ+mirror', series:2 },
|
| 230 |
+
{ id:'2.4 HQ chunk45', series:2 },
|
| 231 |
+
{ id:'2.5 HQ+RABC+Δ★', series:2 },
|
| 232 |
];
|
| 233 |
|
|
|
|
| 234 |
const L2_FAILURES = {
|
| 235 |
+
'1.1 π0': { 'Unfold':10 },
|
| 236 |
+
'1.2 π0.5': { 'Unfold':9, 'Rotation':1 },
|
| 237 |
+
'1.3 ΔActions': { 'Unfold':10 },
|
| 238 |
+
'1.4 RABC low': { 'Unfold':10 },
|
| 239 |
+
'1.5 RABC high': { 'Unfold':9, 'Fold 1':1 },
|
| 240 |
+
'1.7 Δ+RABC': { 'Unfold':8, 'Fold 3':1, 'Rotation':1 },
|
| 241 |
+
'2.1 HQ': { 'Unfold':8, 'Rotation':1 },
|
| 242 |
+
'2.2 HQ+RABC+Δ': { 'Unfold':4, 'Rotation':1 },
|
| 243 |
+
'2.3 HQ+mirror': { 'Unfold':8, 'Fold 1':1 },
|
| 244 |
+
'2.4 HQ chunk45': { 'Unfold':9, 'Fold 3':1 },
|
| 245 |
+
'2.5 HQ+RABC+Δ★': { 'Unfold':2 },
|
| 246 |
};
|
| 247 |
|
|
|
|
| 248 |
const L1_FAILURES = {
|
| 249 |
+
'1.1 π0': { 'Fold 2':1 },
|
| 250 |
+
'1.2 π0.5': { 'Rotation':4, 'Fold 4':2, 'Fold 2':1 },
|
| 251 |
+
'1.3 ΔActions': { 'Rotation':1, 'Fold 4':1 },
|
| 252 |
+
'1.4 RABC low': { 'Rotation':2, 'Fold 3':1, 'Fold 4':2, 'Fold 2':3 },
|
| 253 |
+
'1.5 RABC high': { 'Fold 3':2, 'Fold 2':6, 'Fold 1':1 },
|
| 254 |
+
'1.7 Δ+RABC': { 'Fold 4':1, 'Fold 2':1, 'Rotation':1 },
|
| 255 |
+
'2.1 HQ': { 'Fold 2':1, 'Fold 4':1 },
|
| 256 |
+
'2.2 HQ+RABC+Δ': { 'Fold 2':1 },
|
| 257 |
+
'2.3 HQ+mirror': { 'Fold 1':3, 'Fold 4':3, 'Fold 3':3 },
|
| 258 |
+
'2.4 HQ chunk45': { 'Rotation':2, 'Fold 4':3, 'Fold 3':1 },
|
| 259 |
+
'2.5 HQ+RABC+Δ★': {},
|
| 260 |
};
|
| 261 |
|
| 262 |
const SUBTASKS_L2 = ['Unfold','Fold 1','Fold 2','Fold 3','Fold 4','Rotation'];
|
| 263 |
const SUBTASKS_L1 = ['Fold 1','Fold 2','Fold 3','Fold 4','Rotation'];
|
| 264 |
|
|
|
|
| 265 |
const COLORS = {
|
| 266 |
'Unfold': '#ef4444',
|
| 267 |
'Fold 1': '#f97316',
|
|
|
|
| 271 |
'Rotation': '#818cf8',
|
| 272 |
};
|
| 273 |
|
| 274 |
+
const modes = { l2: 'abs', l1: 'abs' };
|
| 275 |
+
|
| 276 |
+
function setMode(level, mode) {
|
| 277 |
+
modes[level] = mode;
|
| 278 |
+
document.getElementById(`mode-${level}-abs`).classList.toggle('active', mode === 'abs');
|
| 279 |
+
document.getElementById(`mode-${level}-pct`).classList.toggle('active', mode === 'pct');
|
| 280 |
+
// Force re-render
|
| 281 |
+
rendered[level] = false;
|
| 282 |
+
renderTab(level);
|
| 283 |
+
}
|
| 284 |
+
|
| 285 |
+
function buildStackedBar(svgId, legendId, data, subtasks, experiments, normalize) {
|
| 286 |
const svgEl = document.getElementById(svgId);
|
| 287 |
const W = svgEl.parentElement.clientWidth - 40;
|
| 288 |
+
const H = 340;
|
| 289 |
+
const margin = { top: 30, right: 16, bottom: 80, left: 70 };
|
| 290 |
const innerW = W - margin.left - margin.right;
|
| 291 |
const innerH = H - margin.top - margin.bottom;
|
| 292 |
|
|
|
|
| 297 |
.attr('viewBox', `0 0 ${W} ${H}`)
|
| 298 |
.attr('height', H);
|
| 299 |
|
| 300 |
+
svg.selectAll('*').remove();
|
| 301 |
+
|
| 302 |
const g = svg.append('g')
|
| 303 |
.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 304 |
|
|
|
|
| 305 |
const expIds = experiments.map(a => a.id);
|
| 306 |
const stackData = expIds.map(id => {
|
| 307 |
const row = { id };
|
|
|
|
| 310 |
return row;
|
| 311 |
});
|
| 312 |
|
| 313 |
+
let displayData;
|
| 314 |
+
if (normalize) {
|
| 315 |
+
displayData = stackData.map(row => {
|
| 316 |
+
const out = { id: row.id, _total: row._total };
|
| 317 |
+
subtasks.forEach(s => {
|
| 318 |
+
out[s] = row._total > 0 ? (row[s] / row._total) * 100 : 0;
|
| 319 |
+
});
|
| 320 |
+
out._displayTotal = row._total > 0 ? 100 : 0;
|
| 321 |
+
return out;
|
| 322 |
+
});
|
| 323 |
+
} else {
|
| 324 |
+
displayData = stackData.map(row => ({ ...row, _displayTotal: row._total }));
|
| 325 |
+
}
|
| 326 |
|
| 327 |
+
const maxVal = normalize ? 100 : (d3.max(displayData, d => d._displayTotal) || 10);
|
|
|
|
|
|
|
|
|
|
| 328 |
|
| 329 |
+
const x = d3.scaleBand().domain(expIds).range([0, innerW]).padding(0.28);
|
| 330 |
+
const y = d3.scaleLinear().domain([0, maxVal]).range([innerH, 0]).nice();
|
| 331 |
+
const stack = d3.stack().keys(subtasks)(displayData);
|
| 332 |
|
| 333 |
// Grid lines
|
| 334 |
+
g.append('g').attr('class', 'grid')
|
| 335 |
+
.call(d3.axisLeft(y).tickSize(-innerW).tickFormat('').ticks(5))
|
|
|
|
|
|
|
|
|
|
|
|
|
| 336 |
.call(gg => {
|
| 337 |
gg.select('.domain').remove();
|
| 338 |
+
gg.selectAll('line').attr('stroke', '#2a2d3a').attr('stroke-dasharray', '3,3');
|
|
|
|
|
|
|
| 339 |
});
|
| 340 |
|
| 341 |
// Stacked bars
|
| 342 |
+
const layer = g.selectAll('.layer').data(stack).join('g')
|
| 343 |
+
.attr('class', 'layer').attr('fill', d => COLORS[d.key] || '#666');
|
| 344 |
+
|
| 345 |
+
layer.selectAll('rect').data(d => d).join('rect')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 346 |
.attr('x', d => x(d.data.id))
|
| 347 |
.attr('y', d => y(d[1]))
|
| 348 |
.attr('height', d => Math.max(0, y(d[0]) - y(d[1])))
|
|
|
|
| 350 |
.attr('rx', 2)
|
| 351 |
.attr('opacity', 0.88);
|
| 352 |
|
| 353 |
+
// Labels on top
|
| 354 |
+
g.selectAll('.bar-label').data(displayData).join('text')
|
|
|
|
|
|
|
| 355 |
.attr('class', 'bar-label')
|
| 356 |
.attr('x', d => x(d.id) + x.bandwidth() / 2)
|
| 357 |
+
.attr('y', d => d._total === 0 ? y(0) - 4 : y(d._displayTotal) - 5)
|
| 358 |
.attr('text-anchor', 'middle')
|
| 359 |
.attr('fill', d => d._total === 0 ? '#3a3d4a' : '#8b8fa8')
|
| 360 |
.attr('font-size', '9')
|
| 361 |
+
.text(d => {
|
| 362 |
+
if (d._total === 0) return '✓ 0 failures';
|
| 363 |
+
return normalize ? `n=${d._total}` : d._total;
|
| 364 |
+
});
|
| 365 |
|
| 366 |
// Series divider line
|
| 367 |
const s1Last = experiments.filter(a => a.series === 1).pop().id;
|
|
|
|
| 370 |
const xDiv = x(s1Last) + x.bandwidth() + x.step() * 0.14;
|
| 371 |
g.append('line')
|
| 372 |
.attr('x1', xDiv).attr('x2', xDiv)
|
| 373 |
+
.attr('y1', -22).attr('y2', innerH + 4)
|
| 374 |
+
.attr('stroke', '#3a3d4a').attr('stroke-width', 1).attr('stroke-dasharray', '4,3');
|
| 375 |
+
|
| 376 |
+
g.append('text').attr('x', xDiv - 6).attr('y', -18).attr('text-anchor', 'end')
|
| 377 |
+
.attr('fill', '#f7934f').attr('font-size', '8').attr('letter-spacing', '0.06em').text('SERIES 1');
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 378 |
|
| 379 |
if (s2First) {
|
| 380 |
+
g.append('text').attr('x', xDiv + 6).attr('y', -18).attr('text-anchor', 'start')
|
| 381 |
+
.attr('fill', '#4dc98a').attr('font-size', '8').attr('letter-spacing', '0.06em').text('SERIES 2');
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 382 |
}
|
| 383 |
}
|
| 384 |
|
| 385 |
// Axes
|
| 386 |
g.append('g')
|
| 387 |
+
.call(d3.axisLeft(y).ticks(5).tickSize(4).tickFormat(d => normalize ? d + '%' : d))
|
| 388 |
.call(gg => {
|
| 389 |
gg.select('.domain').attr('stroke', '#2a2d3a');
|
| 390 |
gg.selectAll('text').attr('fill', '#8b8fa8').attr('font-size', '9');
|
| 391 |
gg.selectAll('line').attr('stroke', '#2a2d3a');
|
| 392 |
});
|
| 393 |
|
| 394 |
+
g.append('g').attr('transform', `translate(0,${innerH})`)
|
|
|
|
| 395 |
.call(d3.axisBottom(x).tickSize(0))
|
| 396 |
.call(gg => {
|
| 397 |
gg.select('.domain').attr('stroke', '#2a2d3a');
|
|
|
|
| 400 |
const a = experiments.find(a => a.id === d);
|
| 401 |
return a?.series === 2 ? '#4dc98a' : '#f7934f';
|
| 402 |
})
|
| 403 |
+
.attr('font-size', '9')
|
| 404 |
+
.attr('transform', 'rotate(-40)')
|
| 405 |
+
.attr('text-anchor', 'end')
|
| 406 |
+
.attr('dx', '-0.5em')
|
| 407 |
+
.attr('dy', '0.3em');
|
| 408 |
});
|
| 409 |
|
| 410 |
// Y axis label
|
| 411 |
+
svg.append('text').attr('transform', 'rotate(-90)')
|
| 412 |
+
.attr('x', -(margin.top + innerH / 2)).attr('y', 10).attr('text-anchor', 'middle')
|
| 413 |
+
.attr('fill', '#555e7a').attr('font-size', '9')
|
| 414 |
+
.text(normalize ? 'Failure distribution (%)' : 'Failed rollouts (n)');
|
|
|
|
|
|
|
|
|
|
|
|
|
| 415 |
|
| 416 |
// Legend
|
| 417 |
const legendEl = document.getElementById(legendId);
|
|
|
|
| 423 |
`).join('');
|
| 424 |
}
|
| 425 |
|
|
|
|
| 426 |
const rendered = { l2: false, l1: false };
|
| 427 |
|
| 428 |
function renderTab(id) {
|
| 429 |
if (rendered[id]) return;
|
| 430 |
rendered[id] = true;
|
| 431 |
+
const normalize = modes[id] === 'pct';
|
| 432 |
+
if (id === 'l2') buildStackedBar('chart-l2', 'legend-l2', L2_FAILURES, SUBTASKS_L2, EXPERIMENTS, normalize);
|
| 433 |
+
if (id === 'l1') buildStackedBar('chart-l1', 'legend-l1', L1_FAILURES, SUBTASKS_L1, EXPERIMENTS, normalize);
|
| 434 |
}
|
| 435 |
|
| 436 |
function showTab(id) {
|
|
|
|
| 444 |
}
|
| 445 |
|
| 446 |
window.showTab = showTab;
|
| 447 |
+
window.setMode = setMode;
|
| 448 |
|
|
|
|
| 449 |
renderTab('l2');
|
| 450 |
|
| 451 |
}
|
app/src/content/embeds/folding/l1-time-quality.html
CHANGED
|
@@ -7,207 +7,159 @@
|
|
| 7 |
:root { --bg: transparent; --text: #e8eaf0; --subtext: #8b8fa8; --grid: #2a2d3a; --border: #2a2d3a; }
|
| 8 |
* { box-sizing: border-box; margin: 0; padding: 0; }
|
| 9 |
body { background: var(--bg); font-family: system-ui, sans-serif; color: var(--text); }
|
| 10 |
-
|
| 11 |
-
.legend
|
| 12 |
-
.legend-
|
| 13 |
-
.legend-
|
| 14 |
-
.legend-
|
|
|
|
|
|
|
| 15 |
.tooltip {
|
| 16 |
position: absolute; background: #1a1d27; border: 1px solid var(--border);
|
| 17 |
border-radius: 8px; padding: 10px 14px; pointer-events: none;
|
| 18 |
-
opacity: 0; transition: opacity .15s; z-index: 10; min-width:
|
| 19 |
box-shadow: 0 4px 16px rgba(0,0,0,.4); font-size: 13px;
|
| 20 |
}
|
| 21 |
.tooltip strong { display: block; margin-bottom: 5px; }
|
| 22 |
.tooltip-row { display: flex; justify-content: space-between; gap: 12px; margin-top: 3px; font-size: 12px; color: var(--subtext); }
|
| 23 |
.tooltip-row span:last-child { color: var(--text); font-weight: 600; }
|
| 24 |
|
| 25 |
-
.
|
| 26 |
-
.exp-ref-toggle { background: none; border: 1px solid #2a2d3a; color: #8b8fa8; font-size: 11px;
|
| 27 |
-
padding: 4px 10px; border-radius: 6px; cursor: pointer; margin-bottom: 8px; }
|
| 28 |
-
.exp-ref-toggle:hover { color: #e8eaf0; border-color: #4f8ef7; }
|
| 29 |
-
.exp-table { width: 100%; border-collapse: collapse; font-size: 11px; }
|
| 30 |
-
.exp-table th { color: #8b8fa8; font-weight: 500; text-align: left; padding: 4px 8px;
|
| 31 |
-
border-bottom: 1px solid #2a2d3a; white-space: nowrap; }
|
| 32 |
-
.exp-table td { color: #c8cad8; padding: 4px 8px; border-bottom: 1px solid #1a1d27; vertical-align: top; }
|
| 33 |
-
.exp-table td:first-child { color: #e8eaf0; font-weight: 600; white-space: nowrap; }
|
| 34 |
-
.exp-table tr.s2 td { background: rgba(247,147,79,0.05); }
|
| 35 |
-
.exp-table tr.s1 td { background: rgba(79,142,247,0.04); }
|
| 36 |
-
.exp-table tr:hover td { background: rgba(255,255,255,0.04); }
|
| 37 |
-
|
| 38 |
-
.axis text { fill: var(--subtext); font-size: 12px; }
|
| 39 |
.axis line, .axis path { stroke: var(--grid); }
|
| 40 |
.grid line { stroke: var(--grid); stroke-dasharray: 3,3; }
|
|
|
|
|
|
|
| 41 |
</style>
|
| 42 |
</head>
|
| 43 |
<body>
|
| 44 |
<div class="legend">
|
| 45 |
-
<div class="legend-item"><div class="legend-pip" style="background:#f7934f"></div>
|
| 46 |
-
<div class="legend-item"><div class="legend-
|
| 47 |
-
<
|
| 48 |
-
|
| 49 |
-
<
|
| 50 |
-
<div id="exp-ref"></div>
|
| 51 |
</div>
|
| 52 |
<div style="position:relative">
|
| 53 |
-
<svg id="tq-chart"></svg>
|
| 54 |
<div class="tooltip" id="tq-tooltip"></div>
|
| 55 |
</div>
|
| 56 |
<script>
|
| 57 |
function _initL1TimeQuality() {
|
| 58 |
const raw = [
|
| 59 |
-
{label:"1.1",series:"1",l1time:121.5, quality:2.70, total_sr:40},
|
| 60 |
-
{label:"1.2",series:"1",l1time:90.75, quality:2.50, total_sr:20},
|
| 61 |
-
{label:"1.3",series:"1",l1time:113.86,quality:2.80, total_sr:35},
|
| 62 |
-
{label:"1.4",series:"1",l1time:78.33, quality:2.20, total_sr:15},
|
| 63 |
-
{label:"1.5",series:"1",l1time:null, quality:1.00, total_sr:0 },
|
| 64 |
-
{label:"1.7",series:"1",l1time:99.5, quality:2.30, total_sr:40},
|
| 65 |
-
{label:"2.1",series:"2",l1time:57.57, quality:2.80, total_sr:40},
|
| 66 |
-
{label:"2.2",series:"2",l1time:43.2, quality:3.30, total_sr:75},
|
| 67 |
-
{label:"2.3",series:"2",l1time:null, quality:1.00, total_sr:5 },
|
| 68 |
-
{label:"2.4",series:"2",l1time:72.5, quality:1.80, total_sr:20},
|
| 69 |
-
{label:"2.5",series:"2",l1time:40.8, quality:4.10, total_sr:90},
|
| 70 |
];
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
const data = [...raw].sort((a,b) => {
|
| 75 |
-
if (a.l1time===null && b.l1time===null) return 0;
|
| 76 |
-
if (a.l1time===null) return 1;
|
| 77 |
-
if (b.l1time===null) return -1;
|
| 78 |
-
return a.l1time - b.l1time;
|
| 79 |
-
});
|
| 80 |
|
| 81 |
const seriesColor = s => s === "2" ? "#f7934f" : "#4f8ef7";
|
| 82 |
-
const margin = {top:
|
| 83 |
const svg = d3.select("#tq-chart");
|
| 84 |
const container = svg.node().parentElement;
|
| 85 |
const tooltip = d3.select("#tq-tooltip");
|
| 86 |
|
|
|
|
|
|
|
| 87 |
function render() {
|
| 88 |
svg.selectAll("*").remove();
|
| 89 |
const W = container.clientWidth;
|
| 90 |
-
const H = Math.max(
|
| 91 |
const w = W - margin.left - margin.right;
|
| 92 |
const h = H - margin.top - margin.bottom;
|
| 93 |
svg.attr("width",W).attr("height",H);
|
| 94 |
const g = svg.append("g").attr("transform",`translate(${margin.left},${margin.top})`);
|
| 95 |
|
| 96 |
-
const
|
| 97 |
-
const
|
| 98 |
-
const yQual = d3.scaleLinear().domain([0, 5]).range([h,0]);
|
| 99 |
|
| 100 |
-
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
d3.axisLeft(yTime).ticks(5).tickFormat(d=>d+"s").tickSize(0))
|
| 107 |
-
.call(ax=>ax.select(".domain").remove())
|
| 108 |
-
.call(ax=>ax.selectAll(".tick line").remove());
|
| 109 |
-
g.append("g").attr("class","axis").attr("transform",`translate(${w},0)`).call(
|
| 110 |
-
d3.axisRight(yQual).ticks(5).tickSize(0))
|
| 111 |
-
.call(ax=>ax.select(".domain").remove())
|
| 112 |
-
.call(ax=>ax.selectAll(".tick line").remove());
|
| 113 |
-
|
| 114 |
-
// Right axis label
|
| 115 |
-
g.append("text").attr("x",w+50).attr("y",h/2).attr("text-anchor","middle")
|
| 116 |
-
.attr("fill","#fbbf24").attr("font-size",10)
|
| 117 |
-
.attr("transform",`rotate(90,${w+50},${h/2})`)
|
| 118 |
-
.text("Quality (1–5)");
|
| 119 |
-
|
| 120 |
-
// Series pip under labels
|
| 121 |
-
data.forEach(d => {
|
| 122 |
-
g.append("rect")
|
| 123 |
-
.attr("x",x(d.label)).attr("width",x.bandwidth())
|
| 124 |
-
.attr("y",h+28).attr("height",4).attr("rx",2)
|
| 125 |
-
.attr("fill",seriesColor(d.series)).attr("opacity",0.8);
|
| 126 |
-
});
|
| 127 |
|
| 128 |
-
//
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
.attr("x",x(d.label)).attr("width",x.bandwidth())
|
| 133 |
-
.attr("y",yTime(d.l1time)).attr("height",h-yTime(d.l1time))
|
| 134 |
-
.attr("fill",seriesColor(d.series)).attr("rx",3).attr("opacity",0.85)
|
| 135 |
-
.style("cursor","pointer")
|
| 136 |
-
.on("mousemove",function(event){
|
| 137 |
-
tooltip.style("opacity",1).html(`
|
| 138 |
-
<strong>Experiment ${d.label} <small style="color:${seriesColor(d.series)}">(Series ${d.series})</small></strong>\n <div style=\"margin-top:6px;padding-top:6px;border-top:1px solid #2a2d3a;font-size:11px;color:#8b8fa8;line-height:1.5\">${(EXPERIMENTS[d.label]||{}).note||''}</div>
|
| 139 |
-
<div class="tooltip-row"><span>Avg L1 Time</span><span>${d.l1time.toFixed(1)}s</span></div>
|
| 140 |
-
<div class="tooltip-row"><span>Fold Quality</span><span>${d.quality.toFixed(2)}/5</span></div>
|
| 141 |
-
<div class="tooltip-row"><span>Total SR</span><span>${d.total_sr}%</span></div>
|
| 142 |
-
`);
|
| 143 |
-
const bx=container.getBoundingClientRect();
|
| 144 |
-
const ex=event.clientX-bx.left, ey=event.clientY-bx.top;
|
| 145 |
-
tooltip.style("left",Math.min(ex+12,W-185)+"px").style("top",Math.max(ey-90,0)+"px");
|
| 146 |
-
})
|
| 147 |
-
.on("mouseleave",()=>tooltip.style("opacity",0));
|
| 148 |
-
|
| 149 |
-
g.append("text")
|
| 150 |
-
.attr("x",x(d.label)+x.bandwidth()/2).attr("y",yTime(d.l1time)-4)
|
| 151 |
-
.attr("text-anchor","middle").attr("fill","#e8eaf0")
|
| 152 |
-
.attr("font-size",Math.max(8,Math.min(11,x.bandwidth()*0.28)))
|
| 153 |
-
.text(d.l1time.toFixed(0)+"s");
|
| 154 |
-
} else {
|
| 155 |
-
g.append("text")
|
| 156 |
-
.attr("x",x(d.label)+x.bandwidth()/2).attr("y",h-8)
|
| 157 |
-
.attr("text-anchor","middle").attr("fill","#3a3d4a").attr("font-size",9)
|
| 158 |
-
.text("N/A");
|
| 159 |
-
}
|
| 160 |
-
});
|
| 161 |
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
.
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
g.append("circle")
|
| 170 |
-
.attr("cx",
|
| 171 |
-
.attr("
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
});
|
| 173 |
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 177 |
}
|
| 178 |
|
| 179 |
render();
|
| 180 |
window.addEventListener("resize", render);
|
| 181 |
-
|
| 182 |
-
const EXPERIMENTS = {
|
| 183 |
-
"1.1": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
|
| 184 |
-
"1.2": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
|
| 185 |
-
"1.3": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
|
| 186 |
-
"1.4": { desc:"π0.5 · all data · 200k steps �� RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
|
| 187 |
-
"1.5": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
|
| 188 |
-
"1.7": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
|
| 189 |
-
"2.1": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
|
| 190 |
-
"2.2": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
|
| 191 |
-
"2.3": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
|
| 192 |
-
"2.4": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
|
| 193 |
-
"2.5": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
|
| 194 |
-
};
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
(function buildRefTable() {
|
| 198 |
-
const container = document.getElementById('exp-ref');
|
| 199 |
-
if (!container) return;
|
| 200 |
-
const order = ["1.1","1.2","1.3","1.4","1.5","1.7","2.1","2.2","2.3","2.4","2.5"];
|
| 201 |
-
let html = '<table class="exp-table"><thead><tr><th>#</th><th>Description</th></tr></thead><tbody>';
|
| 202 |
-
order.forEach(k => {
|
| 203 |
-
const a = EXPERIMENTS[k];
|
| 204 |
-
const series = k.startsWith("2") ? "s2" : "s1";
|
| 205 |
-
html += `<tr class="${series}"><td><strong>${k}</strong></td><td>${a.desc}</td></tr>`;
|
| 206 |
-
});
|
| 207 |
-
html += '</tbody></table>';
|
| 208 |
-
container.innerHTML = html;
|
| 209 |
-
})();
|
| 210 |
-
|
| 211 |
}
|
| 212 |
|
| 213 |
if (typeof d3 !== "undefined") {
|
|
|
|
| 7 |
:root { --bg: transparent; --text: #e8eaf0; --subtext: #8b8fa8; --grid: #2a2d3a; --border: #2a2d3a; }
|
| 8 |
* { box-sizing: border-box; margin: 0; padding: 0; }
|
| 9 |
body { background: var(--bg); font-family: system-ui, sans-serif; color: var(--text); }
|
| 10 |
+
|
| 11 |
+
.legend { display: flex; gap: 16px; justify-content: center; flex-wrap: wrap; margin-bottom: 8px; align-items: center; }
|
| 12 |
+
.legend-item { display: flex; align-items: center; gap: 6px; font-size: 11px; color: var(--subtext); }
|
| 13 |
+
.legend-pip { width: 10px; height: 10px; border-radius: 50%; display: inline-block; border: 1.5px solid #1a1d27; }
|
| 14 |
+
.legend-size { display: flex; align-items: center; gap: 4px; font-size: 10px; color: var(--subtext); }
|
| 15 |
+
.legend-size circle { fill: none; stroke: var(--subtext); stroke-width: 1; }
|
| 16 |
+
|
| 17 |
.tooltip {
|
| 18 |
position: absolute; background: #1a1d27; border: 1px solid var(--border);
|
| 19 |
border-radius: 8px; padding: 10px 14px; pointer-events: none;
|
| 20 |
+
opacity: 0; transition: opacity .15s; z-index: 10; min-width: 200px;
|
| 21 |
box-shadow: 0 4px 16px rgba(0,0,0,.4); font-size: 13px;
|
| 22 |
}
|
| 23 |
.tooltip strong { display: block; margin-bottom: 5px; }
|
| 24 |
.tooltip-row { display: flex; justify-content: space-between; gap: 12px; margin-top: 3px; font-size: 12px; color: var(--subtext); }
|
| 25 |
.tooltip-row span:last-child { color: var(--text); font-weight: 600; }
|
| 26 |
|
| 27 |
+
.axis text { fill: var(--subtext); font-size: 11px; }
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
.axis line, .axis path { stroke: var(--grid); }
|
| 29 |
.grid line { stroke: var(--grid); stroke-dasharray: 3,3; }
|
| 30 |
+
|
| 31 |
+
.annotation-line { stroke: #3a3d4a; stroke-dasharray: 4,3; stroke-width: 1; }
|
| 32 |
</style>
|
| 33 |
</head>
|
| 34 |
<body>
|
| 35 |
<div class="legend">
|
| 36 |
+
<div class="legend-item"><div class="legend-pip" style="background:#f7934f"></div>Series 2</div>
|
| 37 |
+
<div class="legend-item"><div class="legend-pip" style="background:#4f8ef7"></div>Series 1</div>
|
| 38 |
+
<div class="legend-item" style="margin-left:8px; font-size:10px; color:#555">
|
| 39 |
+
Bubble size = Total SR
|
| 40 |
+
</div>
|
|
|
|
| 41 |
</div>
|
| 42 |
<div style="position:relative">
|
| 43 |
+
<svg id="tq-chart" style="overflow:visible"></svg>
|
| 44 |
<div class="tooltip" id="tq-tooltip"></div>
|
| 45 |
</div>
|
| 46 |
<script>
|
| 47 |
function _initL1TimeQuality() {
|
| 48 |
const raw = [
|
| 49 |
+
{label:"1.1 π0",series:"1",l1time:121.5, quality:2.70, total_sr:40},
|
| 50 |
+
{label:"1.2 π0.5",series:"1",l1time:90.75, quality:2.50, total_sr:20},
|
| 51 |
+
{label:"1.3 ΔActions",series:"1",l1time:113.86,quality:2.80, total_sr:35},
|
| 52 |
+
{label:"1.4 RABC low",series:"1",l1time:78.33, quality:2.20, total_sr:15},
|
| 53 |
+
{label:"1.5 RABC high",series:"1",l1time:null, quality:1.00, total_sr:0 },
|
| 54 |
+
{label:"1.7 Δ+RABC",series:"1",l1time:99.5, quality:2.30, total_sr:40},
|
| 55 |
+
{label:"2.1 HQ",series:"2",l1time:57.57, quality:2.80, total_sr:40},
|
| 56 |
+
{label:"2.2 HQ+RABC+Δ",series:"2",l1time:43.2, quality:3.30, total_sr:75},
|
| 57 |
+
{label:"2.3 HQ+mirror",series:"2",l1time:null, quality:1.00, total_sr:5 },
|
| 58 |
+
{label:"2.4 HQ chunk45",series:"2",l1time:72.5, quality:1.80, total_sr:20},
|
| 59 |
+
{label:"2.5 HQ+RABC+Δ★",series:"2",l1time:40.8, quality:4.10, total_sr:90},
|
| 60 |
];
|
| 61 |
|
| 62 |
+
const data = raw.filter(d => d.l1time !== null);
|
| 63 |
+
const noData = raw.filter(d => d.l1time === null);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
const seriesColor = s => s === "2" ? "#f7934f" : "#4f8ef7";
|
| 66 |
+
const margin = {top:40, right:24, bottom:48, left:54};
|
| 67 |
const svg = d3.select("#tq-chart");
|
| 68 |
const container = svg.node().parentElement;
|
| 69 |
const tooltip = d3.select("#tq-tooltip");
|
| 70 |
|
| 71 |
+
const rScale = d3.scaleSqrt().domain([0, 100]).range([4, 22]);
|
| 72 |
+
|
| 73 |
function render() {
|
| 74 |
svg.selectAll("*").remove();
|
| 75 |
const W = container.clientWidth;
|
| 76 |
+
const H = Math.max(300, Math.min(400, W * 0.52));
|
| 77 |
const w = W - margin.left - margin.right;
|
| 78 |
const h = H - margin.top - margin.bottom;
|
| 79 |
svg.attr("width",W).attr("height",H);
|
| 80 |
const g = svg.append("g").attr("transform",`translate(${margin.left},${margin.top})`);
|
| 81 |
|
| 82 |
+
const xTime = d3.scaleLinear().domain([30, 135]).range([0,w]);
|
| 83 |
+
const yQual = d3.scaleLinear().domain([1.5, 4.5]).range([h,0]);
|
|
|
|
| 84 |
|
| 85 |
+
// Grid
|
| 86 |
+
g.append("g").attr("class","grid").selectAll("line.h").data(yQual.ticks(5)).join("line")
|
| 87 |
+
.attr("x1",0).attr("x2",w).attr("y1",d=>yQual(d)).attr("y2",d=>yQual(d));
|
| 88 |
+
g.append("g").attr("class","grid").selectAll("line.v").data(xTime.ticks(6)).join("line")
|
| 89 |
+
.attr("y1",0).attr("y2",h).attr("x1",d=>xTime(d)).attr("x2",d=>xTime(d));
|
| 90 |
|
| 91 |
+
// "Better" direction annotation
|
| 92 |
+
g.append("text").attr("x",4).attr("y",8).attr("fill","#4dc98a").attr("font-size",9).attr("opacity",0.6)
|
| 93 |
+
.text("← faster, better quality ↑");
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
+
// Axes
|
| 96 |
+
g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
|
| 97 |
+
d3.axisBottom(xTime).ticks(6).tickFormat(d=>d+"s").tickSize(0))
|
| 98 |
+
.call(gg=>gg.select(".domain").remove());
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
+
g.append("g").attr("class","axis").call(
|
| 101 |
+
d3.axisLeft(yQual).ticks(5).tickFormat(d=>d.toFixed(1)).tickSize(0))
|
| 102 |
+
.call(ax=>ax.select(".domain").remove());
|
| 103 |
+
|
| 104 |
+
// Axis labels
|
| 105 |
+
g.append("text").attr("x",w/2).attr("y",h+38).attr("text-anchor","middle")
|
| 106 |
+
.attr("fill","#8b8fa8").attr("font-size",11).text("Level 1 Completion Time (s) → slower");
|
| 107 |
+
g.append("text").attr("x",-h/2).attr("y",-40).attr("text-anchor","middle")
|
| 108 |
+
.attr("transform","rotate(-90)").attr("fill","#8b8fa8").attr("font-size",11).text("Fold Quality (1–5)");
|
| 109 |
+
|
| 110 |
+
// Quality = 3.0 reference line
|
| 111 |
+
g.append("line").attr("class","annotation-line")
|
| 112 |
+
.attr("x1",0).attr("x2",w).attr("y1",yQual(3.0)).attr("y2",yQual(3.0));
|
| 113 |
+
g.append("text").attr("x",w-2).attr("y",yQual(3.0)-5).attr("text-anchor","end")
|
| 114 |
+
.attr("fill","#3a3d4a").attr("font-size",8).text("quality = 3.0");
|
| 115 |
+
|
| 116 |
+
// Draw bubbles (larger ones first so smaller ones are on top)
|
| 117 |
+
const sorted = [...data].sort((a,b) => b.total_sr - a.total_sr);
|
| 118 |
+
|
| 119 |
+
sorted.forEach(d => {
|
| 120 |
+
const cx = xTime(d.l1time);
|
| 121 |
+
const cy = yQual(d.quality);
|
| 122 |
+
const r = rScale(d.total_sr);
|
| 123 |
+
const c = seriesColor(d.series);
|
| 124 |
+
|
| 125 |
+
// Bubble
|
| 126 |
g.append("circle")
|
| 127 |
+
.attr("cx",cx).attr("cy",cy).attr("r",r)
|
| 128 |
+
.attr("fill",c).attr("fill-opacity",0.25)
|
| 129 |
+
.attr("stroke",c).attr("stroke-width",1.5)
|
| 130 |
+
.style("cursor","pointer")
|
| 131 |
+
.on("mousemove",function(event){
|
| 132 |
+
tooltip.style("opacity",1).html(`
|
| 133 |
+
<strong>${d.label} <small style="color:${c}">(Series ${d.series})</small></strong>
|
| 134 |
+
<div class="tooltip-row"><span>L1 Completion Time</span><span>${d.l1time.toFixed(1)}s</span></div>
|
| 135 |
+
<div class="tooltip-row"><span>Fold Quality</span><span>${d.quality.toFixed(2)} / 5</span></div>
|
| 136 |
+
<div class="tooltip-row"><span>Total Success Rate</span><span>${d.total_sr}%</span></div>
|
| 137 |
+
`);
|
| 138 |
+
const bx=container.getBoundingClientRect();
|
| 139 |
+
const ex=event.clientX-bx.left, ey=event.clientY-bx.top;
|
| 140 |
+
tooltip.style("left",Math.min(ex+12,W-210)+"px").style("top",Math.max(ey-90,0)+"px");
|
| 141 |
+
})
|
| 142 |
+
.on("mouseleave",()=>tooltip.style("opacity",0));
|
| 143 |
+
|
| 144 |
+
// Label below bubble
|
| 145 |
+
g.append("text")
|
| 146 |
+
.attr("x",cx).attr("y", cy + r + 12)
|
| 147 |
+
.attr("text-anchor","middle")
|
| 148 |
+
.attr("fill","#e8eaf0").attr("font-size",9).attr("font-weight","500")
|
| 149 |
+
.text(d.label);
|
| 150 |
});
|
| 151 |
|
| 152 |
+
// "No data" annotation for experiments with null L1 time
|
| 153 |
+
if (noData.length > 0) {
|
| 154 |
+
const noDataText = noData.map(d => d.label).join(", ");
|
| 155 |
+
g.append("text").attr("x",w).attr("y",h-4).attr("text-anchor","end")
|
| 156 |
+
.attr("fill","#3a3d4a").attr("font-size",9)
|
| 157 |
+
.text(`No L1 completions: ${noDataText}`);
|
| 158 |
+
}
|
| 159 |
}
|
| 160 |
|
| 161 |
render();
|
| 162 |
window.addEventListener("resize", render);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
}
|
| 164 |
|
| 165 |
if (typeof d3 !== "undefined") {
|
app/src/content/embeds/folding/loss-curves.html
CHANGED
|
@@ -65,17 +65,17 @@
|
|
| 65 |
<script>
|
| 66 |
function _initLossCurves() {
|
| 67 |
const RUNS = [
|
| 68 |
-
{ key: "ablation1_1_2", label: "1.1
|
| 69 |
-
{ key: "ablation1_2_2", label: "1.2
|
| 70 |
-
{ key: "ablation1_3_17_q", label: "1.3
|
| 71 |
-
{ key: "ablation1-4", label: "1.4
|
| 72 |
-
{ key: "ablation1-5_9", label: "1.5
|
| 73 |
-
{ key: "ablation1-7_2", label: "1.7
|
| 74 |
-
{ key: "ablation2-1_100k_q", label: "2.1
|
| 75 |
-
{ key: "ablation2-2_100k", label: "2.2
|
| 76 |
-
{ key: "ablation2-3_100k_q_it", label: "2.3
|
| 77 |
-
{ key: "ablation2-4_100k_q", label: "2.4
|
| 78 |
-
{ key: "ablation2-5_0", label: "2.5
|
| 79 |
];
|
| 80 |
|
| 81 |
// ──────────────────────────────────────────────
|
|
|
|
| 65 |
<script>
|
| 66 |
function _initLossCurves() {
|
| 67 |
const RUNS = [
|
| 68 |
+
{ key: "ablation1_1_2", label: "1.1 π0", series: "s1", color: "#ef5350" },
|
| 69 |
+
{ key: "ablation1_2_2", label: "1.2 π0.5", series: "s1", color: "#f59e0b" },
|
| 70 |
+
{ key: "ablation1_3_17_q", label: "1.3 ΔActions", series: "s1", color: "#10b981" },
|
| 71 |
+
{ key: "ablation1-4", label: "1.4 RABC low", series: "s1", color: "#3b82f6" },
|
| 72 |
+
{ key: "ablation1-5_9", label: "1.5 RABC high", series: "s1", color: "#8b5cf6" },
|
| 73 |
+
{ key: "ablation1-7_2", label: "1.7 Δ+RABC", series: "s1", color: "#ec4899" },
|
| 74 |
+
{ key: "ablation2-1_100k_q", label: "2.1 HQ", series: "s2", color: "#14b8a6" },
|
| 75 |
+
{ key: "ablation2-2_100k", label: "2.2 HQ+RABC+Δ", series: "s2", color: "#6366f1" },
|
| 76 |
+
{ key: "ablation2-3_100k_q_it", label: "2.3 HQ+mirror", series: "s2", color: "#a78bfa" },
|
| 77 |
+
{ key: "ablation2-4_100k_q", label: "2.4 HQ chunk45", series: "s2", color: "#f97316" },
|
| 78 |
+
{ key: "ablation2-5_0", label: "2.5 HQ+RABC+Δ★", series: "s2", color: "#22d3ee" },
|
| 79 |
];
|
| 80 |
|
| 81 |
// ──────────────────────────────────────────────
|
app/src/content/embeds/folding/statistical-analysis.html
CHANGED
|
@@ -8,7 +8,7 @@
|
|
| 8 |
body{background:transparent;font-family:system-ui,sans-serif;color:#e8eaf0}
|
| 9 |
.wrap{max-width:980px;margin:0 auto;padding:20px 20px 36px}
|
| 10 |
|
| 11 |
-
.card{background:#1a1d27;border:1px solid #2a2d3a;border-radius:6px;overflow:
|
| 12 |
.card-head{padding:9px 14px;border-bottom:1px solid #2a2d3a;font-size:10px;text-transform:uppercase;letter-spacing:.07em;color:#8b8fa8;display:flex;justify-content:space-between;align-items:center;flex-wrap:wrap;gap:8px}
|
| 13 |
.chart-area{padding:16px 16px 10px}
|
| 14 |
svg text{font-family:system-ui,sans-serif}
|
|
@@ -44,7 +44,7 @@ svg text{font-family:system-ui,sans-serif}
|
|
| 44 |
<button class="ctrl-btn" id="v-l2" onclick="setVLevel('L2')">Level 2</button>
|
| 45 |
</div>
|
| 46 |
</div>
|
| 47 |
-
<div class="chart-area"><svg id="svg-violin" width="100%" height="
|
| 48 |
<div class="legend">
|
| 49 |
<div class="li"><div class="lsw" style="background:#f7934f"></div>Series 1</div>
|
| 50 |
<div class="li"><div class="lsw" style="background:#4dc98a"></div>Series 2</div>
|
|
@@ -59,26 +59,26 @@ svg text{font-family:system-ui,sans-serif}
|
|
| 59 |
<script>
|
| 60 |
function _initStatAnalysis() {
|
| 61 |
// ── DATA ──────────────────────────────────────────────────────────────────────
|
| 62 |
-
const EXPS = ['1.1','1.2','1.3','1.4','1.5','1.7','2.1','2.2','2.3','2.4','2.5'];
|
| 63 |
const DATA = {
|
| 64 |
-
'1.1': {total:[8,20], L1:[8,10], L2:[0,10], series:1},
|
| 65 |
-
'1.2': {total:[4,20], L1:[4,10], L2:[0,10], series:1},
|
| 66 |
-
'1.3': {total:[7,20], L1:[7,10], L2:[0,10], series:1},
|
| 67 |
-
'1.4': {total:[3,20], L1:[3,10], L2:[0,10], series:1},
|
| 68 |
-
'1.5': {total:[0,20], L1:[0,10], L2:[0,10], series:1},
|
| 69 |
-
'1.7': {total:[8,20], L1:[8,10], L2:[0,10], series:1},
|
| 70 |
-
'2.1': {total:[8,20], L1:[7,10], L2:[1,10], series:2},
|
| 71 |
-
'2.2': {total:[15,20], L1:[10,10], L2:[5,10], series:2},
|
| 72 |
-
'2.3': {total:[1,20], L1:[0,10], L2:[1,10], series:2},
|
| 73 |
-
'2.4': {total:[4,20], L1:[4,10], L2:[0,10], series:2},
|
| 74 |
-
'2.5': {total:[18,20], L1:[10,10], L2:[8,10], series:2},
|
| 75 |
};
|
| 76 |
|
| 77 |
// CLD assignments (Barnard's exact test, two-sided, Bonferroni α=0.10/55)
|
| 78 |
const CLD = {
|
| 79 |
-
total: {'2.5':'a','2.2':'ab','1.1':'bc','1.7':'bc','2.1':'bc','1.3':'bc','1.2':'c','2.4':'c','1.4':'c','2.3':'c','1.5':'c'},
|
| 80 |
-
L1: {'2.2':'a','2.5':'a','1.1':'ab','1.7':'ab','1.3':'ab','2.1':'ab','1.2':'abc','2.4':'abc','1.4':'bc','1.5':'c','2.3':'c'},
|
| 81 |
-
L2: {'2.5':'a','2.2':'ab','2.1':'b','2.3':'b','1.1':'b','1.2':'b','1.3':'b','1.4':'b','1.5':'b','1.7':'b','2.4':'b'},
|
| 82 |
};
|
| 83 |
|
| 84 |
// ── BETA DISTRIBUTION PDF ────────────────────────────────────────────────────
|
|
@@ -127,11 +127,12 @@ function setVLevel(lv){
|
|
| 127 |
|
| 128 |
function drawViolin(){
|
| 129 |
const svgEl=document.getElementById('svg-violin');
|
| 130 |
-
const W=svgEl.parentElement.clientWidth-32, H=
|
| 131 |
-
const m={top:50,right:16,bottom:
|
| 132 |
const iW=W-m.left-m.right, iH=H-m.top-m.bottom;
|
| 133 |
svgEl.setAttribute('viewBox',`0 0 ${W} ${H}`);
|
| 134 |
-
|
|
|
|
| 135 |
svg.selectAll('*').remove();
|
| 136 |
const g=svg.append('g').attr('transform',`translate(${m.left},${m.top})`);
|
| 137 |
|
|
@@ -219,7 +220,7 @@ function drawViolin(){
|
|
| 219 |
// Axes
|
| 220 |
g.append('g').attr('transform',`translate(0,${iH})`)
|
| 221 |
.call(d3.axisBottom(x).tickSize(0))
|
| 222 |
-
.call(gg=>{gg.select('.domain').attr('stroke',BORDER);gg.selectAll('text').attr('fill',d=>seriesColor(DATA[d].series)).attr('font-size',
|
| 223 |
g.append('g').call(d3.axisLeft(y).ticks(5).tickFormat(d=>Math.round(d*100)+'%').tickSize(3))
|
| 224 |
.call(gg=>{gg.select('.domain').attr('stroke',BORDER);gg.selectAll('text').attr('fill',SUB).attr('font-size',9);gg.selectAll('line').attr('stroke',BORDER)});
|
| 225 |
g.append('text').attr('transform','rotate(-90)').attr('x',-iH/2).attr('y',-30).attr('text-anchor','middle')
|
|
|
|
| 8 |
body{background:transparent;font-family:system-ui,sans-serif;color:#e8eaf0}
|
| 9 |
.wrap{max-width:980px;margin:0 auto;padding:20px 20px 36px}
|
| 10 |
|
| 11 |
+
.card{background:#1a1d27;border:1px solid #2a2d3a;border-radius:6px;overflow:visible;margin-bottom:12px}
|
| 12 |
.card-head{padding:9px 14px;border-bottom:1px solid #2a2d3a;font-size:10px;text-transform:uppercase;letter-spacing:.07em;color:#8b8fa8;display:flex;justify-content:space-between;align-items:center;flex-wrap:wrap;gap:8px}
|
| 13 |
.chart-area{padding:16px 16px 10px}
|
| 14 |
svg text{font-family:system-ui,sans-serif}
|
|
|
|
| 44 |
<button class="ctrl-btn" id="v-l2" onclick="setVLevel('L2')">Level 2</button>
|
| 45 |
</div>
|
| 46 |
</div>
|
| 47 |
+
<div class="chart-area" style="overflow:visible"><svg id="svg-violin" width="100%" height="500" style="overflow:visible"></svg></div>
|
| 48 |
<div class="legend">
|
| 49 |
<div class="li"><div class="lsw" style="background:#f7934f"></div>Series 1</div>
|
| 50 |
<div class="li"><div class="lsw" style="background:#4dc98a"></div>Series 2</div>
|
|
|
|
| 59 |
<script>
|
| 60 |
function _initStatAnalysis() {
|
| 61 |
// ── DATA ──────────────────────────────────────────────────────────────────────
|
| 62 |
+
const EXPS = ['1.1 π0','1.2 π0.5','1.3 ΔActions','1.4 RABC low','1.5 RABC high','1.7 Δ+RABC','2.1 HQ','2.2 HQ+RABC+Δ','2.3 HQ+mirror','2.4 HQ chunk45','2.5 HQ+RABC+Δ★'];
|
| 63 |
const DATA = {
|
| 64 |
+
'1.1 π0': {total:[8,20], L1:[8,10], L2:[0,10], series:1},
|
| 65 |
+
'1.2 π0.5': {total:[4,20], L1:[4,10], L2:[0,10], series:1},
|
| 66 |
+
'1.3 ΔActions': {total:[7,20], L1:[7,10], L2:[0,10], series:1},
|
| 67 |
+
'1.4 RABC low': {total:[3,20], L1:[3,10], L2:[0,10], series:1},
|
| 68 |
+
'1.5 RABC high': {total:[0,20], L1:[0,10], L2:[0,10], series:1},
|
| 69 |
+
'1.7 Δ+RABC': {total:[8,20], L1:[8,10], L2:[0,10], series:1},
|
| 70 |
+
'2.1 HQ': {total:[8,20], L1:[7,10], L2:[1,10], series:2},
|
| 71 |
+
'2.2 HQ+RABC+Δ': {total:[15,20], L1:[10,10], L2:[5,10], series:2},
|
| 72 |
+
'2.3 HQ+mirror': {total:[1,20], L1:[0,10], L2:[1,10], series:2},
|
| 73 |
+
'2.4 HQ chunk45': {total:[4,20], L1:[4,10], L2:[0,10], series:2},
|
| 74 |
+
'2.5 HQ+RABC+Δ★': {total:[18,20], L1:[10,10], L2:[8,10], series:2},
|
| 75 |
};
|
| 76 |
|
| 77 |
// CLD assignments (Barnard's exact test, two-sided, Bonferroni α=0.10/55)
|
| 78 |
const CLD = {
|
| 79 |
+
total: {'2.5 HQ+RABC+Δ★':'a','2.2 HQ+RABC+Δ':'ab','1.1 π0':'bc','1.7 Δ+RABC':'bc','2.1 HQ':'bc','1.3 ΔActions':'bc','1.2 π0.5':'c','2.4 HQ chunk45':'c','1.4 RABC low':'c','2.3 HQ+mirror':'c','1.5 RABC high':'c'},
|
| 80 |
+
L1: {'2.2 HQ+RABC+Δ':'a','2.5 HQ+RABC+Δ★':'a','1.1 π0':'ab','1.7 Δ+RABC':'ab','1.3 ΔActions':'ab','2.1 HQ':'ab','1.2 π0.5':'abc','2.4 HQ chunk45':'abc','1.4 RABC low':'bc','1.5 RABC high':'c','2.3 HQ+mirror':'c'},
|
| 81 |
+
L2: {'2.5 HQ+RABC+Δ★':'a','2.2 HQ+RABC+Δ':'ab','2.1 HQ':'b','2.3 HQ+mirror':'b','1.1 π0':'b','1.2 π0.5':'b','1.3 ΔActions':'b','1.4 RABC low':'b','1.5 RABC high':'b','1.7 Δ+RABC':'b','2.4 HQ chunk45':'b'},
|
| 82 |
};
|
| 83 |
|
| 84 |
// ── BETA DISTRIBUTION PDF ────────────────────────────────────────────────────
|
|
|
|
| 127 |
|
| 128 |
function drawViolin(){
|
| 129 |
const svgEl=document.getElementById('svg-violin');
|
| 130 |
+
const W=svgEl.parentElement.clientWidth-32, H=500;
|
| 131 |
+
const m={top:50,right:16,bottom:80,left:70};
|
| 132 |
const iW=W-m.left-m.right, iH=H-m.top-m.bottom;
|
| 133 |
svgEl.setAttribute('viewBox',`0 0 ${W} ${H}`);
|
| 134 |
+
svgEl.setAttribute('height', H);
|
| 135 |
+
const svg=d3.select('#svg-violin').attr('viewBox',`0 0 ${W} ${H}`).attr('height',H);
|
| 136 |
svg.selectAll('*').remove();
|
| 137 |
const g=svg.append('g').attr('transform',`translate(${m.left},${m.top})`);
|
| 138 |
|
|
|
|
| 220 |
// Axes
|
| 221 |
g.append('g').attr('transform',`translate(0,${iH})`)
|
| 222 |
.call(d3.axisBottom(x).tickSize(0))
|
| 223 |
+
.call(gg=>{gg.select('.domain').attr('stroke',BORDER);gg.selectAll('text').attr('fill',d=>seriesColor(DATA[d].series)).attr('font-size',9).attr('transform','rotate(-40)').attr('text-anchor','end').attr('dx','-0.5em').attr('dy','0.3em')});
|
| 224 |
g.append('g').call(d3.axisLeft(y).ticks(5).tickFormat(d=>Math.round(d*100)+'%').tickSize(3))
|
| 225 |
.call(gg=>{gg.select('.domain').attr('stroke',BORDER);gg.selectAll('text').attr('fill',SUB).attr('font-size',9);gg.selectAll('line').attr('stroke',BORDER)});
|
| 226 |
g.append('text').attr('transform','rotate(-90)').attr('x',-iH/2).attr('y',-30).attr('text-anchor','middle')
|
app/src/content/embeds/folding/subtask-heatmap.html
CHANGED
|
@@ -19,25 +19,9 @@
|
|
| 19 |
.legend-bar { display: flex; align-items: center; gap: 8px; margin-top: 10px; font-size: 11px; color: var(--subtext); justify-content: center; }
|
| 20 |
.legend-gradient { height: 10px; width: 180px; border-radius: 5px; flex-shrink: 0; }
|
| 21 |
|
| 22 |
-
.exp-ref-wrap { margin-bottom: 14px; }
|
| 23 |
-
.exp-ref-toggle { background: none; border: 1px solid #2a2d3a; color: #8b8fa8; font-size: 11px;
|
| 24 |
-
padding: 4px 10px; border-radius: 6px; cursor: pointer; margin-bottom: 8px; }
|
| 25 |
-
.exp-ref-toggle:hover { color: #e8eaf0; border-color: #4f8ef7; }
|
| 26 |
-
.exp-table { width: 100%; border-collapse: collapse; font-size: 11px; }
|
| 27 |
-
.exp-table th { color: #8b8fa8; font-weight: 500; text-align: left; padding: 4px 8px;
|
| 28 |
-
border-bottom: 1px solid #2a2d3a; white-space: nowrap; }
|
| 29 |
-
.exp-table td { color: #c8cad8; padding: 4px 8px; border-bottom: 1px solid #1a1d27; vertical-align: top; }
|
| 30 |
-
.exp-table td:first-child { color: #e8eaf0; font-weight: 600; white-space: nowrap; }
|
| 31 |
-
.exp-table tr.s2 td { background: rgba(247,147,79,0.05); }
|
| 32 |
-
.exp-table tr.s1 td { background: rgba(79,142,247,0.04); }
|
| 33 |
-
.exp-table tr:hover td { background: rgba(255,255,255,0.04); }
|
| 34 |
</style>
|
| 35 |
</head>
|
| 36 |
<body>
|
| 37 |
-
<div class="exp-ref-wrap">
|
| 38 |
-
<button class="exp-ref-toggle" onclick="var t=document.getElementById('exp-ref');t.style.display=t.style.display==='none'?'':'none';this.textContent=t.style.display==='none'?'▶ Show experiment descriptions':'▼ Hide experiment descriptions'">▼ Hide experiment descriptions</button>
|
| 39 |
-
<div id="exp-ref"></div>
|
| 40 |
-
</div>
|
| 41 |
<div style="position:relative">
|
| 42 |
<svg id="hm-chart"></svg>
|
| 43 |
<div class="tooltip" id="hm-tooltip"></div>
|
|
@@ -50,17 +34,17 @@
|
|
| 50 |
<script>
|
| 51 |
function _initSubtaskHeatmap() {
|
| 52 |
const rawData = [
|
| 53 |
-
{label:"1.1",series:"1",total_sr:40, times:[null, 19.2, 42.22, 14.33, 19.88, 27.25]},
|
| 54 |
-
{label:"1.2",series:"1",total_sr:20, times:[50, 39.27, 41.5, 12.3, 13.75, 10.75]},
|
| 55 |
-
{label:"1.3",series:"1",total_sr:35, times:[null, 19.5, 44.2, 14.8, 30.33, 22.14]},
|
| 56 |
-
{label:"1.4",series:"1",total_sr:15, times:[null, 20.8, 36.62, 10.0, 18.8, 12.67]},
|
| 57 |
-
{label:"1.5",series:"1",total_sr:0, times:[240, 21.4, 100.0, null, null, null ]},
|
| 58 |
-
{label:"1.7",series:"1",total_sr:40, times:[157.5,19.33, 32.64, 8.9, 11.0, 23.38]},
|
| 59 |
-
{label:"2.1",series:"2",total_sr:40, times:[77.5, 11.08, 21.09, 5.45, 5.5, 11.5 ]},
|
| 60 |
-
{label:"2.2",series:"2",total_sr:75, times:[34.33,6.25, 12.31, 3.75, 5.31, 8.93 ]},
|
| 61 |
-
{label:"2.3",series:"2",total_sr:5, times:[49, 14.0, 23.71, 17.5, 11.0, 4.0 ]},
|
| 62 |
-
{label:"2.4",series:"2",total_sr:20, times:[120, 10.09, 41.18, 7.89, 7.33, 10.0 ]},
|
| 63 |
-
{label:"2.5",series:"2",total_sr:90, times:[62.25,8.28, 12.0, 5.28, 5.22, 6.83 ]},
|
| 64 |
];
|
| 65 |
|
| 66 |
// Sort rows: best → worst by total_sr (heatmap: top = best)
|
|
@@ -87,7 +71,7 @@ const canvas = document.getElementById("lgd");
|
|
| 87 |
const ctx = canvas.getContext("2d");
|
| 88 |
for (let i=0; i<180; i++) { ctx.fillStyle=colorScale(i/180*120); ctx.fillRect(i,0,1,10); }
|
| 89 |
|
| 90 |
-
const margin = {top:12, right:16, bottom:36, left:
|
| 91 |
const svg = d3.select("#hm-chart");
|
| 92 |
const container = svg.node().parentElement;
|
| 93 |
const tooltip = d3.select("#hm-tooltip");
|
|
@@ -96,7 +80,7 @@ function render() {
|
|
| 96 |
svg.selectAll("*").remove();
|
| 97 |
const W = container.clientWidth;
|
| 98 |
const cellW = Math.floor((W - margin.left - margin.right) / subtasks.length);
|
| 99 |
-
const cellH = Math.max(
|
| 100 |
const H = data.length * cellH + margin.top + margin.bottom;
|
| 101 |
svg.attr("width",W).attr("height",H);
|
| 102 |
const g = svg.append("g").attr("transform",`translate(${margin.left},${margin.top})`);
|
|
@@ -118,16 +102,14 @@ function render() {
|
|
| 118 |
.attr("fill",seriesColor(d.series)).attr("opacity",0.9);
|
| 119 |
|
| 120 |
g.append("text")
|
| 121 |
-
.attr("x",-8).attr("y",ri*cellH+cellH/2
|
| 122 |
-
.attr("text-anchor","end").attr("fill","#e8eaf0").attr("font-size",
|
| 123 |
.text(d.label);
|
| 124 |
|
| 125 |
-
// total SR badge
|
| 126 |
g.append("text")
|
| 127 |
-
.attr("x",-8).attr("y",ri*cellH+cellH/2+
|
| 128 |
-
.attr("text-anchor","
|
| 129 |
-
.
|
| 130 |
-
.text(d.total_sr+"%");
|
| 131 |
});
|
| 132 |
|
| 133 |
// Cells
|
|
@@ -177,34 +159,20 @@ render();
|
|
| 177 |
window.addEventListener("resize", render);
|
| 178 |
|
| 179 |
const EXPERIMENTS = {
|
| 180 |
-
"1.1": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
|
| 181 |
-
"1.2": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
|
| 182 |
-
"1.3": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
|
| 183 |
-
"1.4": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
|
| 184 |
-
"1.5": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
|
| 185 |
-
"1.7": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
|
| 186 |
-
"2.1": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
|
| 187 |
-
"2.2": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
|
| 188 |
-
"2.3": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
|
| 189 |
-
"2.4": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
|
| 190 |
-
"2.5": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
|
| 191 |
};
|
| 192 |
|
| 193 |
|
| 194 |
-
(function buildRefTable() {
|
| 195 |
-
const container = document.getElementById('exp-ref');
|
| 196 |
-
if (!container) return;
|
| 197 |
-
const order = ["1.1","1.2","1.3","1.4","1.5","1.7","2.1","2.2","2.3","2.4","2.5"];
|
| 198 |
-
let html = '<table class="exp-table"><thead><tr><th>#</th><th>Description</th></tr></thead><tbody>';
|
| 199 |
-
order.forEach(k => {
|
| 200 |
-
const a = EXPERIMENTS[k];
|
| 201 |
-
const series = k.startsWith("2") ? "s2" : "s1";
|
| 202 |
-
html += `<tr class="${series}"><td><strong>${k}</strong></td><td>${a.desc}</td></tr>`;
|
| 203 |
-
});
|
| 204 |
-
html += '</tbody></table>';
|
| 205 |
-
container.innerHTML = html;
|
| 206 |
-
})();
|
| 207 |
-
|
| 208 |
}
|
| 209 |
|
| 210 |
if (typeof d3 !== "undefined") {
|
|
|
|
| 19 |
.legend-bar { display: flex; align-items: center; gap: 8px; margin-top: 10px; font-size: 11px; color: var(--subtext); justify-content: center; }
|
| 20 |
.legend-gradient { height: 10px; width: 180px; border-radius: 5px; flex-shrink: 0; }
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
</style>
|
| 23 |
</head>
|
| 24 |
<body>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
<div style="position:relative">
|
| 26 |
<svg id="hm-chart"></svg>
|
| 27 |
<div class="tooltip" id="hm-tooltip"></div>
|
|
|
|
| 34 |
<script>
|
| 35 |
function _initSubtaskHeatmap() {
|
| 36 |
const rawData = [
|
| 37 |
+
{label:"1.1 π0",series:"1",total_sr:40, times:[null, 19.2, 42.22, 14.33, 19.88, 27.25]},
|
| 38 |
+
{label:"1.2 π0.5",series:"1",total_sr:20, times:[50, 39.27, 41.5, 12.3, 13.75, 10.75]},
|
| 39 |
+
{label:"1.3 ΔActions",series:"1",total_sr:35, times:[null, 19.5, 44.2, 14.8, 30.33, 22.14]},
|
| 40 |
+
{label:"1.4 RABC low",series:"1",total_sr:15, times:[null, 20.8, 36.62, 10.0, 18.8, 12.67]},
|
| 41 |
+
{label:"1.5 RABC high",series:"1",total_sr:0, times:[240, 21.4, 100.0, null, null, null ]},
|
| 42 |
+
{label:"1.7 Δ+RABC",series:"1",total_sr:40, times:[157.5,19.33, 32.64, 8.9, 11.0, 23.38]},
|
| 43 |
+
{label:"2.1 HQ",series:"2",total_sr:40, times:[77.5, 11.08, 21.09, 5.45, 5.5, 11.5 ]},
|
| 44 |
+
{label:"2.2 HQ+RABC+Δ",series:"2",total_sr:75, times:[34.33,6.25, 12.31, 3.75, 5.31, 8.93 ]},
|
| 45 |
+
{label:"2.3 HQ+mirror",series:"2",total_sr:5, times:[49, 14.0, 23.71, 17.5, 11.0, 4.0 ]},
|
| 46 |
+
{label:"2.4 HQ chunk45",series:"2",total_sr:20, times:[120, 10.09, 41.18, 7.89, 7.33, 10.0 ]},
|
| 47 |
+
{label:"2.5 HQ+RABC+Δ★",series:"2",total_sr:90, times:[62.25,8.28, 12.0, 5.28, 5.22, 6.83 ]},
|
| 48 |
];
|
| 49 |
|
| 50 |
// Sort rows: best → worst by total_sr (heatmap: top = best)
|
|
|
|
| 71 |
const ctx = canvas.getContext("2d");
|
| 72 |
for (let i=0; i<180; i++) { ctx.fillStyle=colorScale(i/180*120); ctx.fillRect(i,0,1,10); }
|
| 73 |
|
| 74 |
+
const margin = {top:12, right:16, bottom:36, left:120};
|
| 75 |
const svg = d3.select("#hm-chart");
|
| 76 |
const container = svg.node().parentElement;
|
| 77 |
const tooltip = d3.select("#hm-tooltip");
|
|
|
|
| 80 |
svg.selectAll("*").remove();
|
| 81 |
const W = container.clientWidth;
|
| 82 |
const cellW = Math.floor((W - margin.left - margin.right) / subtasks.length);
|
| 83 |
+
const cellH = Math.max(34, Math.min(44, cellW * 0.7));
|
| 84 |
const H = data.length * cellH + margin.top + margin.bottom;
|
| 85 |
svg.attr("width",W).attr("height",H);
|
| 86 |
const g = svg.append("g").attr("transform",`translate(${margin.left},${margin.top})`);
|
|
|
|
| 102 |
.attr("fill",seriesColor(d.series)).attr("opacity",0.9);
|
| 103 |
|
| 104 |
g.append("text")
|
| 105 |
+
.attr("x",-8).attr("y",ri*cellH+cellH/2)
|
| 106 |
+
.attr("text-anchor","end").attr("fill","#e8eaf0").attr("font-size",10).attr("font-weight","500")
|
| 107 |
.text(d.label);
|
| 108 |
|
|
|
|
| 109 |
g.append("text")
|
| 110 |
+
.attr("x",-8).attr("y",ri*cellH+cellH/2+11)
|
| 111 |
+
.attr("text-anchor","end").attr("fill","#8b8fa8").attr("font-size",8)
|
| 112 |
+
.text(d.total_sr+"% SR");
|
|
|
|
| 113 |
});
|
| 114 |
|
| 115 |
// Cells
|
|
|
|
| 159 |
window.addEventListener("resize", render);
|
| 160 |
|
| 161 |
const EXPERIMENTS = {
|
| 162 |
+
"1.1 π0": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
|
| 163 |
+
"1.2 π0.5": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
|
| 164 |
+
"1.3 ΔActions": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
|
| 165 |
+
"1.4 RABC low": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
|
| 166 |
+
"1.5 RABC high": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
|
| 167 |
+
"1.7 Δ+RABC": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
|
| 168 |
+
"2.1 HQ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
|
| 169 |
+
"2.2 HQ+RABC+Δ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
|
| 170 |
+
"2.3 HQ+mirror": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
|
| 171 |
+
"2.4 HQ chunk45": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
|
| 172 |
+
"2.5 HQ+RABC+Δ★": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
|
| 173 |
};
|
| 174 |
|
| 175 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
}
|
| 177 |
|
| 178 |
if (typeof d3 !== "undefined") {
|
app/src/content/embeds/folding/success-rates.html
CHANGED
|
@@ -61,19 +61,6 @@
|
|
| 61 |
.tooltip-ci { font-size: 10px; color: #555; margin-left: 4px; }
|
| 62 |
.tooltip-note { margin-top: 7px; padding-top: 7px; border-top: 1px solid #2a2d3a; font-size: 11px; color: #8b8fa8; line-height: 1.5; }
|
| 63 |
|
| 64 |
-
/* ── Ablation table ── */
|
| 65 |
-
.abl-ref-wrap { margin-bottom: 14px; }
|
| 66 |
-
.abl-ref-toggle { background: none; border: 1px solid #2a2d3a; color: #8b8fa8; font-size: 11px;
|
| 67 |
-
padding: 4px 10px; border-radius: 6px; cursor: pointer; margin-bottom: 8px; }
|
| 68 |
-
.abl-ref-toggle:hover { color: #e8eaf0; border-color: #4f8ef7; }
|
| 69 |
-
.abl-table { width: 100%; border-collapse: collapse; font-size: 11px; }
|
| 70 |
-
.abl-table th { color: #8b8fa8; font-weight: 500; text-align: left; padding: 4px 8px; border-bottom: 1px solid #2a2d3a; white-space: nowrap; }
|
| 71 |
-
.abl-table td { color: #c8cad8; padding: 4px 8px; border-bottom: 1px solid #1a1d27; vertical-align: top; }
|
| 72 |
-
.abl-table td:first-child { color: #e8eaf0; font-weight: 600; white-space: nowrap; }
|
| 73 |
-
.abl-table tr.s2 td { background: rgba(247,147,79,0.05); }
|
| 74 |
-
.abl-table tr.s1 td { background: rgba(79,142,247,0.04); }
|
| 75 |
-
.abl-table tr:hover td { background: rgba(255,255,255,0.04); }
|
| 76 |
-
|
| 77 |
/* ── Chart ── */
|
| 78 |
.axis text { fill: var(--subtext); font-size: 12px; }
|
| 79 |
.axis line, .axis path { stroke: var(--grid); }
|
|
@@ -117,14 +104,8 @@
|
|
| 117 |
</span>
|
| 118 |
</div>
|
| 119 |
|
| 120 |
-
<!-- Experiment reference table -->
|
| 121 |
-
<div class="abl-ref-wrap">
|
| 122 |
-
<button class="abl-ref-toggle" onclick="var t=document.getElementById('sr-abl-ref');t.style.display=t.style.display==='none'?'':'none';this.textContent=t.style.display==='none'?'▶ Show experiment descriptions':'▼ Hide experiment descriptions'">▼ Hide experiment descriptions</button>
|
| 123 |
-
<div id="sr-abl-ref"></div>
|
| 124 |
-
</div>
|
| 125 |
-
|
| 126 |
<div style="position:relative">
|
| 127 |
-
<svg id="sr-chart"></svg>
|
| 128 |
<div class="tooltip" id="sr-tooltip"></div>
|
| 129 |
</div>
|
| 130 |
|
|
@@ -132,17 +113,17 @@
|
|
| 132 |
function _initSuccessRates() {
|
| 133 |
// ── Experiment metadata ───────────────────────────────────────────────────
|
| 134 |
const EXPERIMENTS = {
|
| 135 |
-
"1.1": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"π0 base model fine-tuned on full dataset. Default normalization." },
|
| 136 |
-
"1.2": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to π0.5 with MEAN_STD normalization." },
|
| 137 |
-
"1.3": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds delta actions and switches to QUANTILES normalization." },
|
| 138 |
-
"1.4": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"RABC with low κ — not very selective. MEAN_STD norm, no delta actions." },
|
| 139 |
-
"1.5": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"RABC with higher κ (mean + ½σ). MEAN_STD norm, no delta actions." },
|
| 140 |
-
"1.7": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
|
| 141 |
-
"2.1": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data. No RABC." },
|
| 142 |
-
"2.2": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
|
| 143 |
-
"2.3": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments HQ data with mirrored trajectories." },
|
| 144 |
-
"2.4": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Larger action chunk size (45 vs default 30)." },
|
| 145 |
-
"2.5": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
|
| 146 |
};
|
| 147 |
|
| 148 |
// ── Raw data ───────────────────────────────────────────────────────────────
|
|
@@ -150,17 +131,17 @@ const EXPERIMENTS = {
|
|
| 150 |
const N = { total: 20, l1: 10, l2: 10 };
|
| 151 |
|
| 152 |
const raw = [
|
| 153 |
-
{label:"1.1", series:"1", total:40, l1:80, l2:0 },
|
| 154 |
-
{label:"1.2", series:"1", total:20, l1:40, l2:0 },
|
| 155 |
-
{label:"1.3", series:"1", total:35, l1:70, l2:0 },
|
| 156 |
-
{label:"1.4", series:"1", total:15, l1:30, l2:0 },
|
| 157 |
-
{label:"1.5", series:"1", total:0, l1:0, l2:0 },
|
| 158 |
-
{label:"1.7", series:"1", total:40, l1:80, l2:0 },
|
| 159 |
-
{label:"2.1", series:"2", total:40, l1:70, l2:10},
|
| 160 |
-
{label:"2.2", series:"2", total:75, l1:100, l2:50},
|
| 161 |
-
{label:"2.3", series:"2", total:5, l1:0, l2:10},
|
| 162 |
-
{label:"2.4", series:"2", total:20, l1:40, l2:0 },
|
| 163 |
-
{label:"2.5", series:"2", total:90, l1:100, l2:80},
|
| 164 |
];
|
| 165 |
|
| 166 |
// ── Wilson 90% CI ──────────────────────────────────────────────────────────
|
|
@@ -218,7 +199,7 @@ function getSorted() {
|
|
| 218 |
}
|
| 219 |
|
| 220 |
// ── Render ─────────────────────────────────────────────────────────────────
|
| 221 |
-
const margin = {top:28, right:20, bottom:
|
| 222 |
const svg = d3.select("#sr-chart");
|
| 223 |
const container = svg.node().parentElement;
|
| 224 |
const tooltip = d3.select("#sr-tooltip");
|
|
@@ -229,7 +210,7 @@ function render() {
|
|
| 229 |
const activeKeys = ["total","l1","l2"].filter(k => active[k]);
|
| 230 |
|
| 231 |
const W = container.clientWidth;
|
| 232 |
-
const H = Math.max(
|
| 233 |
const w = W - margin.left - margin.right;
|
| 234 |
const h = H - margin.top - margin.bottom;
|
| 235 |
svg.attr("width", W).attr("height", H);
|
|
@@ -245,7 +226,8 @@ function render() {
|
|
| 245 |
|
| 246 |
// Axes
|
| 247 |
g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
|
| 248 |
-
d3.axisBottom(x0).tickSize(0))
|
|
|
|
| 249 |
g.append("g").attr("class","axis").call(
|
| 250 |
d3.axisLeft(y).tickValues([0,25,50,75,100]).tickFormat(d=>d+"%").tickSize(0))
|
| 251 |
.call(ax=>ax.select(".domain").remove())
|
|
@@ -255,7 +237,7 @@ function render() {
|
|
| 255 |
sortedData.forEach(d => {
|
| 256 |
g.append("rect")
|
| 257 |
.attr("x", x0(d.label)).attr("width", x0.bandwidth())
|
| 258 |
-
.attr("y", h+
|
| 259 |
.attr("fill", seriesColor(d.series)).attr("opacity", 0.8);
|
| 260 |
});
|
| 261 |
|
|
@@ -350,19 +332,6 @@ function render() {
|
|
| 350 |
.text(`sorted: best → worst by ${skLabel[sk]}`);
|
| 351 |
}
|
| 352 |
|
| 353 |
-
// ── Reference table ────────────────────────────────────────────────────────
|
| 354 |
-
(function() {
|
| 355 |
-
const el = document.getElementById("sr-abl-ref");
|
| 356 |
-
const order = ["1.1","1.2","1.3","1.4","1.5","1.7","2.1","2.2","2.3","2.4","2.5"];
|
| 357 |
-
let html = '<table class="abl-table"><thead><tr><th>#</th><th>Description</th></tr></thead><tbody>';
|
| 358 |
-
order.forEach(k => {
|
| 359 |
-
const e = EXPERIMENTS[k], cls = k.startsWith("2") ? "s2" : "s1";
|
| 360 |
-
html += `<tr class="${cls}"><td><strong>${k}</strong></td><td>${e.desc}</td></tr>`;
|
| 361 |
-
});
|
| 362 |
-
html += "</tbody></table>";
|
| 363 |
-
el.innerHTML = html;
|
| 364 |
-
})();
|
| 365 |
-
|
| 366 |
render();
|
| 367 |
window.addEventListener("resize", render);
|
| 368 |
}
|
|
|
|
| 61 |
.tooltip-ci { font-size: 10px; color: #555; margin-left: 4px; }
|
| 62 |
.tooltip-note { margin-top: 7px; padding-top: 7px; border-top: 1px solid #2a2d3a; font-size: 11px; color: #8b8fa8; line-height: 1.5; }
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
/* ── Chart ── */
|
| 65 |
.axis text { fill: var(--subtext); font-size: 12px; }
|
| 66 |
.axis line, .axis path { stroke: var(--grid); }
|
|
|
|
| 104 |
</span>
|
| 105 |
</div>
|
| 106 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
<div style="position:relative">
|
| 108 |
+
<svg id="sr-chart" style="overflow:visible"></svg>
|
| 109 |
<div class="tooltip" id="sr-tooltip"></div>
|
| 110 |
</div>
|
| 111 |
|
|
|
|
| 113 |
function _initSuccessRates() {
|
| 114 |
// ── Experiment metadata ───────────────────────────────────────────────────
|
| 115 |
const EXPERIMENTS = {
|
| 116 |
+
"1.1 π0": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"π0 base model fine-tuned on full dataset. Default normalization." },
|
| 117 |
+
"1.2 π0.5": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to π0.5 with MEAN_STD normalization." },
|
| 118 |
+
"1.3 ΔActions": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds delta actions and switches to QUANTILES normalization." },
|
| 119 |
+
"1.4 RABC low": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"RABC with low κ — not very selective. MEAN_STD norm, no delta actions." },
|
| 120 |
+
"1.5 RABC high": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"RABC with higher κ (mean + ½σ). MEAN_STD norm, no delta actions." },
|
| 121 |
+
"1.7 Δ+RABC": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
|
| 122 |
+
"2.1 HQ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data. No RABC." },
|
| 123 |
+
"2.2 HQ+RABC+Δ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
|
| 124 |
+
"2.3 HQ+mirror": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments HQ data with mirrored trajectories." },
|
| 125 |
+
"2.4 HQ chunk45": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Larger action chunk size (45 vs default 30)." },
|
| 126 |
+
"2.5 HQ+RABC+Δ★": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
|
| 127 |
};
|
| 128 |
|
| 129 |
// ── Raw data ───────────────────────────────────────────────────────────────
|
|
|
|
| 131 |
const N = { total: 20, l1: 10, l2: 10 };
|
| 132 |
|
| 133 |
const raw = [
|
| 134 |
+
{label:"1.1 π0", series:"1", total:40, l1:80, l2:0 },
|
| 135 |
+
{label:"1.2 π0.5", series:"1", total:20, l1:40, l2:0 },
|
| 136 |
+
{label:"1.3 ΔActions", series:"1", total:35, l1:70, l2:0 },
|
| 137 |
+
{label:"1.4 RABC low", series:"1", total:15, l1:30, l2:0 },
|
| 138 |
+
{label:"1.5 RABC high", series:"1", total:0, l1:0, l2:0 },
|
| 139 |
+
{label:"1.7 Δ+RABC", series:"1", total:40, l1:80, l2:0 },
|
| 140 |
+
{label:"2.1 HQ", series:"2", total:40, l1:70, l2:10},
|
| 141 |
+
{label:"2.2 HQ+RABC+Δ", series:"2", total:75, l1:100, l2:50},
|
| 142 |
+
{label:"2.3 HQ+mirror", series:"2", total:5, l1:0, l2:10},
|
| 143 |
+
{label:"2.4 HQ chunk45", series:"2", total:20, l1:40, l2:0 },
|
| 144 |
+
{label:"2.5 HQ+RABC+Δ★", series:"2", total:90, l1:100, l2:80},
|
| 145 |
];
|
| 146 |
|
| 147 |
// ── Wilson 90% CI ──────────────────────────────────────────────────────────
|
|
|
|
| 199 |
}
|
| 200 |
|
| 201 |
// ── Render ─────────────────────────────────────────────────────────────────
|
| 202 |
+
const margin = {top:28, right:20, bottom:80, left:80};
|
| 203 |
const svg = d3.select("#sr-chart");
|
| 204 |
const container = svg.node().parentElement;
|
| 205 |
const tooltip = d3.select("#sr-tooltip");
|
|
|
|
| 210 |
const activeKeys = ["total","l1","l2"].filter(k => active[k]);
|
| 211 |
|
| 212 |
const W = container.clientWidth;
|
| 213 |
+
const H = Math.max(300, Math.min(400, W * 0.50));
|
| 214 |
const w = W - margin.left - margin.right;
|
| 215 |
const h = H - margin.top - margin.bottom;
|
| 216 |
svg.attr("width", W).attr("height", H);
|
|
|
|
| 226 |
|
| 227 |
// Axes
|
| 228 |
g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
|
| 229 |
+
d3.axisBottom(x0).tickSize(0))
|
| 230 |
+
.call(gg=>{gg.select(".domain").remove();gg.selectAll("text").attr("transform","rotate(-40)").attr("text-anchor","end").attr("dx","-0.5em").attr("dy","0.3em").attr("font-size",9)});
|
| 231 |
g.append("g").attr("class","axis").call(
|
| 232 |
d3.axisLeft(y).tickValues([0,25,50,75,100]).tickFormat(d=>d+"%").tickSize(0))
|
| 233 |
.call(ax=>ax.select(".domain").remove())
|
|
|
|
| 237 |
sortedData.forEach(d => {
|
| 238 |
g.append("rect")
|
| 239 |
.attr("x", x0(d.label)).attr("width", x0.bandwidth())
|
| 240 |
+
.attr("y", h+60).attr("height", 4).attr("rx", 2)
|
| 241 |
.attr("fill", seriesColor(d.series)).attr("opacity", 0.8);
|
| 242 |
});
|
| 243 |
|
|
|
|
| 332 |
.text(`sorted: best → worst by ${skLabel[sk]}`);
|
| 333 |
}
|
| 334 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 335 |
render();
|
| 336 |
window.addEventListener("resize", render);
|
| 337 |
}
|
app/src/content/embeds/folding/total-score.html
CHANGED
|
@@ -8,19 +8,6 @@
|
|
| 8 |
* { box-sizing: border-box; margin: 0; padding: 0; }
|
| 9 |
body { background: var(--bg); font-family: system-ui, sans-serif; color: var(--text); }
|
| 10 |
|
| 11 |
-
.exp-ref-wrap { margin-bottom: 14px; }
|
| 12 |
-
.exp-ref-toggle { background: none; border: 1px solid #2a2d3a; color: #8b8fa8; font-size: 11px;
|
| 13 |
-
padding: 4px 10px; border-radius: 6px; cursor: pointer; margin-bottom: 8px; }
|
| 14 |
-
.exp-ref-toggle:hover { color: #e8eaf0; border-color: #4f8ef7; }
|
| 15 |
-
.exp-table { width: 100%; border-collapse: collapse; font-size: 11px; }
|
| 16 |
-
.exp-table th { color: #8b8fa8; font-weight: 500; text-align: left; padding: 4px 8px;
|
| 17 |
-
border-bottom: 1px solid #2a2d3a; white-space: nowrap; }
|
| 18 |
-
.exp-table td { color: #c8cad8; padding: 4px 8px; border-bottom: 1px solid #1a1d27; vertical-align: top; }
|
| 19 |
-
.exp-table td:first-child { color: #e8eaf0; font-weight: 600; white-space: nowrap; }
|
| 20 |
-
.exp-table tr.s2 td { background: rgba(247,147,79,0.05); }
|
| 21 |
-
.exp-table tr.s1 td { background: rgba(79,142,247,0.04); }
|
| 22 |
-
.exp-table tr:hover td { background: rgba(255,255,255,0.04); }
|
| 23 |
-
|
| 24 |
.axis text { fill: var(--subtext); font-size: 11px; }
|
| 25 |
.axis line, .axis path { stroke: var(--grid); }
|
| 26 |
.grid line { stroke: var(--grid); stroke-dasharray: 3,3; }
|
|
@@ -37,28 +24,24 @@
|
|
| 37 |
</style>
|
| 38 |
</head>
|
| 39 |
<body>
|
| 40 |
-
<div class="exp-ref-wrap">
|
| 41 |
-
<button class="exp-ref-toggle" onclick="var t=document.getElementById('ts-exp-ref');t.style.display=t.style.display==='none'?'':'none';this.textContent=t.style.display==='none'?'▶ Show experiment descriptions':'▼ Hide experiment descriptions'">▼ Hide experiment descriptions</button>
|
| 42 |
-
<div id="ts-exp-ref"></div>
|
| 43 |
-
</div>
|
| 44 |
<div style="position:relative">
|
| 45 |
-
<svg id="ts-chart"></svg>
|
| 46 |
<div class="tooltip" id="ts-tooltip"></div>
|
| 47 |
</div>
|
| 48 |
<script>
|
| 49 |
function _initTotalScore() {
|
| 50 |
const raw = [
|
| 51 |
-
{label:"1.1",series:"1",score:440, pct:29.3,total_sr:40},
|
| 52 |
-
{label:"1.2",series:"1",score:480, pct:32.0,total_sr:20},
|
| 53 |
-
{label:"1.3",series:"1",score:460, pct:30.7,total_sr:35},
|
| 54 |
-
{label:"1.4",series:"1",score:330, pct:22.0,total_sr:15},
|
| 55 |
-
{label:"1.5",series:"1",score:170, pct:11.3,total_sr:0 },
|
| 56 |
-
{label:"1.7",series:"1",score:600, pct:40.0,total_sr:40},
|
| 57 |
-
{label:"2.1",series:"2",score:620, pct:41.3,total_sr:40},
|
| 58 |
-
{label:"2.2",series:"2",score:1090,pct:72.7,total_sr:75},
|
| 59 |
-
{label:"2.3",series:"2",score:310, pct:20.7,total_sr:5 },
|
| 60 |
-
{label:"2.4",series:"2",score:460, pct:30.7,total_sr:20},
|
| 61 |
-
{label:"2.5",series:"2",score:1300,pct:86.7,total_sr:90},
|
| 62 |
];
|
| 63 |
|
| 64 |
// Sort highest → lowest score %
|
|
@@ -69,7 +52,7 @@ const seriesColor = s => s === "2" ? "#f7934f" : "#4f8ef7";
|
|
| 69 |
const perfColor = d3.scaleSequential().domain([0,100])
|
| 70 |
.interpolator(d3.interpolateRgbBasis(["#f87171","#fbbf24","#4dc98a"]));
|
| 71 |
|
| 72 |
-
const margin = {top:28, right:20, bottom:
|
| 73 |
const svg = d3.select("#ts-chart");
|
| 74 |
const container = svg.node().parentElement;
|
| 75 |
const tooltip = d3.select("#ts-tooltip");
|
|
@@ -77,7 +60,7 @@ const tooltip = d3.select("#ts-tooltip");
|
|
| 77 |
function render() {
|
| 78 |
svg.selectAll("*").remove();
|
| 79 |
const W = container.clientWidth;
|
| 80 |
-
const H = Math.max(
|
| 81 |
const w = W - margin.left - margin.right;
|
| 82 |
const h = H - margin.top - margin.bottom;
|
| 83 |
svg.attr("width",W).attr("height",H);
|
|
@@ -96,7 +79,8 @@ function render() {
|
|
| 96 |
g.append("text").attr("x",w+3).attr("y",y(50)+4).attr("fill","#fbbf24").attr("font-size",9).text("50%");
|
| 97 |
|
| 98 |
g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
|
| 99 |
-
d3.axisBottom(x).tickSize(0))
|
|
|
|
| 100 |
g.append("g").attr("class","axis").call(
|
| 101 |
d3.axisLeft(y).ticks(5).tickFormat(d=>d+"%").tickSize(0))
|
| 102 |
.call(ax=>ax.select(".domain").remove())
|
|
@@ -106,7 +90,7 @@ function render() {
|
|
| 106 |
data.forEach(d => {
|
| 107 |
g.append("rect")
|
| 108 |
.attr("x",x(d.label)).attr("width",x.bandwidth())
|
| 109 |
-
.attr("y",h+
|
| 110 |
.attr("fill",seriesColor(d.series)).attr("opacity",0.8);
|
| 111 |
});
|
| 112 |
|
|
@@ -152,6 +136,17 @@ function render() {
|
|
| 152 |
.text(d.pct+"%");
|
| 153 |
});
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
g.append("text").attr("x",w).attr("y",-12).attr("text-anchor","end")
|
| 156 |
.attr("fill","#8b8fa8").attr("font-size",10)
|
| 157 |
.text("sorted: highest → lowest score %");
|
|
@@ -161,34 +156,20 @@ render();
|
|
| 161 |
window.addEventListener("resize", render);
|
| 162 |
|
| 163 |
const EXPERIMENTS = {
|
| 164 |
-
"1.1": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
|
| 165 |
-
"1.2": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
|
| 166 |
-
"1.3": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
|
| 167 |
-
"1.4": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
|
| 168 |
-
"1.5": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
|
| 169 |
-
"1.7": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
|
| 170 |
-
"2.1": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
|
| 171 |
-
"2.2": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
|
| 172 |
-
"2.3": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
|
| 173 |
-
"2.4": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
|
| 174 |
-
"2.5": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
|
| 175 |
};
|
| 176 |
|
| 177 |
|
| 178 |
-
(function buildRefTable() {
|
| 179 |
-
const container = document.getElementById('ts-exp-ref');
|
| 180 |
-
if (!container) return;
|
| 181 |
-
const order = ["1.1","1.2","1.3","1.4","1.5","1.7","2.1","2.2","2.3","2.4","2.5"];
|
| 182 |
-
let html = '<table class="exp-table"><thead><tr><th>#</th><th>Description</th></tr></thead><tbody>';
|
| 183 |
-
order.forEach(k => {
|
| 184 |
-
const a = EXPERIMENTS[k];
|
| 185 |
-
const series = k.startsWith("2") ? "s2" : "s1";
|
| 186 |
-
html += `<tr class="${series}"><td><strong>${k}</strong></td><td>${a.desc}</td></tr>`;
|
| 187 |
-
});
|
| 188 |
-
html += '</tbody></table>';
|
| 189 |
-
container.innerHTML = html;
|
| 190 |
-
})();
|
| 191 |
-
|
| 192 |
}
|
| 193 |
|
| 194 |
if (typeof d3 !== "undefined") {
|
|
|
|
| 8 |
* { box-sizing: border-box; margin: 0; padding: 0; }
|
| 9 |
body { background: var(--bg); font-family: system-ui, sans-serif; color: var(--text); }
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
.axis text { fill: var(--subtext); font-size: 11px; }
|
| 12 |
.axis line, .axis path { stroke: var(--grid); }
|
| 13 |
.grid line { stroke: var(--grid); stroke-dasharray: 3,3; }
|
|
|
|
| 24 |
</style>
|
| 25 |
</head>
|
| 26 |
<body>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
<div style="position:relative">
|
| 28 |
+
<svg id="ts-chart" style="overflow:visible"></svg>
|
| 29 |
<div class="tooltip" id="ts-tooltip"></div>
|
| 30 |
</div>
|
| 31 |
<script>
|
| 32 |
function _initTotalScore() {
|
| 33 |
const raw = [
|
| 34 |
+
{label:"1.1 π0",series:"1",score:440, pct:29.3,total_sr:40},
|
| 35 |
+
{label:"1.2 π0.5",series:"1",score:480, pct:32.0,total_sr:20},
|
| 36 |
+
{label:"1.3 ΔActions",series:"1",score:460, pct:30.7,total_sr:35},
|
| 37 |
+
{label:"1.4 RABC low",series:"1",score:330, pct:22.0,total_sr:15},
|
| 38 |
+
{label:"1.5 RABC high",series:"1",score:170, pct:11.3,total_sr:0 },
|
| 39 |
+
{label:"1.7 Δ+RABC",series:"1",score:600, pct:40.0,total_sr:40},
|
| 40 |
+
{label:"2.1 HQ",series:"2",score:620, pct:41.3,total_sr:40},
|
| 41 |
+
{label:"2.2 HQ+RABC+Δ",series:"2",score:1090,pct:72.7,total_sr:75},
|
| 42 |
+
{label:"2.3 HQ+mirror",series:"2",score:310, pct:20.7,total_sr:5 },
|
| 43 |
+
{label:"2.4 HQ chunk45",series:"2",score:460, pct:30.7,total_sr:20},
|
| 44 |
+
{label:"2.5 HQ+RABC+Δ★",series:"2",score:1300,pct:86.7,total_sr:90},
|
| 45 |
];
|
| 46 |
|
| 47 |
// Sort highest → lowest score %
|
|
|
|
| 52 |
const perfColor = d3.scaleSequential().domain([0,100])
|
| 53 |
.interpolator(d3.interpolateRgbBasis(["#f87171","#fbbf24","#4dc98a"]));
|
| 54 |
|
| 55 |
+
const margin = {top:28, right:20, bottom:80, left:80};
|
| 56 |
const svg = d3.select("#ts-chart");
|
| 57 |
const container = svg.node().parentElement;
|
| 58 |
const tooltip = d3.select("#ts-tooltip");
|
|
|
|
| 60 |
function render() {
|
| 61 |
svg.selectAll("*").remove();
|
| 62 |
const W = container.clientWidth;
|
| 63 |
+
const H = Math.max(290, Math.min(380, W * 0.47));
|
| 64 |
const w = W - margin.left - margin.right;
|
| 65 |
const h = H - margin.top - margin.bottom;
|
| 66 |
svg.attr("width",W).attr("height",H);
|
|
|
|
| 79 |
g.append("text").attr("x",w+3).attr("y",y(50)+4).attr("fill","#fbbf24").attr("font-size",9).text("50%");
|
| 80 |
|
| 81 |
g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
|
| 82 |
+
d3.axisBottom(x).tickSize(0))
|
| 83 |
+
.call(gg=>{gg.select(".domain").remove();gg.selectAll("text").attr("transform","rotate(-40)").attr("text-anchor","end").attr("dx","-0.5em").attr("dy","0.3em").attr("font-size",9)});
|
| 84 |
g.append("g").attr("class","axis").call(
|
| 85 |
d3.axisLeft(y).ticks(5).tickFormat(d=>d+"%").tickSize(0))
|
| 86 |
.call(ax=>ax.select(".domain").remove())
|
|
|
|
| 90 |
data.forEach(d => {
|
| 91 |
g.append("rect")
|
| 92 |
.attr("x",x(d.label)).attr("width",x.bandwidth())
|
| 93 |
+
.attr("y",h+60).attr("height",4).attr("rx",2)
|
| 94 |
.attr("fill",seriesColor(d.series)).attr("opacity",0.8);
|
| 95 |
});
|
| 96 |
|
|
|
|
| 136 |
.text(d.pct+"%");
|
| 137 |
});
|
| 138 |
|
| 139 |
+
// Highlight best experiment
|
| 140 |
+
const best = data[0];
|
| 141 |
+
if (best) {
|
| 142 |
+
const bx = x(best.label) + x.bandwidth()/2;
|
| 143 |
+
const by = y(best.pct);
|
| 144 |
+
g.append("line").attr("x1",bx).attr("x2",bx).attr("y1",by-16).attr("y2",-8)
|
| 145 |
+
.attr("stroke","#4dc98a").attr("stroke-width",1).attr("stroke-dasharray","2,2").attr("opacity",0.5);
|
| 146 |
+
g.append("text").attr("x",bx).attr("y",-12).attr("text-anchor","middle")
|
| 147 |
+
.attr("fill","#4dc98a").attr("font-size",9).attr("font-weight","600").text("★ best");
|
| 148 |
+
}
|
| 149 |
+
|
| 150 |
g.append("text").attr("x",w).attr("y",-12).attr("text-anchor","end")
|
| 151 |
.attr("fill","#8b8fa8").attr("font-size",10)
|
| 152 |
.text("sorted: highest → lowest score %");
|
|
|
|
| 156 |
window.addEventListener("resize", render);
|
| 157 |
|
| 158 |
const EXPERIMENTS = {
|
| 159 |
+
"1.1 π0": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
|
| 160 |
+
"1.2 π0.5": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
|
| 161 |
+
"1.3 ΔActions": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
|
| 162 |
+
"1.4 RABC low": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
|
| 163 |
+
"1.5 RABC high": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
|
| 164 |
+
"1.7 Δ+RABC": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
|
| 165 |
+
"2.1 HQ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
|
| 166 |
+
"2.2 HQ+RABC+Δ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
|
| 167 |
+
"2.3 HQ+mirror": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
|
| 168 |
+
"2.4 HQ chunk45": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
|
| 169 |
+
"2.5 HQ+RABC+Δ★": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
|
| 170 |
};
|
| 171 |
|
| 172 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
}
|
| 174 |
|
| 175 |
if (typeof d3 !== "undefined") {
|