pepijn223 HF Staff commited on
Commit
5f268a7
·
unverified ·
1 Parent(s): fb7e94b

Improve chart clarity and fix visualization issues

Browse files

- Replace dual-axis L1 Time/Quality chart with bubble scatter plot
- Add counts/percentage toggle to failure analysis
- Fix label overlaps in heatmap, scatter, and failure charts
- Remove misleading series divider lines from sorted charts
- Fix total-score description for 50% reference line
- Increase margins and use smart label placement throughout

Made-with: Cursor

README.md CHANGED
@@ -1,9 +1,9 @@
1
  ---
2
- title: 'Bringing paper to life: A modern template for scientific writing'
3
- short_desc: 'A practical journey behind training SOTA LLMs'
4
- emoji: 📝
5
- colorFrom: blue
6
- colorTo: indigo
7
  sdk: docker
8
  pinned: false
9
  header: mini
 
1
  ---
2
+ title: 'Unfolding Robotics: Open-Source Shirt Folding from Data to Deployment'
3
+ short_desc: 'The complete open-source recipe for teaching robots to fold clothes'
4
+ emoji: 🤖
5
+ colorFrom: yellow
6
+ colorTo: orange
7
  sdk: docker
8
  pinned: false
9
  header: mini
app/src/components/HtmlEmbed.astro CHANGED
@@ -293,6 +293,7 @@ const htmlWithId =
293
  padding: 24px;
294
  z-index: calc(var(--z-elevated) + 1);
295
  position: relative;
 
296
  }
297
  .html-embed__card.is-frameless {
298
  background: transparent;
 
293
  padding: 24px;
294
  z-index: calc(var(--z-elevated) + 1);
295
  position: relative;
296
+ overflow: visible;
297
  }
298
  .html-embed__card.is-frameless {
299
  background: transparent;
app/src/content/article.mdx CHANGED
@@ -44,7 +44,7 @@ tags:
44
  - open-source
45
  tableOfContentsAutoCollapse: true
46
  pdfProOnly: false
47
- showPdf: true
48
  ---
49
 
50
  import Hero from "./chapters/folding/01-hero.mdx";
 
44
  - open-source
45
  tableOfContentsAutoCollapse: true
46
  pdfProOnly: false
47
+ showPdf: false
48
  ---
49
 
50
  import Hero from "./chapters/folding/01-hero.mdx";
app/src/content/chapters/folding/08-ablations.mdx CHANGED
@@ -78,7 +78,7 @@ The gap between Series 1 and Series 2 is immediately visible. Experiment 2.5 rea
78
  id="total-score"
79
  src="folding/total-score.html"
80
  title="Total Score by Experiment"
81
- desc="Overall score (% of maximum 1500) per experiment. The 50% threshold line highlights which experiments achieve at least half the maximum score."
82
  />
83
 
84
  Total score captures partial progress that binary success rate misses. Even failed rollouts earn credit for completed subtasks, revealing that some Series 1 experiments make meaningful progress despite 0% Level 2 success. Only two experiments break the 50% threshold, all from Series 2.
@@ -86,8 +86,8 @@ Total score captures partial progress that binary success rate misses. Even fail
86
  <HtmlEmbed
87
  id="l1-time-quality"
88
  src="folding/l1-time-quality.html"
89
- title="Level 1 Completion Time & Fold Quality"
90
- desc="Average Level 1 completion time (bars) and fold quality score (dashed line, right axis) per experiment. Lower time and higher quality are better."
91
  />
92
 
93
  Speed and quality correlate strongly with data quality. Series 2 experiments fold 2-3x faster than Series 1 (40s vs 100s+), and fold quality only breaks past 3.0 with high-quality training data. Faster isn't a separate goal from better; it's a consequence of the policy learning a clear, unambiguous strategy.
 
78
  id="total-score"
79
  src="folding/total-score.html"
80
  title="Total Score by Experiment"
81
+ desc="Overall score (% of maximum 1500) per experiment. The dashed line marks 50% as a reference point."
82
  />
83
 
84
  Total score captures partial progress that binary success rate misses. Even failed rollouts earn credit for completed subtasks, revealing that some Series 1 experiments make meaningful progress despite 0% Level 2 success. Only two experiments break the 50% threshold, all from Series 2.
 
86
  <HtmlEmbed
87
  id="l1-time-quality"
88
  src="folding/l1-time-quality.html"
89
+ title="Level 1 Completion Time vs. Fold Quality"
90
+ desc="Each bubble is one experiment. X-axis = completion time (faster is left), Y-axis = fold quality (higher is better), bubble size = total success rate. The best experiments cluster in the top-left corner."
91
  />
92
 
93
  Speed and quality correlate strongly with data quality. Series 2 experiments fold 2-3x faster than Series 1 (40s vs 100s+), and fold quality only breaks past 3.0 with high-quality training data. Faster isn't a separate goal from better; it's a consequence of the policy learning a clear, unambiguous strategy.
app/src/content/embeds/folding/failure-analysis.html CHANGED
@@ -62,14 +62,42 @@
62
  padding: 20px 20px 12px;
63
  }
64
 
 
 
 
 
 
 
 
 
 
65
  .chart-title {
66
  font-size: 11px;
67
  text-transform: uppercase;
68
  letter-spacing: 0.08em;
69
  color: #8b8fa8;
70
- margin-bottom: 16px;
71
  }
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  .legend {
74
  display: flex;
75
  flex-wrap: wrap;
@@ -149,15 +177,21 @@
149
  <!-- LEVEL 2 PANEL -->
150
  <div class="panel active" id="panel-l2">
151
  <div class="insight-box">
152
- <strong>Series 1:</strong> nearly all level 2failures occur at Unfold , the robot never gets past step 1.&nbsp;
153
  <strong>Series 2:</strong> Unfold failures collapse (2.5: 0%), but late-stage failures (Fold 3, Rotation) emerge — the model now reliably unfolds but precision degrades at the end.
154
  </div>
155
  <div class="chart-wrap">
156
- <div class="chart-title">Where does the robot fail? — Level 2 failed rollouts by subtask</div>
157
- <svg id="chart-l2" width="100%" height="320"></svg>
 
 
 
 
 
 
158
  <div class="legend" id="legend-l2"></div>
159
  </div>
160
- <p class="note">Each bar = one experiment, showing how its failed Level 2 rollouts distribute across subtasks. Only failed rollouts shown — successful rollouts are excluded. A failure at "Unfold" means the robot never spread the shirt; a failure at "Rotation" means it completed folding but failed the final placement.</p>
161
  </div>
162
 
163
  <!-- LEVEL 1 PANEL -->
@@ -166,71 +200,68 @@
166
  <strong>Level 1 failures</strong> are more distributed since unfolding is given. Series 1 failures concentrate at Fold 2 and Fold 4 (mid-task precision). Series 2 nearly eliminates failures entirely — only 2.3 (mirroring) and 2.4 (chunk=45) regress significantly.
167
  </div>
168
  <div class="chart-wrap">
169
- <div class="chart-title">Where does the robot fail? — Level 1 failed rollouts by subtask</div>
170
- <svg id="chart-l1" width="100%" height="320"></svg>
 
 
 
 
 
 
171
  <div class="legend" id="legend-l1"></div>
172
  </div>
173
- <p class="note">Level 1 begins with the shirt already laid flat, so "Unfold" is not a failure point. Failures at Fold 1 indicate the robot cannot even begin folding a severe failure. Failures at Rotation indicate the robot folded successfully but failed the final placement step.</p>
174
  </div>
175
 
176
  </div>
177
 
178
  <script>
179
  function _initFailureAnalysis() {
180
- // ── DATA ──────────────────────────────────────────────────────────────────────
181
- // Derived from raw rollout data: for each failed rollout,
182
- // "failure point" = first subtask that was NOT reached after a previous one was TRUE
183
- // Level 1: Unfold is given (None), failure starts from Fold 1
184
- // Level 2: Unfold is explicit
185
-
186
  const EXPERIMENTS = [
187
- { id:'1.1', series:1 },
188
- { id:'1.2', series:1 },
189
- { id:'1.3', series:1 },
190
- { id:'1.4', series:1 },
191
- { id:'1.5', series:1 },
192
- { id:'1.7', series:1 },
193
- { id:'2.1', series:2 },
194
- { id:'2.2', series:2 },
195
- { id:'2.3', series:2 },
196
- { id:'2.4', series:2 },
197
- { id:'2.5', series:2 },
198
  ];
199
 
200
- // L2 failures: {experimentId: {subtask: count}}
201
  const L2_FAILURES = {
202
- '1.1': { 'Unfold':10 },
203
- '1.2': { 'Unfold':9, 'Rotation':1 },
204
- '1.3': { 'Unfold':10 },
205
- '1.4': { 'Unfold':10 },
206
- '1.5': { 'Unfold':9, 'Fold 1':1 },
207
- '1.7': { 'Unfold':8, 'Fold 3':1, 'Rotation':1 },
208
- '2.1': { 'Unfold':8, 'Rotation':1 },
209
- '2.2': { 'Unfold':4, 'Rotation':1 },
210
- '2.3': { 'Unfold':8, 'Fold 1':1 },
211
- '2.4': { 'Unfold':9, 'Fold 3':1 },
212
- '2.5': { 'Unfold':2 },
213
  };
214
 
215
- // L1 failures
216
  const L1_FAILURES = {
217
- '1.1': { 'Fold 2':1 },
218
- '1.2': { 'Rotation':4, 'Fold 4':2, 'Fold 2':1 },
219
- '1.3': { 'Rotation':1, 'Fold 4':1 },
220
- '1.4': { 'Rotation':2, 'Fold 3':1, 'Fold 4':2, 'Fold 2':3 },
221
- '1.5': { 'Fold 3':2, 'Fold 2':6, 'Fold 1':1 },
222
- '1.7': { 'Fold 4':1, 'Fold 2':1, 'Rotation':1 },
223
- '2.1': { 'Fold 2':1, 'Fold 4':1 },
224
- '2.2': { 'Fold 2':1 },
225
- '2.3': { 'Fold 1':3, 'Fold 4':3, 'Fold 3':3 },
226
- '2.4': { 'Rotation':2, 'Fold 4':3, 'Fold 3':1 },
227
- '2.5': {}, // 0 L1 failures
228
  };
229
 
230
  const SUBTASKS_L2 = ['Unfold','Fold 1','Fold 2','Fold 3','Fold 4','Rotation'];
231
  const SUBTASKS_L1 = ['Fold 1','Fold 2','Fold 3','Fold 4','Rotation'];
232
 
233
- // Colour: warm→cool progression. Unfold = red (early), Rotation = teal (late)
234
  const COLORS = {
235
  'Unfold': '#ef4444',
236
  'Fold 1': '#f97316',
@@ -240,12 +271,22 @@ const COLORS = {
240
  'Rotation': '#818cf8',
241
  };
242
 
243
- // ── CHART BUILDER ─────────────────────────────────────────────────────────────
244
- function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
 
 
 
 
 
 
 
 
 
 
245
  const svgEl = document.getElementById(svgId);
246
  const W = svgEl.parentElement.clientWidth - 40;
247
- const H = 300;
248
- const margin = { top: 14, right: 16, bottom: 48, left: 38 };
249
  const innerW = W - margin.left - margin.right;
250
  const innerH = H - margin.top - margin.bottom;
251
 
@@ -256,10 +297,11 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
256
  .attr('viewBox', `0 0 ${W} ${H}`)
257
  .attr('height', H);
258
 
 
 
259
  const g = svg.append('g')
260
  .attr('transform', `translate(${margin.left},${margin.top})`);
261
 
262
- // Prepare stacked data
263
  const expIds = experiments.map(a => a.id);
264
  const stackData = expIds.map(id => {
265
  const row = { id };
@@ -268,44 +310,39 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
268
  return row;
269
  });
270
 
271
- const maxTotal = d3.max(stackData, d => d._total) || 10;
272
-
273
- const x = d3.scaleBand()
274
- .domain(expIds)
275
- .range([0, innerW])
276
- .padding(0.28);
 
 
 
 
 
 
 
277
 
278
- const y = d3.scaleLinear()
279
- .domain([0, maxTotal])
280
- .range([innerH, 0])
281
- .nice();
282
 
283
- const stack = d3.stack().keys(subtasks)(stackData);
 
 
284
 
285
  // Grid lines
286
- g.append('g')
287
- .attr('class', 'grid')
288
- .call(d3.axisLeft(y)
289
- .tickSize(-innerW)
290
- .tickFormat('')
291
- .ticks(5))
292
  .call(gg => {
293
  gg.select('.domain').remove();
294
- gg.selectAll('line')
295
- .attr('stroke', '#2a2d3a')
296
- .attr('stroke-dasharray', '3,3');
297
  });
298
 
299
  // Stacked bars
300
- const layer = g.selectAll('.layer')
301
- .data(stack)
302
- .join('g')
303
- .attr('class', 'layer')
304
- .attr('fill', d => COLORS[d.key] || '#666');
305
-
306
- layer.selectAll('rect')
307
- .data(d => d)
308
- .join('rect')
309
  .attr('x', d => x(d.data.id))
310
  .attr('y', d => y(d[1]))
311
  .attr('height', d => Math.max(0, y(d[0]) - y(d[1])))
@@ -313,17 +350,18 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
313
  .attr('rx', 2)
314
  .attr('opacity', 0.88);
315
 
316
- // Count labels on top of bars
317
- g.selectAll('.bar-label')
318
- .data(stackData)
319
- .join('text')
320
  .attr('class', 'bar-label')
321
  .attr('x', d => x(d.id) + x.bandwidth() / 2)
322
- .attr('y', d => d._total === 0 ? y(0) - 4 : y(d._total) - 5)
323
  .attr('text-anchor', 'middle')
324
  .attr('fill', d => d._total === 0 ? '#3a3d4a' : '#8b8fa8')
325
  .attr('font-size', '9')
326
- .text(d => d._total === 0 ? '✓' : d._total);
 
 
 
327
 
328
  // Series divider line
329
  const s1Last = experiments.filter(a => a.series === 1).pop().id;
@@ -332,43 +370,28 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
332
  const xDiv = x(s1Last) + x.bandwidth() + x.step() * 0.14;
333
  g.append('line')
334
  .attr('x1', xDiv).attr('x2', xDiv)
335
- .attr('y1', -8).attr('y2', innerH + 4)
336
- .attr('stroke', '#3a3d4a')
337
- .attr('stroke-width', 1)
338
- .attr('stroke-dasharray', '4,3');
339
-
340
- g.append('text')
341
- .attr('x', xDiv - 6)
342
- .attr('y', -4)
343
- .attr('text-anchor', 'end')
344
- .attr('fill', '#f7934f')
345
- .attr('font-size', '8')
346
- .attr('letter-spacing', '0.06em')
347
- .text('SERIES 1');
348
 
349
  if (s2First) {
350
- g.append('text')
351
- .attr('x', xDiv + 6)
352
- .attr('y', -4)
353
- .attr('text-anchor', 'start')
354
- .attr('fill', '#4dc98a')
355
- .attr('font-size', '8')
356
- .attr('letter-spacing', '0.06em')
357
- .text('SERIES 2');
358
  }
359
  }
360
 
361
  // Axes
362
  g.append('g')
363
- .call(d3.axisLeft(y).ticks(5).tickSize(4))
364
  .call(gg => {
365
  gg.select('.domain').attr('stroke', '#2a2d3a');
366
  gg.selectAll('text').attr('fill', '#8b8fa8').attr('font-size', '9');
367
  gg.selectAll('line').attr('stroke', '#2a2d3a');
368
  });
369
 
370
- g.append('g')
371
- .attr('transform', `translate(0,${innerH})`)
372
  .call(d3.axisBottom(x).tickSize(0))
373
  .call(gg => {
374
  gg.select('.domain').attr('stroke', '#2a2d3a');
@@ -377,19 +400,18 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
377
  const a = experiments.find(a => a.id === d);
378
  return a?.series === 2 ? '#4dc98a' : '#f7934f';
379
  })
380
- .attr('font-size', '10')
381
- .attr('dy', '1.2em');
 
 
 
382
  });
383
 
384
  // Y axis label
385
- svg.append('text')
386
- .attr('transform', `rotate(-90)`)
387
- .attr('x', -(margin.top + innerH / 2))
388
- .attr('y', 10)
389
- .attr('text-anchor', 'middle')
390
- .attr('fill', '#555e7a')
391
- .attr('font-size', '9')
392
- .text('Failed rollouts (n)');
393
 
394
  // Legend
395
  const legendEl = document.getElementById(legendId);
@@ -401,14 +423,14 @@ function buildStackedBar(svgId, legendId, data, subtasks, experiments) {
401
  `).join('');
402
  }
403
 
404
- // ── TAB SWITCHER ──────────────────────────────────────────────────────────────
405
  const rendered = { l2: false, l1: false };
406
 
407
  function renderTab(id) {
408
  if (rendered[id]) return;
409
  rendered[id] = true;
410
- if (id === 'l2') buildStackedBar('chart-l2', 'legend-l2', L2_FAILURES, SUBTASKS_L2, EXPERIMENTS);
411
- if (id === 'l1') buildStackedBar('chart-l1', 'legend-l1', L1_FAILURES, SUBTASKS_L1, EXPERIMENTS);
 
412
  }
413
 
414
  function showTab(id) {
@@ -422,8 +444,8 @@ function showTab(id) {
422
  }
423
 
424
  window.showTab = showTab;
 
425
 
426
- // ── RENDER (only the visible tab) ─────────────────────────────────────────────
427
  renderTab('l2');
428
 
429
  }
 
62
  padding: 20px 20px 12px;
63
  }
64
 
65
+ .chart-header {
66
+ display: flex;
67
+ justify-content: space-between;
68
+ align-items: center;
69
+ flex-wrap: wrap;
70
+ gap: 8px;
71
+ margin-bottom: 16px;
72
+ }
73
+
74
  .chart-title {
75
  font-size: 11px;
76
  text-transform: uppercase;
77
  letter-spacing: 0.08em;
78
  color: #8b8fa8;
 
79
  }
80
 
81
+ .mode-toggle {
82
+ display: flex;
83
+ gap: 0;
84
+ }
85
+ .mode-btn {
86
+ padding: 4px 12px;
87
+ font-size: 10px;
88
+ font-family: inherit;
89
+ cursor: pointer;
90
+ border: 1px solid #2a2d3a;
91
+ background: none;
92
+ color: #8b8fa8;
93
+ transition: all 0.15s;
94
+ letter-spacing: 0.04em;
95
+ }
96
+ .mode-btn:first-child { border-radius: 4px 0 0 4px; }
97
+ .mode-btn:last-child { border-radius: 0 4px 4px 0; border-left: none; }
98
+ .mode-btn.active { background: #252835; color: #e8eaf0; border-color: #4a4d5a; }
99
+ .mode-btn:hover:not(.active) { color: #e8eaf0; }
100
+
101
  .legend {
102
  display: flex;
103
  flex-wrap: wrap;
 
177
  <!-- LEVEL 2 PANEL -->
178
  <div class="panel active" id="panel-l2">
179
  <div class="insight-box">
180
+ <strong>Series 1:</strong> nearly all level 2 failures occur at Unfold the robot never gets past step 1.&nbsp;
181
  <strong>Series 2:</strong> Unfold failures collapse (2.5: 0%), but late-stage failures (Fold 3, Rotation) emerge — the model now reliably unfolds but precision degrades at the end.
182
  </div>
183
  <div class="chart-wrap">
184
+ <div class="chart-header">
185
+ <div class="chart-title">Where does the robot fail? — Level 2 failed rollouts by subtask</div>
186
+ <div class="mode-toggle">
187
+ <button class="mode-btn active" id="mode-l2-abs" onclick="setMode('l2','abs')">Counts</button>
188
+ <button class="mode-btn" id="mode-l2-pct" onclick="setMode('l2','pct')">Percentage</button>
189
+ </div>
190
+ </div>
191
+ <svg id="chart-l2" width="100%" height="320" style="overflow:visible"></svg>
192
  <div class="legend" id="legend-l2"></div>
193
  </div>
194
+ <p class="note">Each bar = one experiment, showing how its failed Level 2 rollouts distribute across subtasks. Only failed rollouts shown — successful rollouts are excluded. Toggle "Percentage" to compare failure distributions regardless of total failure count.</p>
195
  </div>
196
 
197
  <!-- LEVEL 1 PANEL -->
 
200
  <strong>Level 1 failures</strong> are more distributed since unfolding is given. Series 1 failures concentrate at Fold 2 and Fold 4 (mid-task precision). Series 2 nearly eliminates failures entirely — only 2.3 (mirroring) and 2.4 (chunk=45) regress significantly.
201
  </div>
202
  <div class="chart-wrap">
203
+ <div class="chart-header">
204
+ <div class="chart-title">Where does the robot fail? — Level 1 failed rollouts by subtask</div>
205
+ <div class="mode-toggle">
206
+ <button class="mode-btn active" id="mode-l1-abs" onclick="setMode('l1','abs')">Counts</button>
207
+ <button class="mode-btn" id="mode-l1-pct" onclick="setMode('l1','pct')">Percentage</button>
208
+ </div>
209
+ </div>
210
+ <svg id="chart-l1" width="100%" height="320" style="overflow:visible"></svg>
211
  <div class="legend" id="legend-l1"></div>
212
  </div>
213
+ <p class="note">Level 1 begins with the shirt already laid flat, so "Unfold" is not a failure point. Toggle "Percentage" to compare where each experiment struggles, independent of how many total failures it has.</p>
214
  </div>
215
 
216
  </div>
217
 
218
  <script>
219
  function _initFailureAnalysis() {
 
 
 
 
 
 
220
  const EXPERIMENTS = [
221
+ { id:'1.1 π0', series:1 },
222
+ { id:'1.2 π0.5', series:1 },
223
+ { id:'1.3 ΔActions', series:1 },
224
+ { id:'1.4 RABC low', series:1 },
225
+ { id:'1.5 RABC high', series:1 },
226
+ { id:'1.7 Δ+RABC', series:1 },
227
+ { id:'2.1 HQ', series:2 },
228
+ { id:'2.2 HQ+RABC+Δ', series:2 },
229
+ { id:'2.3 HQ+mirror', series:2 },
230
+ { id:'2.4 HQ chunk45', series:2 },
231
+ { id:'2.5 HQ+RABC+Δ★', series:2 },
232
  ];
233
 
 
234
  const L2_FAILURES = {
235
+ '1.1 π0': { 'Unfold':10 },
236
+ '1.2 π0.5': { 'Unfold':9, 'Rotation':1 },
237
+ '1.3 ΔActions': { 'Unfold':10 },
238
+ '1.4 RABC low': { 'Unfold':10 },
239
+ '1.5 RABC high': { 'Unfold':9, 'Fold 1':1 },
240
+ '1.7 Δ+RABC': { 'Unfold':8, 'Fold 3':1, 'Rotation':1 },
241
+ '2.1 HQ': { 'Unfold':8, 'Rotation':1 },
242
+ '2.2 HQ+RABC+Δ': { 'Unfold':4, 'Rotation':1 },
243
+ '2.3 HQ+mirror': { 'Unfold':8, 'Fold 1':1 },
244
+ '2.4 HQ chunk45': { 'Unfold':9, 'Fold 3':1 },
245
+ '2.5 HQ+RABC+Δ★': { 'Unfold':2 },
246
  };
247
 
 
248
  const L1_FAILURES = {
249
+ '1.1 π0': { 'Fold 2':1 },
250
+ '1.2 π0.5': { 'Rotation':4, 'Fold 4':2, 'Fold 2':1 },
251
+ '1.3 ΔActions': { 'Rotation':1, 'Fold 4':1 },
252
+ '1.4 RABC low': { 'Rotation':2, 'Fold 3':1, 'Fold 4':2, 'Fold 2':3 },
253
+ '1.5 RABC high': { 'Fold 3':2, 'Fold 2':6, 'Fold 1':1 },
254
+ '1.7 Δ+RABC': { 'Fold 4':1, 'Fold 2':1, 'Rotation':1 },
255
+ '2.1 HQ': { 'Fold 2':1, 'Fold 4':1 },
256
+ '2.2 HQ+RABC+Δ': { 'Fold 2':1 },
257
+ '2.3 HQ+mirror': { 'Fold 1':3, 'Fold 4':3, 'Fold 3':3 },
258
+ '2.4 HQ chunk45': { 'Rotation':2, 'Fold 4':3, 'Fold 3':1 },
259
+ '2.5 HQ+RABC+Δ★': {},
260
  };
261
 
262
  const SUBTASKS_L2 = ['Unfold','Fold 1','Fold 2','Fold 3','Fold 4','Rotation'];
263
  const SUBTASKS_L1 = ['Fold 1','Fold 2','Fold 3','Fold 4','Rotation'];
264
 
 
265
  const COLORS = {
266
  'Unfold': '#ef4444',
267
  'Fold 1': '#f97316',
 
271
  'Rotation': '#818cf8',
272
  };
273
 
274
+ const modes = { l2: 'abs', l1: 'abs' };
275
+
276
+ function setMode(level, mode) {
277
+ modes[level] = mode;
278
+ document.getElementById(`mode-${level}-abs`).classList.toggle('active', mode === 'abs');
279
+ document.getElementById(`mode-${level}-pct`).classList.toggle('active', mode === 'pct');
280
+ // Force re-render
281
+ rendered[level] = false;
282
+ renderTab(level);
283
+ }
284
+
285
+ function buildStackedBar(svgId, legendId, data, subtasks, experiments, normalize) {
286
  const svgEl = document.getElementById(svgId);
287
  const W = svgEl.parentElement.clientWidth - 40;
288
+ const H = 340;
289
+ const margin = { top: 30, right: 16, bottom: 80, left: 70 };
290
  const innerW = W - margin.left - margin.right;
291
  const innerH = H - margin.top - margin.bottom;
292
 
 
297
  .attr('viewBox', `0 0 ${W} ${H}`)
298
  .attr('height', H);
299
 
300
+ svg.selectAll('*').remove();
301
+
302
  const g = svg.append('g')
303
  .attr('transform', `translate(${margin.left},${margin.top})`);
304
 
 
305
  const expIds = experiments.map(a => a.id);
306
  const stackData = expIds.map(id => {
307
  const row = { id };
 
310
  return row;
311
  });
312
 
313
+ let displayData;
314
+ if (normalize) {
315
+ displayData = stackData.map(row => {
316
+ const out = { id: row.id, _total: row._total };
317
+ subtasks.forEach(s => {
318
+ out[s] = row._total > 0 ? (row[s] / row._total) * 100 : 0;
319
+ });
320
+ out._displayTotal = row._total > 0 ? 100 : 0;
321
+ return out;
322
+ });
323
+ } else {
324
+ displayData = stackData.map(row => ({ ...row, _displayTotal: row._total }));
325
+ }
326
 
327
+ const maxVal = normalize ? 100 : (d3.max(displayData, d => d._displayTotal) || 10);
 
 
 
328
 
329
+ const x = d3.scaleBand().domain(expIds).range([0, innerW]).padding(0.28);
330
+ const y = d3.scaleLinear().domain([0, maxVal]).range([innerH, 0]).nice();
331
+ const stack = d3.stack().keys(subtasks)(displayData);
332
 
333
  // Grid lines
334
+ g.append('g').attr('class', 'grid')
335
+ .call(d3.axisLeft(y).tickSize(-innerW).tickFormat('').ticks(5))
 
 
 
 
336
  .call(gg => {
337
  gg.select('.domain').remove();
338
+ gg.selectAll('line').attr('stroke', '#2a2d3a').attr('stroke-dasharray', '3,3');
 
 
339
  });
340
 
341
  // Stacked bars
342
+ const layer = g.selectAll('.layer').data(stack).join('g')
343
+ .attr('class', 'layer').attr('fill', d => COLORS[d.key] || '#666');
344
+
345
+ layer.selectAll('rect').data(d => d).join('rect')
 
 
 
 
 
346
  .attr('x', d => x(d.data.id))
347
  .attr('y', d => y(d[1]))
348
  .attr('height', d => Math.max(0, y(d[0]) - y(d[1])))
 
350
  .attr('rx', 2)
351
  .attr('opacity', 0.88);
352
 
353
+ // Labels on top
354
+ g.selectAll('.bar-label').data(displayData).join('text')
 
 
355
  .attr('class', 'bar-label')
356
  .attr('x', d => x(d.id) + x.bandwidth() / 2)
357
+ .attr('y', d => d._total === 0 ? y(0) - 4 : y(d._displayTotal) - 5)
358
  .attr('text-anchor', 'middle')
359
  .attr('fill', d => d._total === 0 ? '#3a3d4a' : '#8b8fa8')
360
  .attr('font-size', '9')
361
+ .text(d => {
362
+ if (d._total === 0) return '✓ 0 failures';
363
+ return normalize ? `n=${d._total}` : d._total;
364
+ });
365
 
366
  // Series divider line
367
  const s1Last = experiments.filter(a => a.series === 1).pop().id;
 
370
  const xDiv = x(s1Last) + x.bandwidth() + x.step() * 0.14;
371
  g.append('line')
372
  .attr('x1', xDiv).attr('x2', xDiv)
373
+ .attr('y1', -22).attr('y2', innerH + 4)
374
+ .attr('stroke', '#3a3d4a').attr('stroke-width', 1).attr('stroke-dasharray', '4,3');
375
+
376
+ g.append('text').attr('x', xDiv - 6).attr('y', -18).attr('text-anchor', 'end')
377
+ .attr('fill', '#f7934f').attr('font-size', '8').attr('letter-spacing', '0.06em').text('SERIES 1');
 
 
 
 
 
 
 
 
378
 
379
  if (s2First) {
380
+ g.append('text').attr('x', xDiv + 6).attr('y', -18).attr('text-anchor', 'start')
381
+ .attr('fill', '#4dc98a').attr('font-size', '8').attr('letter-spacing', '0.06em').text('SERIES 2');
 
 
 
 
 
 
382
  }
383
  }
384
 
385
  // Axes
386
  g.append('g')
387
+ .call(d3.axisLeft(y).ticks(5).tickSize(4).tickFormat(d => normalize ? d + '%' : d))
388
  .call(gg => {
389
  gg.select('.domain').attr('stroke', '#2a2d3a');
390
  gg.selectAll('text').attr('fill', '#8b8fa8').attr('font-size', '9');
391
  gg.selectAll('line').attr('stroke', '#2a2d3a');
392
  });
393
 
394
+ g.append('g').attr('transform', `translate(0,${innerH})`)
 
395
  .call(d3.axisBottom(x).tickSize(0))
396
  .call(gg => {
397
  gg.select('.domain').attr('stroke', '#2a2d3a');
 
400
  const a = experiments.find(a => a.id === d);
401
  return a?.series === 2 ? '#4dc98a' : '#f7934f';
402
  })
403
+ .attr('font-size', '9')
404
+ .attr('transform', 'rotate(-40)')
405
+ .attr('text-anchor', 'end')
406
+ .attr('dx', '-0.5em')
407
+ .attr('dy', '0.3em');
408
  });
409
 
410
  // Y axis label
411
+ svg.append('text').attr('transform', 'rotate(-90)')
412
+ .attr('x', -(margin.top + innerH / 2)).attr('y', 10).attr('text-anchor', 'middle')
413
+ .attr('fill', '#555e7a').attr('font-size', '9')
414
+ .text(normalize ? 'Failure distribution (%)' : 'Failed rollouts (n)');
 
 
 
 
415
 
416
  // Legend
417
  const legendEl = document.getElementById(legendId);
 
423
  `).join('');
424
  }
425
 
 
426
  const rendered = { l2: false, l1: false };
427
 
428
  function renderTab(id) {
429
  if (rendered[id]) return;
430
  rendered[id] = true;
431
+ const normalize = modes[id] === 'pct';
432
+ if (id === 'l2') buildStackedBar('chart-l2', 'legend-l2', L2_FAILURES, SUBTASKS_L2, EXPERIMENTS, normalize);
433
+ if (id === 'l1') buildStackedBar('chart-l1', 'legend-l1', L1_FAILURES, SUBTASKS_L1, EXPERIMENTS, normalize);
434
  }
435
 
436
  function showTab(id) {
 
444
  }
445
 
446
  window.showTab = showTab;
447
+ window.setMode = setMode;
448
 
 
449
  renderTab('l2');
450
 
451
  }
app/src/content/embeds/folding/l1-time-quality.html CHANGED
@@ -7,207 +7,159 @@
7
  :root { --bg: transparent; --text: #e8eaf0; --subtext: #8b8fa8; --grid: #2a2d3a; --border: #2a2d3a; }
8
  * { box-sizing: border-box; margin: 0; padding: 0; }
9
  body { background: var(--bg); font-family: system-ui, sans-serif; color: var(--text); }
10
- .legend { display: flex; gap: 20px; justify-content: center; flex-wrap: wrap; margin-bottom: 8px; }
11
- .legend-item { display: flex; align-items: center; gap: 6px; font-size: 12px; color: var(--subtext); }
12
- .legend-dot { width: 12px; height: 12px; border-radius: 50%; }
13
- .legend-line { width: 18px; height: 3px; border-radius: 2px; }
14
- .legend-pip { width: 8px; height: 8px; border-radius: 50%; display: inline-block; }
 
 
15
  .tooltip {
16
  position: absolute; background: #1a1d27; border: 1px solid var(--border);
17
  border-radius: 8px; padding: 10px 14px; pointer-events: none;
18
- opacity: 0; transition: opacity .15s; z-index: 10; min-width: 220px;
19
  box-shadow: 0 4px 16px rgba(0,0,0,.4); font-size: 13px;
20
  }
21
  .tooltip strong { display: block; margin-bottom: 5px; }
22
  .tooltip-row { display: flex; justify-content: space-between; gap: 12px; margin-top: 3px; font-size: 12px; color: var(--subtext); }
23
  .tooltip-row span:last-child { color: var(--text); font-weight: 600; }
24
 
25
- .exp-ref-wrap { margin-bottom: 14px; }
26
- .exp-ref-toggle { background: none; border: 1px solid #2a2d3a; color: #8b8fa8; font-size: 11px;
27
- padding: 4px 10px; border-radius: 6px; cursor: pointer; margin-bottom: 8px; }
28
- .exp-ref-toggle:hover { color: #e8eaf0; border-color: #4f8ef7; }
29
- .exp-table { width: 100%; border-collapse: collapse; font-size: 11px; }
30
- .exp-table th { color: #8b8fa8; font-weight: 500; text-align: left; padding: 4px 8px;
31
- border-bottom: 1px solid #2a2d3a; white-space: nowrap; }
32
- .exp-table td { color: #c8cad8; padding: 4px 8px; border-bottom: 1px solid #1a1d27; vertical-align: top; }
33
- .exp-table td:first-child { color: #e8eaf0; font-weight: 600; white-space: nowrap; }
34
- .exp-table tr.s2 td { background: rgba(247,147,79,0.05); }
35
- .exp-table tr.s1 td { background: rgba(79,142,247,0.04); }
36
- .exp-table tr:hover td { background: rgba(255,255,255,0.04); }
37
-
38
- .axis text { fill: var(--subtext); font-size: 12px; }
39
  .axis line, .axis path { stroke: var(--grid); }
40
  .grid line { stroke: var(--grid); stroke-dasharray: 3,3; }
 
 
41
  </style>
42
  </head>
43
  <body>
44
  <div class="legend">
45
- <div class="legend-item"><div class="legend-pip" style="background:#f7934f"></div>&nbsp;Series 2 bar &nbsp;<div class="legend-pip" style="background:#4f8ef7"></div>&nbsp;Series 1 bar</div>
46
- <div class="legend-item"><div class="legend-line" style="background:#fbbf24"></div>Quality (right axis, 1–5)</div>
47
- </div>
48
- <div class="exp-ref-wrap">
49
- <button class="exp-ref-toggle" onclick="var t=document.getElementById('exp-ref');t.style.display=t.style.display==='none'?'':'none';this.textContent=t.style.display==='none'?'▶ Show experiment descriptions':'▼ Hide experiment descriptions'">▼ Hide experiment descriptions</button>
50
- <div id="exp-ref"></div>
51
  </div>
52
  <div style="position:relative">
53
- <svg id="tq-chart"></svg>
54
  <div class="tooltip" id="tq-tooltip"></div>
55
  </div>
56
  <script>
57
  function _initL1TimeQuality() {
58
  const raw = [
59
- {label:"1.1",series:"1",l1time:121.5, quality:2.70, total_sr:40},
60
- {label:"1.2",series:"1",l1time:90.75, quality:2.50, total_sr:20},
61
- {label:"1.3",series:"1",l1time:113.86,quality:2.80, total_sr:35},
62
- {label:"1.4",series:"1",l1time:78.33, quality:2.20, total_sr:15},
63
- {label:"1.5",series:"1",l1time:null, quality:1.00, total_sr:0 },
64
- {label:"1.7",series:"1",l1time:99.5, quality:2.30, total_sr:40},
65
- {label:"2.1",series:"2",l1time:57.57, quality:2.80, total_sr:40},
66
- {label:"2.2",series:"2",l1time:43.2, quality:3.30, total_sr:75},
67
- {label:"2.3",series:"2",l1time:null, quality:1.00, total_sr:5 },
68
- {label:"2.4",series:"2",l1time:72.5, quality:1.80, total_sr:20},
69
- {label:"2.5",series:"2",l1time:40.8, quality:4.10, total_sr:90},
70
  ];
71
 
72
- // Sort: fastest L1 time first (best performance = least time).
73
- // N/A (null) goes to the end.
74
- const data = [...raw].sort((a,b) => {
75
- if (a.l1time===null && b.l1time===null) return 0;
76
- if (a.l1time===null) return 1;
77
- if (b.l1time===null) return -1;
78
- return a.l1time - b.l1time;
79
- });
80
 
81
  const seriesColor = s => s === "2" ? "#f7934f" : "#4f8ef7";
82
- const margin = {top:20, right:58, bottom:48, left:46};
83
  const svg = d3.select("#tq-chart");
84
  const container = svg.node().parentElement;
85
  const tooltip = d3.select("#tq-tooltip");
86
 
 
 
87
  function render() {
88
  svg.selectAll("*").remove();
89
  const W = container.clientWidth;
90
- const H = Math.max(240, Math.min(320, W * 0.4));
91
  const w = W - margin.left - margin.right;
92
  const h = H - margin.top - margin.bottom;
93
  svg.attr("width",W).attr("height",H);
94
  const g = svg.append("g").attr("transform",`translate(${margin.left},${margin.top})`);
95
 
96
- const x = d3.scaleBand().domain(data.map(d=>d.label)).range([0,w]).padding(0.28);
97
- const yTime = d3.scaleLinear().domain([0, 140]).range([h,0]).nice();
98
- const yQual = d3.scaleLinear().domain([0, 5]).range([h,0]);
99
 
100
- g.append("g").attr("class","grid").selectAll("line").data(yTime.ticks(5)).join("line")
101
- .attr("x1",0).attr("x2",w).attr("y1",d=>yTime(d)).attr("y2",d=>yTime(d));
 
 
 
102
 
103
- g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
104
- d3.axisBottom(x).tickSize(0)).select(".domain").remove();
105
- g.append("g").attr("class","axis").call(
106
- d3.axisLeft(yTime).ticks(5).tickFormat(d=>d+"s").tickSize(0))
107
- .call(ax=>ax.select(".domain").remove())
108
- .call(ax=>ax.selectAll(".tick line").remove());
109
- g.append("g").attr("class","axis").attr("transform",`translate(${w},0)`).call(
110
- d3.axisRight(yQual).ticks(5).tickSize(0))
111
- .call(ax=>ax.select(".domain").remove())
112
- .call(ax=>ax.selectAll(".tick line").remove());
113
-
114
- // Right axis label
115
- g.append("text").attr("x",w+50).attr("y",h/2).attr("text-anchor","middle")
116
- .attr("fill","#fbbf24").attr("font-size",10)
117
- .attr("transform",`rotate(90,${w+50},${h/2})`)
118
- .text("Quality (1–5)");
119
-
120
- // Series pip under labels
121
- data.forEach(d => {
122
- g.append("rect")
123
- .attr("x",x(d.label)).attr("width",x.bandwidth())
124
- .attr("y",h+28).attr("height",4).attr("rx",2)
125
- .attr("fill",seriesColor(d.series)).attr("opacity",0.8);
126
- });
127
 
128
- // Bars
129
- data.forEach(d => {
130
- if (d.l1time !== null) {
131
- g.append("rect")
132
- .attr("x",x(d.label)).attr("width",x.bandwidth())
133
- .attr("y",yTime(d.l1time)).attr("height",h-yTime(d.l1time))
134
- .attr("fill",seriesColor(d.series)).attr("rx",3).attr("opacity",0.85)
135
- .style("cursor","pointer")
136
- .on("mousemove",function(event){
137
- tooltip.style("opacity",1).html(`
138
- <strong>Experiment ${d.label} <small style="color:${seriesColor(d.series)}">(Series ${d.series})</small></strong>\n <div style=\"margin-top:6px;padding-top:6px;border-top:1px solid #2a2d3a;font-size:11px;color:#8b8fa8;line-height:1.5\">${(EXPERIMENTS[d.label]||{}).note||''}</div>
139
- <div class="tooltip-row"><span>Avg L1 Time</span><span>${d.l1time.toFixed(1)}s</span></div>
140
- <div class="tooltip-row"><span>Fold Quality</span><span>${d.quality.toFixed(2)}/5</span></div>
141
- <div class="tooltip-row"><span>Total SR</span><span>${d.total_sr}%</span></div>
142
- `);
143
- const bx=container.getBoundingClientRect();
144
- const ex=event.clientX-bx.left, ey=event.clientY-bx.top;
145
- tooltip.style("left",Math.min(ex+12,W-185)+"px").style("top",Math.max(ey-90,0)+"px");
146
- })
147
- .on("mouseleave",()=>tooltip.style("opacity",0));
148
-
149
- g.append("text")
150
- .attr("x",x(d.label)+x.bandwidth()/2).attr("y",yTime(d.l1time)-4)
151
- .attr("text-anchor","middle").attr("fill","#e8eaf0")
152
- .attr("font-size",Math.max(8,Math.min(11,x.bandwidth()*0.28)))
153
- .text(d.l1time.toFixed(0)+"s");
154
- } else {
155
- g.append("text")
156
- .attr("x",x(d.label)+x.bandwidth()/2).attr("y",h-8)
157
- .attr("text-anchor","middle").attr("fill","#3a3d4a").attr("font-size",9)
158
- .text("N/A");
159
- }
160
- });
161
 
162
- // Quality line
163
- const validLine = d3.line().x(d=>x(d.label)+x.bandwidth()/2).y(d=>yQual(d.quality))
164
- .curve(d3.curveMonotoneX);
165
- g.append("path").datum(data).attr("d",validLine)
166
- .attr("fill","none").attr("stroke","#fbbf24").attr("stroke-width",2)
167
- .attr("stroke-dasharray","5,3").attr("opacity",0.85);
168
- data.forEach(d => {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
  g.append("circle")
170
- .attr("cx",x(d.label)+x.bandwidth()/2).attr("cy",yQual(d.quality))
171
- .attr("r",4).attr("fill","#fbbf24").attr("stroke","#1a1d27").attr("stroke-width",1.5);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
  });
173
 
174
- g.append("text").attr("x",w).attr("y",-8).attr("text-anchor","end")
175
- .attr("fill","#8b8fa8").attr("font-size",10)
176
- .text("sorted: fastest slowest L1 time (N/A last)");
 
 
 
 
177
  }
178
 
179
  render();
180
  window.addEventListener("resize", render);
181
-
182
- const EXPERIMENTS = {
183
- "1.1": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
184
- "1.2": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
185
- "1.3": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
186
- "1.4": { desc:"π0.5 · all data · 200k steps �� RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
187
- "1.5": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
188
- "1.7": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
189
- "2.1": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
190
- "2.2": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
191
- "2.3": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
192
- "2.4": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
193
- "2.5": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
194
- };
195
-
196
-
197
- (function buildRefTable() {
198
- const container = document.getElementById('exp-ref');
199
- if (!container) return;
200
- const order = ["1.1","1.2","1.3","1.4","1.5","1.7","2.1","2.2","2.3","2.4","2.5"];
201
- let html = '<table class="exp-table"><thead><tr><th>#</th><th>Description</th></tr></thead><tbody>';
202
- order.forEach(k => {
203
- const a = EXPERIMENTS[k];
204
- const series = k.startsWith("2") ? "s2" : "s1";
205
- html += `<tr class="${series}"><td><strong>${k}</strong></td><td>${a.desc}</td></tr>`;
206
- });
207
- html += '</tbody></table>';
208
- container.innerHTML = html;
209
- })();
210
-
211
  }
212
 
213
  if (typeof d3 !== "undefined") {
 
7
  :root { --bg: transparent; --text: #e8eaf0; --subtext: #8b8fa8; --grid: #2a2d3a; --border: #2a2d3a; }
8
  * { box-sizing: border-box; margin: 0; padding: 0; }
9
  body { background: var(--bg); font-family: system-ui, sans-serif; color: var(--text); }
10
+
11
+ .legend { display: flex; gap: 16px; justify-content: center; flex-wrap: wrap; margin-bottom: 8px; align-items: center; }
12
+ .legend-item { display: flex; align-items: center; gap: 6px; font-size: 11px; color: var(--subtext); }
13
+ .legend-pip { width: 10px; height: 10px; border-radius: 50%; display: inline-block; border: 1.5px solid #1a1d27; }
14
+ .legend-size { display: flex; align-items: center; gap: 4px; font-size: 10px; color: var(--subtext); }
15
+ .legend-size circle { fill: none; stroke: var(--subtext); stroke-width: 1; }
16
+
17
  .tooltip {
18
  position: absolute; background: #1a1d27; border: 1px solid var(--border);
19
  border-radius: 8px; padding: 10px 14px; pointer-events: none;
20
+ opacity: 0; transition: opacity .15s; z-index: 10; min-width: 200px;
21
  box-shadow: 0 4px 16px rgba(0,0,0,.4); font-size: 13px;
22
  }
23
  .tooltip strong { display: block; margin-bottom: 5px; }
24
  .tooltip-row { display: flex; justify-content: space-between; gap: 12px; margin-top: 3px; font-size: 12px; color: var(--subtext); }
25
  .tooltip-row span:last-child { color: var(--text); font-weight: 600; }
26
 
27
+ .axis text { fill: var(--subtext); font-size: 11px; }
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  .axis line, .axis path { stroke: var(--grid); }
29
  .grid line { stroke: var(--grid); stroke-dasharray: 3,3; }
30
+
31
+ .annotation-line { stroke: #3a3d4a; stroke-dasharray: 4,3; stroke-width: 1; }
32
  </style>
33
  </head>
34
  <body>
35
  <div class="legend">
36
+ <div class="legend-item"><div class="legend-pip" style="background:#f7934f"></div>Series 2</div>
37
+ <div class="legend-item"><div class="legend-pip" style="background:#4f8ef7"></div>Series 1</div>
38
+ <div class="legend-item" style="margin-left:8px; font-size:10px; color:#555">
39
+ Bubble size = Total SR
40
+ </div>
 
41
  </div>
42
  <div style="position:relative">
43
+ <svg id="tq-chart" style="overflow:visible"></svg>
44
  <div class="tooltip" id="tq-tooltip"></div>
45
  </div>
46
  <script>
47
  function _initL1TimeQuality() {
48
  const raw = [
49
+ {label:"1.1 π0",series:"1",l1time:121.5, quality:2.70, total_sr:40},
50
+ {label:"1.2 π0.5",series:"1",l1time:90.75, quality:2.50, total_sr:20},
51
+ {label:"1.3 ΔActions",series:"1",l1time:113.86,quality:2.80, total_sr:35},
52
+ {label:"1.4 RABC low",series:"1",l1time:78.33, quality:2.20, total_sr:15},
53
+ {label:"1.5 RABC high",series:"1",l1time:null, quality:1.00, total_sr:0 },
54
+ {label:"1.7 Δ+RABC",series:"1",l1time:99.5, quality:2.30, total_sr:40},
55
+ {label:"2.1 HQ",series:"2",l1time:57.57, quality:2.80, total_sr:40},
56
+ {label:"2.2 HQ+RABC+Δ",series:"2",l1time:43.2, quality:3.30, total_sr:75},
57
+ {label:"2.3 HQ+mirror",series:"2",l1time:null, quality:1.00, total_sr:5 },
58
+ {label:"2.4 HQ chunk45",series:"2",l1time:72.5, quality:1.80, total_sr:20},
59
+ {label:"2.5 HQ+RABC+Δ★",series:"2",l1time:40.8, quality:4.10, total_sr:90},
60
  ];
61
 
62
+ const data = raw.filter(d => d.l1time !== null);
63
+ const noData = raw.filter(d => d.l1time === null);
 
 
 
 
 
 
64
 
65
  const seriesColor = s => s === "2" ? "#f7934f" : "#4f8ef7";
66
+ const margin = {top:40, right:24, bottom:48, left:54};
67
  const svg = d3.select("#tq-chart");
68
  const container = svg.node().parentElement;
69
  const tooltip = d3.select("#tq-tooltip");
70
 
71
+ const rScale = d3.scaleSqrt().domain([0, 100]).range([4, 22]);
72
+
73
  function render() {
74
  svg.selectAll("*").remove();
75
  const W = container.clientWidth;
76
+ const H = Math.max(300, Math.min(400, W * 0.52));
77
  const w = W - margin.left - margin.right;
78
  const h = H - margin.top - margin.bottom;
79
  svg.attr("width",W).attr("height",H);
80
  const g = svg.append("g").attr("transform",`translate(${margin.left},${margin.top})`);
81
 
82
+ const xTime = d3.scaleLinear().domain([30, 135]).range([0,w]);
83
+ const yQual = d3.scaleLinear().domain([1.5, 4.5]).range([h,0]);
 
84
 
85
+ // Grid
86
+ g.append("g").attr("class","grid").selectAll("line.h").data(yQual.ticks(5)).join("line")
87
+ .attr("x1",0).attr("x2",w).attr("y1",d=>yQual(d)).attr("y2",d=>yQual(d));
88
+ g.append("g").attr("class","grid").selectAll("line.v").data(xTime.ticks(6)).join("line")
89
+ .attr("y1",0).attr("y2",h).attr("x1",d=>xTime(d)).attr("x2",d=>xTime(d));
90
 
91
+ // "Better" direction annotation
92
+ g.append("text").attr("x",4).attr("y",8).attr("fill","#4dc98a").attr("font-size",9).attr("opacity",0.6)
93
+ .text("← faster, better quality ↑");
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
+ // Axes
96
+ g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
97
+ d3.axisBottom(xTime).ticks(6).tickFormat(d=>d+"s").tickSize(0))
98
+ .call(gg=>gg.select(".domain").remove());
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
+ g.append("g").attr("class","axis").call(
101
+ d3.axisLeft(yQual).ticks(5).tickFormat(d=>d.toFixed(1)).tickSize(0))
102
+ .call(ax=>ax.select(".domain").remove());
103
+
104
+ // Axis labels
105
+ g.append("text").attr("x",w/2).attr("y",h+38).attr("text-anchor","middle")
106
+ .attr("fill","#8b8fa8").attr("font-size",11).text("Level 1 Completion Time (s) → slower");
107
+ g.append("text").attr("x",-h/2).attr("y",-40).attr("text-anchor","middle")
108
+ .attr("transform","rotate(-90)").attr("fill","#8b8fa8").attr("font-size",11).text("Fold Quality (1–5)");
109
+
110
+ // Quality = 3.0 reference line
111
+ g.append("line").attr("class","annotation-line")
112
+ .attr("x1",0).attr("x2",w).attr("y1",yQual(3.0)).attr("y2",yQual(3.0));
113
+ g.append("text").attr("x",w-2).attr("y",yQual(3.0)-5).attr("text-anchor","end")
114
+ .attr("fill","#3a3d4a").attr("font-size",8).text("quality = 3.0");
115
+
116
+ // Draw bubbles (larger ones first so smaller ones are on top)
117
+ const sorted = [...data].sort((a,b) => b.total_sr - a.total_sr);
118
+
119
+ sorted.forEach(d => {
120
+ const cx = xTime(d.l1time);
121
+ const cy = yQual(d.quality);
122
+ const r = rScale(d.total_sr);
123
+ const c = seriesColor(d.series);
124
+
125
+ // Bubble
126
  g.append("circle")
127
+ .attr("cx",cx).attr("cy",cy).attr("r",r)
128
+ .attr("fill",c).attr("fill-opacity",0.25)
129
+ .attr("stroke",c).attr("stroke-width",1.5)
130
+ .style("cursor","pointer")
131
+ .on("mousemove",function(event){
132
+ tooltip.style("opacity",1).html(`
133
+ <strong>${d.label} <small style="color:${c}">(Series ${d.series})</small></strong>
134
+ <div class="tooltip-row"><span>L1 Completion Time</span><span>${d.l1time.toFixed(1)}s</span></div>
135
+ <div class="tooltip-row"><span>Fold Quality</span><span>${d.quality.toFixed(2)} / 5</span></div>
136
+ <div class="tooltip-row"><span>Total Success Rate</span><span>${d.total_sr}%</span></div>
137
+ `);
138
+ const bx=container.getBoundingClientRect();
139
+ const ex=event.clientX-bx.left, ey=event.clientY-bx.top;
140
+ tooltip.style("left",Math.min(ex+12,W-210)+"px").style("top",Math.max(ey-90,0)+"px");
141
+ })
142
+ .on("mouseleave",()=>tooltip.style("opacity",0));
143
+
144
+ // Label below bubble
145
+ g.append("text")
146
+ .attr("x",cx).attr("y", cy + r + 12)
147
+ .attr("text-anchor","middle")
148
+ .attr("fill","#e8eaf0").attr("font-size",9).attr("font-weight","500")
149
+ .text(d.label);
150
  });
151
 
152
+ // "No data" annotation for experiments with null L1 time
153
+ if (noData.length > 0) {
154
+ const noDataText = noData.map(d => d.label).join(", ");
155
+ g.append("text").attr("x",w).attr("y",h-4).attr("text-anchor","end")
156
+ .attr("fill","#3a3d4a").attr("font-size",9)
157
+ .text(`No L1 completions: ${noDataText}`);
158
+ }
159
  }
160
 
161
  render();
162
  window.addEventListener("resize", render);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  }
164
 
165
  if (typeof d3 !== "undefined") {
app/src/content/embeds/folding/loss-curves.html CHANGED
@@ -65,17 +65,17 @@
65
  <script>
66
  function _initLossCurves() {
67
  const RUNS = [
68
- { key: "ablation1_1_2", label: "1.1: π0 · MEAN_STD", series: "s1", color: "#ef5350" },
69
- { key: "ablation1_2_2", label: "1.2: π0.5 · MEAN_STD", series: "s1", color: "#f59e0b" },
70
- { key: "ablation1_3_17_q", label: "1.3: π0.5 · ΔActions · QUANTILES", series: "s1", color: "#10b981" },
71
- { key: "ablation1-4", label: "1.4: π0.5 · RABC κ=0.01", series: "s1", color: "#3b82f6" },
72
- { key: "ablation1-5_9", label: "1.5: π0.5 · RABC κ=0.0215", series: "s1", color: "#8b5cf6" },
73
- { key: "ablation1-7_2", label: "1.7: π0.5 · ΔActions · RABC κ=0.0215 · QUANTILES",series: "s1", color: "#ec4899" },
74
- { key: "ablation2-1_100k_q", label: "2.1: HQ finetune from 1.3", series: "s2", color: "#14b8a6" },
75
- { key: "ablation2-2_100k", label: "2.2: HQ + RABC κ=0.0265 + ΔActions", series: "s2", color: "#6366f1" },
76
- { key: "ablation2-3_100k_q_it", label: "2.3: HQ + ΔActions + mirroring", series: "s2", color: "#a78bfa" },
77
- { key: "ablation2-4_100k_q", label: "2.4: HQ + ΔActions · chunk=45", series: "s2", color: "#f97316" },
78
- { key: "ablation2-5_0", label: "2.5: HQ + RABC + ΔActions (best)", series: "s2", color: "#22d3ee" },
79
  ];
80
 
81
  // ──────────────────────────────────────────────
 
65
  <script>
66
  function _initLossCurves() {
67
  const RUNS = [
68
+ { key: "ablation1_1_2", label: "1.1 π0", series: "s1", color: "#ef5350" },
69
+ { key: "ablation1_2_2", label: "1.2 π0.5", series: "s1", color: "#f59e0b" },
70
+ { key: "ablation1_3_17_q", label: "1.3 ΔActions", series: "s1", color: "#10b981" },
71
+ { key: "ablation1-4", label: "1.4 RABC low", series: "s1", color: "#3b82f6" },
72
+ { key: "ablation1-5_9", label: "1.5 RABC high", series: "s1", color: "#8b5cf6" },
73
+ { key: "ablation1-7_2", label: "1.7 Δ+RABC", series: "s1", color: "#ec4899" },
74
+ { key: "ablation2-1_100k_q", label: "2.1 HQ", series: "s2", color: "#14b8a6" },
75
+ { key: "ablation2-2_100k", label: "2.2 HQ+RABC+Δ", series: "s2", color: "#6366f1" },
76
+ { key: "ablation2-3_100k_q_it", label: "2.3 HQ+mirror", series: "s2", color: "#a78bfa" },
77
+ { key: "ablation2-4_100k_q", label: "2.4 HQ chunk45", series: "s2", color: "#f97316" },
78
+ { key: "ablation2-5_0", label: "2.5 HQ+RABC+Δ", series: "s2", color: "#22d3ee" },
79
  ];
80
 
81
  // ──────────────────────────────────────────────
app/src/content/embeds/folding/statistical-analysis.html CHANGED
@@ -8,7 +8,7 @@
8
  body{background:transparent;font-family:system-ui,sans-serif;color:#e8eaf0}
9
  .wrap{max-width:980px;margin:0 auto;padding:20px 20px 36px}
10
 
11
- .card{background:#1a1d27;border:1px solid #2a2d3a;border-radius:6px;overflow:hidden;margin-bottom:12px}
12
  .card-head{padding:9px 14px;border-bottom:1px solid #2a2d3a;font-size:10px;text-transform:uppercase;letter-spacing:.07em;color:#8b8fa8;display:flex;justify-content:space-between;align-items:center;flex-wrap:wrap;gap:8px}
13
  .chart-area{padding:16px 16px 10px}
14
  svg text{font-family:system-ui,sans-serif}
@@ -44,7 +44,7 @@ svg text{font-family:system-ui,sans-serif}
44
  <button class="ctrl-btn" id="v-l2" onclick="setVLevel('L2')">Level 2</button>
45
  </div>
46
  </div>
47
- <div class="chart-area"><svg id="svg-violin" width="100%" height="340"></svg></div>
48
  <div class="legend">
49
  <div class="li"><div class="lsw" style="background:#f7934f"></div>Series 1</div>
50
  <div class="li"><div class="lsw" style="background:#4dc98a"></div>Series 2</div>
@@ -59,26 +59,26 @@ svg text{font-family:system-ui,sans-serif}
59
  <script>
60
  function _initStatAnalysis() {
61
  // ── DATA ──────────────────────────────────────────────────────────────────────
62
- const EXPS = ['1.1','1.2','1.3','1.4','1.5','1.7','2.1','2.2','2.3','2.4','2.5'];
63
  const DATA = {
64
- '1.1': {total:[8,20], L1:[8,10], L2:[0,10], series:1},
65
- '1.2': {total:[4,20], L1:[4,10], L2:[0,10], series:1},
66
- '1.3': {total:[7,20], L1:[7,10], L2:[0,10], series:1},
67
- '1.4': {total:[3,20], L1:[3,10], L2:[0,10], series:1},
68
- '1.5': {total:[0,20], L1:[0,10], L2:[0,10], series:1},
69
- '1.7': {total:[8,20], L1:[8,10], L2:[0,10], series:1},
70
- '2.1': {total:[8,20], L1:[7,10], L2:[1,10], series:2},
71
- '2.2': {total:[15,20], L1:[10,10], L2:[5,10], series:2},
72
- '2.3': {total:[1,20], L1:[0,10], L2:[1,10], series:2},
73
- '2.4': {total:[4,20], L1:[4,10], L2:[0,10], series:2},
74
- '2.5': {total:[18,20], L1:[10,10], L2:[8,10], series:2},
75
  };
76
 
77
  // CLD assignments (Barnard's exact test, two-sided, Bonferroni α=0.10/55)
78
  const CLD = {
79
- total: {'2.5':'a','2.2':'ab','1.1':'bc','1.7':'bc','2.1':'bc','1.3':'bc','1.2':'c','2.4':'c','1.4':'c','2.3':'c','1.5':'c'},
80
- L1: {'2.2':'a','2.5':'a','1.1':'ab','1.7':'ab','1.3':'ab','2.1':'ab','1.2':'abc','2.4':'abc','1.4':'bc','1.5':'c','2.3':'c'},
81
- L2: {'2.5':'a','2.2':'ab','2.1':'b','2.3':'b','1.1':'b','1.2':'b','1.3':'b','1.4':'b','1.5':'b','1.7':'b','2.4':'b'},
82
  };
83
 
84
  // ── BETA DISTRIBUTION PDF ────────────────────────────────────────────────────
@@ -127,11 +127,12 @@ function setVLevel(lv){
127
 
128
  function drawViolin(){
129
  const svgEl=document.getElementById('svg-violin');
130
- const W=svgEl.parentElement.clientWidth-32, H=340;
131
- const m={top:50,right:16,bottom:32,left:38};
132
  const iW=W-m.left-m.right, iH=H-m.top-m.bottom;
133
  svgEl.setAttribute('viewBox',`0 0 ${W} ${H}`);
134
- const svg=d3.select('#svg-violin').attr('viewBox',`0 0 ${W} ${H}`);
 
135
  svg.selectAll('*').remove();
136
  const g=svg.append('g').attr('transform',`translate(${m.left},${m.top})`);
137
 
@@ -219,7 +220,7 @@ function drawViolin(){
219
  // Axes
220
  g.append('g').attr('transform',`translate(0,${iH})`)
221
  .call(d3.axisBottom(x).tickSize(0))
222
- .call(gg=>{gg.select('.domain').attr('stroke',BORDER);gg.selectAll('text').attr('fill',d=>seriesColor(DATA[d].series)).attr('font-size',10).attr('dy','1.3em')});
223
  g.append('g').call(d3.axisLeft(y).ticks(5).tickFormat(d=>Math.round(d*100)+'%').tickSize(3))
224
  .call(gg=>{gg.select('.domain').attr('stroke',BORDER);gg.selectAll('text').attr('fill',SUB).attr('font-size',9);gg.selectAll('line').attr('stroke',BORDER)});
225
  g.append('text').attr('transform','rotate(-90)').attr('x',-iH/2).attr('y',-30).attr('text-anchor','middle')
 
8
  body{background:transparent;font-family:system-ui,sans-serif;color:#e8eaf0}
9
  .wrap{max-width:980px;margin:0 auto;padding:20px 20px 36px}
10
 
11
+ .card{background:#1a1d27;border:1px solid #2a2d3a;border-radius:6px;overflow:visible;margin-bottom:12px}
12
  .card-head{padding:9px 14px;border-bottom:1px solid #2a2d3a;font-size:10px;text-transform:uppercase;letter-spacing:.07em;color:#8b8fa8;display:flex;justify-content:space-between;align-items:center;flex-wrap:wrap;gap:8px}
13
  .chart-area{padding:16px 16px 10px}
14
  svg text{font-family:system-ui,sans-serif}
 
44
  <button class="ctrl-btn" id="v-l2" onclick="setVLevel('L2')">Level 2</button>
45
  </div>
46
  </div>
47
+ <div class="chart-area" style="overflow:visible"><svg id="svg-violin" width="100%" height="500" style="overflow:visible"></svg></div>
48
  <div class="legend">
49
  <div class="li"><div class="lsw" style="background:#f7934f"></div>Series 1</div>
50
  <div class="li"><div class="lsw" style="background:#4dc98a"></div>Series 2</div>
 
59
  <script>
60
  function _initStatAnalysis() {
61
  // ── DATA ──────────────────────────────────────────────────────────────────────
62
+ const EXPS = ['1.1 π0','1.2 π0.5','1.3 ΔActions','1.4 RABC low','1.5 RABC high','1.7 Δ+RABC','2.1 HQ','2.2 HQ+RABC+Δ','2.3 HQ+mirror','2.4 HQ chunk45','2.5 HQ+RABC+Δ★'];
63
  const DATA = {
64
+ '1.1 π0': {total:[8,20], L1:[8,10], L2:[0,10], series:1},
65
+ '1.2 π0.5': {total:[4,20], L1:[4,10], L2:[0,10], series:1},
66
+ '1.3 ΔActions': {total:[7,20], L1:[7,10], L2:[0,10], series:1},
67
+ '1.4 RABC low': {total:[3,20], L1:[3,10], L2:[0,10], series:1},
68
+ '1.5 RABC high': {total:[0,20], L1:[0,10], L2:[0,10], series:1},
69
+ '1.7 Δ+RABC': {total:[8,20], L1:[8,10], L2:[0,10], series:1},
70
+ '2.1 HQ': {total:[8,20], L1:[7,10], L2:[1,10], series:2},
71
+ '2.2 HQ+RABC+Δ': {total:[15,20], L1:[10,10], L2:[5,10], series:2},
72
+ '2.3 HQ+mirror': {total:[1,20], L1:[0,10], L2:[1,10], series:2},
73
+ '2.4 HQ chunk45': {total:[4,20], L1:[4,10], L2:[0,10], series:2},
74
+ '2.5 HQ+RABC+Δ★': {total:[18,20], L1:[10,10], L2:[8,10], series:2},
75
  };
76
 
77
  // CLD assignments (Barnard's exact test, two-sided, Bonferroni α=0.10/55)
78
  const CLD = {
79
+ total: {'2.5 HQ+RABC+Δ★':'a','2.2 HQ+RABC+Δ':'ab','1.1 π0':'bc','1.7 Δ+RABC':'bc','2.1 HQ':'bc','1.3 ΔActions':'bc','1.2 π0.5':'c','2.4 HQ chunk45':'c','1.4 RABC low':'c','2.3 HQ+mirror':'c','1.5 RABC high':'c'},
80
+ L1: {'2.2 HQ+RABC+Δ':'a','2.5 HQ+RABC+Δ★':'a','1.1 π0':'ab','1.7 Δ+RABC':'ab','1.3 ΔActions':'ab','2.1 HQ':'ab','1.2 π0.5':'abc','2.4 HQ chunk45':'abc','1.4 RABC low':'bc','1.5 RABC high':'c','2.3 HQ+mirror':'c'},
81
+ L2: {'2.5 HQ+RABC+Δ★':'a','2.2 HQ+RABC+Δ':'ab','2.1 HQ':'b','2.3 HQ+mirror':'b','1.1 π0':'b','1.2 π0.5':'b','1.3 ΔActions':'b','1.4 RABC low':'b','1.5 RABC high':'b','1.7 Δ+RABC':'b','2.4 HQ chunk45':'b'},
82
  };
83
 
84
  // ── BETA DISTRIBUTION PDF ────────────────────────────────────────────────────
 
127
 
128
  function drawViolin(){
129
  const svgEl=document.getElementById('svg-violin');
130
+ const W=svgEl.parentElement.clientWidth-32, H=500;
131
+ const m={top:50,right:16,bottom:80,left:70};
132
  const iW=W-m.left-m.right, iH=H-m.top-m.bottom;
133
  svgEl.setAttribute('viewBox',`0 0 ${W} ${H}`);
134
+ svgEl.setAttribute('height', H);
135
+ const svg=d3.select('#svg-violin').attr('viewBox',`0 0 ${W} ${H}`).attr('height',H);
136
  svg.selectAll('*').remove();
137
  const g=svg.append('g').attr('transform',`translate(${m.left},${m.top})`);
138
 
 
220
  // Axes
221
  g.append('g').attr('transform',`translate(0,${iH})`)
222
  .call(d3.axisBottom(x).tickSize(0))
223
+ .call(gg=>{gg.select('.domain').attr('stroke',BORDER);gg.selectAll('text').attr('fill',d=>seriesColor(DATA[d].series)).attr('font-size',9).attr('transform','rotate(-40)').attr('text-anchor','end').attr('dx','-0.5em').attr('dy','0.3em')});
224
  g.append('g').call(d3.axisLeft(y).ticks(5).tickFormat(d=>Math.round(d*100)+'%').tickSize(3))
225
  .call(gg=>{gg.select('.domain').attr('stroke',BORDER);gg.selectAll('text').attr('fill',SUB).attr('font-size',9);gg.selectAll('line').attr('stroke',BORDER)});
226
  g.append('text').attr('transform','rotate(-90)').attr('x',-iH/2).attr('y',-30).attr('text-anchor','middle')
app/src/content/embeds/folding/subtask-heatmap.html CHANGED
@@ -19,25 +19,9 @@
19
  .legend-bar { display: flex; align-items: center; gap: 8px; margin-top: 10px; font-size: 11px; color: var(--subtext); justify-content: center; }
20
  .legend-gradient { height: 10px; width: 180px; border-radius: 5px; flex-shrink: 0; }
21
 
22
- .exp-ref-wrap { margin-bottom: 14px; }
23
- .exp-ref-toggle { background: none; border: 1px solid #2a2d3a; color: #8b8fa8; font-size: 11px;
24
- padding: 4px 10px; border-radius: 6px; cursor: pointer; margin-bottom: 8px; }
25
- .exp-ref-toggle:hover { color: #e8eaf0; border-color: #4f8ef7; }
26
- .exp-table { width: 100%; border-collapse: collapse; font-size: 11px; }
27
- .exp-table th { color: #8b8fa8; font-weight: 500; text-align: left; padding: 4px 8px;
28
- border-bottom: 1px solid #2a2d3a; white-space: nowrap; }
29
- .exp-table td { color: #c8cad8; padding: 4px 8px; border-bottom: 1px solid #1a1d27; vertical-align: top; }
30
- .exp-table td:first-child { color: #e8eaf0; font-weight: 600; white-space: nowrap; }
31
- .exp-table tr.s2 td { background: rgba(247,147,79,0.05); }
32
- .exp-table tr.s1 td { background: rgba(79,142,247,0.04); }
33
- .exp-table tr:hover td { background: rgba(255,255,255,0.04); }
34
  </style>
35
  </head>
36
  <body>
37
- <div class="exp-ref-wrap">
38
- <button class="exp-ref-toggle" onclick="var t=document.getElementById('exp-ref');t.style.display=t.style.display==='none'?'':'none';this.textContent=t.style.display==='none'?'▶ Show experiment descriptions':'▼ Hide experiment descriptions'">▼ Hide experiment descriptions</button>
39
- <div id="exp-ref"></div>
40
- </div>
41
  <div style="position:relative">
42
  <svg id="hm-chart"></svg>
43
  <div class="tooltip" id="hm-tooltip"></div>
@@ -50,17 +34,17 @@
50
  <script>
51
  function _initSubtaskHeatmap() {
52
  const rawData = [
53
- {label:"1.1",series:"1",total_sr:40, times:[null, 19.2, 42.22, 14.33, 19.88, 27.25]},
54
- {label:"1.2",series:"1",total_sr:20, times:[50, 39.27, 41.5, 12.3, 13.75, 10.75]},
55
- {label:"1.3",series:"1",total_sr:35, times:[null, 19.5, 44.2, 14.8, 30.33, 22.14]},
56
- {label:"1.4",series:"1",total_sr:15, times:[null, 20.8, 36.62, 10.0, 18.8, 12.67]},
57
- {label:"1.5",series:"1",total_sr:0, times:[240, 21.4, 100.0, null, null, null ]},
58
- {label:"1.7",series:"1",total_sr:40, times:[157.5,19.33, 32.64, 8.9, 11.0, 23.38]},
59
- {label:"2.1",series:"2",total_sr:40, times:[77.5, 11.08, 21.09, 5.45, 5.5, 11.5 ]},
60
- {label:"2.2",series:"2",total_sr:75, times:[34.33,6.25, 12.31, 3.75, 5.31, 8.93 ]},
61
- {label:"2.3",series:"2",total_sr:5, times:[49, 14.0, 23.71, 17.5, 11.0, 4.0 ]},
62
- {label:"2.4",series:"2",total_sr:20, times:[120, 10.09, 41.18, 7.89, 7.33, 10.0 ]},
63
- {label:"2.5",series:"2",total_sr:90, times:[62.25,8.28, 12.0, 5.28, 5.22, 6.83 ]},
64
  ];
65
 
66
  // Sort rows: best → worst by total_sr (heatmap: top = best)
@@ -87,7 +71,7 @@ const canvas = document.getElementById("lgd");
87
  const ctx = canvas.getContext("2d");
88
  for (let i=0; i<180; i++) { ctx.fillStyle=colorScale(i/180*120); ctx.fillRect(i,0,1,10); }
89
 
90
- const margin = {top:12, right:16, bottom:36, left:60};
91
  const svg = d3.select("#hm-chart");
92
  const container = svg.node().parentElement;
93
  const tooltip = d3.select("#hm-tooltip");
@@ -96,7 +80,7 @@ function render() {
96
  svg.selectAll("*").remove();
97
  const W = container.clientWidth;
98
  const cellW = Math.floor((W - margin.left - margin.right) / subtasks.length);
99
- const cellH = Math.max(26, Math.min(40, cellW * 0.7));
100
  const H = data.length * cellH + margin.top + margin.bottom;
101
  svg.attr("width",W).attr("height",H);
102
  const g = svg.append("g").attr("transform",`translate(${margin.left},${margin.top})`);
@@ -118,16 +102,14 @@ function render() {
118
  .attr("fill",seriesColor(d.series)).attr("opacity",0.9);
119
 
120
  g.append("text")
121
- .attr("x",-8).attr("y",ri*cellH+cellH/2+4)
122
- .attr("text-anchor","end").attr("fill","#e8eaf0").attr("font-size",11).attr("font-weight","500")
123
  .text(d.label);
124
 
125
- // total SR badge
126
  g.append("text")
127
- .attr("x",-8).attr("y",ri*cellH+cellH/2+4)
128
- .attr("text-anchor","start").attr("fill","#8b8fa8").attr("font-size",8)
129
- .attr("transform",`translate(${-margin.left+10},0)`)
130
- .text(d.total_sr+"%");
131
  });
132
 
133
  // Cells
@@ -177,34 +159,20 @@ render();
177
  window.addEventListener("resize", render);
178
 
179
  const EXPERIMENTS = {
180
- "1.1": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
181
- "1.2": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
182
- "1.3": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
183
- "1.4": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
184
- "1.5": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
185
- "1.7": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
186
- "2.1": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
187
- "2.2": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
188
- "2.3": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
189
- "2.4": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
190
- "2.5": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
191
  };
192
 
193
 
194
- (function buildRefTable() {
195
- const container = document.getElementById('exp-ref');
196
- if (!container) return;
197
- const order = ["1.1","1.2","1.3","1.4","1.5","1.7","2.1","2.2","2.3","2.4","2.5"];
198
- let html = '<table class="exp-table"><thead><tr><th>#</th><th>Description</th></tr></thead><tbody>';
199
- order.forEach(k => {
200
- const a = EXPERIMENTS[k];
201
- const series = k.startsWith("2") ? "s2" : "s1";
202
- html += `<tr class="${series}"><td><strong>${k}</strong></td><td>${a.desc}</td></tr>`;
203
- });
204
- html += '</tbody></table>';
205
- container.innerHTML = html;
206
- })();
207
-
208
  }
209
 
210
  if (typeof d3 !== "undefined") {
 
19
  .legend-bar { display: flex; align-items: center; gap: 8px; margin-top: 10px; font-size: 11px; color: var(--subtext); justify-content: center; }
20
  .legend-gradient { height: 10px; width: 180px; border-radius: 5px; flex-shrink: 0; }
21
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  </style>
23
  </head>
24
  <body>
 
 
 
 
25
  <div style="position:relative">
26
  <svg id="hm-chart"></svg>
27
  <div class="tooltip" id="hm-tooltip"></div>
 
34
  <script>
35
  function _initSubtaskHeatmap() {
36
  const rawData = [
37
+ {label:"1.1 π0",series:"1",total_sr:40, times:[null, 19.2, 42.22, 14.33, 19.88, 27.25]},
38
+ {label:"1.2 π0.5",series:"1",total_sr:20, times:[50, 39.27, 41.5, 12.3, 13.75, 10.75]},
39
+ {label:"1.3 ΔActions",series:"1",total_sr:35, times:[null, 19.5, 44.2, 14.8, 30.33, 22.14]},
40
+ {label:"1.4 RABC low",series:"1",total_sr:15, times:[null, 20.8, 36.62, 10.0, 18.8, 12.67]},
41
+ {label:"1.5 RABC high",series:"1",total_sr:0, times:[240, 21.4, 100.0, null, null, null ]},
42
+ {label:"1.7 Δ+RABC",series:"1",total_sr:40, times:[157.5,19.33, 32.64, 8.9, 11.0, 23.38]},
43
+ {label:"2.1 HQ",series:"2",total_sr:40, times:[77.5, 11.08, 21.09, 5.45, 5.5, 11.5 ]},
44
+ {label:"2.2 HQ+RABC+Δ",series:"2",total_sr:75, times:[34.33,6.25, 12.31, 3.75, 5.31, 8.93 ]},
45
+ {label:"2.3 HQ+mirror",series:"2",total_sr:5, times:[49, 14.0, 23.71, 17.5, 11.0, 4.0 ]},
46
+ {label:"2.4 HQ chunk45",series:"2",total_sr:20, times:[120, 10.09, 41.18, 7.89, 7.33, 10.0 ]},
47
+ {label:"2.5 HQ+RABC+Δ★",series:"2",total_sr:90, times:[62.25,8.28, 12.0, 5.28, 5.22, 6.83 ]},
48
  ];
49
 
50
  // Sort rows: best → worst by total_sr (heatmap: top = best)
 
71
  const ctx = canvas.getContext("2d");
72
  for (let i=0; i<180; i++) { ctx.fillStyle=colorScale(i/180*120); ctx.fillRect(i,0,1,10); }
73
 
74
+ const margin = {top:12, right:16, bottom:36, left:120};
75
  const svg = d3.select("#hm-chart");
76
  const container = svg.node().parentElement;
77
  const tooltip = d3.select("#hm-tooltip");
 
80
  svg.selectAll("*").remove();
81
  const W = container.clientWidth;
82
  const cellW = Math.floor((W - margin.left - margin.right) / subtasks.length);
83
+ const cellH = Math.max(34, Math.min(44, cellW * 0.7));
84
  const H = data.length * cellH + margin.top + margin.bottom;
85
  svg.attr("width",W).attr("height",H);
86
  const g = svg.append("g").attr("transform",`translate(${margin.left},${margin.top})`);
 
102
  .attr("fill",seriesColor(d.series)).attr("opacity",0.9);
103
 
104
  g.append("text")
105
+ .attr("x",-8).attr("y",ri*cellH+cellH/2)
106
+ .attr("text-anchor","end").attr("fill","#e8eaf0").attr("font-size",10).attr("font-weight","500")
107
  .text(d.label);
108
 
 
109
  g.append("text")
110
+ .attr("x",-8).attr("y",ri*cellH+cellH/2+11)
111
+ .attr("text-anchor","end").attr("fill","#8b8fa8").attr("font-size",8)
112
+ .text(d.total_sr+"% SR");
 
113
  });
114
 
115
  // Cells
 
159
  window.addEventListener("resize", render);
160
 
161
  const EXPERIMENTS = {
162
+ "1.1 π0": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
163
+ "1.2 π0.5": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
164
+ "1.3 ΔActions": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
165
+ "1.4 RABC low": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
166
+ "1.5 RABC high": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
167
+ "1.7 Δ+RABC": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
168
+ "2.1 HQ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
169
+ "2.2 HQ+RABC+Δ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
170
+ "2.3 HQ+mirror": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
171
+ "2.4 HQ chunk45": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
172
+ "2.5 HQ+RABC+Δ★": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
173
  };
174
 
175
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
  }
177
 
178
  if (typeof d3 !== "undefined") {
app/src/content/embeds/folding/success-rates.html CHANGED
@@ -61,19 +61,6 @@
61
  .tooltip-ci { font-size: 10px; color: #555; margin-left: 4px; }
62
  .tooltip-note { margin-top: 7px; padding-top: 7px; border-top: 1px solid #2a2d3a; font-size: 11px; color: #8b8fa8; line-height: 1.5; }
63
 
64
- /* ── Ablation table ── */
65
- .abl-ref-wrap { margin-bottom: 14px; }
66
- .abl-ref-toggle { background: none; border: 1px solid #2a2d3a; color: #8b8fa8; font-size: 11px;
67
- padding: 4px 10px; border-radius: 6px; cursor: pointer; margin-bottom: 8px; }
68
- .abl-ref-toggle:hover { color: #e8eaf0; border-color: #4f8ef7; }
69
- .abl-table { width: 100%; border-collapse: collapse; font-size: 11px; }
70
- .abl-table th { color: #8b8fa8; font-weight: 500; text-align: left; padding: 4px 8px; border-bottom: 1px solid #2a2d3a; white-space: nowrap; }
71
- .abl-table td { color: #c8cad8; padding: 4px 8px; border-bottom: 1px solid #1a1d27; vertical-align: top; }
72
- .abl-table td:first-child { color: #e8eaf0; font-weight: 600; white-space: nowrap; }
73
- .abl-table tr.s2 td { background: rgba(247,147,79,0.05); }
74
- .abl-table tr.s1 td { background: rgba(79,142,247,0.04); }
75
- .abl-table tr:hover td { background: rgba(255,255,255,0.04); }
76
-
77
  /* ── Chart ── */
78
  .axis text { fill: var(--subtext); font-size: 12px; }
79
  .axis line, .axis path { stroke: var(--grid); }
@@ -117,14 +104,8 @@
117
  </span>
118
  </div>
119
 
120
- <!-- Experiment reference table -->
121
- <div class="abl-ref-wrap">
122
- <button class="abl-ref-toggle" onclick="var t=document.getElementById('sr-abl-ref');t.style.display=t.style.display==='none'?'':'none';this.textContent=t.style.display==='none'?'▶ Show experiment descriptions':'▼ Hide experiment descriptions'">▼ Hide experiment descriptions</button>
123
- <div id="sr-abl-ref"></div>
124
- </div>
125
-
126
  <div style="position:relative">
127
- <svg id="sr-chart"></svg>
128
  <div class="tooltip" id="sr-tooltip"></div>
129
  </div>
130
 
@@ -132,17 +113,17 @@
132
  function _initSuccessRates() {
133
  // ── Experiment metadata ───────────────────────────────────────────────────
134
  const EXPERIMENTS = {
135
- "1.1": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"π0 base model fine-tuned on full dataset. Default normalization." },
136
- "1.2": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to π0.5 with MEAN_STD normalization." },
137
- "1.3": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds delta actions and switches to QUANTILES normalization." },
138
- "1.4": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"RABC with low κ — not very selective. MEAN_STD norm, no delta actions." },
139
- "1.5": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"RABC with higher κ (mean + ½σ). MEAN_STD norm, no delta actions." },
140
- "1.7": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
141
- "2.1": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data. No RABC." },
142
- "2.2": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
143
- "2.3": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments HQ data with mirrored trajectories." },
144
- "2.4": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Larger action chunk size (45 vs default 30)." },
145
- "2.5": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
146
  };
147
 
148
  // ── Raw data ───────────────────────────────────────────────────────────────
@@ -150,17 +131,17 @@ const EXPERIMENTS = {
150
  const N = { total: 20, l1: 10, l2: 10 };
151
 
152
  const raw = [
153
- {label:"1.1", series:"1", total:40, l1:80, l2:0 },
154
- {label:"1.2", series:"1", total:20, l1:40, l2:0 },
155
- {label:"1.3", series:"1", total:35, l1:70, l2:0 },
156
- {label:"1.4", series:"1", total:15, l1:30, l2:0 },
157
- {label:"1.5", series:"1", total:0, l1:0, l2:0 },
158
- {label:"1.7", series:"1", total:40, l1:80, l2:0 },
159
- {label:"2.1", series:"2", total:40, l1:70, l2:10},
160
- {label:"2.2", series:"2", total:75, l1:100, l2:50},
161
- {label:"2.3", series:"2", total:5, l1:0, l2:10},
162
- {label:"2.4", series:"2", total:20, l1:40, l2:0 },
163
- {label:"2.5", series:"2", total:90, l1:100, l2:80},
164
  ];
165
 
166
  // ── Wilson 90% CI ──────────────────────────────────────────────────────────
@@ -218,7 +199,7 @@ function getSorted() {
218
  }
219
 
220
  // ── Render ─────────────────────────────────────────────────────────────────
221
- const margin = {top:28, right:20, bottom:48, left:46};
222
  const svg = d3.select("#sr-chart");
223
  const container = svg.node().parentElement;
224
  const tooltip = d3.select("#sr-tooltip");
@@ -229,7 +210,7 @@ function render() {
229
  const activeKeys = ["total","l1","l2"].filter(k => active[k]);
230
 
231
  const W = container.clientWidth;
232
- const H = Math.max(260, Math.min(360, W * 0.46));
233
  const w = W - margin.left - margin.right;
234
  const h = H - margin.top - margin.bottom;
235
  svg.attr("width", W).attr("height", H);
@@ -245,7 +226,8 @@ function render() {
245
 
246
  // Axes
247
  g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
248
- d3.axisBottom(x0).tickSize(0)).select(".domain").remove();
 
249
  g.append("g").attr("class","axis").call(
250
  d3.axisLeft(y).tickValues([0,25,50,75,100]).tickFormat(d=>d+"%").tickSize(0))
251
  .call(ax=>ax.select(".domain").remove())
@@ -255,7 +237,7 @@ function render() {
255
  sortedData.forEach(d => {
256
  g.append("rect")
257
  .attr("x", x0(d.label)).attr("width", x0.bandwidth())
258
- .attr("y", h+26).attr("height", 4).attr("rx", 2)
259
  .attr("fill", seriesColor(d.series)).attr("opacity", 0.8);
260
  });
261
 
@@ -350,19 +332,6 @@ function render() {
350
  .text(`sorted: best → worst by ${skLabel[sk]}`);
351
  }
352
 
353
- // ── Reference table ────────────────────────────────────────────────────────
354
- (function() {
355
- const el = document.getElementById("sr-abl-ref");
356
- const order = ["1.1","1.2","1.3","1.4","1.5","1.7","2.1","2.2","2.3","2.4","2.5"];
357
- let html = '<table class="abl-table"><thead><tr><th>#</th><th>Description</th></tr></thead><tbody>';
358
- order.forEach(k => {
359
- const e = EXPERIMENTS[k], cls = k.startsWith("2") ? "s2" : "s1";
360
- html += `<tr class="${cls}"><td><strong>${k}</strong></td><td>${e.desc}</td></tr>`;
361
- });
362
- html += "</tbody></table>";
363
- el.innerHTML = html;
364
- })();
365
-
366
  render();
367
  window.addEventListener("resize", render);
368
  }
 
61
  .tooltip-ci { font-size: 10px; color: #555; margin-left: 4px; }
62
  .tooltip-note { margin-top: 7px; padding-top: 7px; border-top: 1px solid #2a2d3a; font-size: 11px; color: #8b8fa8; line-height: 1.5; }
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  /* ── Chart ── */
65
  .axis text { fill: var(--subtext); font-size: 12px; }
66
  .axis line, .axis path { stroke: var(--grid); }
 
104
  </span>
105
  </div>
106
 
 
 
 
 
 
 
107
  <div style="position:relative">
108
+ <svg id="sr-chart" style="overflow:visible"></svg>
109
  <div class="tooltip" id="sr-tooltip"></div>
110
  </div>
111
 
 
113
  function _initSuccessRates() {
114
  // ── Experiment metadata ───────────────────────────────────────────────────
115
  const EXPERIMENTS = {
116
+ "1.1 π0": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"π0 base model fine-tuned on full dataset. Default normalization." },
117
+ "1.2 π0.5": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to π0.5 with MEAN_STD normalization." },
118
+ "1.3 ΔActions": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds delta actions and switches to QUANTILES normalization." },
119
+ "1.4 RABC low": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"RABC with low κ — not very selective. MEAN_STD norm, no delta actions." },
120
+ "1.5 RABC high": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"RABC with higher κ (mean + ½σ). MEAN_STD norm, no delta actions." },
121
+ "1.7 Δ+RABC": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
122
+ "2.1 HQ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data. No RABC." },
123
+ "2.2 HQ+RABC+Δ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
124
+ "2.3 HQ+mirror": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments HQ data with mirrored trajectories." },
125
+ "2.4 HQ chunk45": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Larger action chunk size (45 vs default 30)." },
126
+ "2.5 HQ+RABC+Δ★": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
127
  };
128
 
129
  // ── Raw data ───────────────────────────────────────────────────────────────
 
131
  const N = { total: 20, l1: 10, l2: 10 };
132
 
133
  const raw = [
134
+ {label:"1.1 π0", series:"1", total:40, l1:80, l2:0 },
135
+ {label:"1.2 π0.5", series:"1", total:20, l1:40, l2:0 },
136
+ {label:"1.3 ΔActions", series:"1", total:35, l1:70, l2:0 },
137
+ {label:"1.4 RABC low", series:"1", total:15, l1:30, l2:0 },
138
+ {label:"1.5 RABC high", series:"1", total:0, l1:0, l2:0 },
139
+ {label:"1.7 Δ+RABC", series:"1", total:40, l1:80, l2:0 },
140
+ {label:"2.1 HQ", series:"2", total:40, l1:70, l2:10},
141
+ {label:"2.2 HQ+RABC+Δ", series:"2", total:75, l1:100, l2:50},
142
+ {label:"2.3 HQ+mirror", series:"2", total:5, l1:0, l2:10},
143
+ {label:"2.4 HQ chunk45", series:"2", total:20, l1:40, l2:0 },
144
+ {label:"2.5 HQ+RABC+Δ★", series:"2", total:90, l1:100, l2:80},
145
  ];
146
 
147
  // ── Wilson 90% CI ──────────────────────────────────────────────────────────
 
199
  }
200
 
201
  // ── Render ─────────────────────────────────────────────────────────────────
202
+ const margin = {top:28, right:20, bottom:80, left:80};
203
  const svg = d3.select("#sr-chart");
204
  const container = svg.node().parentElement;
205
  const tooltip = d3.select("#sr-tooltip");
 
210
  const activeKeys = ["total","l1","l2"].filter(k => active[k]);
211
 
212
  const W = container.clientWidth;
213
+ const H = Math.max(300, Math.min(400, W * 0.50));
214
  const w = W - margin.left - margin.right;
215
  const h = H - margin.top - margin.bottom;
216
  svg.attr("width", W).attr("height", H);
 
226
 
227
  // Axes
228
  g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
229
+ d3.axisBottom(x0).tickSize(0))
230
+ .call(gg=>{gg.select(".domain").remove();gg.selectAll("text").attr("transform","rotate(-40)").attr("text-anchor","end").attr("dx","-0.5em").attr("dy","0.3em").attr("font-size",9)});
231
  g.append("g").attr("class","axis").call(
232
  d3.axisLeft(y).tickValues([0,25,50,75,100]).tickFormat(d=>d+"%").tickSize(0))
233
  .call(ax=>ax.select(".domain").remove())
 
237
  sortedData.forEach(d => {
238
  g.append("rect")
239
  .attr("x", x0(d.label)).attr("width", x0.bandwidth())
240
+ .attr("y", h+60).attr("height", 4).attr("rx", 2)
241
  .attr("fill", seriesColor(d.series)).attr("opacity", 0.8);
242
  });
243
 
 
332
  .text(`sorted: best → worst by ${skLabel[sk]}`);
333
  }
334
 
 
 
 
 
 
 
 
 
 
 
 
 
 
335
  render();
336
  window.addEventListener("resize", render);
337
  }
app/src/content/embeds/folding/total-score.html CHANGED
@@ -8,19 +8,6 @@
8
  * { box-sizing: border-box; margin: 0; padding: 0; }
9
  body { background: var(--bg); font-family: system-ui, sans-serif; color: var(--text); }
10
 
11
- .exp-ref-wrap { margin-bottom: 14px; }
12
- .exp-ref-toggle { background: none; border: 1px solid #2a2d3a; color: #8b8fa8; font-size: 11px;
13
- padding: 4px 10px; border-radius: 6px; cursor: pointer; margin-bottom: 8px; }
14
- .exp-ref-toggle:hover { color: #e8eaf0; border-color: #4f8ef7; }
15
- .exp-table { width: 100%; border-collapse: collapse; font-size: 11px; }
16
- .exp-table th { color: #8b8fa8; font-weight: 500; text-align: left; padding: 4px 8px;
17
- border-bottom: 1px solid #2a2d3a; white-space: nowrap; }
18
- .exp-table td { color: #c8cad8; padding: 4px 8px; border-bottom: 1px solid #1a1d27; vertical-align: top; }
19
- .exp-table td:first-child { color: #e8eaf0; font-weight: 600; white-space: nowrap; }
20
- .exp-table tr.s2 td { background: rgba(247,147,79,0.05); }
21
- .exp-table tr.s1 td { background: rgba(79,142,247,0.04); }
22
- .exp-table tr:hover td { background: rgba(255,255,255,0.04); }
23
-
24
  .axis text { fill: var(--subtext); font-size: 11px; }
25
  .axis line, .axis path { stroke: var(--grid); }
26
  .grid line { stroke: var(--grid); stroke-dasharray: 3,3; }
@@ -37,28 +24,24 @@
37
  </style>
38
  </head>
39
  <body>
40
- <div class="exp-ref-wrap">
41
- <button class="exp-ref-toggle" onclick="var t=document.getElementById('ts-exp-ref');t.style.display=t.style.display==='none'?'':'none';this.textContent=t.style.display==='none'?'▶ Show experiment descriptions':'▼ Hide experiment descriptions'">▼ Hide experiment descriptions</button>
42
- <div id="ts-exp-ref"></div>
43
- </div>
44
  <div style="position:relative">
45
- <svg id="ts-chart"></svg>
46
  <div class="tooltip" id="ts-tooltip"></div>
47
  </div>
48
  <script>
49
  function _initTotalScore() {
50
  const raw = [
51
- {label:"1.1",series:"1",score:440, pct:29.3,total_sr:40},
52
- {label:"1.2",series:"1",score:480, pct:32.0,total_sr:20},
53
- {label:"1.3",series:"1",score:460, pct:30.7,total_sr:35},
54
- {label:"1.4",series:"1",score:330, pct:22.0,total_sr:15},
55
- {label:"1.5",series:"1",score:170, pct:11.3,total_sr:0 },
56
- {label:"1.7",series:"1",score:600, pct:40.0,total_sr:40},
57
- {label:"2.1",series:"2",score:620, pct:41.3,total_sr:40},
58
- {label:"2.2",series:"2",score:1090,pct:72.7,total_sr:75},
59
- {label:"2.3",series:"2",score:310, pct:20.7,total_sr:5 },
60
- {label:"2.4",series:"2",score:460, pct:30.7,total_sr:20},
61
- {label:"2.5",series:"2",score:1300,pct:86.7,total_sr:90},
62
  ];
63
 
64
  // Sort highest → lowest score %
@@ -69,7 +52,7 @@ const seriesColor = s => s === "2" ? "#f7934f" : "#4f8ef7";
69
  const perfColor = d3.scaleSequential().domain([0,100])
70
  .interpolator(d3.interpolateRgbBasis(["#f87171","#fbbf24","#4dc98a"]));
71
 
72
- const margin = {top:28, right:20, bottom:48, left:50};
73
  const svg = d3.select("#ts-chart");
74
  const container = svg.node().parentElement;
75
  const tooltip = d3.select("#ts-tooltip");
@@ -77,7 +60,7 @@ const tooltip = d3.select("#ts-tooltip");
77
  function render() {
78
  svg.selectAll("*").remove();
79
  const W = container.clientWidth;
80
- const H = Math.max(250, Math.min(340, W * 0.43));
81
  const w = W - margin.left - margin.right;
82
  const h = H - margin.top - margin.bottom;
83
  svg.attr("width",W).attr("height",H);
@@ -96,7 +79,8 @@ function render() {
96
  g.append("text").attr("x",w+3).attr("y",y(50)+4).attr("fill","#fbbf24").attr("font-size",9).text("50%");
97
 
98
  g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
99
- d3.axisBottom(x).tickSize(0)).select(".domain").remove();
 
100
  g.append("g").attr("class","axis").call(
101
  d3.axisLeft(y).ticks(5).tickFormat(d=>d+"%").tickSize(0))
102
  .call(ax=>ax.select(".domain").remove())
@@ -106,7 +90,7 @@ function render() {
106
  data.forEach(d => {
107
  g.append("rect")
108
  .attr("x",x(d.label)).attr("width",x.bandwidth())
109
- .attr("y",h+28).attr("height",4).attr("rx",2)
110
  .attr("fill",seriesColor(d.series)).attr("opacity",0.8);
111
  });
112
 
@@ -152,6 +136,17 @@ function render() {
152
  .text(d.pct+"%");
153
  });
154
 
 
 
 
 
 
 
 
 
 
 
 
155
  g.append("text").attr("x",w).attr("y",-12).attr("text-anchor","end")
156
  .attr("fill","#8b8fa8").attr("font-size",10)
157
  .text("sorted: highest → lowest score %");
@@ -161,34 +156,20 @@ render();
161
  window.addEventListener("resize", render);
162
 
163
  const EXPERIMENTS = {
164
- "1.1": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
165
- "1.2": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
166
- "1.3": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
167
- "1.4": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
168
- "1.5": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
169
- "1.7": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
170
- "2.1": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
171
- "2.2": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
172
- "2.3": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
173
- "2.4": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
174
- "2.5": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
175
  };
176
 
177
 
178
- (function buildRefTable() {
179
- const container = document.getElementById('ts-exp-ref');
180
- if (!container) return;
181
- const order = ["1.1","1.2","1.3","1.4","1.5","1.7","2.1","2.2","2.3","2.4","2.5"];
182
- let html = '<table class="exp-table"><thead><tr><th>#</th><th>Description</th></tr></thead><tbody>';
183
- order.forEach(k => {
184
- const a = EXPERIMENTS[k];
185
- const series = k.startsWith("2") ? "s2" : "s1";
186
- html += `<tr class="${series}"><td><strong>${k}</strong></td><td>${a.desc}</td></tr>`;
187
- });
188
- html += '</tbody></table>';
189
- container.innerHTML = html;
190
- })();
191
-
192
  }
193
 
194
  if (typeof d3 !== "undefined") {
 
8
  * { box-sizing: border-box; margin: 0; padding: 0; }
9
  body { background: var(--bg); font-family: system-ui, sans-serif; color: var(--text); }
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  .axis text { fill: var(--subtext); font-size: 11px; }
12
  .axis line, .axis path { stroke: var(--grid); }
13
  .grid line { stroke: var(--grid); stroke-dasharray: 3,3; }
 
24
  </style>
25
  </head>
26
  <body>
 
 
 
 
27
  <div style="position:relative">
28
+ <svg id="ts-chart" style="overflow:visible"></svg>
29
  <div class="tooltip" id="ts-tooltip"></div>
30
  </div>
31
  <script>
32
  function _initTotalScore() {
33
  const raw = [
34
+ {label:"1.1 π0",series:"1",score:440, pct:29.3,total_sr:40},
35
+ {label:"1.2 π0.5",series:"1",score:480, pct:32.0,total_sr:20},
36
+ {label:"1.3 ΔActions",series:"1",score:460, pct:30.7,total_sr:35},
37
+ {label:"1.4 RABC low",series:"1",score:330, pct:22.0,total_sr:15},
38
+ {label:"1.5 RABC high",series:"1",score:170, pct:11.3,total_sr:0 },
39
+ {label:"1.7 Δ+RABC",series:"1",score:600, pct:40.0,total_sr:40},
40
+ {label:"2.1 HQ",series:"2",score:620, pct:41.3,total_sr:40},
41
+ {label:"2.2 HQ+RABC+Δ",series:"2",score:1090,pct:72.7,total_sr:75},
42
+ {label:"2.3 HQ+mirror",series:"2",score:310, pct:20.7,total_sr:5 },
43
+ {label:"2.4 HQ chunk45",series:"2",score:460, pct:30.7,total_sr:20},
44
+ {label:"2.5 HQ+RABC+Δ★",series:"2",score:1300,pct:86.7,total_sr:90},
45
  ];
46
 
47
  // Sort highest → lowest score %
 
52
  const perfColor = d3.scaleSequential().domain([0,100])
53
  .interpolator(d3.interpolateRgbBasis(["#f87171","#fbbf24","#4dc98a"]));
54
 
55
+ const margin = {top:28, right:20, bottom:80, left:80};
56
  const svg = d3.select("#ts-chart");
57
  const container = svg.node().parentElement;
58
  const tooltip = d3.select("#ts-tooltip");
 
60
  function render() {
61
  svg.selectAll("*").remove();
62
  const W = container.clientWidth;
63
+ const H = Math.max(290, Math.min(380, W * 0.47));
64
  const w = W - margin.left - margin.right;
65
  const h = H - margin.top - margin.bottom;
66
  svg.attr("width",W).attr("height",H);
 
79
  g.append("text").attr("x",w+3).attr("y",y(50)+4).attr("fill","#fbbf24").attr("font-size",9).text("50%");
80
 
81
  g.append("g").attr("class","axis").attr("transform",`translate(0,${h})`).call(
82
+ d3.axisBottom(x).tickSize(0))
83
+ .call(gg=>{gg.select(".domain").remove();gg.selectAll("text").attr("transform","rotate(-40)").attr("text-anchor","end").attr("dx","-0.5em").attr("dy","0.3em").attr("font-size",9)});
84
  g.append("g").attr("class","axis").call(
85
  d3.axisLeft(y).ticks(5).tickFormat(d=>d+"%").tickSize(0))
86
  .call(ax=>ax.select(".domain").remove())
 
90
  data.forEach(d => {
91
  g.append("rect")
92
  .attr("x",x(d.label)).attr("width",x.bandwidth())
93
+ .attr("y",h+60).attr("height",4).attr("rx",2)
94
  .attr("fill",seriesColor(d.series)).attr("opacity",0.8);
95
  });
96
 
 
136
  .text(d.pct+"%");
137
  });
138
 
139
+ // Highlight best experiment
140
+ const best = data[0];
141
+ if (best) {
142
+ const bx = x(best.label) + x.bandwidth()/2;
143
+ const by = y(best.pct);
144
+ g.append("line").attr("x1",bx).attr("x2",bx).attr("y1",by-16).attr("y2",-8)
145
+ .attr("stroke","#4dc98a").attr("stroke-width",1).attr("stroke-dasharray","2,2").attr("opacity",0.5);
146
+ g.append("text").attr("x",bx).attr("y",-12).attr("text-anchor","middle")
147
+ .attr("fill","#4dc98a").attr("font-size",9).attr("font-weight","600").text("★ best");
148
+ }
149
+
150
  g.append("text").attr("x",w).attr("y",-12).attr("text-anchor","end")
151
  .attr("fill","#8b8fa8").attr("font-size",10)
152
  .text("sorted: highest → lowest score %");
 
156
  window.addEventListener("resize", render);
157
 
158
  const EXPERIMENTS = {
159
+ "1.1 π0": { desc:"π0 · all data · 200k steps · MEAN_STD", note:"Base pi0 policy trained from scratch on the full dataset." },
160
+ "1.2 π0.5": { desc:"π0.5 · all data · 200k steps · MEAN_STD", note:"Upgraded to pi0.5 architecture, same data and steps." },
161
+ "1.3 ΔActions": { desc:"π0.5 · all data · 200k steps · ΔActions · QUANTILES", note:"Adds Delta Actions on top of 1.2 — actions expressed as deltas." },
162
+ "1.4 RABC low": { desc:"π0.5 · all data · 200k steps · RABC κ=0.01", note:"Selective Action Reward Model with low κ (≈ mean threshold, not very selective)." },
163
+ "1.5 RABC high": { desc:"π0.5 · all data · 200k steps · RABC κ=0.0215", note:"SARM with κ = mean + ½ std — more selective filtering than 1.4." },
164
+ "1.7 Δ+RABC": { desc:"π0.5 · all data · 200k steps · ΔActions + RABC κ=0.0215 · QUANTILES", note:"Best of Series 1. Base checkpoint for 2.5." },
165
+ "2.1 HQ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3", note:"Fine-tunes 1.3 on curated high-quality data only." },
166
+ "2.2 HQ+RABC+Δ": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 + RABC κ=0.0265 + ΔActions", note:"Adds RABC on high-quality fine-tune from 1.3." },
167
+ "2.3 HQ+mirror": { desc:"π0.5 · HQ + mirrored · 100k steps · fine-tune from 1.3 + ΔActions + mirroring", note:"Augments the high-quality dataset with mirrored trajectories." },
168
+ "2.4 HQ chunk45": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.3 · chunk=45", note:"Explores chunked action prediction (chunk=50, RTC size=50, execution horizon=35)." },
169
+ "2.5 HQ+RABC+Δ★": { desc:"π0.5 · HQ data · 100k steps · fine-tune from 1.7 + RABC κ=0.0265 + ΔActions (best)", note:"Top performer. Best overall result." },
170
  };
171
 
172
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  }
174
 
175
  if (typeof d3 !== "undefined") {