amirali1985 commited on
Commit
2a58c25
Β·
verified Β·
1 Parent(s): ac126fc

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. app.py +11 -31
app.py CHANGED
@@ -324,16 +324,16 @@ on **hard carry/borrow cascades** β€” problems requiring multi-digit propagation
324
  | C4 (4 hot carries) | 88% | **100%** | **+12pp** |
325
  | C5 (5 hot carries) | 76% | **100%** | **+24pp** |
326
  | C6 (6 hot carries) | 92% | **100%** | **+8pp** |
327
- | sub\_M4 (4 borrows) | 8% | **100%** | **+92pp** |
328
 
329
  **Undersized model (2L/1H/128d) at 100K β€” where both plateau below 100%:**
330
 
331
  | Split | Baseline | SoRL K=1 abs30 | Gap |
332
  |-------|----------|----------------|-----|
333
- | C3 (3 hot carries) | 44% | **93%** | **+49pp** |
334
  | C4 (4 hot carries) | 38% | **94%** | **+56pp** |
335
- | C5 (5 hot carries) | 33% | **85%** | **+52pp** |
336
- | C6 (6 hot carries) | 39% | **96%** | **+57pp** |
337
 
338
  Even when the model is too small to reach 100%, SoRL's abstraction tokens provide
339
  external scratch-pad memory that doubles or triples accuracy on hard cascades.
@@ -356,38 +356,18 @@ See the **Results** and **Interpretability** tabs for figures and analysis.
356
  interactive=False,
357
  )
358
 
 
 
 
 
 
 
359
  with gr.Accordion("Per-Split Detail", open=False):
360
  model_selector = gr.Dropdown(label="Model", choices=[], allow_custom_value=True)
361
  detail_btn = gr.Button("Show splits")
362
  detail_table = gr.Dataframe(headers=["Split", "Accuracy", "N"], interactive=False)
363
 
364
- # ── Tab 2: Results ──
365
- with gr.TabItem("Results"):
366
- gr.Markdown("""## SoRL K=1 abs30: never loses to baseline
367
-
368
- Our best config β€” **K=1 (abstraction at every position), vocab size 30** β€” matches or beats
369
- the SFT baseline on every data size and every architecture tested. No exceptions.
370
- """)
371
- gr.Image("static_figures/fig_data_efficiency.png")
372
-
373
- gr.Markdown("""**At 10K training examples**, SoRL K=1 abs30 reaches **96.1%** while the baseline
374
- reaches only 72.4% β€” a **+24 percentage point** improvement. At 25K, SoRL hits 100% while the
375
- baseline is at 91.6%. By 50K both reach 100%.
376
-
377
- K=4 (abstraction every 4th position) fails at 10K data β€” it doesn't have enough examples to learn
378
- useful abstractions through search. K=1 is more data-efficient because every position gets a
379
- scratchpad token.
380
- """)
381
-
382
- gr.Markdown("### SoRL helps undersized models the most")
383
- gr.Image("static_figures/fig_undersized.png")
384
-
385
- gr.Markdown("""The biggest gains are on **capacity-limited architectures**. A 2L/1H/128d model
386
- goes from 50% (baseline) to **85%** (SoRL K=1 abs30) β€” a +35pp improvement. The abstraction tokens
387
- effectively give the model external memory that compensates for its limited hidden dimensions.
388
- """)
389
-
390
- # ── Tab 3: Interpretability ──
391
  with gr.TabItem("Interpretability"):
392
  gr.Markdown("""## SoRL tokens externalize arithmetic circuits
393
 
 
324
  | C4 (4 hot carries) | 88% | **100%** | **+12pp** |
325
  | C5 (5 hot carries) | 76% | **100%** | **+24pp** |
326
  | C6 (6 hot carries) | 92% | **100%** | **+8pp** |
327
+ | sub\_M4 (4 borrows) | 10% | **100%** | **+90pp** |
328
 
329
  **Undersized model (2L/1H/128d) at 100K β€” where both plateau below 100%:**
330
 
331
  | Split | Baseline | SoRL K=1 abs30 | Gap |
332
  |-------|----------|----------------|-----|
333
+ | C3 (3 hot carries) | 28% | **98%** | **+70pp** |
334
  | C4 (4 hot carries) | 38% | **94%** | **+56pp** |
335
+ | C5 (5 hot carries) | 48% | **86%** | **+38pp** |
336
+ | C6 (6 hot carries) | 32% | **94%** | **+62pp** |
337
 
338
  Even when the model is too small to reach 100%, SoRL's abstraction tokens provide
339
  external scratch-pad memory that doubles or triples accuracy on hard cascades.
 
356
  interactive=False,
357
  )
358
 
359
+ with gr.Accordion("Data Efficiency & Undersized Models", open=False):
360
+ gr.Image("static_figures/fig_data_efficiency.png")
361
+ gr.Markdown("At 10K, SoRL K=1 abs30 reaches **96.7%** vs baseline **76.6%** (+20pp). By 50K both hit 100%.")
362
+ gr.Image("static_figures/fig_undersized.png")
363
+ gr.Markdown("Undersized 2L/1H/128d: baseline 50% β†’ SoRL **85%** (+35pp). Abstraction tokens compensate for limited capacity.")
364
+
365
  with gr.Accordion("Per-Split Detail", open=False):
366
  model_selector = gr.Dropdown(label="Model", choices=[], allow_custom_value=True)
367
  detail_btn = gr.Button("Show splits")
368
  detail_table = gr.Dataframe(headers=["Split", "Accuracy", "N"], interactive=False)
369
 
370
+ # ── Tab 2: Interpretability ──
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
371
  with gr.TabItem("Interpretability"):
372
  gr.Markdown("""## SoRL tokens externalize arithmetic circuits
373