amirali1985 commited on
Commit
f8026c4
Β·
1 Parent(s): c15e8c4

finding boxes: trim to single sentences

Browse files
Files changed (1) hide show
  1. app.py +12 -30
app.py CHANGED
@@ -78,11 +78,8 @@ routing being the mechanism behind the gain.
78
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
79
  fonttitle=\bfseries\small, title={Finding \#1},
80
  left=5pt, right=5pt, top=4pt, bottom=4pt]
81
- \small
82
- \sorl{} outperforms \sft{} on 12/13 configurations and all 13 on C6 ($+50$\,pp),
83
- externalizing carry/borrow routing as named tokens that recover Quirke's
84
- subtask taxonomy without supervision and support targeted single-position
85
- interventions (27--31\% fix rate). Full analysis in Appendix~\ref{app:arithmetic}.
86
  \end{tcolorbox}
87
  """
88
 
@@ -418,10 +415,8 @@ Three patterns are notable:
418
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
419
  fonttitle=\bfseries\small, title={Finding \#2},
420
  left=5pt, right=5pt, top=4pt, bottom=4pt]
421
- \small
422
- Tokens are causally necessary: knockout $\to$ 0.1\% accuracy.
423
- Shuffle hurts more than random on cascades β€” wrong-position tokens cause
424
- systematic carry errors; random tokens cause broader incoherence.
425
  \end{tcolorbox}
426
 
427
  % ─────────────────────────────────────────────────────────────────────────────
@@ -472,10 +467,7 @@ token \texttt{t23} is the subtraction mirror (UD, 88\%, position $d_3$).
472
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
473
  fonttitle=\bfseries\small, title={Finding \#3},
474
  left=5pt, right=5pt, top=4pt, bottom=4pt]
475
- \small
476
- 23 of 30 codebook tokens are active; each concentrates on 1--2 Quirke
477
- subtasks (${\geq}70\%$ purity for most) and is locked to one or two
478
- answer positions.
479
  \end{tcolorbox}
480
 
481
  % ─────────────────────────────────────────────────────────────────────────────
@@ -506,9 +498,7 @@ single-position swap cannot resolve.
506
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
507
  fonttitle=\bfseries\small, title={Finding \#4},
508
  left=5pt, right=5pt, top=4pt, bottom=4pt]
509
- \small
510
- Replacing a single abstraction token fixes 27--31\% of mispredicted
511
- carry-heavy examples β€” no weight updates, no activation access required.
512
  \end{tcolorbox}
513
 
514
  % ─────────────────────────────────────────────────────────────────────────────
@@ -552,11 +542,8 @@ post-hoc analysis.
552
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
553
  fonttitle=\bfseries\small, title={Finding \#5},
554
  left=5pt, right=5pt, top=4pt, bottom=4pt]
555
- \small
556
- \sorl{} rediscovers Quirke's carry-state tri-classifier ($\{0, U, 1\}$)
557
- without supervision: the three regimes map onto disjoint token clusters.
558
- What \citet{quirke_2024_addsub_preprint} needed PCA to reveal, \sorl{}
559
- externalizes as a readable routing token.
560
  \end{tcolorbox}
561
 
562
  % ─────────────────────────────────────────────────────────────────────────────
@@ -607,11 +594,8 @@ variable. The specialist tokens concentrate at mid-sequence positions
607
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
608
  fonttitle=\bfseries\small, title={Finding \#6},
609
  left=5pt, right=5pt, top=4pt, bottom=4pt]
610
- \small
611
- Specialist tokens (\texttt{t21}: 94\% US, single position) coexist with
612
- polysemantic fallbacks (\texttt{t1}: 24\% purity, five positions);
613
- polysemanticity concentrates at overflow positions where carry state is
614
- most variable.
615
  \end{tcolorbox}
616
 
617
  % ─────────────────────────────────────────────────────────────────────────────
@@ -652,10 +636,8 @@ The full procedure is in \texttt{experiments/11\_auto\_interp/run.py}.
652
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
653
  fonttitle=\bfseries\small, title={Finding \#7},
654
  left=5pt, right=5pt, top=4pt, bottom=4pt]
655
- \small
656
- Automated interpretation matches Quirke labels without accessing them:
657
- 8 high-confidence tokens (${\geq}0.88$ softmax) receive crisp role
658
- descriptions; polysemantic tokens (${\leq}0.50$) get appropriately vague ones.
659
  \end{tcolorbox}
660
 
661
  % ─────────────────────────────────────────────────────────────────────────────
 
78
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
79
  fonttitle=\bfseries\small, title={Finding \#1},
80
  left=5pt, right=5pt, top=4pt, bottom=4pt]
81
+ \small \sorl{} wins on 12/13 configurations overall and all 13 on C6 ($+50$\,pp),
82
+ externalizing carry routing as named tokens that recover Quirke's taxonomy without supervision.
 
 
 
83
  \end{tcolorbox}
84
  """
85
 
 
415
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
416
  fonttitle=\bfseries\small, title={Finding \#2},
417
  left=5pt, right=5pt, top=4pt, bottom=4pt]
418
+ \small Tokens are causally necessary (knockout $\to$ 0.1\%);
419
+ shuffle hurts more than random because wrong-position tokens cause systematic carry errors.
 
 
420
  \end{tcolorbox}
421
 
422
  % ─────────────────────────────────────────────────────────────────────────────
 
467
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
468
  fonttitle=\bfseries\small, title={Finding \#3},
469
  left=5pt, right=5pt, top=4pt, bottom=4pt]
470
+ \small 23/30 tokens active; each locks to 1--2 Quirke subtasks (${\geq}70\%$ purity) and 1--2 answer positions.
 
 
 
471
  \end{tcolorbox}
472
 
473
  % ─────────────────────────────────────────────────────────────────────────────
 
498
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
499
  fonttitle=\bfseries\small, title={Finding \#4},
500
  left=5pt, right=5pt, top=4pt, bottom=4pt]
501
+ \small Swapping one token fixes 27--31\% of mispredicted carry-heavy examples β€” no weight updates needed.
 
 
502
  \end{tcolorbox}
503
 
504
  % ─────────────────────────────────────────────────────────────────────────────
 
542
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
543
  fonttitle=\bfseries\small, title={Finding \#5},
544
  left=5pt, right=5pt, top=4pt, bottom=4pt]
545
+ \small \sorl{} rediscovers the carry-state tri-classifier ($\{0,U,1\}$) unsupervised;
546
+ what \citet{quirke_2024_addsub_preprint} needed PCA to reveal, \sorl{} externalizes as a routing token.
 
 
 
547
  \end{tcolorbox}
548
 
549
  % ─────────────────────────────────────────────────────────────────────────────
 
594
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
595
  fonttitle=\bfseries\small, title={Finding \#6},
596
  left=5pt, right=5pt, top=4pt, bottom=4pt]
597
+ \small Specialist tokens (\texttt{t21}: 94\% purity) coexist with polysemantic fallbacks (\texttt{t1}: 24\%);
598
+ polysemanticity concentrates at the most variable overflow positions.
 
 
 
599
  \end{tcolorbox}
600
 
601
  % ─────────────────────────────────────────────────────────────────────────────
 
636
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
637
  fonttitle=\bfseries\small, title={Finding \#7},
638
  left=5pt, right=5pt, top=4pt, bottom=4pt]
639
+ \small Automated interpretation matches Quirke labels without accessing them:
640
+ high-confidence specialists get crisp descriptions; polysemantic tokens get vague ones.
 
 
641
  \end{tcolorbox}
642
 
643
  % ─────────────────────────────────────────────────────────────────────────────