Commit Β·
f8026c4
1
Parent(s): c15e8c4
finding boxes: trim to single sentences
Browse files
app.py
CHANGED
|
@@ -78,11 +78,8 @@ routing being the mechanism behind the gain.
|
|
| 78 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 79 |
fonttitle=\bfseries\small, title={Finding \#1},
|
| 80 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 81 |
-
\small
|
| 82 |
-
|
| 83 |
-
externalizing carry/borrow routing as named tokens that recover Quirke's
|
| 84 |
-
subtask taxonomy without supervision and support targeted single-position
|
| 85 |
-
interventions (27--31\% fix rate). Full analysis in Appendix~\ref{app:arithmetic}.
|
| 86 |
\end{tcolorbox}
|
| 87 |
"""
|
| 88 |
|
|
@@ -418,10 +415,8 @@ Three patterns are notable:
|
|
| 418 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 419 |
fonttitle=\bfseries\small, title={Finding \#2},
|
| 420 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 421 |
-
\small
|
| 422 |
-
|
| 423 |
-
Shuffle hurts more than random on cascades β wrong-position tokens cause
|
| 424 |
-
systematic carry errors; random tokens cause broader incoherence.
|
| 425 |
\end{tcolorbox}
|
| 426 |
|
| 427 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -472,10 +467,7 @@ token \texttt{t23} is the subtraction mirror (UD, 88\%, position $d_3$).
|
|
| 472 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 473 |
fonttitle=\bfseries\small, title={Finding \#3},
|
| 474 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 475 |
-
\small
|
| 476 |
-
23 of 30 codebook tokens are active; each concentrates on 1--2 Quirke
|
| 477 |
-
subtasks (${\geq}70\%$ purity for most) and is locked to one or two
|
| 478 |
-
answer positions.
|
| 479 |
\end{tcolorbox}
|
| 480 |
|
| 481 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -506,9 +498,7 @@ single-position swap cannot resolve.
|
|
| 506 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 507 |
fonttitle=\bfseries\small, title={Finding \#4},
|
| 508 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 509 |
-
\small
|
| 510 |
-
Replacing a single abstraction token fixes 27--31\% of mispredicted
|
| 511 |
-
carry-heavy examples β no weight updates, no activation access required.
|
| 512 |
\end{tcolorbox}
|
| 513 |
|
| 514 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -552,11 +542,8 @@ post-hoc analysis.
|
|
| 552 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 553 |
fonttitle=\bfseries\small, title={Finding \#5},
|
| 554 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 555 |
-
\small
|
| 556 |
-
\
|
| 557 |
-
without supervision: the three regimes map onto disjoint token clusters.
|
| 558 |
-
What \citet{quirke_2024_addsub_preprint} needed PCA to reveal, \sorl{}
|
| 559 |
-
externalizes as a readable routing token.
|
| 560 |
\end{tcolorbox}
|
| 561 |
|
| 562 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -607,11 +594,8 @@ variable. The specialist tokens concentrate at mid-sequence positions
|
|
| 607 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 608 |
fonttitle=\bfseries\small, title={Finding \#6},
|
| 609 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 610 |
-
\small
|
| 611 |
-
|
| 612 |
-
polysemantic fallbacks (\texttt{t1}: 24\% purity, five positions);
|
| 613 |
-
polysemanticity concentrates at overflow positions where carry state is
|
| 614 |
-
most variable.
|
| 615 |
\end{tcolorbox}
|
| 616 |
|
| 617 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -652,10 +636,8 @@ The full procedure is in \texttt{experiments/11\_auto\_interp/run.py}.
|
|
| 652 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 653 |
fonttitle=\bfseries\small, title={Finding \#7},
|
| 654 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 655 |
-
\small
|
| 656 |
-
|
| 657 |
-
8 high-confidence tokens (${\geq}0.88$ softmax) receive crisp role
|
| 658 |
-
descriptions; polysemantic tokens (${\leq}0.50$) get appropriately vague ones.
|
| 659 |
\end{tcolorbox}
|
| 660 |
|
| 661 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 78 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 79 |
fonttitle=\bfseries\small, title={Finding \#1},
|
| 80 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 81 |
+
\small \sorl{} wins on 12/13 configurations overall and all 13 on C6 ($+50$\,pp),
|
| 82 |
+
externalizing carry routing as named tokens that recover Quirke's taxonomy without supervision.
|
|
|
|
|
|
|
|
|
|
| 83 |
\end{tcolorbox}
|
| 84 |
"""
|
| 85 |
|
|
|
|
| 415 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 416 |
fonttitle=\bfseries\small, title={Finding \#2},
|
| 417 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 418 |
+
\small Tokens are causally necessary (knockout $\to$ 0.1\%);
|
| 419 |
+
shuffle hurts more than random because wrong-position tokens cause systematic carry errors.
|
|
|
|
|
|
|
| 420 |
\end{tcolorbox}
|
| 421 |
|
| 422 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 467 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 468 |
fonttitle=\bfseries\small, title={Finding \#3},
|
| 469 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 470 |
+
\small 23/30 tokens active; each locks to 1--2 Quirke subtasks (${\geq}70\%$ purity) and 1--2 answer positions.
|
|
|
|
|
|
|
|
|
|
| 471 |
\end{tcolorbox}
|
| 472 |
|
| 473 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 498 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 499 |
fonttitle=\bfseries\small, title={Finding \#4},
|
| 500 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 501 |
+
\small Swapping one token fixes 27--31\% of mispredicted carry-heavy examples β no weight updates needed.
|
|
|
|
|
|
|
| 502 |
\end{tcolorbox}
|
| 503 |
|
| 504 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 542 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 543 |
fonttitle=\bfseries\small, title={Finding \#5},
|
| 544 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 545 |
+
\small \sorl{} rediscovers the carry-state tri-classifier ($\{0,U,1\}$) unsupervised;
|
| 546 |
+
what \citet{quirke_2024_addsub_preprint} needed PCA to reveal, \sorl{} externalizes as a routing token.
|
|
|
|
|
|
|
|
|
|
| 547 |
\end{tcolorbox}
|
| 548 |
|
| 549 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 594 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 595 |
fonttitle=\bfseries\small, title={Finding \#6},
|
| 596 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 597 |
+
\small Specialist tokens (\texttt{t21}: 94\% purity) coexist with polysemantic fallbacks (\texttt{t1}: 24\%);
|
| 598 |
+
polysemanticity concentrates at the most variable overflow positions.
|
|
|
|
|
|
|
|
|
|
| 599 |
\end{tcolorbox}
|
| 600 |
|
| 601 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 636 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 637 |
fonttitle=\bfseries\small, title={Finding \#7},
|
| 638 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 639 |
+
\small Automated interpretation matches Quirke labels without accessing them:
|
| 640 |
+
high-confidence specialists get crisp descriptions; polysemantic tokens get vague ones.
|
|
|
|
|
|
|
| 641 |
\end{tcolorbox}
|
| 642 |
|
| 643 |
% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|