Commit ·
e64295d
1
Parent(s): 4a90eaf
main body: rebalance interp + performance
Browse files
app.py
CHANGED
|
@@ -60,32 +60,32 @@ Tokens are also causally necessary: knocking out all tokens collapses
|
|
| 60 |
accuracy from 95.5\% to 0.1\%, confirming they carry the computation
|
| 61 |
rather than merely annotating it.
|
| 62 |
|
| 63 |
-
\paragraph{Named tokens enable targeted intervention.}
|
| 64 |
Because the routing codes are discrete and named, surgical model
|
| 65 |
-
edits are possible that have no analog in standard transformers
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
\
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
Table~\ref{tab:undersized-wins}).
|
| 78 |
|
| 79 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 80 |
fonttitle=\bfseries\small, title={Finding \#1},
|
| 81 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 82 |
\small
|
| 83 |
-
\sorl{} externalizes
|
| 84 |
-
|
| 85 |
-
without any supervision on those circuits
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
|
|
|
| 89 |
Full analysis in Appendix~\ref{app:arithmetic}.
|
| 90 |
\end{tcolorbox}
|
| 91 |
"""
|
|
|
|
| 60 |
accuracy from 95.5\% to 0.1\%, confirming they carry the computation
|
| 61 |
rather than merely annotating it.
|
| 62 |
|
| 63 |
+
\paragraph{Named tokens enable targeted intervention and better performance.}
|
| 64 |
Because the routing codes are discrete and named, surgical model
|
| 65 |
+
edits are possible that have no analog in standard transformers:
|
| 66 |
+
swapping a single token at one answer position fixes wrong predictions
|
| 67 |
+
at a 27--31\% rate on carry-heavy examples (cross-operation transplant:
|
| 68 |
+
93.5\% vs.\ 75.5\% random baseline).
|
| 69 |
+
Interpretability here is not merely post-hoc — it translates directly
|
| 70 |
+
into the ability to correct the model.
|
| 71 |
+
Correspondingly, \sorl{} outperforms \sft{} on 12 of 13 tested
|
| 72 |
+
(architecture, data-size) configurations, and on \emph{all 13} on
|
| 73 |
+
the hardest 6-deep carry cascades, with gains as large as $+50$\,pp
|
| 74 |
+
(Table~\ref{tab:undersized-wins}).
|
| 75 |
+
The margin grows with cascade depth, consistent with explicit carry/borrow
|
| 76 |
+
routing being the mechanism behind the gain.
|
|
|
|
| 77 |
|
| 78 |
\begin{tcolorbox}[colback=gray!6, colframe=gray!40,
|
| 79 |
fonttitle=\bfseries\small, title={Finding \#1},
|
| 80 |
left=5pt, right=5pt, top=4pt, bottom=4pt]
|
| 81 |
\small
|
| 82 |
+
\sorl{} externalizes carry/borrow routing as discrete, named abstraction
|
| 83 |
+
tokens — recovering the subtask taxonomy of \citet{quirke_2024_addsub_preprint}
|
| 84 |
+
without any supervision on those circuits — and translates that transparency
|
| 85 |
+
into measurable gains: 12/13 configurations overall, all 13 on C6,
|
| 86 |
+
up to $+50$\,pp on the hardest cascades.
|
| 87 |
+
The tokens are causally necessary (knockout $\to$ 0.1\%) and support
|
| 88 |
+
targeted single-position interventions (27--31\% fix rate).
|
| 89 |
Full analysis in Appendix~\ref{app:arithmetic}.
|
| 90 |
\end{tcolorbox}
|
| 91 |
"""
|