amirali1985 commited on
Commit
e64295d
·
1 Parent(s): 4a90eaf

main body: rebalance interp + performance

Browse files
Files changed (1) hide show
  1. app.py +20 -20
app.py CHANGED
@@ -60,32 +60,32 @@ Tokens are also causally necessary: knocking out all tokens collapses
60
  accuracy from 95.5\% to 0.1\%, confirming they carry the computation
61
  rather than merely annotating it.
62
 
63
- \paragraph{Named tokens enable targeted intervention.}
64
  Because the routing codes are discrete and named, surgical model
65
- edits are possible that have no analog in standard transformers.
66
- Swapping a single token at one answer position while leaving all
67
- others intact — fixes wrong predictions at a 27--31\% rate on
68
- carry-heavy examples (cross-operation transplant: 93.5\% vs.\ 75.5\%
69
- random baseline).
70
- An automated interpretation procedure (\`{a} la
71
- \citealt{bills2023language}) applied to the 10 highest-confidence
72
- examples per token produces human-readable role descriptions that
73
- match the Quirke labels (Appendix~\ref{app:autointerp}), providing
74
- an independent sanity check that the tokens mean what they appear to mean.
75
- \sorl{} also outperforms \sft{} on 12 of 13 tested configurations,
76
- with the largest gains on the hardest cascades ($+50$\,pp on C6;
77
- Table~\ref{tab:undersized-wins}).
78
 
79
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
80
  fonttitle=\bfseries\small, title={Finding \#1},
81
  left=5pt, right=5pt, top=4pt, bottom=4pt]
82
  \small
83
- \sorl{} externalizes the carry/borrow routing circuits identified by
84
- \citet{quirke_2024_addsub_preprint} as discrete, named abstraction tokens
85
- without any supervision on those circuits.
86
- The tokens are causally necessary (knockout $\to$ 0.1\%), support
87
- targeted single-position interventions (27--31\% fix rate), and
88
- receive human-readable interpretations from an automated procedure.
 
89
  Full analysis in Appendix~\ref{app:arithmetic}.
90
  \end{tcolorbox}
91
  """
 
60
  accuracy from 95.5\% to 0.1\%, confirming they carry the computation
61
  rather than merely annotating it.
62
 
63
+ \paragraph{Named tokens enable targeted intervention and better performance.}
64
  Because the routing codes are discrete and named, surgical model
65
+ edits are possible that have no analog in standard transformers:
66
+ swapping a single token at one answer position fixes wrong predictions
67
+ at a 27--31\% rate on carry-heavy examples (cross-operation transplant:
68
+ 93.5\% vs.\ 75.5\% random baseline).
69
+ Interpretability here is not merely post-hoc — it translates directly
70
+ into the ability to correct the model.
71
+ Correspondingly, \sorl{} outperforms \sft{} on 12 of 13 tested
72
+ (architecture, data-size) configurations, and on \emph{all 13} on
73
+ the hardest 6-deep carry cascades, with gains as large as $+50$\,pp
74
+ (Table~\ref{tab:undersized-wins}).
75
+ The margin grows with cascade depth, consistent with explicit carry/borrow
76
+ routing being the mechanism behind the gain.
 
77
 
78
  \begin{tcolorbox}[colback=gray!6, colframe=gray!40,
79
  fonttitle=\bfseries\small, title={Finding \#1},
80
  left=5pt, right=5pt, top=4pt, bottom=4pt]
81
  \small
82
+ \sorl{} externalizes carry/borrow routing as discrete, named abstraction
83
+ tokens recovering the subtask taxonomy of \citet{quirke_2024_addsub_preprint}
84
+ without any supervision on those circuits — and translates that transparency
85
+ into measurable gains: 12/13 configurations overall, all 13 on C6,
86
+ up to $+50$\,pp on the hardest cascades.
87
+ The tokens are causally necessary (knockout $\to$ 0.1\%) and support
88
+ targeted single-position interventions (27--31\% fix rate).
89
  Full analysis in Appendix~\ref{app:arithmetic}.
90
  \end{tcolorbox}
91
  """