amirali1985 commited on
Commit
eaf9f36
Β·
1 Parent(s): ba15dfe

replace \paragraph with \textbf

Browse files
Files changed (1) hide show
  1. app.py +5 -5
app.py CHANGED
@@ -47,7 +47,7 @@ trivial positions (SC/SA) β€” the carry structure is readable off the token
47
  sequence with no probing or patching required.
48
  Full training and architecture details are in Appendix~\ref{app:training}.
49
 
50
- \paragraph{Abstraction tokens recover known circuits without supervision.}
51
  Analysis of \texttt{2L/1H/128d} shows that \sorl{}'s codebook
52
  spontaneously partitions into subtask-specialist tokens:
53
  each of the 23 active tokens concentrates on a narrow slice of the
@@ -60,7 +60,7 @@ Tokens are also causally necessary: knocking out all tokens collapses
60
  accuracy from 95.5\% to 0.1\%, confirming they carry the computation
61
  rather than merely annotating it.
62
 
63
- \paragraph{Named tokens enable targeted intervention and better performance.}
64
  Because the routing codes are discrete and named, surgical model
65
  edits are possible that have no analog in standard transformers:
66
  swapping a single token at one answer position fixes wrong predictions
@@ -324,7 +324,7 @@ LATEX_APPENDIX = r"""% ═══════════════════
324
  \label{tab:quirke-subtasks}
325
  \end{table}
326
 
327
- \paragraph{Setup.}
328
  All interpretability analyses use model
329
  \texttt{add\_sub\_sorl\_v1\_abs30\_K1\_100K\_2L1H128d}
330
  (\texttt{2L/1H/128d}, 2 layers, 1 head, hidden size 128; trained on 100K examples),
@@ -384,7 +384,7 @@ Table~\ref{tab:ablation-splits} shows per-split accuracy under each condition.
384
  \label{tab:ablation-splits}
385
  \end{table}
386
 
387
- \paragraph{Commentary.}
388
  Knockout reduces accuracy to $\leq$2\% on every split, confirming that
389
  the model has offloaded computation into the routing tokens.
390
  Three patterns are notable:
@@ -483,7 +483,7 @@ every other token in the codebook (29 candidates $\times$ 5 positions = 145
483
  interventions per example) and measure how many wrong predictions become
484
  correct β€” and how many previously-correct predictions break.
485
 
486
- \paragraph{Results.}
487
  At positions $d_0$--$d_2$ (the carry-heavy positions), a fixing swap exists
488
  for 27--31\% of mispredicted examples.
489
  The best single swap is replacing \texttt{t16} with \texttt{t25} at $d_1$:
 
47
  sequence with no probing or patching required.
48
  Full training and architecture details are in Appendix~\ref{app:training}.
49
 
50
+ \textbf{Abstraction tokens recover known circuits without supervision.}
51
  Analysis of \texttt{2L/1H/128d} shows that \sorl{}'s codebook
52
  spontaneously partitions into subtask-specialist tokens:
53
  each of the 23 active tokens concentrates on a narrow slice of the
 
60
  accuracy from 95.5\% to 0.1\%, confirming they carry the computation
61
  rather than merely annotating it.
62
 
63
+ \textbf{Named tokens enable targeted intervention and better performance.}
64
  Because the routing codes are discrete and named, surgical model
65
  edits are possible that have no analog in standard transformers:
66
  swapping a single token at one answer position fixes wrong predictions
 
324
  \label{tab:quirke-subtasks}
325
  \end{table}
326
 
327
+ \textbf{Setup.}
328
  All interpretability analyses use model
329
  \texttt{add\_sub\_sorl\_v1\_abs30\_K1\_100K\_2L1H128d}
330
  (\texttt{2L/1H/128d}, 2 layers, 1 head, hidden size 128; trained on 100K examples),
 
384
  \label{tab:ablation-splits}
385
  \end{table}
386
 
387
+ \textbf{Commentary.}
388
  Knockout reduces accuracy to $\leq$2\% on every split, confirming that
389
  the model has offloaded computation into the routing tokens.
390
  Three patterns are notable:
 
483
  interventions per example) and measure how many wrong predictions become
484
  correct β€” and how many previously-correct predictions break.
485
 
486
+ \textbf{Results.}
487
  At positions $d_0$--$d_2$ (the carry-heavy positions), a fixing swap exists
488
  for 27--31\% of mispredicted examples.
489
  The best single swap is replacing \texttt{t16} with \texttt{t25} at $d_1$: