amirali1985 commited on
Commit
0dc2d60
Β·
verified Β·
1 Parent(s): 6c7e2bf

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -36,3 +36,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
36
  static_figures/fig1_token_difficulty_profiles.png filter=lfs diff=lfs merge=lfs -text
37
  static_figures/fig1_token_specialization.png filter=lfs diff=lfs merge=lfs -text
38
  static_figures/fig2_causal_by_depth.png filter=lfs diff=lfs merge=lfs -text
 
 
 
36
  static_figures/fig1_token_difficulty_profiles.png filter=lfs diff=lfs merge=lfs -text
37
  static_figures/fig1_token_specialization.png filter=lfs diff=lfs merge=lfs -text
38
  static_figures/fig2_causal_by_depth.png filter=lfs diff=lfs merge=lfs -text
39
+ static_figures/fig_k1_causal.png filter=lfs diff=lfs merge=lfs -text
40
+ static_figures/fig_k1_token_difficulty.png filter=lfs diff=lfs merge=lfs -text
app.py CHANGED
@@ -358,59 +358,40 @@ effectively give the model external memory that compensates for its limited hidd
358
  gr.Markdown("""## Do abstraction tokens encode carry/borrow circuits?
359
 
360
  When humans add multi-digit numbers, they track **carries** β€” if 7+8=15, they write 5 and carry 1.
 
 
 
361
  [Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed transformers learn carry/borrow
362
- circuits internally, discoverable only through activation-level analysis. They define a **tri-state
363
- carry classifier** (eq. 2): each position is **0** (no carry), **1** (definite carry), or **U**
364
- (uncertain β€” digit sum = 9, carry depends on cascade from right).
365
 
366
- **SoRL makes these circuits visible as explicit tokens.** We analyze our K=4 abs30 model (which
367
- has fewer abstraction positions, forcing sharper specialization).
 
368
  """)
369
 
370
  gr.Markdown("""### 1. Token specialization by difficulty
371
 
372
  For each token, we ask: **what kinds of problems does this token appear in?** The heatmap shows
373
- P(difficulty level | token) β€” if a token is uniformly distributed across S0-S6, it carries no
374
- difficulty-specific information. If it concentrates on specific levels, it's a specialist.
375
-
376
- **Addition (left):** Token t3 (simple addition, 0% carry) concentrates on S0 (no cascades). Tokens
377
- t8, t9 (100% carry) spread across S1-S5 β€” they're the carry workhorses. Token t2 peaks at S6
378
- (the hardest cascade) despite having only 5% local carry β€” it encodes cascade *propagation*, not
379
- local carry state.
380
-
381
- **Subtraction (right):** Token t16 appears 93% in M0 (no borrows) β€” a pure "easy case" marker.
382
- Tokens t5 and t11 shift toward M4/M5 (deep borrow cascades).
383
  """)
384
- gr.Image("static_figures/fig1_token_difficulty_profiles.png")
385
 
386
- gr.Markdown("""### 2. Causal verification: token identity matters for hard cascades
387
 
388
  Three interventions test whether tokens carry real information:
389
- - **Shuffle**: randomly permute token IDs within the sequence (keeps positions, scrambles identity)
390
- - **Random**: replace all tokens with random IDs from the vocabulary
391
- - **Knockout**: remove all abstraction tokens entirely (replace with placeholder)
392
 
393
- For easy problems (S0-S2), shuffling barely matters. For **deep cascades (S4-S6)**,
394
- shuffling drops accuracy by 10-30 percentage points. The model needs the *correct*
395
- token identity to propagate carries through multiple digits.
396
  """)
397
- gr.Image("static_figures/fig2_causal_by_depth.png")
398
-
399
- gr.Markdown("""### 3. Key token profiles
400
-
401
- | Token | Appears when... | Interpretation |
402
- |-------|----------------|----------------|
403
- | **t3** | Simple add (SA), no carry, answer digit = 0 | **"No overflow"** β€” leftmost digit is 0 |
404
- | **t6** | Use carry (UC), input digit sum mod 10 = 9 in 92% of cases | **"Sum-of-9"** β€” Quirke's tri-state **U** (uncertain carry) |
405
- | **t8, t9** | Use carry (UC), carry = 100% of cases | **"Definite carry"** β€” Quirke's tri-state **1** |
406
- | **t17** | Make borrow (MB) 51%, subtraction only | **"Borrow indicator"** β€” Quirke's MBn (eq. 7) |
407
- | **t16** | Carry = 82%, mixed addition/subtraction | **"Active carry/borrow state"** |
408
-
409
- The model independently discovered Quirke's tri-state carry classifier: separate tokens for
410
- "no carry" (t3), "uncertain / sum=9" (t6), and "definite carry" (t8/t9).
411
- This emerged purely from SoRL's info-gain objective β€” no supervision about carry logic.
412
 
413
- *Model: abs30 K=4, 2L/3H/510d, 100K training examples. Analysis: 2400 eval examples.*
 
414
  """)
415
 
416
  # ── Tab 3: About ──
 
358
  gr.Markdown("""## Do abstraction tokens encode carry/borrow circuits?
359
 
360
  When humans add multi-digit numbers, they track **carries** β€” if 7+8=15, they write 5 and carry 1.
361
+ For a chain like 999999+1, the carry cascades through all 6 digits. Subtraction has the same
362
+ structure with **borrows** instead of carries.
363
+
364
  [Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed transformers learn carry/borrow
365
+ circuits internally, but these are only discoverable through activation-level analysis (PCA, probing).
 
 
366
 
367
+ **SoRL makes these circuits visible as explicit tokens.** With K=1 (an abstraction at every
368
+ position), each answer digit gets its own scratchpad token. We analyze whether these tokens
369
+ specialize by problem difficulty.
370
  """)
371
 
372
  gr.Markdown("""### 1. Token specialization by difficulty
373
 
374
  For each token, we ask: **what kinds of problems does this token appear in?** The heatmap shows
375
+ P(difficulty | token). Tokens at the top specialize in easy problems, tokens at the bottom
376
+ specialize in hard cascades.
 
 
 
 
 
 
 
 
377
  """)
378
+ gr.Image("static_figures/fig_k1_token_difficulty.png")
379
 
380
+ gr.Markdown("""### 2. Causal verification: token identity is critical
381
 
382
  Three interventions test whether tokens carry real information:
383
+ - **Shuffle**: randomly permute token IDs (keeps positions, scrambles identity)
384
+ - **Random**: replace all tokens with random IDs
385
+ - **Knockout**: remove all abstraction tokens (0% accuracy β€” total dependence)
386
 
387
+ **Shuffle drops accuracy by 56-66 percentage points on S5/S6** (5-6 carry cascades).
388
+ Even easy problems (S0) drop ~30pp β€” with K=1, every position has an abstraction,
389
+ so shuffling disrupts every digit's computation.
390
  """)
391
+ gr.Image("static_figures/fig_k1_causal.png")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
392
 
393
+ gr.Markdown("""
394
+ *Model: K=1 abs30, 2L/3H/510d, 100K training examples. Analysis: 4400 eval examples (200/split).*
395
  """)
396
 
397
  # ── Tab 3: About ──
static_figures/fig_k1_causal.png ADDED

Git LFS Details

  • SHA256: b23e9c96d2c7c42146b46c3421b6b1ff3c4c89ad1fb0cfb30f35ef9dabeda86f
  • Pointer size: 131 Bytes
  • Size of remote file: 108 kB
static_figures/fig_k1_token_difficulty.png ADDED

Git LFS Details

  • SHA256: 88f04b6615a5cb415e76040ce2b3b4e4f2143324302643f515e10590fcd5128c
  • Pointer size: 131 Bytes
  • Size of remote file: 256 kB