amirali1985 commited on
Commit
b8f9488
Β·
verified Β·
1 Parent(s): 521c255

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. app.py +49 -30
app.py CHANGED
@@ -531,46 +531,65 @@ t2 = carry cascade. t7 = borrow cascade (subtraction only). t3 = no cascade need
531
  The model has learned a **vocabulary for arithmetic reasoning** that maps directly to
532
  Quirke's circuit definitions β€” without any supervision about carry logic.
533
 
534
- ### 4. Surgical token transplant: fixing errors by swapping one token
535
 
536
- The strongest causal evidence: we find problems where the model gets the answer **wrong**,
537
- then transplant a single abstraction token from a **correct** example (same subtask, same
538
- position), and the error is reduced.
539
 
540
- Using the 1L/3H/510d model at 100K (C3-C6 accuracy ~65%, mix of correct and wrong):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
541
 
542
  ```
543
- Example 1: 014560 + 125450 = 0140010
544
- Wrong: 0139010 β€” t9 at d2 (Use Carry position)
545
- Fix: transplant t3 from a correct UC example
546
- Result: 0130010 β€” d2 fixed (9β†’0), 2β†’1 errors βœ“
547
-
548
- Example 2: 109221 + 326780 = 0436001
549
- Wrong: 0435901 β€” t21 at d3 (Use Carry position)
550
- Fix: transplant t18 from a correct UC example
551
- Result: 0435001 β€” d3 fixed (9β†’0), 2β†’1 errors βœ“
552
-
553
- Example 3: 332200 + 868010 = 1200210
554
- Wrong: 1190210 β€” t9 at d1 (Use Carry position)
555
- Fix: transplant t3 from a correct UC example
556
- Result: 1100210 β€” d1 fixed (9β†’0), 2β†’1 errors βœ“
557
  ```
558
 
559
- **What went wrong in each case?** The model confused the carry state:
 
 
 
560
 
561
- - **t21** encodes "sum-of-9, carry uncertain" (Quirke's U state β€” 93% US, sum=9 in 95% of cases)
562
- - **t9** encodes "maybe carry, maybe not" (mixed β€” 56% sum=9, only 22% carry)
563
- - **t18** encodes "definite carry, not from a sum-9" (Quirke's 1 state β€” 46% carry, only 2% sum=9)
564
 
565
- The model assigned t21 or t9 ("carry is uncertain") at positions where the carry was actually
566
- **resolved** β€” it needed t18 or t3 ("carry is definite"). Transplanting the correct carry-state
567
- token from a problem where the model got it right fixes that digit.
 
568
 
569
- The fixes are partial (2β†’1 errors, not 2β†’0) because fixing one carry doesn't always
570
- fix the downstream cascade. But they prove that **token identity causally determines
571
- carry computation** β€” wrong token = wrong carry state = wrong answer digit.
 
572
 
573
- *Model: abs30 K=1, 1L/3H/510d, 100K training examples.*
 
 
574
  """)
575
 
576
  gr.Markdown("""### 3. Tokens spread across digit positions
 
531
  The model has learned a **vocabulary for arithmetic reasoning** that maps directly to
532
  Quirke's circuit definitions β€” without any supervision about carry logic.
533
 
534
+ ### 4. Surgical intervention: the right token fixes hard-case failures
535
 
536
+ *All experiments below use the 1L/3H/510d model with K=1 abs30, trained on 100K examples.
537
+ This model gets C3-C6 accuracy ~70% β€” enough correct examples for comparison, enough errors to fix.*
 
538
 
539
+ **The two carry tokens: t9 vs t21**
540
+
541
+ The model learned two tokens that both appear at carry-related positions, but with different specializations:
542
+
543
+ | | **t9** (n=573) | **t21** (n=719) |
544
+ |---|---|---|
545
+ | Position | d2-d4 (spread) | d3 only (100%) |
546
+ | Sum = 9 | 56% | **95%** |
547
+ | Carry rate | 22% | 3% |
548
+ | Difficulty | **easy** (S0=23%, S2=28%) | **hard** (S5=37%, S6=37%) |
549
+ | Role | "shallow carry, maybe sum-9" | "deep cascade, definitely sum-9" |
550
+
551
+ t9 is a **shallow/ambiguous** token β€” it appears in easy problems where carry state doesn't matter much.
552
+ t21 is a **deep cascade specialist** β€” it appears specifically in 5-6 carry cascades where
553
+ every digit's sum is exactly 9 (Quirke's uncertain U state, eq. 2).
554
+
555
+ **The failure mode: using t9 at hard-cascade positions**
556
+
557
+ When the model encounters a hard cascade (C5/C6), it sometimes assigns t9 (the shallow token)
558
+ instead of t21 (the cascade specialist). This is like using the wrong circuit β€” the model
559
+ treats a deep cascade as a shallow one and gets the carry propagation wrong.
560
+
561
+ **The fix: globally replacing t9 with t21**
562
+
563
+ We test what happens when we force the model to always use t21 (the cascade specialist)
564
+ instead of t9:
565
 
566
  ```
567
+ Normal All t9β†’t21 Effect
568
+ C3-C6 (hard carries): 70% 93% +23pp ← hard cases dramatically improve
569
+ S5 (5 cascades): 27% 92% +65pp ← nearly fixes the hardest split
570
+ S0 (no carries): 99% 74% -25pp ← easy cases get worse
 
 
 
 
 
 
 
 
 
 
571
  ```
572
 
573
+ Forcing t21 everywhere is like telling the model "always assume deep cascade" β€” it fixes
574
+ hard problems (+65pp on S5!) but hurts easy ones where no cascade exists. The model needs
575
+ to *correctly choose* between t9 and t21 based on the input, and its main failure mode on
576
+ hard cases is choosing too conservatively (t9 instead of t21).
577
 
578
+ **Individual transplants confirm this:**
 
 
579
 
580
+ ```
581
+ 014560 + 125450 = 0140010
582
+ Wrong: 0139010 β€” t9 at d2 (shallow token at carry position)
583
+ Fixed: 0130010 β€” transplant correct token β†’ d2 fixed βœ“
584
 
585
+ 332200 + 868010 = 1200210
586
+ Wrong: 1190210 β€” t9 at d1 (shallow token at carry position)
587
+ Fixed: 1100210 β€” transplant correct token β†’ d1 fixed βœ“
588
+ ```
589
 
590
+ Each transplant fixes the specific digit where the wrong carry token was assigned.
591
+ This proves **token identity causally determines carry computation** β€” the wrong
592
+ token = wrong carry state = wrong answer digit.
593
  """)
594
 
595
  gr.Markdown("""### 3. Tokens spread across digit positions