amirali1985 commited on
Commit
1daccad
Β·
verified Β·
1 Parent(s): 66e5f77

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. app.py +43 -2
  2. static_figures/fig_token_vignettes.png +0 -0
app.py CHANGED
@@ -389,9 +389,50 @@ effectively give the model external memory that compensates for its limited hidd
389
  with gr.TabItem("Interpretability"):
390
  gr.Markdown("""## SoRL tokens externalize arithmetic circuits
391
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
392
  [Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed that transformers learn
393
- carry/borrow circuits for multi-digit arithmetic, but these are hidden inside activations
394
- and require PCA, probing, or ablation at the activation level to discover.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
395
 
396
  **SoRL makes these circuits directly observable as tokens.** We show that:
397
  1. More abstraction vocabulary β†’ richer representations
 
389
  with gr.TabItem("Interpretability"):
390
  gr.Markdown("""## SoRL tokens externalize arithmetic circuits
391
 
392
+ ### Background: how multi-digit arithmetic works
393
+
394
+ Adding two 6-digit numbers like `345678 + 657893` requires tracking **carries** β€” when
395
+ a column sums to 10 or more, you carry 1 to the next column. A **carry cascade** happens
396
+ when carries chain through multiple consecutive columns (e.g., `999 + 1 = 1000`).
397
+
398
+ We evaluate on **C-splits** β€” problems grouped by how many consecutive columns produce carries,
399
+ with varied (non-zero) answer digits:
400
+
401
+ | Split | Meaning | Example | Why it's hard |
402
+ |-------|---------|---------|---------------|
403
+ | **C1** | 1 carry | `345678 + 100921 = 446599` | Single carry β€” easy |
404
+ | **C2** | 2 consecutive carries | `503847 + 297162 = 801009` | Carry propagates once |
405
+ | **C3** | 3 consecutive carries | `145232 + 957868 = 1103100` | Must track 3-step cascade |
406
+ | **C4** | 4 consecutive carries | `780149 + 819959 = 1600108` | Longer cascade chain |
407
+ | **C5** | 5 consecutive carries | `553777 + 847927 = 1401704` | Nearly full cascade |
408
+ | **C6** | 6 carries (max) | `503847 + 996167 = 1500014` | Every column cascades |
409
+
410
  [Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed that transformers learn
411
+ these carry/borrow circuits internally, but they're hidden in activations β€” discoverable only
412
+ through PCA, probing, or ablation at the activation level.
413
+
414
+ ### Quirke's subtask definitions
415
+
416
+ At each digit position, the model must compute one of these operations
417
+ ([Quirke et al. Β§3.2-3.3](https://arxiv.org/abs/2402.02619)):
418
+
419
+ **Addition:**
420
+ | Subtask | Meaning | Quirke eq. |
421
+ |---------|---------|-----------|
422
+ | **SA** | Simple Add: `(d₁ + dβ‚‚) mod 10` | β€” |
423
+ | **SC** | Sum Carry: `d₁ + dβ‚‚ β‰₯ 10`, produces a local carry | β€” |
424
+ | **SS** | Sum-of-9: `d₁ + dβ‚‚ = 9`, carry state is **uncertain** | eq. 2: STn = U |
425
+ | **UC** | Use Carry: this digit's answer depends on carry from right | eq. 2: STn = 1 |
426
+ | **US** | Use Sum-9 cascade: carry propagates through a chain of sum-9 digits | eq. 4-6 |
427
+
428
+ **Subtraction** (same structure with borrows replacing carries):
429
+ | Subtask | Meaning | Quirke eq. |
430
+ |---------|---------|-----------|
431
+ | **MD** | Base Diff: `(d₁ - dβ‚‚) mod 10` | β€” |
432
+ | **MB** | Make Borrow: `d₁ < dβ‚‚`, produces a local borrow | eq. 7: MBn = 1 |
433
+ | **ME** | Equal digits: `d₁ = dβ‚‚`, borrow state is **uncertain** | eq. 7: MBn = U |
434
+ | **UB** | Use Borrow: answer depends on borrow from right | β€” |
435
+ | **UD** | Cascade borrow: borrow propagates through equal-digit chain | β€” |
436
 
437
  **SoRL makes these circuits directly observable as tokens.** We show that:
438
  1. More abstraction vocabulary β†’ richer representations
static_figures/fig_token_vignettes.png ADDED