Upload folder using huggingface_hub
Browse files- app.py +43 -2
- static_figures/fig_token_vignettes.png +0 -0
app.py
CHANGED
|
@@ -389,9 +389,50 @@ effectively give the model external memory that compensates for its limited hidd
|
|
| 389 |
with gr.TabItem("Interpretability"):
|
| 390 |
gr.Markdown("""## SoRL tokens externalize arithmetic circuits
|
| 391 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 392 |
[Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed that transformers learn
|
| 393 |
-
carry/borrow circuits
|
| 394 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 395 |
|
| 396 |
**SoRL makes these circuits directly observable as tokens.** We show that:
|
| 397 |
1. More abstraction vocabulary β richer representations
|
|
|
|
| 389 |
with gr.TabItem("Interpretability"):
|
| 390 |
gr.Markdown("""## SoRL tokens externalize arithmetic circuits
|
| 391 |
|
| 392 |
+
### Background: how multi-digit arithmetic works
|
| 393 |
+
|
| 394 |
+
Adding two 6-digit numbers like `345678 + 657893` requires tracking **carries** β when
|
| 395 |
+
a column sums to 10 or more, you carry 1 to the next column. A **carry cascade** happens
|
| 396 |
+
when carries chain through multiple consecutive columns (e.g., `999 + 1 = 1000`).
|
| 397 |
+
|
| 398 |
+
We evaluate on **C-splits** β problems grouped by how many consecutive columns produce carries,
|
| 399 |
+
with varied (non-zero) answer digits:
|
| 400 |
+
|
| 401 |
+
| Split | Meaning | Example | Why it's hard |
|
| 402 |
+
|-------|---------|---------|---------------|
|
| 403 |
+
| **C1** | 1 carry | `345678 + 100921 = 446599` | Single carry β easy |
|
| 404 |
+
| **C2** | 2 consecutive carries | `503847 + 297162 = 801009` | Carry propagates once |
|
| 405 |
+
| **C3** | 3 consecutive carries | `145232 + 957868 = 1103100` | Must track 3-step cascade |
|
| 406 |
+
| **C4** | 4 consecutive carries | `780149 + 819959 = 1600108` | Longer cascade chain |
|
| 407 |
+
| **C5** | 5 consecutive carries | `553777 + 847927 = 1401704` | Nearly full cascade |
|
| 408 |
+
| **C6** | 6 carries (max) | `503847 + 996167 = 1500014` | Every column cascades |
|
| 409 |
+
|
| 410 |
[Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed that transformers learn
|
| 411 |
+
these carry/borrow circuits internally, but they're hidden in activations β discoverable only
|
| 412 |
+
through PCA, probing, or ablation at the activation level.
|
| 413 |
+
|
| 414 |
+
### Quirke's subtask definitions
|
| 415 |
+
|
| 416 |
+
At each digit position, the model must compute one of these operations
|
| 417 |
+
([Quirke et al. Β§3.2-3.3](https://arxiv.org/abs/2402.02619)):
|
| 418 |
+
|
| 419 |
+
**Addition:**
|
| 420 |
+
| Subtask | Meaning | Quirke eq. |
|
| 421 |
+
|---------|---------|-----------|
|
| 422 |
+
| **SA** | Simple Add: `(dβ + dβ) mod 10` | β |
|
| 423 |
+
| **SC** | Sum Carry: `dβ + dβ β₯ 10`, produces a local carry | β |
|
| 424 |
+
| **SS** | Sum-of-9: `dβ + dβ = 9`, carry state is **uncertain** | eq. 2: STn = U |
|
| 425 |
+
| **UC** | Use Carry: this digit's answer depends on carry from right | eq. 2: STn = 1 |
|
| 426 |
+
| **US** | Use Sum-9 cascade: carry propagates through a chain of sum-9 digits | eq. 4-6 |
|
| 427 |
+
|
| 428 |
+
**Subtraction** (same structure with borrows replacing carries):
|
| 429 |
+
| Subtask | Meaning | Quirke eq. |
|
| 430 |
+
|---------|---------|-----------|
|
| 431 |
+
| **MD** | Base Diff: `(dβ - dβ) mod 10` | β |
|
| 432 |
+
| **MB** | Make Borrow: `dβ < dβ`, produces a local borrow | eq. 7: MBn = 1 |
|
| 433 |
+
| **ME** | Equal digits: `dβ = dβ`, borrow state is **uncertain** | eq. 7: MBn = U |
|
| 434 |
+
| **UB** | Use Borrow: answer depends on borrow from right | β |
|
| 435 |
+
| **UD** | Cascade borrow: borrow propagates through equal-digit chain | β |
|
| 436 |
|
| 437 |
**SoRL makes these circuits directly observable as tokens.** We show that:
|
| 438 |
1. More abstraction vocabulary β richer representations
|
static_figures/fig_token_vignettes.png
ADDED
|