Update README.md
Browse files
README.md
CHANGED
|
@@ -1,101 +1,103 @@
|
|
| 1 |
-
---
|
| 2 |
-
language: en
|
| 3 |
-
license: mit
|
| 4 |
-
tags:
|
| 5 |
-
- propagation-logic
|
| 6 |
-
- mechanism-first
|
| 7 |
-
- abstract-reasoning
|
| 8 |
-
- derivation-traces
|
| 9 |
-
- boundary-conditions
|
| 10 |
-
datasets:
|
| 11 |
-
- ApplePiesFromScratch/dta-benchmark
|
| 12 |
-
metrics:
|
| 13 |
-
- dta
|
| 14 |
-
---
|
| 15 |
-
|
| 16 |
-
# MechanismBase β P / G β Q
|
| 17 |
-
|
| 18 |
-
A 10M parameter transformer trained on derivation traces, not natural language.
|
| 19 |
-
|
| 20 |
-
## What this is
|
| 21 |
-
|
| 22 |
-
Standard language models learn statistical patterns over text.
|
| 23 |
-
This model was trained on the **procedure** P / G β Q β explicit derivation
|
| 24 |
-
traces showing closure analysis, fixed point detection, cycle structure
|
| 25 |
-
identification, and forced boundary condition derivation.
|
| 26 |
-
|
| 27 |
-
**The claim:** given any carrier V and gradient family Ξ, the model can derive
|
| 28 |
-
forced boundary conditions β what logic system the carrier implies, what
|
| 29 |
-
fixed points exist, what cycle structure is forced.
|
| 30 |
-
|
| 31 |
-
## Theory
|
| 32 |
-
|
| 33 |
-
Propagation Logic v13 β SSRN Abstract ID: 6439258 (James Pugmire)
|
| 34 |
-
|
| 35 |
-
The single primitive operator: `P / G β Q`
|
| 36 |
-
|
| 37 |
-
A loaded pattern P propagates through gradient field G in context C to
|
| 38 |
-
produce updated pattern Q. All of classical logic, fuzzy logic, arithmetic,
|
| 39 |
-
calculus, and grammar fall out of different (V, Ξ) choices.
|
| 40 |
-
|
| 41 |
-
## Model
|
| 42 |
-
|
| 43 |
-
- Architecture: Transformer decoder (custom, mechanism-aligned)
|
| 44 |
-
- Parameters: 10.5M
|
| 45 |
-
- Training tokens: ~
|
| 46 |
-
- Training epochs: 5
|
| 47 |
-
|
| 48 |
-
## Benchmark: DTA (Derivation Trace Accuracy)
|
| 49 |
-
|
| 50 |
-
The correct benchmark for this model is not BLiMP or MMLU.
|
| 51 |
-
It is DTA β how accurately does the model predict forced boundary conditions
|
| 52 |
-
on novel carriers?
|
| 53 |
-
|
| 54 |
-
See: `ApplePiesFromScratch/dta-benchmark`
|
| 55 |
-
|
| 56 |
-
|
|
| 57 |
-
|
|
| 58 |
-
|
|
| 59 |
-
|
|
| 60 |
-
|
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
import
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
python
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- propagation-logic
|
| 6 |
+
- mechanism-first
|
| 7 |
+
- abstract-reasoning
|
| 8 |
+
- derivation-traces
|
| 9 |
+
- boundary-conditions
|
| 10 |
+
datasets:
|
| 11 |
+
- ApplePiesFromScratch/dta-benchmark
|
| 12 |
+
metrics:
|
| 13 |
+
- dta
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# MechanismBase β P / G β Q
|
| 17 |
+
|
| 18 |
+
A 10M parameter transformer trained on derivation traces, not natural language.
|
| 19 |
+
|
| 20 |
+
## What this is
|
| 21 |
+
|
| 22 |
+
Standard language models learn statistical patterns over text.
|
| 23 |
+
This model was trained on the **procedure** P / G β Q β explicit derivation
|
| 24 |
+
traces showing closure analysis, fixed point detection, cycle structure
|
| 25 |
+
identification, and forced boundary condition derivation.
|
| 26 |
+
|
| 27 |
+
**The claim:** given any carrier V and gradient family Ξ, the model can derive
|
| 28 |
+
forced boundary conditions β what logic system the carrier implies, what
|
| 29 |
+
fixed points exist, what cycle structure is forced.
|
| 30 |
+
|
| 31 |
+
## Theory
|
| 32 |
+
|
| 33 |
+
Propagation Logic v13 β SSRN Abstract ID: 6439258 (James Pugmire)
|
| 34 |
+
|
| 35 |
+
The single primitive operator: `P / G β Q`
|
| 36 |
+
|
| 37 |
+
A loaded pattern P propagates through gradient field G in context C to
|
| 38 |
+
produce updated pattern Q. All of classical logic, fuzzy logic, arithmetic,
|
| 39 |
+
calculus, and grammar fall out of different (V, Ξ) choices.
|
| 40 |
+
|
| 41 |
+
## Model
|
| 42 |
+
|
| 43 |
+
- Architecture: Transformer decoder (custom, mechanism-aligned)
|
| 44 |
+
- Parameters: 10.5M
|
| 45 |
+
- Training tokens: ~1M (derivation traces)
|
| 46 |
+
- Training epochs: 5
|
| 47 |
+
|
| 48 |
+
## Benchmark: DTA (Derivation Trace Accuracy)
|
| 49 |
+
|
| 50 |
+
The correct benchmark for this model is not BLiMP or MMLU.
|
| 51 |
+
It is DTA β how accurately does the model predict forced boundary conditions
|
| 52 |
+
on novel carriers?
|
| 53 |
+
|
| 54 |
+
See: `ApplePiesFromScratch/dta-benchmark`
|
| 55 |
+
| Model | DTA-Overall | DTA-Closure | DTA-FixedPts | DTA-Involution | DTA-Cycle |
|
| 56 |
+
|-------|-------------|-------------|--------------|----------------|-----------|
|
| 57 |
+
| MechanismBase (10M) | 77.5% | 80.0% | 90.0% | 100.0% | 40.0% |
|
| 58 |
+
| GPT-3.5-turbo (175B)| 55.0% | 70.0% | 10.0% | 50.0% | 90.0% |
|
| 59 |
+
| GPT-4 (1.8T) | 87.5% |100.0% | 70.0% | 90.0% | 90.0% |
|
| 60 |
+
| Random baseline | 25.0% | 50.0% | 25.0% | 50.0% | 25.0% |
|
| 61 |
+
| Engine (oracle) |100.0% |100.0% |100.0% | 100.0% |100.0% |
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
## Usage
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
# The model requires the pl/ library and engine.py from the repo
|
| 68 |
+
# Clone: github.com/ApplePiesFromScratch/propagation-logic
|
| 69 |
+
|
| 70 |
+
from model import MechanismBase, SmallConfig
|
| 71 |
+
from tokenizers import Tokenizer
|
| 72 |
+
import torch
|
| 73 |
+
|
| 74 |
+
config = SmallConfig()
|
| 75 |
+
model = MechanismBase(config)
|
| 76 |
+
# Load weights from Hub (see full usage in repo)
|
| 77 |
+
|
| 78 |
+
tokenizer = Tokenizer.from_file("mechanism_tokenizer/tokenizer.json")
|
| 79 |
+
|
| 80 |
+
# Give the model a partial derivation trace
|
| 81 |
+
partial = """DOMAIN: color_domain
|
| 82 |
+
CARRIER: ['red', 'green', 'blue']
|
| 83 |
+
GRADIENTS: ['complement', 'id']
|
| 84 |
+
THETA: 1.0
|
| 85 |
+
---
|
| 86 |
+
"""
|
| 87 |
+
|
| 88 |
+
ids = torch.tensor(tokenizer.encode(partial).ids).unsqueeze(0)
|
| 89 |
+
output = model.generate(ids, max_new_tokens=200, temperature=0.3)
|
| 90 |
+
print(tokenizer.decode(output[0].tolist()))
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
## Training
|
| 94 |
+
|
| 95 |
+
```
|
| 96 |
+
python generate_data.py # generates derivation trace corpus
|
| 97 |
+
python tokenizer_train.py # BPE tokenizer on corpus
|
| 98 |
+
python train.py # SmallConfig, ~30 min on RTX 4060 Ti
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
## Repository
|
| 102 |
+
|
| 103 |
+
GitHub: [ApplePiesFromScratch/propagation-logic](https://github.com/ApplePiesFromScratch/propagation-logic)
|