reuAC commited on
Commit
d5e3ea6
·
verified ·
1 Parent(s): a01bf08

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +183 -183
  2. README_CN.md +13 -2
  3. paper/paper-cn.pdf +2 -2
  4. paper/paper.pdf +2 -2
README.md CHANGED
@@ -1,183 +1,183 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- - zh
6
- tags:
7
- - transformer
8
- - interpretability
9
- - mechanistic-interpretability
10
- - language-model
11
- - signal-decomposition
12
- - sparse-representations
13
- - pytorch
14
- datasets:
15
- - openwebtext
16
- pipeline_tag: text-generation
17
- ---
18
-
19
- # reFlow
20
-
21
- **A Metal Soul In My Hand** — A feature-decoupled Transformer architecture with native interpretability.
22
-
23
- reFlow reconstructs the traditional full-rank embedding matrix into the product of a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. A dynamic vocabulary matrix $W_{vocab} = W_{recipe} \times W_{basis}$ is reconstructed in real-time at each forward pass, serving simultaneously as both the embedding matrix and the output projection matrix.
24
-
25
- > **Paper**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf)
26
-
27
- ## Project Structure
28
-
29
- ```
30
- reFlow/
31
- ├── train.py # Training script (single GPU / DDP)
32
- ├── sample.py # Text generation from trained models
33
- ├── experiment.py # 12-experiment interpretability suite (Chinese)
34
- ├── experiment_en.py # 12-experiment interpretability suite (English)
35
- ├── check.py # Checkpoint parameter inspector
36
- ├── bench.py # Performance benchmarking
37
- ├── models/
38
- │ ├── gpt2.py # Standard GPT-2 baseline
39
- │ ├── gpt2-new.py # Modernized GPT-2 (RoPE + SwiGLU + RMSNorm)
40
- │ ├── reflow.py # reFlow base architecture
41
- │ ├── reflow-topk.py # reFlow with ReLU + Top-K hard sparsity
42
- │ └── reflow-lite.py # reFlow with GQA + reduced MLP
43
- ├── config/ # Training / sampling / eval configurations
44
- ├── data/
45
- ├── openwebtext/ # OpenWebText dataset preparation
46
- │ └── sft-lima/ # LIMA SFT dataset preparation
47
- ── out/ # Checkpoints and experiment reports
48
- ```
49
-
50
- ## Installation
51
-
52
- ### Prerequisites
53
-
54
- - Python 3.10+
55
- - CUDA-compatible GPU (tested on Tesla T4 x4)
56
-
57
- ### 1. PyTorch (CUDA 12.8)
58
-
59
- ```bash
60
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
61
- ```
62
-
63
- > Adjust the CUDA version in the URL to match your driver. See [PyTorch Get Started](https://pytorch.org/get-started/locally/).
64
-
65
- ### 2. Core Dependencies
66
-
67
- ```bash
68
- pip install datasets tiktoken wandb tqdm
69
- ```
70
-
71
- ### 3. Experiment Suite Dependencies
72
-
73
- The interpretability experiments (`experiment.py`) require additional packages:
74
-
75
- ```bash
76
- pip install numpy matplotlib seaborn scikit-learn scipy adjustText
77
- ```
78
-
79
- ### Quick Install (All-in-One)
80
-
81
- ```bash
82
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
83
- pip install datasets tiktoken wandb tqdm numpy matplotlib seaborn scikit-learn scipy adjustText
84
- ```
85
-
86
- ## Data Preparation
87
-
88
- ### OpenWebText
89
-
90
- ```bash
91
- python data/openwebtext/prepare.py
92
- ```
93
-
94
- This downloads the OpenWebText corpus (~54 GB) and tokenizes it with the GPT-2 BPE tokenizer. Output: `data/openwebtext/train.bin` (~17 GB, ~9B tokens) and `val.bin`.
95
-
96
- ## Training
97
-
98
- All configurations are in `config/`. No CLI overrides — all hyperparameters must be set in the config file.
99
-
100
- ### Single GPU
101
-
102
- ```bash
103
- python train.py config/train_reflow_1.py
104
- ```
105
-
106
- ### Multi-GPU (DDP)
107
-
108
- ```bash
109
- torchrun --standalone --nproc_per_node=4 train.py config/train_reflow_1.py
110
- ```
111
-
112
- ### Available Training Configs
113
-
114
- | Config | Architecture | Layers | Params | Notes |
115
- |--------|-------------|--------|--------|-------|
116
- | `train_gpt2.py` | GPT-2 | 36 | 505.62M | Standard baseline |
117
- | `train_gpt2_new.py` | GPT-2-New | 36 | 514.01M | + RoPE, SwiGLU, RMSNorm |
118
- | `train_reflow_1.py` | reFlow | 32 | 463.67M | Base reFlow, constant lr |
119
- | `train_reflow_1_big.py` | reFlow | 36 | 515.06M | lr decay, for interpretability |
120
- | `train_reflow_1_topk_big.py` | reFlow-TopK | 36 | 515.06M | + ReLU + Top-64 sparsity |
121
- | `train_reflow_1_lite.py` | reFlow-Lite | 32 | 413.34M | + GQA, reduced MLP |
122
- | `train_reflow_1_small.py` | reFlow | 6 | 46.47M | Small-scale validation |
123
-
124
- ### Resume Training
125
-
126
- Append `_resume` to the config name (e.g., `train_reflow_1_big_resume.py`).
127
-
128
- ## Text Generation
129
-
130
- ```bash
131
- python sample.py config/sample_reflow_1.py
132
- ```
133
-
134
- Edit the config file to change the prompt, temperature, top-k, etc.
135
-
136
- ## Interpretability Experiments
137
-
138
- The experiment suite runs 12 analyses on a trained reFlow model. Both Chinese and English versions are available:
139
-
140
- ```bash
141
- python experiment_en.py config/train_reflow_1_big.py # English
142
- python experiment.py config/train_reflow_1_big.py # Chinese
143
- ```
144
-
145
- An interactive menu will appear:
146
-
147
- | # | Experiment | Group |
148
- |---|-----------|-------|
149
- | 1 | Recipe Atlas recipe-space nearest neighbors | A. Signal Identity |
150
- | 2 | Sparsity Profile — activation sparsity analysis | A. Signal Identity |
151
- | 3 | Basis Geometry — singular value & effective rank | A. Signal Identity |
152
- | 4 | Semantic Galaxy — PCA clustering visualization | B. Semantic Properties |
153
- | 5 | Semantic Algebra — vector arithmetic (king − man + woman = queen) | B. Semantic Properties |
154
- | 6 | Typo Resilience — robustness to spelling errors | B. Semantic Properties |
155
- | 7 | Layer Evolution — per-layer probability crystallization | C. Mechanistic Analysis |
156
- | 8 | Signal Flow — signal activation heatmaps across layers | C. Mechanistic Analysis |
157
- | 9 | Causal Ablation — progressive signal knockout curves | C. Mechanistic Analysis |
158
- | 10 | Emotion Surgery — sentiment steering via signal injection | D. Control & Steering |
159
- | 11 | Concept Inception — binary-search concept implantation | D. Control & Steering |
160
- | 12 | Genetic Hijackglobal recipe matrix manipulation | D. Control & Steering |
161
-
162
- Enter `all` to run all experiments, or specific numbers (e.g., `1 3 5`). Reports are saved to `out/<model>/audit_reports/`.
163
-
164
- ## Checkpoint Inspection
165
-
166
- ```bash
167
- python check.py config/train_reflow_1.py out/reflow-1/ckpt.pt
168
- ```
169
-
170
- ## License
171
-
172
- MIT License. Based on [nanoGPT](https://github.com/karpathy/nanoGPT) by Andrej Karpathy.
173
-
174
- ```bibtex
175
- @misc{reuac_2026,
176
- author = { reuAC },
177
- title = { reFlow (Revision 672259a) },
178
- year = 2026,
179
- url = { https://huggingface.co/reuAC/reFlow },
180
- doi = { 10.57967/hf/8047 },
181
- publisher = { Hugging Face }
182
- }
183
- ```
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - zh
6
+ tags:
7
+ - transformer
8
+ - interpretability
9
+ - mechanistic-interpretability
10
+ - language-model
11
+ - signal-decomposition
12
+ - sparse-representations
13
+ - pytorch
14
+ datasets:
15
+ - openwebtext
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # reFlow
20
+
21
+ **A Metal Soul In My Hand** — A feature-decoupled Transformer architecture with native interpretability.
22
+
23
+ reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.
24
+
25
+ ## Key Results
26
+
27
+ **Convergence.** At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points — Small (46.47M), reFlow-1 (463.67M), Big (515.06M) — confirm strict scaling law compliance (val loss: 3.55 → 3.01 → 2.92).
28
+
29
+ **Emergent Interpretable Structure** (pure language modeling objective, no auxiliary loss):
30
+ - Recipe-space semantic algebra: king + woman − man → queen (rank #1), 3/3 tests passed
31
+ - Natural sparsity: each token activates ~11% of signals (mean 117/1024), Gini coefficient 0.085
32
+ - Causal traceability: single-signal ablation collapses target probability from 8.31% to 0.03%
33
+ - Information crystallization boundary: semantic interventions are effective at L0–L12 but inert beyond L18
34
+ - Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 → 0/3, silhouette +0.11 → −0.02)
35
+
36
+ > **Paper**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.
37
+
38
+ ## Project Structure
39
+
40
+ ```
41
+ reFlow/
42
+ ── train.py # Training script (single GPU / DDP)
43
+ ├── sample.py # Text generation from trained models
44
+ ├── experiment.py # 12-experiment interpretability suite (Chinese)
45
+ ├── experiment_en.py # 12-experiment interpretability suite (English)
46
+ ── check.py # Checkpoint parameter inspector
47
+ ── bench.py # Performance benchmarking
48
+ ├── models/
49
+ │ ├── gpt2.py # Standard GPT-2 baseline
50
+ │ ├── gpt2-new.py # Modernized GPT-2 (RoPE + SwiGLU + RMSNorm)
51
+ │ ├── reflow.py # reFlow base architecture
52
+ │ ├── reflow-topk.py # reFlow with ReLU + Top-K hard sparsity
53
+ │ └── reflow-lite.py # reFlow with GQA + reduced MLP
54
+ ├── config/ # Training / sampling / eval configurations
55
+ ├── data/
56
+ │ ├── openwebtext/ # OpenWebText dataset preparation
57
+ │ └── sft-lima/ # LIMA SFT dataset preparation
58
+ └── out/ # Checkpoints and experiment reports
59
+ ```
60
+
61
+ ## Installation
62
+
63
+ ### Prerequisites
64
+
65
+ - Python 3.10+
66
+ - CUDA-compatible GPU (tested on Tesla T4 x4)
67
+
68
+ ### 1. PyTorch (CUDA 12.8)
69
+
70
+ ```bash
71
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
72
+ ```
73
+
74
+ > Adjust the CUDA version in the URL to match your driver. See [PyTorch Get Started](https://pytorch.org/get-started/locally/).
75
+
76
+ ### 2. Core Dependencies
77
+
78
+ ```bash
79
+ pip install datasets tiktoken wandb tqdm
80
+ ```
81
+
82
+ ### 3. Experiment Suite Dependencies
83
+
84
+ The interpretability experiments (`experiment.py`) require additional packages:
85
+
86
+ ```bash
87
+ pip install numpy matplotlib seaborn scikit-learn scipy adjustText
88
+ ```
89
+
90
+ ### Quick Install (All-in-One)
91
+
92
+ ```bash
93
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
94
+ pip install datasets tiktoken wandb tqdm numpy matplotlib seaborn scikit-learn scipy adjustText
95
+ ```
96
+
97
+ ## Data Preparation
98
+
99
+ ### OpenWebText
100
+
101
+ ```bash
102
+ python data/openwebtext/prepare.py
103
+ ```
104
+
105
+ This downloads the OpenWebText corpus (~54 GB) and tokenizes it with the GPT-2 BPE tokenizer. Output: `data/openwebtext/train.bin` (~17 GB, ~9B tokens) and `val.bin`.
106
+
107
+ ## Training
108
+
109
+ All configurations are in `config/`. No CLI overrides — all hyperparameters must be set in the config file.
110
+
111
+ ### Single GPU
112
+
113
+ ```bash
114
+ python train.py config/train_reflow_1.py
115
+ ```
116
+
117
+ ### Multi-GPU (DDP)
118
+
119
+ ```bash
120
+ torchrun --standalone --nproc_per_node=4 train.py config/train_reflow_1.py
121
+ ```
122
+
123
+ ### Available Training Configs
124
+
125
+ | Config | Architecture | Layers | Params | Notes |
126
+ |--------|-------------|--------|--------|-------|
127
+ | `train_gpt2.py` | GPT-2 | 36 | 505.62M | Standard baseline |
128
+ | `train_gpt2_new.py` | GPT-2-New | 36 | 514.01M | + RoPE, SwiGLU, RMSNorm |
129
+ | `train_reflow_1.py` | reFlow | 32 | 463.67M | Base reFlow, constant lr |
130
+ | `train_reflow_1_big.py` | reFlow | 36 | 515.06M | lr decay, for interpretability |
131
+ | `train_reflow_1_topk_big.py` | reFlow-TopK | 36 | 515.06M | + ReLU + Top-64 sparsity |
132
+ | `train_reflow_1_lite.py` | reFlow-Lite | 32 | 413.34M | + GQA, reduced MLP |
133
+ | `train_reflow_1_small.py` | reFlow | 6 | 46.47M | Small-scale validation |
134
+
135
+ ### Resume Training
136
+
137
+ Append `_resume` to the config name (e.g., `train_reflow_1_big_resume.py`).
138
+
139
+ ## Text Generation
140
+
141
+ ```bash
142
+ python sample.py config/sample_reflow_1.py
143
+ ```
144
+
145
+ Edit the config file to change the prompt, temperature, top-k, etc.
146
+
147
+ ## Interpretability Experiments
148
+
149
+ The experiment suite runs 12 analyses on a trained reFlow model. Both Chinese and English versions are available:
150
+
151
+ ```bash
152
+ python experiment_en.py config/train_reflow_1_big.py # English
153
+ python experiment.py config/train_reflow_1_big.py # Chinese
154
+ ```
155
+
156
+ An interactive menu will appear:
157
+
158
+ | # | Experiment | Group |
159
+ |---|-----------|-------|
160
+ | 1 | Recipe Atlas — recipe-space nearest neighbors | A. Signal Identity |
161
+ | 2 | Sparsity Profile — activation sparsity analysis | A. Signal Identity |
162
+ | 3 | Basis Geometry singular value & effective rank | A. Signal Identity |
163
+ | 4 | Semantic Galaxy — PCA clustering visualization | B. Semantic Properties |
164
+ | 5 | Semantic Algebra — vector arithmetic (king − man + woman = queen) | B. Semantic Properties |
165
+ | 6 | Typo Resilience — robustness to spelling errors | B. Semantic Properties |
166
+ | 7 | Layer Evolution — per-layer probability crystallization | C. Mechanistic Analysis |
167
+ | 8 | Signal Flow — signal activation heatmaps across layers | C. Mechanistic Analysis |
168
+ | 9 | Causal Ablation — progressive signal knockout curves | C. Mechanistic Analysis |
169
+ | 10 | Emotion Surgery — sentiment steering via signal injection | D. Control & Steering |
170
+ | 11 | Concept Inception — binary-search concept implantation | D. Control & Steering |
171
+ | 12 | Genetic Hijack — global recipe matrix manipulation | D. Control & Steering |
172
+
173
+ Enter `all` to run all experiments, or specific numbers (e.g., `1 3 5`). Reports are saved to `out/<model>/audit_reports/`.
174
+
175
+ ## Checkpoint Inspection
176
+
177
+ ```bash
178
+ python check.py config/train_reflow_1.py out/reflow-1/ckpt.pt
179
+ ```
180
+
181
+ ## License
182
+
183
+ MIT License. Based on [nanoGPT](https://github.com/karpathy/nanoGPT) by Andrej Karpathy.
README_CN.md CHANGED
@@ -2,9 +2,20 @@
2
 
3
  **A Metal Soul In My Hand** — 具备原生可解释性的特征解耦 Transformer 架构。
4
 
5
- reFlow 将传统全秩词嵌入矩阵重构为**配方矩阵** $W_{recipe} \in \mathbb{R}^{V \times S}$ 与**信号基底矩阵** $W_{basis} \in \mathbb{R}^{S \times d}$ 的乘积形式,迫使模型在潜空间中维护一组连续、低冗余的信号基底。动态词表矩阵 $W_{vocab} = W_{recipe} \times W_{basis}$ 在每次前向传播中实时重构,同时作为嵌入矩阵与输出投影矩阵使用
6
 
7
- > **论文**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf)
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  ## 项目结构
10
 
 
2
 
3
  **A Metal Soul In My Hand** — 具备原生可解释性的特征解耦 Transformer 架构。
4
 
5
+ reFlow 将嵌入矩阵 $E \in \mathbb{R}^{V \times d}$ 分解为**配方矩阵** $W_{recipe} \in \mathbb{R}^{V \times S}$ 与**信号基底矩阵** $W_{basis} \in \mathbb{R}^{S \times d}$ 的乘积形式,迫使模型在潜空间中维护一组连续、低冗余的信号基底。同一乘积 $W_{recipe} \times W_{basis}$ 同时用于输入嵌入与输出投影,构成端到端的信号流形计算闭环,无需独立 LM Head
6
 
7
+ ## 核心结果
8
+
9
+ **收敛性。** 在对齐深度与参数量(36 层,~515M)的条件下,reFlow-1-Big 的验证损失与 GPT-2-New(514M)差距仅约 1%。三个参数规模点 — Small(46.47M)、reFlow-1(463.67M)、Big(515.06M)— 验证损失分别为 3.55、3.01、2.92,严格遵循缩放定律。
10
+
11
+ **自发涌现的可解释结构**(纯语言建模目标,无辅助损失):
12
+ - 配方空间语义代数:king + woman − man → queen(排名 #1),3/3 测试通过
13
+ - 自然稀疏性:每个 token 平均激活约 11% 的信号(均值 117/1024),Gini 系数 0.085
14
+ - 因果可追踪性:消融单个信号即可将目标概率从 8.31% 摧毁至 0.03%
15
+ - 信息结晶边界:语义干预在 L0–L12 有效,L18 之后失效
16
+ - 硬稀疏约束(Top-64)系统性摧毁配方空间语义结构(代数 3/3 → 0/3,轮廓系数 +0.11 → −0.02)
17
+
18
+ > **论文**: [English (PDF)](./paper/paper.pdf) | [中文 (PDF)](./paper/paper-cn.pdf) — 理论推导、12 项可解释性实验及缩放/消融分析。
19
 
20
  ## 项目结构
21
 
paper/paper-cn.pdf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8342d5e5b6c3d6e4d1b6dbc3ccde4933ab8b6cf9cd346a999b22d0dafece0cc3
3
- size 315963
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24d5e8402b4f41db3e3a26317876c4236ff5b6c534b07238567a2a912b4f4a22
3
+ size 316420
paper/paper.pdf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9afa7ec6025e700d5876a6be83f8e4ac5f1bd37c55104a06129808db951d97e0
3
- size 296359
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fbbc48ebae91e0f4009d7de9f0313b560f908998680780ae47d816ca49fa3e4
3
+ size 114729