reuAC commited on
Commit
a81a84f
Β·
verified Β·
1 Parent(s): 1d9e33d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -193
README.md CHANGED
@@ -1,193 +1,195 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- - zh
6
- tags:
7
- - transformer
8
- - interpretability
9
- - mechanistic-interpretability
10
- - language-model
11
- - signal-decomposition
12
- - sparse-representations
13
- - pytorch
14
- datasets:
15
- - openwebtext
16
- pipeline_tag: text-generation
17
- ---
18
-
19
- # reFlow
20
-
21
- [ [δΈ­ζ–‡](README_CN.md) | English ]
22
-
23
- **A Metal Soul In My Hand** β€” A feature-decoupled Transformer architecture with native interpretability.
24
-
25
- reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.
26
-
27
- ## Online Demo
28
-
29
- **Try reFlow in your browser:**
30
- - [HuggingFace Space](https://huggingface.co/spaces/reuAC/reFlow) (Global Access)
31
- - [ModelScope Studio](https://www.modelscope.cn/studios/recuAC/reFlow) (China Access)
32
-
33
- ## Key Results
34
-
35
- **Convergence.** At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points β€” Small (46.47M), reFlow-1 (463.67M), Big (515.06M) β€” confirm strict scaling law compliance (val loss: 3.55 β†’ 3.01 β†’ 2.92).
36
-
37
- **Emergent Interpretable Structure** (pure language modeling objective, no auxiliary loss):
38
- - Recipe-space semantic algebra: king + woman βˆ’ man β†’ queen (rank #1), 3/3 tests passed
39
- - Natural sparsity: each token activates ~11% of signals (mean 117/1024), Gini coefficient 0.085
40
- - Causal traceability: single-signal ablation collapses target probability from 8.31% to 0.03%
41
- - Information crystallization boundary: semantic interventions are effective at L0–L12 but inert beyond L18
42
- - Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 β†’ 0/3, silhouette +0.11 β†’ βˆ’0.02)
43
-
44
- > **Paper**: [English (PDF)](./paper/paper.pdf) | [δΈ­ζ–‡ (PDF)](./paper/paper-cn.pdf) β€” Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.
45
- >
46
- > **Pretrained Weights**: [HuggingFace](https://huggingface.co/reuAC/reFlow)
47
-
48
- ## Project Structure
49
-
50
- ```
51
- reFlow/
52
- β”œβ”€β”€ train.py # Training script (single GPU / DDP)
53
- β”œβ”€β”€ sample.py # Text generation from trained models
54
- β”œβ”€β”€ experiment.py # 12-experiment interpretability suite (Chinese)
55
- β”œβ”€β”€ experiment_en.py # 12-experiment interpretability suite (English)
56
- β”œβ”€β”€ check.py # Checkpoint parameter inspector
57
- β”œβ”€β”€ bench.py # Performance benchmarking
58
- β”œβ”€β”€ models/
59
- β”‚ β”œβ”€β”€ gpt2.py # Standard GPT-2 baseline
60
- β”‚ β”œβ”€β”€ gpt2-new.py # Modernized GPT-2 (RoPE + SwiGLU + RMSNorm)
61
- β”‚ β”œβ”€β”€ reflow.py # reFlow base architecture
62
- β”‚ β”œβ”€β”€ reflow-topk.py # reFlow with ReLU + Top-K hard sparsity
63
- β”‚ └── reflow-lite.py # reFlow with GQA + reduced MLP
64
- β”œβ”€β”€ config/ # Training / sampling / eval configurations
65
- β”œβ”€β”€ data/
66
- β”‚ β”œβ”€β”€ openwebtext/ # OpenWebText dataset preparation
67
- β”‚ └── sft-lima/ # LIMA SFT dataset preparation
68
- └── out/ # Checkpoints and experiment reports
69
- ```
70
-
71
- ## Installation
72
-
73
- ### Prerequisites
74
-
75
- - Python 3.10+
76
- - CUDA-compatible GPU (tested on Tesla T4 x4)
77
-
78
- ### 1. PyTorch (CUDA 12.8)
79
-
80
- ```bash
81
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
82
- ```
83
-
84
- > Adjust the CUDA version in the URL to match your driver. See [PyTorch Get Started](https://pytorch.org/get-started/locally/).
85
-
86
- ### 2. Core Dependencies
87
-
88
- ```bash
89
- pip install datasets tiktoken wandb tqdm
90
- ```
91
-
92
- ### 3. Experiment Suite Dependencies
93
-
94
- The interpretability experiments (`experiment.py`) require additional packages:
95
-
96
- ```bash
97
- pip install numpy matplotlib seaborn scikit-learn scipy adjustText
98
- ```
99
-
100
- ### Quick Install (All-in-One)
101
-
102
- ```bash
103
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
104
- pip install datasets tiktoken wandb tqdm numpy matplotlib seaborn scikit-learn scipy adjustText
105
- ```
106
-
107
- ## Data Preparation
108
-
109
- ### OpenWebText
110
-
111
- ```bash
112
- python data/openwebtext/prepare.py
113
- ```
114
-
115
- This downloads the OpenWebText corpus (~54 GB) and tokenizes it with the GPT-2 BPE tokenizer. Output: `data/openwebtext/train.bin` (~17 GB, ~9B tokens) and `val.bin`.
116
-
117
- ## Training
118
-
119
- All configurations are in `config/`. No CLI overrides β€” all hyperparameters must be set in the config file.
120
-
121
- ### Single GPU
122
-
123
- ```bash
124
- python train.py config/train_reflow_1.py
125
- ```
126
-
127
- ### Multi-GPU (DDP)
128
-
129
- ```bash
130
- torchrun --standalone --nproc_per_node=4 train.py config/train_reflow_1.py
131
- ```
132
-
133
- ### Available Training Configs
134
-
135
- | Config | Architecture | Layers | Params | Notes |
136
- |--------|-------------|--------|--------|-------|
137
- | `train_gpt2.py` | GPT-2 | 36 | 505.62M | Standard baseline |
138
- | `train_gpt2_new.py` | GPT-2-New | 36 | 514.01M | + RoPE, SwiGLU, RMSNorm |
139
- | `train_reflow_1.py` | reFlow | 32 | 463.67M | Base reFlow, constant lr |
140
- | `train_reflow_1_big.py` | reFlow | 36 | 515.06M | lr decay, for interpretability |
141
- | `train_reflow_1_topk_big.py` | reFlow-TopK | 36 | 515.06M | + ReLU + Top-64 sparsity |
142
- | `train_reflow_1_lite.py` | reFlow-Lite | 32 | 413.34M | + GQA, reduced MLP |
143
- | `train_reflow_1_small.py` | reFlow | 6 | 46.47M | Small-scale validation |
144
-
145
- ### Resume Training
146
-
147
- Append `_resume` to the config name (e.g., `train_reflow_1_big_resume.py`).
148
-
149
- ## Text Generation
150
-
151
- ```bash
152
- python sample.py config/sample_reflow_1.py
153
- ```
154
-
155
- Edit the config file to change the prompt, temperature, top-k, etc.
156
-
157
- ## Interpretability Experiments
158
-
159
- The experiment suite runs 12 analyses on a trained reFlow model. Both Chinese and English versions are available:
160
-
161
- ```bash
162
- python experiment_en.py config/train_reflow_1_big.py # English
163
- python experiment.py config/train_reflow_1_big.py # Chinese
164
- ```
165
-
166
- An interactive menu will appear:
167
-
168
- | # | Experiment | Group |
169
- |---|-----------|-------|
170
- | 1 | Recipe Atlas β€” recipe-space nearest neighbors | A. Signal Identity |
171
- | 2 | Sparsity Profile β€” activation sparsity analysis | A. Signal Identity |
172
- | 3 | Basis Geometry β€” singular value & effective rank | A. Signal Identity |
173
- | 4 | Semantic Galaxy β€” PCA clustering visualization | B. Semantic Properties |
174
- | 5 | Semantic Algebra β€” vector arithmetic (king βˆ’ man + woman = queen) | B. Semantic Properties |
175
- | 6 | Typo Resilience β€” robustness to spelling errors | B. Semantic Properties |
176
- | 7 | Layer Evolution β€” per-layer probability crystallization | C. Mechanistic Analysis |
177
- | 8 | Signal Flow β€” signal activation heatmaps across layers | C. Mechanistic Analysis |
178
- | 9 | Causal Ablation β€” progressive signal knockout curves | C. Mechanistic Analysis |
179
- | 10 | Emotion Surgery β€” sentiment steering via signal injection | D. Control & Steering |
180
- | 11 | Concept Inception β€” binary-search concept implantation | D. Control & Steering |
181
- | 12 | Genetic Hijack β€” global recipe matrix manipulation | D. Control & Steering |
182
-
183
- Enter `all` to run all experiments, or specific numbers (e.g., `1 3 5`). Reports are saved to `out/<model>/audit_reports/`.
184
-
185
- ## Checkpoint Inspection
186
-
187
- ```bash
188
- python check.py config/train_reflow_1.py out/reflow-1/ckpt.pt
189
- ```
190
-
191
- ## License
192
-
193
- MIT License. Based on [nanoGPT](https://github.com/karpathy/nanoGPT) by Andrej Karpathy.
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - zh
6
+ tags:
7
+ - transformer
8
+ - interpretability
9
+ - mechanistic-interpretability
10
+ - language-model
11
+ - signal-decomposition
12
+ - sparse-representations
13
+ - pytorch
14
+ datasets:
15
+ - openwebtext
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # reFlow
20
+
21
+ [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19160838.svg)](https://doi.org/10.5281/zenodo.19160838)
22
+
23
+ [ [δΈ­ζ–‡](README_CN.md) | English ]
24
+
25
+ **A Metal Soul In My Hand** β€” A feature-decoupled Transformer architecture with native interpretability.
26
+
27
+ reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.
28
+
29
+ ## Online Demo
30
+
31
+ **Try reFlow in your browser:**
32
+ - [HuggingFace Space](https://huggingface.co/spaces/reuAC/reFlow) (Global Access)
33
+ - [ModelScope Studio](https://www.modelscope.cn/studios/recuAC/reFlow) (China Access)
34
+
35
+ ## Key Results
36
+
37
+ **Convergence.** At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points β€” Small (46.47M), reFlow-1 (463.67M), Big (515.06M) β€” confirm strict scaling law compliance (val loss: 3.55 β†’ 3.01 β†’ 2.92).
38
+
39
+ **Emergent Interpretable Structure** (pure language modeling objective, no auxiliary loss):
40
+ - Recipe-space semantic algebra: king + woman βˆ’ man β†’ queen (rank #1), 3/3 tests passed
41
+ - Natural sparsity: each token activates ~11% of signals (mean 117/1024), Gini coefficient 0.085
42
+ - Causal traceability: single-signal ablation collapses target probability from 8.31% to 0.03%
43
+ - Information crystallization boundary: semantic interventions are effective at L0–L12 but inert beyond L18
44
+ - Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 β†’ 0/3, silhouette +0.11 β†’ βˆ’0.02)
45
+
46
+ > **Paper**: [English (PDF)](./paper/paper.pdf) | [δΈ­ζ–‡ (PDF)](./paper/paper-cn.pdf) β€” Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.
47
+ >
48
+ > **Pretrained Weights**: [HuggingFace](https://huggingface.co/reuAC/reFlow)
49
+
50
+ ## Project Structure
51
+
52
+ ```
53
+ reFlow/
54
+ β”œβ”€β”€ train.py # Training script (single GPU / DDP)
55
+ β”œβ”€β”€ sample.py # Text generation from trained models
56
+ β”œβ”€β”€ experiment.py # 12-experiment interpretability suite (Chinese)
57
+ β”œβ”€β”€ experiment_en.py # 12-experiment interpretability suite (English)
58
+ β”œβ”€β”€ check.py # Checkpoint parameter inspector
59
+ β”œβ”€β”€ bench.py # Performance benchmarking
60
+ β”œβ”€β”€ models/
61
+ β”‚ β”œβ”€β”€ gpt2.py # Standard GPT-2 baseline
62
+ β”‚ β”œβ”€β”€ gpt2-new.py # Modernized GPT-2 (RoPE + SwiGLU + RMSNorm)
63
+ β”‚ β”œβ”€β”€ reflow.py # reFlow base architecture
64
+ β”‚ β”œβ”€β”€ reflow-topk.py # reFlow with ReLU + Top-K hard sparsity
65
+ β”‚ └── reflow-lite.py # reFlow with GQA + reduced MLP
66
+ β”œβ”€β”€ config/ # Training / sampling / eval configurations
67
+ β”œβ”€β”€ data/
68
+ β”‚ β”œβ”€β”€ openwebtext/ # OpenWebText dataset preparation
69
+ β”‚ └── sft-lima/ # LIMA SFT dataset preparation
70
+ └── out/ # Checkpoints and experiment reports
71
+ ```
72
+
73
+ ## Installation
74
+
75
+ ### Prerequisites
76
+
77
+ - Python 3.10+
78
+ - CUDA-compatible GPU (tested on Tesla T4 x4)
79
+
80
+ ### 1. PyTorch (CUDA 12.8)
81
+
82
+ ```bash
83
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
84
+ ```
85
+
86
+ > Adjust the CUDA version in the URL to match your driver. See [PyTorch Get Started](https://pytorch.org/get-started/locally/).
87
+
88
+ ### 2. Core Dependencies
89
+
90
+ ```bash
91
+ pip install datasets tiktoken wandb tqdm
92
+ ```
93
+
94
+ ### 3. Experiment Suite Dependencies
95
+
96
+ The interpretability experiments (`experiment.py`) require additional packages:
97
+
98
+ ```bash
99
+ pip install numpy matplotlib seaborn scikit-learn scipy adjustText
100
+ ```
101
+
102
+ ### Quick Install (All-in-One)
103
+
104
+ ```bash
105
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
106
+ pip install datasets tiktoken wandb tqdm numpy matplotlib seaborn scikit-learn scipy adjustText
107
+ ```
108
+
109
+ ## Data Preparation
110
+
111
+ ### OpenWebText
112
+
113
+ ```bash
114
+ python data/openwebtext/prepare.py
115
+ ```
116
+
117
+ This downloads the OpenWebText corpus (~54 GB) and tokenizes it with the GPT-2 BPE tokenizer. Output: `data/openwebtext/train.bin` (~17 GB, ~9B tokens) and `val.bin`.
118
+
119
+ ## Training
120
+
121
+ All configurations are in `config/`. No CLI overrides β€” all hyperparameters must be set in the config file.
122
+
123
+ ### Single GPU
124
+
125
+ ```bash
126
+ python train.py config/train_reflow_1.py
127
+ ```
128
+
129
+ ### Multi-GPU (DDP)
130
+
131
+ ```bash
132
+ torchrun --standalone --nproc_per_node=4 train.py config/train_reflow_1.py
133
+ ```
134
+
135
+ ### Available Training Configs
136
+
137
+ | Config | Architecture | Layers | Params | Notes |
138
+ |--------|-------------|--------|--------|-------|
139
+ | `train_gpt2.py` | GPT-2 | 36 | 505.62M | Standard baseline |
140
+ | `train_gpt2_new.py` | GPT-2-New | 36 | 514.01M | + RoPE, SwiGLU, RMSNorm |
141
+ | `train_reflow_1.py` | reFlow | 32 | 463.67M | Base reFlow, constant lr |
142
+ | `train_reflow_1_big.py` | reFlow | 36 | 515.06M | lr decay, for interpretability |
143
+ | `train_reflow_1_topk_big.py` | reFlow-TopK | 36 | 515.06M | + ReLU + Top-64 sparsity |
144
+ | `train_reflow_1_lite.py` | reFlow-Lite | 32 | 413.34M | + GQA, reduced MLP |
145
+ | `train_reflow_1_small.py` | reFlow | 6 | 46.47M | Small-scale validation |
146
+
147
+ ### Resume Training
148
+
149
+ Append `_resume` to the config name (e.g., `train_reflow_1_big_resume.py`).
150
+
151
+ ## Text Generation
152
+
153
+ ```bash
154
+ python sample.py config/sample_reflow_1.py
155
+ ```
156
+
157
+ Edit the config file to change the prompt, temperature, top-k, etc.
158
+
159
+ ## Interpretability Experiments
160
+
161
+ The experiment suite runs 12 analyses on a trained reFlow model. Both Chinese and English versions are available:
162
+
163
+ ```bash
164
+ python experiment_en.py config/train_reflow_1_big.py # English
165
+ python experiment.py config/train_reflow_1_big.py # Chinese
166
+ ```
167
+
168
+ An interactive menu will appear:
169
+
170
+ | # | Experiment | Group |
171
+ |---|-----------|-------|
172
+ | 1 | Recipe Atlas β€” recipe-space nearest neighbors | A. Signal Identity |
173
+ | 2 | Sparsity Profile β€” activation sparsity analysis | A. Signal Identity |
174
+ | 3 | Basis Geometry β€” singular value & effective rank | A. Signal Identity |
175
+ | 4 | Semantic Galaxy β€” PCA clustering visualization | B. Semantic Properties |
176
+ | 5 | Semantic Algebra β€” vector arithmetic (king βˆ’ man + woman = queen) | B. Semantic Properties |
177
+ | 6 | Typo Resilience β€” robustness to spelling errors | B. Semantic Properties |
178
+ | 7 | Layer Evolution β€” per-layer probability crystallization | C. Mechanistic Analysis |
179
+ | 8 | Signal Flow β€” signal activation heatmaps across layers | C. Mechanistic Analysis |
180
+ | 9 | Causal Ablation β€” progressive signal knockout curves | C. Mechanistic Analysis |
181
+ | 10 | Emotion Surgery β€” sentiment steering via signal injection | D. Control & Steering |
182
+ | 11 | Concept Inception β€” binary-search concept implantation | D. Control & Steering |
183
+ | 12 | Genetic Hijack β€” global recipe matrix manipulation | D. Control & Steering |
184
+
185
+ Enter `all` to run all experiments, or specific numbers (e.g., `1 3 5`). Reports are saved to `out/<model>/audit_reports/`.
186
+
187
+ ## Checkpoint Inspection
188
+
189
+ ```bash
190
+ python check.py config/train_reflow_1.py out/reflow-1/ckpt.pt
191
+ ```
192
+
193
+ ## License
194
+
195
+ MIT License. Based on [nanoGPT](https://github.com/karpathy/nanoGPT) by Andrej Karpathy.