File size: 7,648 Bytes
a81a84f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | ---
license: mit
language:
- en
- zh
tags:
- transformer
- interpretability
- mechanistic-interpretability
- language-model
- signal-decomposition
- sparse-representations
- pytorch
datasets:
- openwebtext
pipeline_tag: text-generation
---
# reFlow
[](https://doi.org/10.5281/zenodo.19160838)
[ [δΈζ](README_CN.md) | English ]
**A Metal Soul In My Hand** β A feature-decoupled Transformer architecture with native interpretability.
reFlow factorizes the embedding matrix $E \in \mathbb{R}^{V \times d}$ into a **Recipe Matrix** $W_{recipe} \in \mathbb{R}^{V \times S}$ and a **Signal Basis Matrix** $W_{basis} \in \mathbb{R}^{S \times d}$, forcing the model to maintain a set of continuous, low-redundancy signal bases in latent space. The same factored product $W_{recipe} \times W_{basis}$ serves as both the input embedding and the output projection, forming an end-to-end signal-manifold computation loop without a separate LM head.
## Online Demo
**Try reFlow in your browser:**
- [HuggingFace Space](https://huggingface.co/spaces/reuAC/reFlow) (Global Access)
- [ModelScope Studio](https://www.modelscope.cn/studios/recuAC/reFlow) (China Access)
## Key Results
**Convergence.** At matched depth and scale (36 layers, ~515M parameters), reFlow-1-Big achieves a validation loss within ~1% of GPT-2-New (514M). Three scale points β Small (46.47M), reFlow-1 (463.67M), Big (515.06M) β confirm strict scaling law compliance (val loss: 3.55 β 3.01 β 2.92).
**Emergent Interpretable Structure** (pure language modeling objective, no auxiliary loss):
- Recipe-space semantic algebra: king + woman β man β queen (rank #1), 3/3 tests passed
- Natural sparsity: each token activates ~11% of signals (mean 117/1024), Gini coefficient 0.085
- Causal traceability: single-signal ablation collapses target probability from 8.31% to 0.03%
- Information crystallization boundary: semantic interventions are effective at L0βL12 but inert beyond L18
- Hard sparsity (Top-64) systematically destroys recipe-space semantic structure (algebra 3/3 β 0/3, silhouette +0.11 β β0.02)
> **Paper**: [English (PDF)](./paper/paper.pdf) | [δΈζ (PDF)](./paper/paper-cn.pdf) β Theoretical derivation, 12 interpretability experiments, and scaling/ablation analysis.
>
> **Pretrained Weights**: [HuggingFace](https://huggingface.co/reuAC/reFlow)
## Project Structure
```
reFlow/
βββ train.py # Training script (single GPU / DDP)
βββ sample.py # Text generation from trained models
βββ experiment.py # 12-experiment interpretability suite (Chinese)
βββ experiment_en.py # 12-experiment interpretability suite (English)
βββ check.py # Checkpoint parameter inspector
βββ bench.py # Performance benchmarking
βββ models/
β βββ gpt2.py # Standard GPT-2 baseline
β βββ gpt2-new.py # Modernized GPT-2 (RoPE + SwiGLU + RMSNorm)
β βββ reflow.py # reFlow base architecture
β βββ reflow-topk.py # reFlow with ReLU + Top-K hard sparsity
β βββ reflow-lite.py # reFlow with GQA + reduced MLP
βββ config/ # Training / sampling / eval configurations
βββ data/
β βββ openwebtext/ # OpenWebText dataset preparation
β βββ sft-lima/ # LIMA SFT dataset preparation
βββ out/ # Checkpoints and experiment reports
```
## Installation
### Prerequisites
- Python 3.10+
- CUDA-compatible GPU (tested on Tesla T4 x4)
### 1. PyTorch (CUDA 12.8)
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```
> Adjust the CUDA version in the URL to match your driver. See [PyTorch Get Started](https://pytorch.org/get-started/locally/).
### 2. Core Dependencies
```bash
pip install datasets tiktoken wandb tqdm
```
### 3. Experiment Suite Dependencies
The interpretability experiments (`experiment.py`) require additional packages:
```bash
pip install numpy matplotlib seaborn scikit-learn scipy adjustText
```
### Quick Install (All-in-One)
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install datasets tiktoken wandb tqdm numpy matplotlib seaborn scikit-learn scipy adjustText
```
## Data Preparation
### OpenWebText
```bash
python data/openwebtext/prepare.py
```
This downloads the OpenWebText corpus (~54 GB) and tokenizes it with the GPT-2 BPE tokenizer. Output: `data/openwebtext/train.bin` (~17 GB, ~9B tokens) and `val.bin`.
## Training
All configurations are in `config/`. No CLI overrides β all hyperparameters must be set in the config file.
### Single GPU
```bash
python train.py config/train_reflow_1.py
```
### Multi-GPU (DDP)
```bash
torchrun --standalone --nproc_per_node=4 train.py config/train_reflow_1.py
```
### Available Training Configs
| Config | Architecture | Layers | Params | Notes |
|--------|-------------|--------|--------|-------|
| `train_gpt2.py` | GPT-2 | 36 | 505.62M | Standard baseline |
| `train_gpt2_new.py` | GPT-2-New | 36 | 514.01M | + RoPE, SwiGLU, RMSNorm |
| `train_reflow_1.py` | reFlow | 32 | 463.67M | Base reFlow, constant lr |
| `train_reflow_1_big.py` | reFlow | 36 | 515.06M | lr decay, for interpretability |
| `train_reflow_1_topk_big.py` | reFlow-TopK | 36 | 515.06M | + ReLU + Top-64 sparsity |
| `train_reflow_1_lite.py` | reFlow-Lite | 32 | 413.34M | + GQA, reduced MLP |
| `train_reflow_1_small.py` | reFlow | 6 | 46.47M | Small-scale validation |
### Resume Training
Append `_resume` to the config name (e.g., `train_reflow_1_big_resume.py`).
## Text Generation
```bash
python sample.py config/sample_reflow_1.py
```
Edit the config file to change the prompt, temperature, top-k, etc.
## Interpretability Experiments
The experiment suite runs 12 analyses on a trained reFlow model. Both Chinese and English versions are available:
```bash
python experiment_en.py config/train_reflow_1_big.py # English
python experiment.py config/train_reflow_1_big.py # Chinese
```
An interactive menu will appear:
| # | Experiment | Group |
|---|-----------|-------|
| 1 | Recipe Atlas β recipe-space nearest neighbors | A. Signal Identity |
| 2 | Sparsity Profile β activation sparsity analysis | A. Signal Identity |
| 3 | Basis Geometry β singular value & effective rank | A. Signal Identity |
| 4 | Semantic Galaxy β PCA clustering visualization | B. Semantic Properties |
| 5 | Semantic Algebra β vector arithmetic (king β man + woman = queen) | B. Semantic Properties |
| 6 | Typo Resilience β robustness to spelling errors | B. Semantic Properties |
| 7 | Layer Evolution β per-layer probability crystallization | C. Mechanistic Analysis |
| 8 | Signal Flow β signal activation heatmaps across layers | C. Mechanistic Analysis |
| 9 | Causal Ablation β progressive signal knockout curves | C. Mechanistic Analysis |
| 10 | Emotion Surgery β sentiment steering via signal injection | D. Control & Steering |
| 11 | Concept Inception β binary-search concept implantation | D. Control & Steering |
| 12 | Genetic Hijack β global recipe matrix manipulation | D. Control & Steering |
Enter `all` to run all experiments, or specific numbers (e.g., `1 3 5`). Reports are saved to `out/<model>/audit_reports/`.
## Checkpoint Inspection
```bash
python check.py config/train_reflow_1.py out/reflow-1/ckpt.pt
```
## License
MIT License. Based on [nanoGPT](https://github.com/karpathy/nanoGPT) by Andrej Karpathy.
|