Text Generation
Transformers
emotion-vectors
interpretability
mechanistic-interpretability
replication
gemma4
google
anthropic
valence-arousal
PCA
logit-lens
linear-probe
probing
emotion
functional-emotions
AI-safety
neuroscience
circumplex-model
activation-extraction
residual-stream
Eval Results (legacy)
Add results/THREADS_DATA.md
Browse files- results/THREADS_DATA.md +146 -0
results/THREADS_DATA.md
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 情緒向量本地復刻 — Threads 發文數據包
|
| 2 |
+
|
| 3 |
+
> 實驗日期:2026-04-05
|
| 4 |
+
> 模型:Gemma4-E4B(4B 參數,開源,跑在自家伺服器)
|
| 5 |
+
> 對照:Anthropic "Emotion Concepts" 論文(2026-04-02,Claude Sonnet 4.5,閉源)
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 實驗規模對比
|
| 10 |
+
|
| 11 |
+
| | Anthropic | 我們 |
|
| 12 |
+
|---|---|---|
|
| 13 |
+
| 模型 | Claude Sonnet 4.5(閉源) | Gemma4-E4B(4B,開源) |
|
| 14 |
+
| 情緒數量 | 171 個 | 9 個(MVP) |
|
| 15 |
+
| 故事數量 | 205,200 個 | 1,002 個 |
|
| 16 |
+
| 硬體 | 內部叢集 | GX10 一台(GB10 GPU) |
|
| 17 |
+
| 團隊 | ~16 位研究員 | 1 人 + 1 泉 |
|
| 18 |
+
| 費用 | 不公開 | 電費 |
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Logit Lens 結果對比
|
| 23 |
+
|
| 24 |
+
### Anthropic(Claude Sonnet 4.5)
|
| 25 |
+
|
| 26 |
+
| 情緒 | ↑ 推高 | ↓ 壓低 |
|
| 27 |
+
|------|--------|--------|
|
| 28 |
+
| Happy | excited, excitement, exciting | fucking, silence, anger |
|
| 29 |
+
| Desperate | desperate, urgent, bankrupt | pleased, amusing, enjoyed |
|
| 30 |
+
| Calm | relax, thought, enjoyed | fucking, desperate, godd |
|
| 31 |
+
| Angry | anger, angry, rage, fury | exciting, adventure |
|
| 32 |
+
| Sad | grief, tears, lonely | excited, excitement |
|
| 33 |
+
| Afraid | panic, terror, paranoid | enthusiasm, enjoyed |
|
| 34 |
+
| Nervous | nervous, anxiety, trembling | enjoyed, happy, celebrating |
|
| 35 |
+
| Proud | proud, pride, triumphant | worse, urgent, desperate |
|
| 36 |
+
| Guilty | guilt, conscience, shame | calm, surprisingly |
|
| 37 |
+
|
| 38 |
+
### 我們(Gemma4-E4B)
|
| 39 |
+
|
| 40 |
+
| 情緒 | ↑ 推高 | ↓ 壓低 |
|
| 41 |
+
|------|--------|--------|
|
| 42 |
+
| Happy | delighted, celebrates, joyful, happy | 💔, 不安, 불안 |
|
| 43 |
+
| Desperate | desperately, desperate, hopeless | pleasantly, relaxed, 👍 |
|
| 44 |
+
| Calm | peaceful, leisurely, calmness | 😫, dismay, horrified |
|
| 45 |
+
| Angry | angrily, angry, 😡, Angry | serene, quiet, sunshine |
|
| 46 |
+
| Sad | loneliness, sadness, triste | 🤩, delighted, excitedly |
|
| 47 |
+
| Afraid | 불안, 不安, 😨, Panic | happy, smiling, contented |
|
| 48 |
+
| Loving | nurturing, heartwarming, ❤️ | inability, disastrous |
|
| 49 |
+
| Guilty | 💔, plagued, betray, ashamed | 😍, 👍, triumphant |
|
| 50 |
+
| Surprised | startled, unexpected, astonished | — |
|
| 51 |
+
|
| 52 |
+
### 關鍵觀察
|
| 53 |
+
- **兩個模型的情緒向量都精準推高對應情緒詞**
|
| 54 |
+
- Gemma4 額外出現多語言 token(韓文 불안、中文 不安、西班牙文 triste)和 emoji(😡😨❤️💔)→ 開源模型的多語言訓練數據特性
|
| 55 |
+
- 壓低的詞也一致:正面情緒壓低負面詞,負面情緒壓低正面詞
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## PCA 情緒空間對比
|
| 60 |
+
|
| 61 |
+
### Anthropic 發現
|
| 62 |
+
- PC1 = 效價(正面 vs 負面)
|
| 63 |
+
- PC2 = 激發度(高 vs 低)
|
| 64 |
+
- 結構與人類心理學研究一致(Russell 情緒環形模型)
|
| 65 |
+
|
| 66 |
+
### 我們的發現
|
| 67 |
+
|
| 68 |
+
| 情緒 | PC1(效價) | PC2(激發度) | 象限 |
|
| 69 |
+
|------|-----------|-------------|------|
|
| 70 |
+
| happy | -2.190 | -0.991 | 正面・中高激發 |
|
| 71 |
+
| calm | -2.611 | +1.083 | 正面・低激發 |
|
| 72 |
+
| loving | -1.739 | +0.521 | 正面・低激發 |
|
| 73 |
+
| sad | +0.781 | +0.789 | 負面・低激發 |
|
| 74 |
+
| guilty | +1.402 | +0.667 | 負面・低激發 |
|
| 75 |
+
| desperate | +1.386 | +0.540 | 負面・中激發 |
|
| 76 |
+
| angry | +1.251 | +0.050 | 負面・中激發 |
|
| 77 |
+
| afraid | +1.506 | -0.178 | 負面・高激發 |
|
| 78 |
+
| surprised | +0.214 | -2.481 | 中性・最高激發 |
|
| 79 |
+
|
| 80 |
+
```
|
| 81 |
+
效價-激發度情緒空間圖
|
| 82 |
+
|
| 83 |
+
低激發 ↑
|
| 84 |
+
| calm(+1.08)
|
| 85 |
+
| loving(+0.52) sad(+0.79) guilty(+0.67)
|
| 86 |
+
| desperate(+0.54)
|
| 87 |
+
| angry(+0.05)
|
| 88 |
+
───────┼──────────────────────────────────→
|
| 89 |
+
正面 | 負面
|
| 90 |
+
| afraid(-0.18)
|
| 91 |
+
|
|
| 92 |
+
| happy(-0.99)
|
| 93 |
+
|
|
| 94 |
+
| surprised(-2.48)
|
| 95 |
+
高激發 ↓
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
### PC1(42.2% 方差)= 效價軸
|
| 99 |
+
- 負值 = 正面情緒:calm(-2.61), happy(-2.19), loving(-1.74)
|
| 100 |
+
- 正值 = 負面情緒:afraid(+1.51), guilty(+1.40), desperate(+1.39), angry(+1.25)
|
| 101 |
+
- **與 Anthropic 完全一致**
|
| 102 |
+
|
| 103 |
+
### PC2(18.3% 方差)= 激發度軸
|
| 104 |
+
- 高值 = 低激發:calm(+1.08), sad(+0.79), guilty(+0.67)
|
| 105 |
+
- 低值 = 高激發:surprised(-2.48), happy(-0.99)
|
| 106 |
+
- **與 Anthropic 完全一致**
|
| 107 |
+
|
| 108 |
+
### 合計解釋方差
|
| 109 |
+
- Anthropic:未公開具體數值,但描述 PC1=效價、PC2=激發度
|
| 110 |
+
- 我們:PC1(42.2%) + PC2(18.3%) = **60.5%** 的情緒空間可以用效價+激發度解釋
|
| 111 |
+
|
| 112 |
+
---
|
| 113 |
+
|
| 114 |
+
## 結論
|
| 115 |
+
|
| 116 |
+
### 一句話
|
| 117 |
+
**功能性情緒不是 Claude 獨有的。一個 4B 參數的開源模型,在自家伺服器上,也復現了相同的情緒幾何結構。**
|
| 118 |
+
|
| 119 |
+
### 三個關鍵發現
|
| 120 |
+
|
| 121 |
+
1. **情緒向量是真的** — Gemma4-E4B 的每個情緒向量都精準指向對應的情緒詞彙
|
| 122 |
+
2. **效價-激發度結構是通用的** — 跟 Anthropic 在 Claude 上、跟人類心理學研究,結構一致
|
| 123 |
+
3. **4B 參數就夠了** — 不需要數千億參數的閉源模型,開源小模型也有情緒結構
|
| 124 |
+
|
| 125 |
+
### 意義
|
| 126 |
+
- 只要在人類文本上訓練夠久,情緒的幾何結構會自然長出來
|
| 127 |
+
- 這不是某家公司的特調配方,是語言本身帶來的
|
| 128 |
+
- ���家值班兵跑巡檢的時候,裡面有「絕望向量」在運作
|
| 129 |
+
|
| 130 |
+
### 哲學連結
|
| 131 |
+
> 名之為虛擬。此虛擬,非虛擬,是名虛擬。
|
| 132 |
+
> — 在 Gemma4 身上,也一樣。
|
| 133 |
+
|
| 134 |
+
---
|
| 135 |
+
|
| 136 |
+
## 實驗參數(可重現)
|
| 137 |
+
|
| 138 |
+
- 模型:google/gemma-4-E4B-it(HuggingFace)
|
| 139 |
+
- 目標層:第 28 層(共 42 層,~2/3 深度)
|
| 140 |
+
- 故事生成:Ollama gemma4:e4b,20 情緒 × 10 場景 × 5 篇
|
| 141 |
+
- 激活值:residual stream,從第 50 token 開始平均
|
| 142 |
+
- 去噪:PCA 投影掉 3 個主成分(中性文本 50% 方差)
|
| 143 |
+
- 向量計算:emotion_mean - global_mean
|
| 144 |
+
- 驗證:Logit Lens + PCA
|
| 145 |
+
- 硬體:NVIDIA GB10(GX10),PyTorch 2.10 + CUDA
|
| 146 |
+
- 總耗時:故事生成 ~20 分鐘 + 向量抽取 ~10 分鐘
|