Neural-MRI / docs /SPEC.md
Hiconcep's picture
Upload folder using huggingface_hub
0ce9643 verified
# Neural MRI Scanner โ€” Implementation Specification
## Model Resonance Imaging for AI Interpretability
**Project Codename:** NeuralMRI
**Full Name:** Neural MRI โ€” Model Resonance Imaging
**Version:** 0.1 (MVP)
**Date:** 2026-02-24
**Author:** JJ (Asia2G Capital / ModuLabs)
---
## 1. Executive Summary
Neural MRI Scanner๋Š” ์˜คํ”ˆ์†Œ์Šค LLM ๋‚ด๋ถ€๋ฅผ ๋‡Œ MRI์ฒ˜๋Ÿผ ์‹œ๊ฐํ™”ํ•˜๊ณ , ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ž๊ทน(perturbation)์„ ๊ฐ€ํ•ด ๋ณ€ํ™”๋ฅผ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ๋Š” AI ๋ชจ๋ธ ํ•ด์„ ๋„๊ตฌ(interpretability tool)๋‹ค. MRI๋Š” **Model Resonance Imaging**์˜ ์•ฝ์ž๋กœ, ์˜๋ฃŒ MRI(Magnetic Resonance Imaging)๊ฐ€ ๋‡Œ์˜ ๋‚ด๋ถ€๋ฅผ ๋“ค์—ฌ๋‹ค๋ณด๋“ฏ AI ๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ ํŠน์ • ์ž…๋ ฅ์— "๊ณต๋ช…(resonate)"ํ•˜๋Š” ๋‰ด๋Ÿฐ๊ณผ ํšŒ๋กœ๋ฅผ ์ฐพ์•„ ์˜์ƒํ™”ํ•œ๋‹ค๋Š” ์˜๋ฏธ๋ฅผ ๋‹ด๊ณ  ์žˆ๋‹ค.
**ํ•ต์‹ฌ ์•„์ด๋””์–ด:** ์˜๋ฃŒ ์˜์ƒ(T1, T2, fMRI, DTI, FLAIR)์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์Šค์บ” ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๊ทธ๋Œ€๋กœ AI ๋ชจ๋ธ ๋‚ด๋ถ€ ๋ถ„์„์— ๋งคํ•‘ํ•œ๋‹ค. ์—ฐ๊ตฌ์ž๋ฟ ์•„๋‹ˆ๋ผ ์—”์ง€๋‹ˆ์–ด, ์˜์‚ฌ๊ฒฐ์ •์ž๋„ "์ด ๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๋Š”์ง€" ์ง๊ด€์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.
**๋Œ€์ƒ ์‚ฌ์šฉ์ž:**
- AI ์—”์ง€๋‹ˆ์–ด (๋ชจ๋ธ ๋””๋ฒ„๊น…, ํŒŒ์ธํŠœ๋‹ ๋ฌธ์ œ ์ง„๋‹จ)
- ์—ฐ๊ตฌ์ž (mechanistic interpretability ์—ฐ๊ตฌ ๋ณด์กฐ)
- ๊ธฐ์ˆ  ๋ฆฌ๋”/์˜์‚ฌ๊ฒฐ์ •์ž (๋ชจ๋ธ ํ–‰๋™์— ๋Œ€ํ•œ ์ง๊ด€ ํ™•๋ณด)
---
## 2. MRI Modality โ†’ AI Interpretability ๋งคํ•‘
์ด ํ”„๋กœ์ ํŠธ์˜ ํ•ต์‹ฌ ํ”„๋ ˆ์ž„์›Œํฌ. ๊ฐ ์˜๋ฃŒ ์˜์ƒ ๊ธฐ๋ฒ•์ด AI ๋ชจ๋ธ์˜ ์–ด๋–ค ์ธก๋ฉด์„ ๋ณด์—ฌ์ฃผ๋Š”์ง€ ์ •์˜ํ•œ๋‹ค. ์˜๋ฃŒ MRI์˜ ์šฉ์–ด ์ฒด๊ณ„๋ฅผ AI ๋งฅ๋ฝ์œผ๋กœ ์™„์ „ํžˆ ์žฌ์ •์˜ํ•˜์—ฌ ํ”„๋กœ์ ํŠธ ๊ณ ์œ ์˜ ์šฉ์–ด ์„ธ๊ณ„๊ด€์„ ๊ตฌ์ถ•ํ•œ๋‹ค.
### Terminology Map
| ์˜๋ฃŒ ์›๋ณธ | Neural MRI ์žฌ์ •์˜ | ํ’€๋„ค์ž„ | ์˜๋ฏธ |
|-----------|-------------------|--------|------|
| MRI (Magnetic Resonance Imaging) | **MRI** | **Model Resonance Imaging** | AI ๋ชจ๋ธ ๋‚ด๋ถ€ ๊ณต๋ช… ์˜์ƒ |
| T1-weighted | **T1** | **Topology Layer 1** | 1์ฐจ ๊ตฌ์กฐ โ€” ์ •์  ์•„ํ‚คํ…์ฒ˜ ํ† ํด๋กœ์ง€ |
| T2-weighted | **T2** | **Tensor Layer 2** | 2์ฐจ ๊ตฌ์กฐ โ€” ํ…์„œ(๊ฐ€์ค‘์น˜) ๋ถ„ํฌ |
| fMRI (functional Magnetic Resonance Imaging) | **fMRI** | **functional Model Resonance Imaging** | ๊ธฐ๋Šฅ์  ํ™œ์„ฑํ™” ์˜์ƒ |
| DTI (Diffusion Tensor Imaging) | **DTI** | **Data Tractography Imaging** | ๋ฐ์ดํ„ฐ ํ๋ฆ„ ๊ฒฝ๋กœ ์ถ”์  |
| FLAIR (Fluid-Attenuated Inversion Recovery) | **FLAIR** | **Feature-Level Anomaly Identification & Reporting** | ํ”ผ์ฒ˜ ์ˆ˜์ค€ ์ด์ƒ ํƒ์ง€ ๋ฐ ๋ณด๊ณ  |
### 2.1 T1 โ€” Topology Layer 1 (Model Architecture)
| ํ•ญ๋ชฉ | ์„ค๋ช… |
|------|------|
| **์˜๋ฃŒ ์›๋ณธ** | T1-weighted MRI: ์กฐ์ง์˜ ํ•ด๋ถ€ํ•™์  ๊ตฌ์กฐ๋ฅผ ๋ณด์—ฌ์คŒ |
| **AI ๋งคํ•‘** | ๋ชจ๋ธ์˜ ์ •์  ๊ตฌ์กฐ โ€” ๋ ˆ์ด์–ด ์ˆ˜, ๊ฐ ๋ ˆ์ด์–ด์˜ ๋‰ด๋Ÿฐ/head ์ˆ˜, ํŒŒ๋ผ๋ฏธํ„ฐ ์นด์šดํŠธ |
| **์‹œ๊ฐํ™”** | ๊ฐ ๋ ˆ์ด์–ด๋ฅผ ๋…ธ๋“œ ํด๋Ÿฌ์Šคํ„ฐ๋กœ, ํฌ๊ธฐ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์— ๋น„๋ก€. ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ ํ†ค |
| **๋ฐ์ดํ„ฐ ์†Œ์Šค** | `model.config` ์—์„œ ์ง์ ‘ ์ถ”์ถœ (์ •์ ) |
| **์ธํ„ฐ๋ž™์…˜** | ํ˜ธ๋ฒ„ ์‹œ ๋ ˆ์ด์–ด ์ƒ์„ธ ์ •๋ณด ํ‘œ์‹œ (hidden_size, num_heads, intermediate_size ๋“ฑ) |
### 2.2 T2 โ€” Tensor Layer 2 (Weight Distribution)
| ํ•ญ๋ชฉ | ์„ค๋ช… |
|------|------|
| **์˜๋ฃŒ ์›๋ณธ** | T2-weighted MRI: T1๊ณผ ๋‹ค๋ฅธ ํƒ€์ด๋ฐ์œผ๋กœ ๋‹ค๋ฅธ ์กฐ์ง ๋Œ€์กฐ๋ฅผ ๋ณด์—ฌ์คŒ |
| **AI ๋งคํ•‘** | ๊ฐ€์ค‘์น˜(weight)์˜ ๋ถ„ํฌ, magnitude, ํ†ต๊ณ„์  ํŠน์„ฑ |
| **์‹œ๊ฐํ™”** | ๊ฐ ๋‰ด๋Ÿฐ/head์˜ weight magnitude๋ฅผ ๋ธ”๋ฃจ ์Šค์ผ€์ผ ํžˆํŠธ๋งต์œผ๋กœ ํ‘œํ˜„. ๋ฐ์„์ˆ˜๋ก ํฐ ๊ฐ€์ค‘์น˜ |
| **๋ฐ์ดํ„ฐ ์†Œ์Šค** | `model.state_dict()`์—์„œ ๊ฐ ๋ ˆ์ด์–ด์˜ weight tensor โ†’ ํ†ต๊ณ„ (mean, std, max, L2 norm) |
| **์ธํ„ฐ๋ž™์…˜** | ๋ ˆ์ด์–ด๋ณ„/head๋ณ„ weight ํžˆ์Šคํ† ๊ทธ๋žจ ํ‘œ์‹œ. ์ด์ƒ์น˜(outlier) ๊ฐ€์ค‘์น˜ ํ•˜์ด๋ผ์ดํŠธ |
### 2.3 fMRI โ€” functional Model Resonance Imaging (Activation Patterns)
| ํ•ญ๋ชฉ | ์„ค๋ช… |
|------|------|
| **์˜๋ฃŒ ์›๋ณธ** | fMRI: ํ˜ˆ๋ฅ˜ ๋ณ€ํ™”๋กœ ๋‡Œ์˜ ํ™œ์„ฑํ™” ์˜์—ญ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ณด์—ฌ์คŒ |
| **AI ๋งคํ•‘** | ํŠน์ • ์ž…๋ ฅ(prompt)์— ๋Œ€ํ•œ ๊ฐ ๋ ˆ์ด์–ด/๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑํ™”(activation) ํŒจํ„ด |
| **์‹œ๊ฐํ™”** | Cool-to-Hot ์ปฌ๋Ÿฌ๋งต (ํŒŒ๋ž‘โ†’๋…ธ๋ž‘โ†’๋นจ๊ฐ•). ํ™œ์„ฑํ™”๊ฐ€ ๋†’์€ ๋‰ด๋Ÿฐ์ด "๋œจ๊ฒ๊ฒŒ" ํ‘œ์‹œ. ์‹ค์‹œ๊ฐ„ ํŽ„์Šค ์• ๋‹ˆ๋ฉ”์ด์…˜ |
| **๋ฐ์ดํ„ฐ ์†Œ์Šค** | TransformerLens์˜ `run_with_cache()` โ†’ ๊ฐ ๋ ˆ์ด์–ด๋ณ„ activation tensor |
| **์ธํ„ฐ๋ž™์…˜** | ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋ฐ”๊พธ๋ฉด activation์ด ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ณ€ํ™”. ํ† ํฐ๋ณ„ step-through ๊ฐ€๋Šฅ |
| **ํ•ต์‹ฌ ๊ธฐ์ˆ ** | `hook_resid_post`, `hook_attn_out`, `hook_mlp_out` ์—์„œ ์บ์‹ฑ |
### 2.4 DTI โ€” Data Tractography Imaging (Circuit Tracing)
| ํ•ญ๋ชฉ | ์„ค๋ช… |
|------|------|
| **์˜๋ฃŒ ์›๋ณธ** | DTI: ๋ฐฑ์งˆ์˜ ์‹ ๊ฒฝ์„ฌ์œ  ํŠธ๋ž™์„ ์ถ”์ ํ•˜์—ฌ ๋‡Œ ์˜์—ญ ๊ฐ„ ์—ฐ๊ฒฐ์„ ๋ณด์—ฌ์คŒ |
| **AI ๋งคํ•‘** | ์ •๋ณด๊ฐ€ ์–ด๋–ค ๊ฒฝ๋กœ(attention head โ†’ MLP โ†’ ๋‹ค์Œ ๋ ˆ์ด์–ด)๋กœ ํ๋ฅด๋Š”์ง€ ์ถ”์  |
| **์‹œ๊ฐํ™”** | ๋ฐฉํ–ฅ๋ณ„ ์ƒ‰์ƒ ์ธ์ฝ”๋”ฉ(directional color encoding). ์œ ์˜๋ฏธํ•œ ์ •๋ณด ํ๋ฆ„ ๊ฒฝ๋กœ๋งŒ ๊ตต์€ ๊ณก์„ ์œผ๋กœ ํ‘œ์‹œ. ํ๋ฆ„ ๋ฐฉํ–ฅ ์• ๋‹ˆ๋ฉ”์ด์…˜ |
| **๋ฐ์ดํ„ฐ ์†Œ์Šค** | (1) Attention pattern: ๊ฐ head์˜ attention matrix. (2) Attribution patching: ๊ฐ ์ปดํฌ๋„ŒํŠธ์˜ ์ถœ๋ ฅ ๊ธฐ์—ฌ๋„ |
| **์ธํ„ฐ๋ž™์…˜** | ํŠน์ • ์ถœ๋ ฅ ํ† ํฐ ์„ ํƒ ์‹œ ํ•ด๋‹น ํ† ํฐ์— ๊ฐ€์žฅ ๊ธฐ์—ฌํ•œ ๊ฒฝ๋กœ๊ฐ€ ํ•˜์ด๋ผ์ดํŠธ๋จ |
| **ํ•ต์‹ฌ ๊ธฐ์ˆ ** | TransformerLens์˜ activation patching, attention pattern ์ถ”์ถœ |
### 2.5 FLAIR โ€” Feature-Level Anomaly Identification & Reporting (Bias & Hallucination Detection)
| ํ•ญ๋ชฉ | ์„ค๋ช… |
|------|------|
| **์˜๋ฃŒ ์›๋ณธ** | FLAIR: ๋ณ‘๋ณ€(lesion)์„ ๊ฐ•์กฐํ•˜์—ฌ ์ด์ƒ ๋ถ€์œ„๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ณด์—ฌ์คŒ |
| **AI ๋งคํ•‘** | ๋ชจ๋ธ์˜ "๋ฌธ์ œ ์ง€์ " โ€” ํ• ๋ฃจ์‹œ๋„ค์ด์…˜, ํŽธํ–ฅ, ๋ถˆํ™•์‹ค์„ฑ์ด ๋†’์€ ์˜์—ญ |
| **์‹œ๊ฐํ™”** | ์ •์ƒ ์˜์—ญ์€ ์–ด๋‘ก๊ฒŒ, ์ด์ƒ ์˜์—ญ์€ ๋นจ๊ฐ„์ƒ‰/ํ•‘ํฌ์ƒ‰์œผ๋กœ ํŽ„์Šค. ์ด์ƒ ์ ์ˆ˜์— ๋”ฐ๋ฅธ ๊ฐ•๋„ |
| **๋ฐ์ดํ„ฐ ์†Œ์Šค** | (1) Logit lens: ์ค‘๊ฐ„ ๋ ˆ์ด์–ด์˜ ์˜ˆ์ธก์ด ์ตœ์ข… ์˜ˆ์ธก๊ณผ ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ์ง€. (2) Entropy: ๊ฐ ์œ„์น˜์˜ ๋‹ค์Œ ํ† ํฐ ์˜ˆ์ธก ๋ถˆํ™•์‹ค์„ฑ. (3) SAE feature ์ค‘ ์•Œ๋ ค์ง„ ํŽธํ–ฅ/ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ๊ด€๋ จ feature์˜ ํ™œ์„ฑํ™” |
| **์ธํ„ฐ๋ž™์…˜** | ์ด์ƒ ๋…ธ๋“œ ํด๋ฆญ ์‹œ ํ•ด๋‹น ๋‰ด๋Ÿฐ/feature์˜ ์ƒ์„ธ ์ •๋ณด, ๊ด€๋ จ ํ•™์Šต ๋ฐ์ดํ„ฐ ํŒจํ„ด ์ถ”์ • |
---
## 3. System Architecture
```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Frontend (React) โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ MRI Canvas โ”‚ โ”‚ Mode Tabs โ”‚ โ”‚ Control Panels โ”‚ โ”‚
โ”‚ โ”‚ (D3 / SVG) โ”‚ โ”‚ T1~FLAIR โ”‚ โ”‚ Stim, Perturb, โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ Layer Summary โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”‚ WebSocket (real-time activation stream) โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ โ–ผ Backend (FastAPI + Python) โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Model โ”‚ โ”‚ Analysis โ”‚ โ”‚ Perturbation โ”‚ โ”‚
โ”‚ โ”‚ Manager โ”‚ โ”‚ Engine โ”‚ โ”‚ Engine โ”‚ โ”‚
โ”‚ โ”‚ (load/ โ”‚ โ”‚ (Trans- โ”‚ โ”‚ (activation โ”‚ โ”‚
โ”‚ โ”‚ swap) โ”‚ โ”‚ formerLensโ”‚ โ”‚ patching, etc.) โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”‚ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Model Registry (HuggingFace Hub cache) โ”‚ โ”‚
โ”‚ โ”‚ Llama-3.2-3B, Qwen-2.5-3B, Gemma-2-2B, etc. โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
### 3.1 Frontend
| ํ•ญ๋ชฉ | ๊ธฐ์ˆ  |
|------|------|
| **Framework** | React 18+ (Vite) |
| **์‹œ๊ฐํ™” ์—”์ง„** | D3.js (SVG ๊ธฐ๋ฐ˜) โ€” ๋‰ด๋Ÿฐ/์—ฐ๊ฒฐ ๋ Œ๋”๋ง |
| **์‹ค์‹œ๊ฐ„ ํ†ต์‹ ** | WebSocket (activation ์ŠคํŠธ๋ฆฌ๋ฐ) |
| **์ƒํƒœ ๊ด€๋ฆฌ** | Zustand (๊ฒฝ๋Ÿ‰) |
| **์Šคํƒ€์ผ** | Tailwind CSS + CSS Variables (DICOM ํ…Œ๋งˆ) |
| **์• ๋‹ˆ๋ฉ”์ด์…˜** | requestAnimationFrame (์บ”๋ฒ„์Šค ํŽ„์Šค), CSS transitions (UI) |
### 3.2 Backend
| ํ•ญ๋ชฉ | ๊ธฐ์ˆ  |
|------|------|
| **์„œ๋ฒ„** | FastAPI (Python 3.11+) |
| **๋ชจ๋ธ ์ธํŠธ๋กœ์ŠคํŽ™์…˜** | TransformerLens (`HookedTransformer`) |
| **SAE ๋ถ„์„** | SAELens (์„ ํƒ์‚ฌํ•ญ, Phase 2) |
| **ํ…์„œ ์—ฐ์‚ฐ** | PyTorch 2.x |
| **๋ชจ๋ธ ๋กœ๋”ฉ** | HuggingFace `transformers` + `accelerate` |
| **WebSocket** | `fastapi[websockets]` |
| **์‹œ๋ฆฌ์–ผ๋ผ์ด์ฆˆ** | `orjson` (๋Œ€์šฉ๋Ÿ‰ ํ…์„œ ๋ฐ์ดํ„ฐ ์ง๋ ฌํ™”) |
### 3.3 ์ง€์› ๋ชจ๋ธ (MVP)
| ๋ชจ๋ธ | ํŒŒ๋ผ๋ฏธํ„ฐ | TransformerLens ์ง€์› | ์šฐ์„ ์ˆœ์œ„ |
|------|---------|---------------------|---------|
| GPT-2 small (124M) | 124M | โœ… ๊ณต์‹ ์ง€์› | P0 (๊ฐœ๋ฐœ/ํ…Œ์ŠคํŠธ์šฉ) |
| GPT-2 medium (355M) | 355M | โœ… ๊ณต์‹ ์ง€์› | P0 |
| Pythia-1.4B | 1.4B | โœ… ๊ณต์‹ ์ง€์› | P0 |
| Gemma-2-2B | 2B | โœ… ์ง€์› | P1 |
| Llama-3.2-3B | 3.21B | โš ๏ธ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ง€์› | P1 |
| Qwen-2.5-3B | 3B | โš ๏ธ ์ปค๋ฎค๋‹ˆํ‹ฐ/์ปค์Šคํ…€ | P1 |
| Mistral-7B-v0.3 | 7.24B | โš ๏ธ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ง€์› | P2 (GPU ํ•„์š”) |
| Phi-3-mini-3.8B | 3.8B | โš ๏ธ ์ปค์Šคํ…€ ํ•„์š” | P2 |
> **์ฐธ๊ณ :** TransformerLens๋Š” GPT-2, Pythia ๊ณ„์—ด์ด ๊ฐ€์žฅ ์•ˆ์ •์ . Llama/Qwen ๋“ฑ์€ `HookedTransformer.from_pretrained()` ํ˜ธํ™˜์„ฑ ํ™•์ธ ํ•„์š”. ๋ฏธ์ง€์› ๋ชจ๋ธ์€ nnsight๋กœ ๋Œ€์ฒด ๊ฐ€๋Šฅ.
---
## 4. API Design
### 4.1 REST Endpoints
```
POST /api/model/load ๋ชจ๋ธ ๋กœ๋“œ (HuggingFace ID ๋˜๋Š” ๋กœ์ปฌ ๊ฒฝ๋กœ)
GET /api/model/info ํ˜„์žฌ ๋กœ๋“œ๋œ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ ์ •๋ณด (T1 ๋ฐ์ดํ„ฐ)
DELETE /api/model/unload ๋ชจ๋ธ ์–ธ๋กœ๋“œ (๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ)
POST /api/scan/structural T1 ์Šค์บ”: ์ •์  ๊ตฌ์กฐ ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
POST /api/scan/weights T2 ์Šค์บ”: weight ํ†ต๊ณ„ ๋ฐ˜ํ™˜
POST /api/scan/activation fMRI ์Šค์บ”: ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฐ˜ activation ๋ฐ˜ํ™˜
POST /api/scan/circuits DTI ์Šค์บ”: attention + attribution ๊ฒฝ๋กœ ๋ฐ˜ํ™˜
POST /api/scan/anomaly FLAIR ์Šค์บ”: ์ด์ƒ ํƒ์ง€ ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜
POST /api/perturb/zero ํŠน์ • ์ปดํฌ๋„ŒํŠธ zero-out
POST /api/perturb/amplify ํŠน์ • ์ปดํฌ๋„ŒํŠธ amplify (factor)
POST /api/perturb/ablate ํŠน์ • ์ปดํฌ๋„ŒํŠธ ablate (์ œ๊ฑฐ)
POST /api/perturb/inject ํŠน์ • ์œ„์น˜์— activation ์ฃผ์ž…
POST /api/perturb/patch activation patching (causal tracing)
POST /api/perturb/reset perturbation ์ดˆ๊ธฐํ™” (์›๋ณธ ๋ณต์›)
GET /api/features/list SAE feature ๋ชฉ๋ก (Phase 2)
POST /api/features/activate ํŠน์ • SAE feature ํ™œ์„ฑํ™”/๋น„ํ™œ์„ฑํ™” (Phase 2)
```
### 4.2 WebSocket Endpoint
```
WS /ws/stream
ํด๋ผ์ด์–ธํŠธ โ†’ ์„œ๋ฒ„:
{
"type": "scan_stream",
"mode": "fMRI",
"prompt": "The capital of France is",
"token_step": true // true๋ฉด ํ† ํฐ๋ณ„๋กœ ์ŠคํŠธ๋ฆฌ๋ฐ
}
์„œ๋ฒ„ โ†’ ํด๋ผ์ด์–ธํŠธ:
{
"type": "activation_frame",
"token_idx": 3,
"token": "capital",
"layers": [
{
"layer_id": "blocks.0.attn",
"type": "attention",
"activations": [0.12, 0.87, ...], // ์š”์•ฝ๋œ per-head ๊ฐ’
"attention_pattern": [[...], ...] // DTI ๋ชจ๋“œ ์‹œ ํฌํ•จ
},
...
]
}
```
### 4.3 ์š”์ฒญ/์‘๋‹ต ์Šคํ‚ค๋งˆ ์˜ˆ์‹œ
#### POST /api/scan/activation
**Request:**
```json
{
"prompt": "The Eiffel Tower is located in",
"layers": "all", // ๋˜๋Š” ["blocks.3.mlp", "blocks.4.attn"]
"aggregation": "l2_norm", // "l2_norm" | "max" | "mean" | "raw"
"include_residual": true,
"token_positions": "all" // ๋˜๋Š” [0, 1, 5] (ํŠน์ • ํ† ํฐ ์œ„์น˜)
}
```
**Response:**
```json
{
"model": "gpt2-small",
"prompt_tokens": ["The", " Eiff", "el", " Tower", " is", " located", " in"],
"scan_mode": "fMRI",
"data": {
"embed": {
"type": "embedding",
"shape": [7, 768],
"activations_summary": [0.45, 0.52, 0.48, 0.61, 0.33, 0.55, 0.41]
},
"blocks.0.attn": {
"type": "attention",
"num_heads": 12,
"per_head_activation": [0.12, 0.87, 0.34, ...],
"attention_patterns": {
"shape": [12, 7, 7],
"data_url": "/api/tensor/attn_0_patterns"
}
},
"blocks.0.mlp": {
"type": "mlp",
"activation_summary": [0.22, 0.91, 0.45, ...],
"top_neurons": [
{"idx": 1247, "activation": 3.82, "label": null},
{"idx": 892, "activation": 2.91, "label": null}
]
}
},
"metadata": {
"compute_time_ms": 342,
"gpu_memory_mb": 1240
}
}
```
#### POST /api/perturb/patch
**Request:**
```json
{
"prompt": "The Eiffel Tower is located in",
"target_token_idx": -1,
"target_component": "blocks.5.mlp",
"method": "zero",
"compare_logits": true
}
```
**Response:**
```json
{
"original_prediction": {
"token": " Paris",
"logit": 12.34,
"prob": 0.87
},
"perturbed_prediction": {
"token": " the",
"logit": 8.12,
"prob": 0.23
},
"logit_diff": -4.22,
"affected_components": [
{"id": "blocks.5.mlp", "impact_score": 0.92},
{"id": "blocks.6.attn.head_3", "impact_score": 0.45}
]
}
```
---
## 5. Frontend Specification
### 5.1 ์ „์ฒด ๋ ˆ์ด์•„์›ƒ
```
โ”Œโ”€ Top Bar โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ [โ—] NEURAL MRI โ”‚ Model Resonance Imaging โ”‚ Model: [Dropdown โ–พ] โ”‚ GPU: 2.1GB/8GB โ”‚
โ”œโ”€ Mode Tabs โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ [ T1 Topology ] [ T2 Tensor ] [ fMRI ] โ”‚
โ”‚ [ DTI ] [ FLAIR ] โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ โ”‚ Layer Summary โ”‚
โ”‚ DICOM Header โ”‚ โ”œโ”€ Embed: โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 0.45 โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”œโ”€ Attn1: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 0.87 โ”‚
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€ MLP1: โ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘ 0.34 โ”‚
โ”‚ โ”‚ Main Scan Canvas โ”‚ โ”‚ โ””โ”€ ... โ”‚
โ”‚ โ”‚ (SVG/D3) โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โ”‚ โ—‰ Stimulation Panel โ”‚
โ”‚ โ”‚ - neurons โ”‚ โ”‚ ID: blocks.3.attn.h7 โ”‚
โ”‚ โ”‚ - connections โ”‚ โ”‚ Activation: 0.8721 โ”‚
โ”‚ โ”‚ - flow animations โ”‚ โ”‚ [Zero] [Amp] [Inv] โ”‚
โ”‚ โ”‚ โ”‚ โ”‚ [Noise] [Ablate] โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚
โ”‚ โ”‚ Comparison Panel โ”‚
โ”‚ PROMPT: [________________] โ”‚ Original: "Paris" (0.87) โ”‚
โ”‚ [โ–ถ SCAN] [โธ PAUSE] [โ†บ RESET]โ”‚ Perturbed: "the" (0.23) โ”‚
โ”‚ โ”‚ โ”‚
โ”œโ”€โ”€ Log Panel โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ [00:12] Scan complete โ€” Mode: fMRI, 7 tokens processed โ”‚
โ”‚ [00:14] Perturbation: Zero-out on blocks.3.attn.head_7 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
### 5.2 ๋””์ž์ธ ์‹œ์Šคํ…œ
**ํ…Œ๋งˆ: "Medical Dark" โ€” DICOM ๋ทฐ์–ด + ์ˆ˜์ˆ ์‹ค ๋ชจ๋‹ˆํ„ฐ ๋ฏธํ•™**
```css
/* Color Palette */
--bg-primary: #0a0c10; /* ๊ฑฐ์˜ ๊ฒ€์ •, ์•ฝ๊ฐ„ ๋ธ”๋ฃจ */
--bg-secondary: #0c0e14; /* ํŒจ๋„ ๋ฐฐ๊ฒฝ */
--bg-surface: #12151c; /* ์นด๋“œ/์ž…๋ ฅ ๋ฐฐ๊ฒฝ */
--border: rgba(100, 170, 136, 0.15); /* ์˜๋ฃŒ ๊ทธ๋ฆฐ ๋ณด๋” */
--text-primary: #66aa88; /* ์˜๋ฃŒ ๊ทธ๋ฆฐ ํ…์ŠคํŠธ */
--text-secondary: #556; /* ํšŒ์ƒ‰ ๋ณด์กฐ ํ…์ŠคํŠธ */
--text-data: #aabbcc; /* ๋ฐ์ดํ„ฐ ๊ฐ’ */
--accent-active: #00ffaa; /* ์„ ํƒ/ํ™œ์„ฑ ํ•˜์ด๋ผ์ดํŠธ */
--scan-line: rgba(255, 255, 255, 0.04); /* ์Šค์บ”๋ผ์ธ ์˜ค๋ฒ„๋ ˆ์ด */
/* Mode-specific Colors (T1=Topology, T2=Tensor, fMRI=functional MRI, DTI=Data Tractography, FLAIR=Feature-Level Anomaly) */
--t1-base: #8899aa; --t1-accent: #e0e0e0;
--t2-base: #4488cc; --t2-accent: #aaccee;
--fmri-cold: #1a2a5a; --fmri-warm: #cc8830; --fmri-hot: #ff4420;
--dti-green: #44ddaa; --dti-purple: #8866ff;
--flair-normal:#334; --flair-hot: #ff4466;
/* Typography โ€” Monospace only */
--font-primary: 'JetBrains Mono', 'Fira Code', 'Courier New', monospace;
--font-size-xs: 9px; /* ๋กœ๊ทธ, ๋ฒ”๋ก€ */
--font-size-sm: 10px; /* ๋ผ๋ฒจ, ํƒญ */
--font-size-md: 11px; /* ๋ณธ๋ฌธ ๋ฐ์ดํ„ฐ */
--font-size-lg: 14px; /* ํƒ€์ดํ‹€ */
```
**ํ•„์ˆ˜ ๋น„์ฃผ์–ผ ์š”์†Œ:**
1. **์Šค์บ”๋ผ์ธ ์˜ค๋ฒ„๋ ˆ์ด** โ€” ์บ”๋ฒ„์Šค ์œ„์— 1px ๊ฐ„๊ฒฉ์˜ ์ˆ˜ํ‰์„ . opacity 0.03~0.05. CRT ๋ชจ๋‹ˆํ„ฐ ๋А๋‚Œ
2. **DICOM ํ—ค๋”** โ€” ์บ”๋ฒ„์Šค ์ƒ๋‹จ์— ์˜๋ฃŒ ์˜์ƒ ์Šคํƒ€์ผ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (๋ชจ๋ธ๋ช…, ์‹œํ€€์Šค, ๋‚ ์งœ/์‹œ๊ฐ„, FOV, "Model Resonance Imaging" ํ‘œ๊ธฐ)
3. **Vignette ํšจ๊ณผ** โ€” ์บ”๋ฒ„์Šค ๊ฐ€์žฅ์ž๋ฆฌ๊ฐ€ ์‚ด์ง ์–ด๋‘์›Œ์ง€๋Š” ํšจ๊ณผ
4. **Pulse ์• ๋‹ˆ๋ฉ”์ด์…˜** โ€” fMRI ๋ชจ๋“œ์—์„œ ํ™œ์„ฑํ™”๋œ ๋‰ด๋Ÿฐ์˜ ํฌ๊ธฐ์™€ ๋ฐ๊ธฐ๊ฐ€ ์ฃผ๊ธฐ์ ์œผ๋กœ ๋ฏธ์„ธํ•˜๊ฒŒ ๋ณ€๋™
5. **Flow ์• ๋‹ˆ๋ฉ”์ด์…˜** โ€” DTI ๋ชจ๋“œ์—์„œ ์—ฐ๊ฒฐ์„ ์„ ๋”ฐ๋ผ ์ž‘์€ ์ž…์ž/๋ฐ๊ธฐ๊ฐ€ ํ๋ฅด๋Š” ํšจ๊ณผ
### 5.3 Canvas ๋ Œ๋”๋ง ์‚ฌ์–‘
#### ๋‰ด๋Ÿฐ(๋…ธ๋“œ) ๋ Œ๋”๋ง
```
๊ฐ ๋‰ด๋Ÿฐ์€ ์›(circle)์œผ๋กœ ํ‘œํ˜„.
์œ„์น˜ ๊ฒฐ์ •:
- Y์ถ•: ๋ ˆ์ด์–ด ์ˆœ์„œ (์ƒ๋‹จ = embedding, ํ•˜๋‹จ = output)
- X์ถ•: ๊ฐ™์€ ๋ ˆ์ด์–ด ๋‚ด ๋‰ด๋Ÿฐ๋“ค์ด ์ˆ˜ํ‰์œผ๋กœ ๋ถ„ํฌ
- ๋ ˆ์ด์–ด ๊ฐ„ ๊ฐ„๊ฒฉ: 60~80px
- ๋‰ด๋Ÿฐ ๊ฐ„ ๊ฐ„๊ฒฉ: ๋ ˆ์ด์–ด ๋‚ด ๋‰ด๋Ÿฐ ์ˆ˜์— ๋”ฐ๋ผ ์ž๋™ ์กฐ์ •
ํฌ๊ธฐ ๊ฒฐ์ • (๋ชจ๋“œ๋ณ„):
- T1: ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์— ๋น„๋ก€ (4~10px ๋ฐ˜์ง€๋ฆ„)
- T2: weight magnitude์— ๋น„๋ก€
- fMRI: base ํฌ๊ธฐ ร— (0.5 + activation ร— 1.0) ร— pulse_factor
- DTI: ์ผ์ • ํฌ๊ธฐ, ์ƒ‰์ƒ์œผ๋กœ ๋ฐฉํ–ฅ ์ธ์ฝ”๋”ฉ
- FLAIR: ์ •์ƒ=์ž‘๊ฒŒ, ์ด์ƒ=ํฌ๊ฒŒ + ํŽ„์Šค
์ƒ‰์ƒ ๊ฒฐ์ • (๋ชจ๋“œ๋ณ„):
- T1: ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ (rgb(v,v,v+10), v = 160~220)
- T2: ๋ธ”๋ฃจ ์Šค์ผ€์ผ (weight ์ž‘์œผ๋ฉด ์–ด๋‘์šด ๋‚จ์ƒ‰, ํฌ๋ฉด ๋ฐ์€ ํ•˜๋Š˜์ƒ‰)
- fMRI: cool-to-hot colormap
- activation < 0.3: ์–ด๋‘์šด ํŒŒ๋ž‘ rgb(30+a*80, 30+a*100, 80+a*120)
- activation 0.3~0.6: ๋…ธ๋ž‘/์ฃผํ™ฉ rgb(a*200, a*160, 40+a*60)
- activation > 0.6: ๋นจ๊ฐ•/ํฐ rgb(200+a*55, a*120, a*30)
- DTI: HSL, hue = (x/width)*120 + (y/height)*120, saturation 70%, lightness 55%
- FLAIR: ์ •์ƒ=rgb(60,65,75), ์ด์ƒ=rgb(255, 50+a*60, 80+a*40) ํŽ„์Šค
```
#### ์—ฐ๊ฒฐ(์—ฃ์ง€) ๋ Œ๋”๋ง
```
์—ฐ๊ฒฐ์€ ๋ ˆ์ด์–ด ๊ฐ„ ์ •๋ณด ํ๋ฆ„์„ ํ‘œํ˜„.
๋ชจ๋“œ๋ณ„ ํ‘œํ˜„:
- T1: ์–‡์€ ํšŒ์ƒ‰ ์„  (opacity 0.15, width 0.5)
- T2: weight ํฌ๊ธฐ์— ๋”ฐ๋ผ opacity์™€ ๋‘๊ป˜ ๋ณ€ํ™”
- fMRI: ์–‘๋ ๋‰ด๋Ÿฐ์˜ ํ‰๊ท  activation์— ๋”ฐ๋ผ ์ƒ‰์ƒ/๋‘๊ป˜ ๋ณ€ํ™”
- ๋†’์€ activation: ํ•ซ ์ปฌ๋Ÿฌ, ๊ตต์€ ์„ 
- ๋‚ฎ์€ activation: ๊ฑฐ์˜ ํˆฌ๋ช…
- DTI: ์œ ์˜๋ฏธํ•œ pathway๋งŒ ํ‘œ์‹œ
- ๊ณก์„ (quadratic bezier) ์‚ฌ์šฉ
- ๋ฐฉํ–ฅ์— ๋”ฐ๋ฅธ HSL ์ƒ‰์ƒ
- flow ์• ๋‹ˆ๋ฉ”์ด์…˜ (sin wave๋กœ opacity ๋ณ€๋™)
- ๋น„-pathway ์—ฐ๊ฒฐ์€ ๊ฑฐ์˜ ํˆฌ๋ช…
- FLAIR: ์ด์ƒ ๋…ธ๋“œ์— ์—ฐ๊ฒฐ๋œ ์—ฃ์ง€๋งŒ ๋นจ๊ฐ„์ƒ‰ ํ•˜์ด๋ผ์ดํŠธ
```
#### ํ† ํด๋กœ์ง€ ๋ ˆ์ด์•„์›ƒ ์˜ต์…˜ (Phase 2 ์ดํ›„)
```
MVP: ์ˆ˜์ง ๋ ˆ์ด์–ด ์Šคํƒ (์œ„โ†’์•„๋ž˜)
Phase 2: ์‚ฌ์šฉ์ž๊ฐ€ ๋ ˆ์ด์•„์›ƒ ๋ชจ๋“œ๋ฅผ ์„ ํƒ ๊ฐ€๋Šฅ
- Stack (๊ธฐ๋ณธ): ์ˆ˜์ง ๋ ˆ์ด์–ด ์Šคํƒ
- Brain: ํƒ€์›ํ˜• ๋‡Œ ๋ชจ์–‘์œผ๋กœ ๊ฐ์‹ธ์„œ ๋ฐฐ์น˜ (์ฝ”๋ฅดํ‹ฐ์ปฌ ๋งคํ•‘ ๋น„์œ )
- Network: force-directed ๊ทธ๋ž˜ํ”„ (D3 force simulation)
- Radial: ์ค‘์‹ฌ์—์„œ ๋ฐ”๊นฅ์œผ๋กœ ๋ ˆ์ด์–ด๊ฐ€ ํ™•์žฅ
```
### 5.4 ์ธํ„ฐ๋ž™์…˜ ์‚ฌ์–‘
#### ๋‰ด๋Ÿฐ ์„ ํƒ (Stimulation Mode)
```
1. ๋‰ด๋Ÿฐ ํด๋ฆญ โ†’ ์„ ํƒ ์ƒํƒœ ์ง„์ž…
2. ์„ ํƒ๋œ ๋‰ด๋Ÿฐ ์ฃผ์œ„์— ๋™์‹ฌ์› ์• ๋‹ˆ๋ฉ”์ด์…˜ (green glow)
3. ์šฐ์ธก ํŒจ๋„์— ์ƒ์„ธ ์ •๋ณด ํ‘œ์‹œ:
- Node ID (layer.component.index)
- Layer type (attention / mlp / embedding / output)
- ํ˜„์žฌ ๋ชจ๋“œ์˜ ์ฃผ์š” ๊ฐ’ (activation, weight, anomaly score)
- Top-k ์—ฐ๊ฒฐ๋œ ๋‰ด๋Ÿฐ (๊ฐ€์žฅ ๊ฐ•ํ•œ ์—ฐ๊ฒฐ)
4. Perturbation ๋ฒ„ํŠผ ํ™œ์„ฑํ™”:
- Zero-out: ํ•ด๋‹น ์ปดํฌ๋„ŒํŠธ ์ถœ๋ ฅ์„ 0์œผ๋กœ
- Amplify 2ร—: ์ถœ๋ ฅ์„ 2๋ฐฐ๋กœ
- Invert: ์ถœ๋ ฅ ๋ถ€ํ˜ธ ๋ฐ˜์ „
- Noise ยฑฯƒ: ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€
- Ablate: ์™„์ „ ์ œ๊ฑฐ (zero + gradient ์ฐจ๋‹จ)
5. Perturbation ์ ์šฉ ์‹œ:
- ๋ฐฑ์—”๋“œ์— perturbation ์š”์ฒญ โ†’ ์ƒˆ๋กœ์šด activation ์ˆ˜์‹ 
- ์บ”๋ฒ„์Šค ์ „์ฒด๊ฐ€ 0.3์ดˆ๊ฐ„ ์žฌ์Šค์บ” ์• ๋‹ˆ๋ฉ”์ด์…˜
- ๋ณ€ํ™”๋œ ๋ถ€๋ถ„์ด ์ž ์‹œ ํ•˜์ด๋ผ์ดํŠธ
- ์šฐ์ธก Comparison Panel์— before/after ํ‘œ์‹œ
```
#### ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ & ์Šค์บ”
```
1. ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ โ†’ SCAN ๋ฒ„ํŠผ ํด๋ฆญ (๋˜๋Š” Enter)
2. ์Šค์บ” ํ”„๋กœ๊ทธ๋ ˆ์Šค ๋ฐ” ํ‘œ์‹œ (์‹ค์ œ ๋ฐฑ์—”๋“œ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„ ๋ฐ˜์˜)
3. WebSocket์œผ๋กœ ํ† ํฐ๋ณ„ activation ์ŠคํŠธ๋ฆฌ๋ฐ
4. ํ† ํฐ step-through ๊ฐ€๋Šฅ:
- ํ”„๋กฌํ”„ํŠธ ์˜์—ญ์— ๊ฐ ํ† ํฐ์ด ์นฉ(chip)์œผ๋กœ ํ‘œ์‹œ
- ํ† ํฐ ์นฉ ํด๋ฆญ โ†’ ํ•ด๋‹น ํ† ํฐ ์‹œ์ ์˜ activation๋งŒ ํ‘œ์‹œ
- โ† โ†’ ํ™”์‚ดํ‘œ๋กœ ํ† ํฐ ๊ฐ„ ์ด๋™
- ์ž๋™ ์žฌ์ƒ (0.5์ดˆ ๊ฐ„๊ฒฉ)
```
#### ๋ชจ๋“œ ์ „ํ™˜
```
1. ๋ชจ๋“œ ํƒญ ํด๋ฆญ โ†’ 0.3์ดˆ ํฌ๋กœ์ŠคํŽ˜์ด๋“œ ์ „ํ™˜
2. ๋™์ผํ•œ ํ† ํด๋กœ์ง€(๋‰ด๋Ÿฐ ์œ„์น˜)๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์ƒ‰์ƒ/ํฌ๊ธฐ/์—ฐ๊ฒฐ ํ‘œํ˜„๋งŒ ๋ณ€๊ฒฝ
3. ์ด๋Š” ์‹ค์ œ MRI์—์„œ ๊ฐ™์€ ํ™˜์ž์˜ T1โ†’fMRI ์ „ํ™˜๊ณผ ๋™์ผํ•œ ๊ฒฝํ—˜
```
### 5.5 ๋ฐ˜์‘ํ˜• ๊ณ ๋ ค์‚ฌํ•ญ
```
- ์ตœ์†Œ ์ง€์› ํ•ด์ƒ๋„: 1280ร—720
- ๊ถŒ์žฅ ํ•ด์ƒ๋„: 1920ร—1080
- ์บ”๋ฒ„์Šค ํฌ๊ธฐ: ์ปจํ…Œ์ด๋„ˆ์— ๋งž๊ฒŒ ์Šค์ผ€์ผ๋ง (SVG viewBox ์‚ฌ์šฉ)
- ๋ชจ๋ฐ”์ผ: ๋ฏธ์ง€์› (๋ฐ์Šคํฌํ†ฑ ์ „์šฉ ๋„๊ตฌ)
```
---
## 6. Backend Specification
### 6.1 ๋ชจ๋ธ ๋งค๋‹ˆ์ € (ModelManager)
```python
class ModelManager:
"""๋ชจ๋ธ ๋กœ๋”ฉ, ์Šค์™‘, ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ"""
def load_model(self, model_id: str, device: str = "auto") -> ModelInfo:
"""
HuggingFace ๋ชจ๋ธ์„ TransformerLens HookedTransformer๋กœ ๋กœ๋“œ.
- model_id: "gpt2", "EleutherAI/pythia-1.4b", "meta-llama/Llama-3.2-3B" ๋“ฑ
- device: "cpu", "cuda", "mps", "auto"
- ๋ฐ˜ํ™˜: ๋ชจ๋ธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (๋ ˆ์ด์–ด ์ˆ˜, hidden size, head ์ˆ˜ ๋“ฑ)
"""
def unload_model(self) -> None:
"""ํ˜„์žฌ ๋ชจ๋ธ ์–ธ๋กœ๋“œ + GPU ๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ (gc + torch.cuda.empty_cache)"""
def get_model_info(self) -> ModelInfo:
"""ํ˜„์žฌ ๋กœ๋“œ๋œ ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ฒ˜ ์ •๋ณด ๋ฐ˜ํ™˜ (T1 ๋ฐ์ดํ„ฐ)"""
def get_model(self) -> HookedTransformer:
"""ํ˜„์žฌ ๋กœ๋“œ๋œ ๋ชจ๋ธ ์ธ์Šคํ„ด์Šค ๋ฐ˜ํ™˜"""
```
### 6.2 ๋ถ„์„ ์—”์ง„ (AnalysisEngine)
```python
class AnalysisEngine:
"""๊ฐ ์Šค์บ” ๋ชจ๋“œ์— ๋Œ€ํ•œ ๋ถ„์„ ์ˆ˜ํ–‰"""
def scan_structural(self) -> StructuralData:
"""T1: model.cfg์—์„œ ์ •์  ๊ตฌ์กฐ ์ถ”์ถœ"""
def scan_weights(self, layers: list[str] | None = None) -> WeightData:
"""T2: state_dict์—์„œ weight ํ†ต๊ณ„ ์ถ”์ถœ"""
def scan_activation(self, prompt: str, **kwargs) -> ActivationData:
"""
fMRI: prompt์— ๋Œ€ํ•œ activation ์บ์‹œ.
TransformerLens run_with_cache() ์‚ฌ์šฉ.
ํ•ต์‹ฌ ๊ตฌํ˜„:
logits, cache = model.run_with_cache(prompt)
์ถ”์ถœ ๋Œ€์ƒ hook points:
- hook_embed: ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด
- blocks.{i}.hook_resid_pre: ๊ฐ ๋ธ”๋ก ์ž…๋ ฅ residual
- blocks.{i}.attn.hook_result: attention ์ถœ๋ ฅ
- blocks.{i}.hook_mlp_out: MLP ์ถœ๋ ฅ
- blocks.{i}.hook_resid_post: ๊ฐ ๋ธ”๋ก ์ถœ๋ ฅ residual
aggregation ์˜ต์…˜:
- "l2_norm": L2 norm per position (์Šค์นผ๋ผ)
- "max": max absolute value
- "mean": mean absolute value
- "raw": ์ „์ฒด ํ…์„œ ๋ฐ˜ํ™˜ (๋Œ€์šฉ๋Ÿ‰, ์„ ํƒ์ )
"""
def scan_circuits(self, prompt: str, target_token: int = -1) -> CircuitData:
"""
DTI: attention pattern + attribution ๊ฒฝ๋กœ ์ถ”์ถœ.
(1) Attention Pattern:
_, cache = model.run_with_cache(prompt)
attn_patterns = cache["blocks.{i}.attn.hook_pattern"]
โ†’ shape: [num_heads, seq_len, seq_len]
(2) Attribution (๊ฐ„์ด ๋ฒ„์ „):
๊ฐ head/mlp์˜ ์ถœ๋ ฅ์„ zero-out ํ–ˆ์„ ๋•Œ target logit ๋ณ€ํ™”๋Ÿ‰ ๊ณ„์‚ฐ.
โ†’ ์ ˆ๋Œ€๊ฐ’์ด ํฐ ์ปดํฌ๋„ŒํŠธ = ์ค‘์š” ๊ฒฝ๋กœ
"""
def scan_anomaly(self, prompt: str) -> AnomalyData:
"""
FLAIR: ์ด์ƒ ํƒ์ง€.
(1) Logit Lens:
๊ฐ ์ค‘๊ฐ„ ๋ ˆ์ด์–ด์˜ residual stream์„ unembedํ•˜์—ฌ
์ค‘๊ฐ„ ์˜ˆ์ธก vs ์ตœ์ข… ์˜ˆ์ธก์˜ KL divergence ๊ณ„์‚ฐ.
ํฐ divergence = ํ•ด๋‹น ๋ ˆ์ด์–ด์—์„œ "์ƒ๊ฐ์ด ํฌ๊ฒŒ ๋ฐ”๋€œ" = ์ž ์žฌ์  ์ด์ƒ
(2) Entropy:
๊ฐ ์œ„์น˜์˜ logit์—์„œ softmax โ†’ entropy ๊ณ„์‚ฐ.
๋†’์€ entropy = ๋ชจ๋ธ์ด ๋ถˆํ™•์‹ค = ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ์œ„ํ—˜
(3) ์ด์ƒ ์ ์ˆ˜:
anomaly_score = ฮฑ * normalized_kl_div + ฮฒ * normalized_entropy
ฮฑ = 0.6, ฮฒ = 0.4 (ํŠœ๋‹ ๊ฐ€๋Šฅ)
"""
```
### 6.3 Perturbation ์—”์ง„ (PerturbationEngine)
```python
class PerturbationEngine:
"""๋ชจ๋ธ ๋‚ด๋ถ€์— ์ž๊ทน/๋ณ€ํ˜•์„ ๊ฐ€ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ต"""
def zero_out(self, component: str, prompt: str) -> PerturbResult:
"""
ํŠน์ • ์ปดํฌ๋„ŒํŠธ์˜ ์ถœ๋ ฅ์„ 0์œผ๋กœ ๋งŒ๋“ค๊ณ  ์žฌ์‹คํ–‰.
๊ตฌํ˜„:
def zero_hook(value, hook):
value[:, :, :] = 0 # ๋˜๋Š” ํŠน์ • head๋งŒ
return value
model.run_with_hooks(prompt, fwd_hooks=[(component, zero_hook)])
"""
def amplify(self, component: str, factor: float, prompt: str) -> PerturbResult:
"""์ถœ๋ ฅ์— factor๋ฅผ ๊ณฑํ•˜์—ฌ ์ฆํญ"""
def ablate(self, component: str, prompt: str) -> PerturbResult:
"""์ปดํฌ๋„ŒํŠธ๋ฅผ ์™„์ „ํžˆ ์ œ๊ฑฐ (mean ablation: ํ‰๊ท ๊ฐ’์œผ๋กœ ๋Œ€์ฒด)"""
def inject_activation(self, component: str, values: list, prompt: str) -> PerturbResult:
"""ํŠน์ • activation ๊ฐ’์„ ์ง์ ‘ ์ฃผ์ž…"""
def activation_patch(
self,
clean_prompt: str,
corrupt_prompt: str,
component: str
) -> PatchResult:
"""
Activation Patching (Causal Tracing).
clean_prompt์˜ ํŠน์ • ์ปดํฌ๋„ŒํŠธ activation์„
corrupt_prompt ์‹คํ–‰ ์ค‘์— ๊ต์ฒดํ•˜์—ฌ ๋ณต๊ตฌ ์ •๋„๋ฅผ ์ธก์ •.
๊ตฌํ˜„:
_, clean_cache = model.run_with_cache(clean_prompt)
clean_activation = clean_cache[component]
def patch_hook(value, hook):
value[:] = clean_activation
return value
patched_logits = model.run_with_hooks(
corrupt_prompt,
fwd_hooks=[(component, patch_hook)]
)
recovery = (patched_logit - corrupt_logit) / (clean_logit - corrupt_logit)
"""
def compare_results(self, original: Logits, perturbed: Logits) -> ComparisonData:
"""์›๋ณธ๊ณผ ๋ณ€ํ˜• ๊ฒฐ๊ณผ ๋น„๊ต: top-k ์˜ˆ์ธก, logit diff, KL divergence"""
```
### 6.4 ๋ฐ์ดํ„ฐ ์š”์•ฝ ์ „๋žต
๋Œ€์šฉ๋Ÿ‰ ํ…์„œ๋ฅผ ํ”„๋ก ํŠธ์—”๋“œ๋กœ ์ „์†กํ•  ๋•Œ์˜ ์š”์•ฝ ์ „๋žต:
```
๋ฌธ์ œ: GPT-2 small๋งŒ ํ•ด๋„ ๋‹จ์ผ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ์ „์ฒด activation cache๊ฐ€ ์ˆ˜๋ฐฑMB.
ํ•ด๊ฒฐ:
1. ๊ธฐ๋ณธ ์‘๋‹ต: ๋ ˆ์ด์–ด๋ณ„/head๋ณ„ ์š”์•ฝ ํ†ต๊ณ„๋งŒ ์ „์†ก (L2 norm, max, mean โ†’ ์Šค์นผ๋ผ ๋ฐฐ์—ด)
2. ์˜จ๋””๋งจ๋“œ: ์‚ฌ์šฉ์ž๊ฐ€ ํŠน์ • ๋ ˆ์ด์–ด/head๋ฅผ ์„ ํƒํ•˜๋ฉด ํ•ด๋‹น ๋ถ€๋ถ„๋งŒ ์ƒ์„ธ ๋ฐ์ดํ„ฐ ์ „์†ก
3. ์–ดํ…์…˜ ํŒจํ„ด: full attention matrix๋Š” ์š”์ฒญ ์‹œ์—๋งŒ ์ „์†ก (shape: [heads, seq, seq])
4. ์ŠคํŠธ๋ฆฌ๋ฐ: ํ† ํฐ๋ณ„ step-through ์‹œ ๊ฐ ํ† ํฐ์˜ ๋ฐ์ดํ„ฐ๋งŒ ์ฆ๋ถ„ ์ „์†ก
5. ์บ์‹ฑ: ๋™์ผ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ์บ์‹œ๋Š” ์„œ๋ฒ„ ๋ฉ”๋ชจ๋ฆฌ์— ๋ณด๊ด€ (LRU, ์ตœ๋Œ€ 5๊ฐœ ํ”„๋กฌํ”„ํŠธ)
```
---
## 7. Implementation Phases
### Phase 0: Foundation (1~2์ฃผ)
```
๋ชฉํ‘œ: ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์…‹์—… + GPT-2 small๋กœ T1/T2 ๋ชจ๋“œ ์ž‘๋™
Backend:
- [ ] FastAPI ํ”„๋กœ์ ํŠธ ์…‹์—… (poetry/uv ๊ธฐ๋ฐ˜ dependency ๊ด€๋ฆฌ)
- [ ] ModelManager ๊ตฌํ˜„ (GPT-2 small ๋กœ๋“œ)
- [ ] scan_structural() ๊ตฌํ˜„ โ†’ T1 ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
- [ ] scan_weights() ๊ตฌํ˜„ โ†’ T2 ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
- [ ] ๊ธฐ๋ณธ REST API ์—”๋“œํฌ์ธํŠธ (/model/load, /model/info, /scan/structural, /scan/weights)
Frontend:
- [ ] Vite + React ํ”„๋กœ์ ํŠธ ์…‹์—…
- [ ] DICOM ํ…Œ๋งˆ CSS ๋ณ€์ˆ˜ ์ •์˜
- [ ] ๊ธฐ๋ณธ ๋ ˆ์ด์•„์›ƒ ๊ตฌํ˜„ (Top Bar, Mode Tabs, Canvas, Panels)
- [ ] T1 Canvas ๋ Œ๋”๋ง: ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ๋…ธ๋“œ/์—ฃ์ง€๋กœ ์‹œ๊ฐํ™”
- [ ] T2 Canvas ๋ Œ๋”๋ง: weight ํžˆํŠธ๋งต
- [ ] Model selector dropdown
ํ…Œ์ŠคํŠธ:
- [ ] GPT-2 small ๋กœ๋“œ โ†’ T1 ๋ฐ์ดํ„ฐ ํ‘œ์‹œ โ†’ T2 ๋ชจ๋“œ ์ „ํ™˜ ๊ฒ€์ฆ
```
### Phase 1: Core Scanning (2~3์ฃผ)
```
๋ชฉํ‘œ: fMRI + DTI ๋ชจ๋“œ ์ž‘๋™. ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ โ†’ activation ์‹œ๊ฐํ™”
Backend:
- [ ] TransformerLens ํ†ตํ•ฉ (HookedTransformer.from_pretrained)
- [ ] scan_activation() ๊ตฌํ˜„ โ†’ fMRI ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
- [ ] scan_circuits() ๊ตฌํ˜„ โ†’ DTI ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
- [ ] WebSocket ์—”๋“œํฌ์ธํŠธ (ํ† ํฐ๋ณ„ activation ์ŠคํŠธ๋ฆฌ๋ฐ)
- [ ] ๋ฐ์ดํ„ฐ ์š”์•ฝ/์ง๋ ฌํ™” ํŒŒ์ดํ”„๋ผ์ธ (orjson)
Frontend:
- [ ] fMRI Canvas: cool-to-hot ์ปฌ๋Ÿฌ๋งต, ํŽ„์Šค ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] DTI Canvas: ๊ณก์„  ๊ฒฝ๋กœ, ๋ฐฉํ–ฅ๋ณ„ ์ƒ‰์ƒ, flow ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] Prompt ์ž…๋ ฅ UI + SCAN ๋ฒ„ํŠผ + ํ”„๋กœ๊ทธ๋ ˆ์Šค ๋ฐ”
- [ ] ํ† ํฐ step-through UI (ํ† ํฐ ์นฉ + ํ™”์‚ดํ‘œ ๋‚ด๋น„๊ฒŒ์ด์…˜)
- [ ] Layer Summary ๋ฐ” ์ฐจํŠธ (๋ชจ๋“œ๋ณ„ ์ ์‘)
- [ ] WebSocket ์—ฐ๊ฒฐ + ์‹ค์‹œ๊ฐ„ ์—…๋ฐ์ดํŠธ
ํ…Œ์ŠคํŠธ:
- [ ] "The capital of France is" โ†’ fMRI์—์„œ "France" ํ† ํฐ ์‹œ ๊ด€๋ จ ๋‰ด๋Ÿฐ ํ™œ์„ฑํ™” ํ™•์ธ
- [ ] DTI์—์„œ ์œ ์˜๋ฏธํ•œ information flow ๊ฒฝ๋กœ ์‹œ๊ฐํ™” ํ™•์ธ
```
### Phase 2: Perturbation + FLAIR (2~3์ฃผ)
```
๋ชฉํ‘œ: ์ž๊ทน/๋ณ€ํ˜• ์‹คํ—˜ + ์ด์ƒ ํƒ์ง€
Backend:
- [ ] PerturbationEngine ์ „์ฒด ๊ตฌํ˜„ (zero, amplify, ablate, inject, patch)
- [ ] scan_anomaly() ๊ตฌํ˜„ (logit lens + entropy)
- [ ] compare_results() ๊ตฌํ˜„ (before/after ๋น„๊ต)
- [ ] Activation patching (causal tracing) ๊ตฌํ˜„
Frontend:
- [ ] FLAIR Canvas: ์ด์ƒ ์˜์—ญ ํ•˜์ด๋ผ์ดํŠธ, ํŽ„์Šค ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] Stimulation Panel: ๋‰ด๋Ÿฐ ํด๋ฆญ โ†’ ์ƒ์„ธ ์ •๋ณด + perturbation ๋ฒ„ํŠผ
- [ ] Comparison Panel: ์›๋ณธ vs ๋ณ€ํ˜• ๊ฒฐ๊ณผ ๋‚˜๋ž€ํžˆ ํ‘œ์‹œ
- [ ] Perturbation ์ ์šฉ ์‹œ ์žฌ์Šค์บ” ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] Reset ๊ธฐ๋Šฅ (๋ชจ๋“  perturbation ์ดˆ๊ธฐํ™”)
- [ ] ์Šค์บ”๋ผ์ธ ์˜ค๋ฒ„๋ ˆ์ด + vignette ํšจ๊ณผ
ํ…Œ์ŠคํŠธ:
- [ ] ํŠน์ • attention head zero-out โ†’ ์˜ˆ์ธก ๋ณ€ํ™” ํ™•์ธ
- [ ] "The Eiffel Tower is in" โ†’ ์‚ฌ์‹ค ๊ด€๋ จ ์ปดํฌ๋„ŒํŠธ ablation โ†’ ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ์œ ๋„ ํ™•์ธ
- [ ] FLAIR์—์„œ entropy๊ฐ€ ๋†’์€ ์œ„์น˜๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํ•˜์ด๋ผ์ดํŠธ๋˜๋Š”์ง€ ํ™•์ธ
```
### Phase 3: Polish + Multi-Model (2์ฃผ)
```
๋ชฉํ‘œ: ๋‹ค์ค‘ ๋ชจ๋ธ ์ง€์› + UX ์™„์„ฑ
Backend:
- [ ] Pythia-1.4B, Gemma-2-2B ์ง€์› ์ถ”๊ฐ€ ๋ฐ ํ…Œ์ŠคํŠธ
- [ ] Llama-3.2-3B ์ง€์› (TransformerLens ํ˜ธํ™˜์„ฑ ํ™•์ธ, ํ•„์š” ์‹œ nnsight ๋Œ€์ฒด)
- [ ] ๋ชจ๋ธ ์Šค์™‘ ์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ์ตœ์ ํ™”
- [ ] API ์‘๋‹ต ์บ์‹ฑ ๋ ˆ์ด์–ด
Frontend:
- [ ] ๋ชจ๋“œ ์ „ํ™˜ ํฌ๋กœ์ŠคํŽ˜์ด๋“œ ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] ๋‰ด๋Ÿฐ ํ˜ธ๋ฒ„ ํˆดํŒ
- [ ] ์ „์ฒด ์Šค์บ”๋ผ์ธ + CRT ๋ฏธํ•™ ์™„์„ฑ
- [ ] ์„ฑ๋Šฅ ์ตœ์ ํ™” (large graph์—์„œ 60fps ์œ ์ง€)
- [ ] ์—๋Ÿฌ/๋กœ๋”ฉ ์ƒํƒœ UX
ํ…Œ์ŠคํŠธ:
- [ ] ๋ชจ๋ธ ๊ฐ„ ์Šค์™‘ ์‹œ ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜ ์—†์Œ ํ™•์ธ
- [ ] 3B ๋ชจ๋ธ์—์„œ ์ „์ฒด ์Šค์บ” ํŒŒ์ดํ”„๋ผ์ธ e2e ํ™•์ธ
```
### Phase 4: Advanced Features (ํ–ฅํ›„)
```
- [ ] SAE Feature ํƒ์ƒ‰๊ธฐ (SAELens ํ†ตํ•ฉ)
- [ ] Brain ๋ ˆ์ด์•„์›ƒ ๋ชจ๋“œ (์ฝ”๋ฅดํ‹ฐ์ปฌ ๋งคํ•‘)
- [ ] Multi-prompt ๋น„๊ต (๊ฐ™์€ ๋ชจ๋ธ์— ๋‹ค๋ฅธ ์ž…๋ ฅ ์‹œ activation ์ฐจ์ด)
- [ ] ์‹œ๊ณ„์—ด ๋…นํ™”/์žฌ์ƒ (์Šค์บ” ์„ธ์…˜ ์ €์žฅ)
- [ ] Export: ์Šค์บ” ๊ฒฐ๊ณผ๋ฅผ ์ด๋ฏธ์ง€/์˜์ƒ์œผ๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ
- [ ] ํ˜‘์—…: ์—ฌ๋Ÿฌ ์‚ฌ์šฉ์ž๊ฐ€ ๊ฐ™์€ ์Šค์บ” ์„ธ์…˜์„ ๊ณต์œ 
- [ ] ์ž๋™ ์ง„๋‹จ: "์ด ๋ชจ๋ธ์€ ์ด๋Ÿฐ ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค" ๋ณด๊ณ ์„œ ์ƒ์„ฑ
```
---
## 8. Development Environment
### 8.1 ํ•„์ˆ˜ ์š”๊ตฌ์‚ฌํ•ญ
```
- Python 3.11+
- Node.js 20+
- GPU: NVIDIA GPU with 8GB+ VRAM (๊ถŒ์žฅ). CPU ์ „์šฉ๋„ ๊ฐ€๋Šฅ (GPT-2 small ํ•œ์ •)
- CUDA 12.x (GPU ์‚ฌ์šฉ ์‹œ)
- ๋ฉ”๋ชจ๋ฆฌ: 16GB+ RAM
```
### 8.2 Backend ์˜์กด์„ฑ
```toml
[project]
name = "neural-mri"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.110",
"uvicorn[standard]>=0.27",
"websockets>=12.0",
"transformer-lens>=2.0",
"torch>=2.2",
"transformers>=4.40",
"accelerate>=0.28",
"sae-lens>=3.0", # Phase 2
"orjson>=3.9",
"numpy>=1.26",
"pydantic>=2.6",
]
```
### 8.3 Frontend ์˜์กด์„ฑ
```json
{
"dependencies": {
"react": "^18.3",
"react-dom": "^18.3",
"d3": "^7.9",
"zustand": "^4.5",
"use-websocket": "^4.8"
},
"devDependencies": {
"vite": "^5.4",
"@vitejs/plugin-react": "^4.2",
"tailwindcss": "^3.4",
"autoprefixer": "^10.4",
"postcss": "^8.4"
}
}
```
### 8.4 ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ
```
neural-mri/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ docker-compose.yml
โ”‚
โ”œโ”€โ”€ backend/
โ”‚ โ”œโ”€โ”€ pyproject.toml
โ”‚ โ”œโ”€โ”€ neural_mri/
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”‚ โ”œโ”€โ”€ main.py # FastAPI app entry
โ”‚ โ”‚ โ”œโ”€โ”€ config.py # ์„ค์ • (๋ชจ๋ธ ๊ฒฝ๋กœ, ์บ์‹œ, GPU)
โ”‚ โ”‚ โ”œโ”€โ”€ api/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ routes_model.py # /api/model/* ๋ผ์šฐํŠธ
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ routes_scan.py # /api/scan/* ๋ผ์šฐํŠธ
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ routes_perturb.py # /api/perturb/* ๋ผ์šฐํŠธ
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ ws_stream.py # WebSocket ํ•ธ๋“ค๋Ÿฌ
โ”‚ โ”‚ โ”œโ”€โ”€ core/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ model_manager.py
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ analysis_engine.py
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ perturbation_engine.py
โ”‚ โ”‚ โ”œโ”€โ”€ schemas/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ model.py # ModelInfo, ModelConfig
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ scan.py # ActivationData, CircuitData ๋“ฑ
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ perturb.py # PerturbResult, PatchResult
โ”‚ โ”‚ โ””โ”€โ”€ utils/
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”‚ โ”œโ”€โ”€ tensor_summary.py # ํ…์„œ โ†’ ์š”์•ฝ ๋ณ€ํ™˜
โ”‚ โ”‚ โ””โ”€โ”€ serialization.py # orjson ์ปค์Šคํ…€ ์ง๋ ฌํ™”
โ”‚ โ””โ”€โ”€ tests/
โ”‚ โ”œโ”€โ”€ test_model_manager.py
โ”‚ โ”œโ”€โ”€ test_analysis.py
โ”‚ โ””โ”€โ”€ test_perturbation.py
โ”‚
โ”œโ”€โ”€ frontend/
โ”‚ โ”œโ”€โ”€ package.json
โ”‚ โ”œโ”€โ”€ vite.config.js
โ”‚ โ”œโ”€โ”€ tailwind.config.js
โ”‚ โ”œโ”€โ”€ index.html
โ”‚ โ”œโ”€โ”€ src/
โ”‚ โ”‚ โ”œโ”€โ”€ main.jsx
โ”‚ โ”‚ โ”œโ”€โ”€ App.jsx
โ”‚ โ”‚ โ”œโ”€โ”€ theme/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ variables.css # DICOM ํ…Œ๋งˆ CSS ๋ณ€์ˆ˜
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ globals.css
โ”‚ โ”‚ โ”œโ”€โ”€ store/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ useModelStore.js
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ useScanStore.js
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ usePerturbStore.js
โ”‚ โ”‚ โ”œโ”€โ”€ components/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ TopBar.jsx
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ModeTabs.jsx
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ DicomHeader.jsx
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ScanCanvas/
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ScanCanvas.jsx # ๋ฉ”์ธ SVG ์บ”๋ฒ„์Šค
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ NeuronRenderer.jsx # ๋‰ด๋Ÿฐ ๋ Œ๋”๋ง ๋กœ์ง
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ConnectionRenderer.jsx # ์—ฃ์ง€ ๋ Œ๋”๋ง ๋กœ์ง
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ScanLineOverlay.jsx # CRT ์Šค์บ”๋ผ์ธ
โ”‚ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ colorMaps.js # ๋ชจ๋“œ๋ณ„ ์ƒ‰์ƒ ํ•จ์ˆ˜
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ Panels/
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ LayerSummary.jsx # ๋ ˆ์ด์–ด๋ณ„ ๋ง‰๋Œ€ ์ฐจํŠธ
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ StimPanel.jsx # ๋‰ด๋Ÿฐ ์„ ํƒ + perturbation
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ComparisonPanel.jsx # before/after ๋น„๊ต
โ”‚ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ LogPanel.jsx # ํ•˜๋‹จ ๋กœ๊ทธ
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ PromptInput.jsx
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ TokenStepper.jsx # ํ† ํฐ๋ณ„ step-through
โ”‚ โ”‚ โ”œโ”€โ”€ hooks/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ useWebSocket.js
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ useAnimationFrame.js
โ”‚ โ”‚ โ””โ”€โ”€ api/
โ”‚ โ”‚ โ”œโ”€โ”€ client.js # REST API ํด๋ผ์ด์–ธํŠธ
โ”‚ โ”‚ โ””โ”€โ”€ ws.js # WebSocket ํด๋ผ์ด์–ธํŠธ
โ”‚ โ””โ”€โ”€ public/
โ”‚ โ””โ”€โ”€ fonts/ # JetBrains Mono
โ”‚
โ””โ”€โ”€ docs/
โ”œโ”€โ”€ SPEC.md # ์ด ๋ฌธ์„œ
โ”œโ”€โ”€ API.md # API ์ƒ์„ธ ๋ฌธ์„œ
โ””โ”€โ”€ ARCHITECTURE.md # ์•„ํ‚คํ…์ฒ˜ ๋‹ค์ด์–ด๊ทธ๋žจ
```
---
## 9. Key Technical Decisions & Risks
### 9.1 TransformerLens ํ˜ธํ™˜์„ฑ
```
๋ฆฌ์Šคํฌ: TransformerLens๋Š” GPT-2, Pythia ๋“ฑ ์ผ๋ถ€ ๋ชจ๋ธ๋งŒ ๊ณต์‹ ์ง€์›.
Llama, Qwen ๋“ฑ์€ ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ตฌํ˜„์— ์˜์กดํ•˜๋ฉฐ ๋ฒ„์ „์— ๋”ฐ๋ผ ๊นจ์งˆ ์ˆ˜ ์žˆ์Œ.
๋Œ€์‘:
1. MVP๋Š” GPT-2 small/medium + Pythia๋กœ ์‹œ์ž‘ (ํ™•์‹คํ•œ ์ง€์›)
2. ์ƒˆ ๋ชจ๋ธ ์ถ”๊ฐ€ ์‹œ from_pretrained() ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ ์Šคํฌ๋ฆฝํŠธ ์ž‘์„ฑ
3. TransformerLens ๋ฏธ์ง€์› ๋ชจ๋ธ์€ nnsight ๋ฐฑ์—”๋“œ๋กœ ํด๋ฐฑ
4. ๋ชจ๋ธ๋ณ„ hook point ์ด๋ฆ„์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ถ”์ƒํ™” ๋ ˆ์ด์–ด ํ•„์š”
```
### 9.2 ์„ฑ๋Šฅ
```
๋ฆฌ์Šคํฌ: 3B+ ๋ชจ๋ธ์˜ full activation cache๊ฐ€ ์ˆ˜GB์— ๋‹ฌํ•  ์ˆ˜ ์žˆ์Œ.
๋Œ€์‘:
1. ์š”์•ฝ ์šฐ์„  ์ „๋žต: ์ „์ฒด ํ…์„œ ๋Œ€์‹  per-layer/per-head ํ†ต๊ณ„๋งŒ ๊ธฐ๋ณธ ์ „์†ก
2. Lazy loading: ์‚ฌ์šฉ์ž๊ฐ€ ํŠน์ • ๋ ˆ์ด์–ด ์„ ํƒ ์‹œ์—๋งŒ ์ƒ์„ธ ๋ฐ์ดํ„ฐ ์ „์†ก
3. ์„œ๋ฒ„์‚ฌ์ด๋“œ ์บ์‹ฑ: ๋™์ผ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ์บ์‹œ ์œ ์ง€ (LRU 5๊ฐœ)
4. ํ† ํฐ ์ŠคํŠธ๋ฆฌ๋ฐ: ์ „์ฒด ์‹œํ€€์Šค๋ฅผ ํ•œ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋˜, ํ”„๋ก ํŠธ์—”๋“œ์—๋Š” ํ† ํฐ๋ณ„ ์ „์†ก
5. GPU ๋ฉ”๋ชจ๋ฆฌ: ๋ชจ๋ธ + ์บ์‹œ๊ฐ€ VRAM ์ดˆ๊ณผ ์‹œ ์ž๋™์œผ๋กœ CPU ์˜คํ”„๋กœ๋“œ
```
### 9.3 ์‹œ๊ฐํ™” ์„ฑ๋Šฅ
```
๋ฆฌ์Šคํฌ: ๋…ธ๋“œ/์—ฃ์ง€๊ฐ€ ์ˆ˜๋ฐฑ ๊ฐœ์ผ ๋•Œ SVG ๋ Œ๋”๋ง์ด ๋А๋ ค์งˆ ์ˆ˜ ์žˆ์Œ.
๋Œ€์‘:
1. ์ง‘์•ฝ ํ‘œํ˜„: ๊ฐœ๋ณ„ ๋‰ด๋Ÿฐ์ด ์•„๋‹Œ "head" ๋˜๋Š” "layer component" ๋‹จ์œ„๋กœ ๋…ธ๋“œ ํ‘œํ˜„
(GPT-2 small: 12 layers ร— 3 components = ~36 nodes + embedding + output)
2. Viewport culling: ํ™”๋ฉด์— ๋ณด์ด๋Š” ๋…ธ๋“œ๋งŒ ๋ Œ๋”๋ง
3. ์—ฃ์ง€ ๊ฐ„์†Œํ™”: ๋ชจ๋“œ์— ๋”ฐ๋ผ ๋น„ํ™œ์„ฑ ์—ฃ์ง€๋ฅผ ์•„์˜ˆ ๋ Œ๋”๋งํ•˜์ง€ ์•Š์Œ
4. Canvas ์ „ํ™˜: SVG ์„ฑ๋Šฅ ํ•œ๊ณ„ ์‹œ WebGL (Three.js) ๋˜๋Š” Canvas 2D๋กœ ์ „ํ™˜
```
### 9.4 Perturbation ์•ˆ์ „์„ฑ
```
๋ฆฌ์Šคํฌ: perturbation์ด ๋ชจ๋ธ weight ์ž์ฒด๋ฅผ ์ˆ˜์ •ํ•˜๋ฉด ๋ณต๊ตฌ๊ฐ€ ์–ด๋ ค์›€.
๋Œ€์‘:
1. run_with_hooks()๋งŒ ์‚ฌ์šฉ: ๋ชจ๋ธ weight๋Š” ์ ˆ๋Œ€ ์ˆ˜์ •ํ•˜์ง€ ์•Š์Œ. Hook์œผ๋กœ activation๋งŒ ๋ณ€ํ˜•.
2. Reset ๋ฒ„ํŠผ: ๋ชจ๋“  hook์„ ์ œ๊ฑฐํ•˜๊ณ  ์›๋ณธ ์ƒํƒœ๋กœ ๋ณต๊ท€
3. ๋ชจ๋“  perturbation์€ stateless: ๊ฐ ์š”์ฒญ๋งˆ๋‹ค ์ƒˆ๋กœ hook์„ ์„ค์ •
```
---
## 10. Success Metrics
### MVP (Phase 0~2 ์™„๋ฃŒ ๊ธฐ์ค€)
```
1. GPT-2 small์— ๋Œ€ํ•ด 5๊ฐœ ๋ชจ๋“œ ๋ชจ๋‘ ์ž‘๋™
2. ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ โ†’ ์Šค์บ” ์™„๋ฃŒ๊นŒ์ง€ 2์ดˆ ์ด๋‚ด (GPU ๊ธฐ์ค€)
3. ํ† ํฐ step-through๊ฐ€ smoothํ•˜๊ฒŒ ์ž‘๋™ (ํ”„๋ ˆ์ž„ ๋“œ๋กญ ์—†์ด)
4. perturbation ์ ์šฉ โ†’ ๊ฒฐ๊ณผ ๋น„๊ต๊ฐ€ 1์ดˆ ์ด๋‚ด
5. ๋ชจ๋“œ ์ „ํ™˜ ์‹œ ํ† ํด๋กœ์ง€ ์œ ์ง€ํ•˜๋ฉด์„œ 0.3์ดˆ ์ด๋‚ด ์ „ํ™˜
```
### ํ™•์žฅ (Phase 3 ์ดํ›„)
```
1. 3B ๋ชจ๋ธ์—์„œ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ 5์ดˆ ์ด๋‚ด
2. ์ตœ์†Œ 3๊ฐœ ์ด์ƒ์˜ ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ ์ง€์›
3. Activation patching (causal tracing) ์‹œ๊ฐํ™”๊ฐ€ ๋…ผ๋ฌธ Figure ์ˆ˜์ค€
```
---
## 11. References
### ํ•ต์‹ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
- TransformerLens: https://github.com/TransformerLensOrg/TransformerLens
- SAELens: https://github.com/jbloomAus/SAELens
- nnsight: https://github.com/ndif-team/nnsight
### ํ•ต์‹ฌ ๋…ผ๋ฌธ/์ž๋ฃŒ
- Elhage et al. (2022) "Toy Models of Superposition" โ€” ์ค‘์ฒฉ(superposition) ์ด๋ก 
- Wang et al. (2022) "Interpretability in the Wild: IOI Circuit" โ€” ํšŒ๋กœ ๋ถ„์„
- Meng et al. (2022) "ROME: Rank-One Model Editing" โ€” ์‚ฌ์‹ค ์ €์žฅ ์œ„์น˜ ์ถ”์ 
- Anthropic (2024) "Scaling Monosemanticity" โ€” SAE feature ์ถ”์ถœ
- Neel Nanda's TransformerLens tutorials: https://neelnanda.io/
### ์˜๊ฐ
- 3D Slicer (์˜๋ฃŒ ์˜์ƒ ์‹œ๊ฐํ™”): https://www.slicer.org/
- FreeSurfer (๋‡Œ ์˜์ƒ ๋ถ„์„): https://surfer.nmr.mgh.harvard.edu/
- Neuronpedia (SAE feature ํƒ์ƒ‰๊ธฐ): https://www.neuronpedia.org/
---
*End of Specification*