Neural-MRI / docs /SPEC.md
Hiconcep's picture
Upload folder using huggingface_hub
0ce9643 verified

Neural MRI Scanner โ€” Implementation Specification

Model Resonance Imaging for AI Interpretability

Project Codename: NeuralMRI
Full Name: Neural MRI โ€” Model Resonance Imaging
Version: 0.1 (MVP)
Date: 2026-02-24
Author: JJ (Asia2G Capital / ModuLabs)


1. Executive Summary

Neural MRI Scanner๋Š” ์˜คํ”ˆ์†Œ์Šค LLM ๋‚ด๋ถ€๋ฅผ ๋‡Œ MRI์ฒ˜๋Ÿผ ์‹œ๊ฐํ™”ํ•˜๊ณ , ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ž๊ทน(perturbation)์„ ๊ฐ€ํ•ด ๋ณ€ํ™”๋ฅผ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ๋Š” AI ๋ชจ๋ธ ํ•ด์„ ๋„๊ตฌ(interpretability tool)๋‹ค. MRI๋Š” Model Resonance Imaging์˜ ์•ฝ์ž๋กœ, ์˜๋ฃŒ MRI(Magnetic Resonance Imaging)๊ฐ€ ๋‡Œ์˜ ๋‚ด๋ถ€๋ฅผ ๋“ค์—ฌ๋‹ค๋ณด๋“ฏ AI ๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ ํŠน์ • ์ž…๋ ฅ์— "๊ณต๋ช…(resonate)"ํ•˜๋Š” ๋‰ด๋Ÿฐ๊ณผ ํšŒ๋กœ๋ฅผ ์ฐพ์•„ ์˜์ƒํ™”ํ•œ๋‹ค๋Š” ์˜๋ฏธ๋ฅผ ๋‹ด๊ณ  ์žˆ๋‹ค.

ํ•ต์‹ฌ ์•„์ด๋””์–ด: ์˜๋ฃŒ ์˜์ƒ(T1, T2, fMRI, DTI, FLAIR)์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์Šค์บ” ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๊ทธ๋Œ€๋กœ AI ๋ชจ๋ธ ๋‚ด๋ถ€ ๋ถ„์„์— ๋งคํ•‘ํ•œ๋‹ค. ์—ฐ๊ตฌ์ž๋ฟ ์•„๋‹ˆ๋ผ ์—”์ง€๋‹ˆ์–ด, ์˜์‚ฌ๊ฒฐ์ •์ž๋„ "์ด ๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๋Š”์ง€" ์ง๊ด€์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.

๋Œ€์ƒ ์‚ฌ์šฉ์ž:

  • AI ์—”์ง€๋‹ˆ์–ด (๋ชจ๋ธ ๋””๋ฒ„๊น…, ํŒŒ์ธํŠœ๋‹ ๋ฌธ์ œ ์ง„๋‹จ)
  • ์—ฐ๊ตฌ์ž (mechanistic interpretability ์—ฐ๊ตฌ ๋ณด์กฐ)
  • ๊ธฐ์ˆ  ๋ฆฌ๋”/์˜์‚ฌ๊ฒฐ์ •์ž (๋ชจ๋ธ ํ–‰๋™์— ๋Œ€ํ•œ ์ง๊ด€ ํ™•๋ณด)

2. MRI Modality โ†’ AI Interpretability ๋งคํ•‘

์ด ํ”„๋กœ์ ํŠธ์˜ ํ•ต์‹ฌ ํ”„๋ ˆ์ž„์›Œํฌ. ๊ฐ ์˜๋ฃŒ ์˜์ƒ ๊ธฐ๋ฒ•์ด AI ๋ชจ๋ธ์˜ ์–ด๋–ค ์ธก๋ฉด์„ ๋ณด์—ฌ์ฃผ๋Š”์ง€ ์ •์˜ํ•œ๋‹ค. ์˜๋ฃŒ MRI์˜ ์šฉ์–ด ์ฒด๊ณ„๋ฅผ AI ๋งฅ๋ฝ์œผ๋กœ ์™„์ „ํžˆ ์žฌ์ •์˜ํ•˜์—ฌ ํ”„๋กœ์ ํŠธ ๊ณ ์œ ์˜ ์šฉ์–ด ์„ธ๊ณ„๊ด€์„ ๊ตฌ์ถ•ํ•œ๋‹ค.

Terminology Map

์˜๋ฃŒ ์›๋ณธ Neural MRI ์žฌ์ •์˜ ํ’€๋„ค์ž„ ์˜๋ฏธ
MRI (Magnetic Resonance Imaging) MRI Model Resonance Imaging AI ๋ชจ๋ธ ๋‚ด๋ถ€ ๊ณต๋ช… ์˜์ƒ
T1-weighted T1 Topology Layer 1 1์ฐจ ๊ตฌ์กฐ โ€” ์ •์  ์•„ํ‚คํ…์ฒ˜ ํ† ํด๋กœ์ง€
T2-weighted T2 Tensor Layer 2 2์ฐจ ๊ตฌ์กฐ โ€” ํ…์„œ(๊ฐ€์ค‘์น˜) ๋ถ„ํฌ
fMRI (functional Magnetic Resonance Imaging) fMRI functional Model Resonance Imaging ๊ธฐ๋Šฅ์  ํ™œ์„ฑํ™” ์˜์ƒ
DTI (Diffusion Tensor Imaging) DTI Data Tractography Imaging ๋ฐ์ดํ„ฐ ํ๋ฆ„ ๊ฒฝ๋กœ ์ถ”์ 
FLAIR (Fluid-Attenuated Inversion Recovery) FLAIR Feature-Level Anomaly Identification & Reporting ํ”ผ์ฒ˜ ์ˆ˜์ค€ ์ด์ƒ ํƒ์ง€ ๋ฐ ๋ณด๊ณ 

2.1 T1 โ€” Topology Layer 1 (Model Architecture)

ํ•ญ๋ชฉ ์„ค๋ช…
์˜๋ฃŒ ์›๋ณธ T1-weighted MRI: ์กฐ์ง์˜ ํ•ด๋ถ€ํ•™์  ๊ตฌ์กฐ๋ฅผ ๋ณด์—ฌ์คŒ
AI ๋งคํ•‘ ๋ชจ๋ธ์˜ ์ •์  ๊ตฌ์กฐ โ€” ๋ ˆ์ด์–ด ์ˆ˜, ๊ฐ ๋ ˆ์ด์–ด์˜ ๋‰ด๋Ÿฐ/head ์ˆ˜, ํŒŒ๋ผ๋ฏธํ„ฐ ์นด์šดํŠธ
์‹œ๊ฐํ™” ๊ฐ ๋ ˆ์ด์–ด๋ฅผ ๋…ธ๋“œ ํด๋Ÿฌ์Šคํ„ฐ๋กœ, ํฌ๊ธฐ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์— ๋น„๋ก€. ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ ํ†ค
๋ฐ์ดํ„ฐ ์†Œ์Šค model.config ์—์„œ ์ง์ ‘ ์ถ”์ถœ (์ •์ )
์ธํ„ฐ๋ž™์…˜ ํ˜ธ๋ฒ„ ์‹œ ๋ ˆ์ด์–ด ์ƒ์„ธ ์ •๋ณด ํ‘œ์‹œ (hidden_size, num_heads, intermediate_size ๋“ฑ)

2.2 T2 โ€” Tensor Layer 2 (Weight Distribution)

ํ•ญ๋ชฉ ์„ค๋ช…
์˜๋ฃŒ ์›๋ณธ T2-weighted MRI: T1๊ณผ ๋‹ค๋ฅธ ํƒ€์ด๋ฐ์œผ๋กœ ๋‹ค๋ฅธ ์กฐ์ง ๋Œ€์กฐ๋ฅผ ๋ณด์—ฌ์คŒ
AI ๋งคํ•‘ ๊ฐ€์ค‘์น˜(weight)์˜ ๋ถ„ํฌ, magnitude, ํ†ต๊ณ„์  ํŠน์„ฑ
์‹œ๊ฐํ™” ๊ฐ ๋‰ด๋Ÿฐ/head์˜ weight magnitude๋ฅผ ๋ธ”๋ฃจ ์Šค์ผ€์ผ ํžˆํŠธ๋งต์œผ๋กœ ํ‘œํ˜„. ๋ฐ์„์ˆ˜๋ก ํฐ ๊ฐ€์ค‘์น˜
๋ฐ์ดํ„ฐ ์†Œ์Šค model.state_dict()์—์„œ ๊ฐ ๋ ˆ์ด์–ด์˜ weight tensor โ†’ ํ†ต๊ณ„ (mean, std, max, L2 norm)
์ธํ„ฐ๋ž™์…˜ ๋ ˆ์ด์–ด๋ณ„/head๋ณ„ weight ํžˆ์Šคํ† ๊ทธ๋žจ ํ‘œ์‹œ. ์ด์ƒ์น˜(outlier) ๊ฐ€์ค‘์น˜ ํ•˜์ด๋ผ์ดํŠธ

2.3 fMRI โ€” functional Model Resonance Imaging (Activation Patterns)

ํ•ญ๋ชฉ ์„ค๋ช…
์˜๋ฃŒ ์›๋ณธ fMRI: ํ˜ˆ๋ฅ˜ ๋ณ€ํ™”๋กœ ๋‡Œ์˜ ํ™œ์„ฑํ™” ์˜์—ญ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ณด์—ฌ์คŒ
AI ๋งคํ•‘ ํŠน์ • ์ž…๋ ฅ(prompt)์— ๋Œ€ํ•œ ๊ฐ ๋ ˆ์ด์–ด/๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑํ™”(activation) ํŒจํ„ด
์‹œ๊ฐํ™” Cool-to-Hot ์ปฌ๋Ÿฌ๋งต (ํŒŒ๋ž‘โ†’๋…ธ๋ž‘โ†’๋นจ๊ฐ•). ํ™œ์„ฑํ™”๊ฐ€ ๋†’์€ ๋‰ด๋Ÿฐ์ด "๋œจ๊ฒ๊ฒŒ" ํ‘œ์‹œ. ์‹ค์‹œ๊ฐ„ ํŽ„์Šค ์• ๋‹ˆ๋ฉ”์ด์…˜
๋ฐ์ดํ„ฐ ์†Œ์Šค TransformerLens์˜ run_with_cache() โ†’ ๊ฐ ๋ ˆ์ด์–ด๋ณ„ activation tensor
์ธํ„ฐ๋ž™์…˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋ฐ”๊พธ๋ฉด activation์ด ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ณ€ํ™”. ํ† ํฐ๋ณ„ step-through ๊ฐ€๋Šฅ
ํ•ต์‹ฌ ๊ธฐ์ˆ  hook_resid_post, hook_attn_out, hook_mlp_out ์—์„œ ์บ์‹ฑ

2.4 DTI โ€” Data Tractography Imaging (Circuit Tracing)

ํ•ญ๋ชฉ ์„ค๋ช…
์˜๋ฃŒ ์›๋ณธ DTI: ๋ฐฑ์งˆ์˜ ์‹ ๊ฒฝ์„ฌ์œ  ํŠธ๋ž™์„ ์ถ”์ ํ•˜์—ฌ ๋‡Œ ์˜์—ญ ๊ฐ„ ์—ฐ๊ฒฐ์„ ๋ณด์—ฌ์คŒ
AI ๋งคํ•‘ ์ •๋ณด๊ฐ€ ์–ด๋–ค ๊ฒฝ๋กœ(attention head โ†’ MLP โ†’ ๋‹ค์Œ ๋ ˆ์ด์–ด)๋กœ ํ๋ฅด๋Š”์ง€ ์ถ”์ 
์‹œ๊ฐํ™” ๋ฐฉํ–ฅ๋ณ„ ์ƒ‰์ƒ ์ธ์ฝ”๋”ฉ(directional color encoding). ์œ ์˜๋ฏธํ•œ ์ •๋ณด ํ๋ฆ„ ๊ฒฝ๋กœ๋งŒ ๊ตต์€ ๊ณก์„ ์œผ๋กœ ํ‘œ์‹œ. ํ๋ฆ„ ๋ฐฉํ–ฅ ์• ๋‹ˆ๋ฉ”์ด์…˜
๋ฐ์ดํ„ฐ ์†Œ์Šค (1) Attention pattern: ๊ฐ head์˜ attention matrix. (2) Attribution patching: ๊ฐ ์ปดํฌ๋„ŒํŠธ์˜ ์ถœ๋ ฅ ๊ธฐ์—ฌ๋„
์ธํ„ฐ๋ž™์…˜ ํŠน์ • ์ถœ๋ ฅ ํ† ํฐ ์„ ํƒ ์‹œ ํ•ด๋‹น ํ† ํฐ์— ๊ฐ€์žฅ ๊ธฐ์—ฌํ•œ ๊ฒฝ๋กœ๊ฐ€ ํ•˜์ด๋ผ์ดํŠธ๋จ
ํ•ต์‹ฌ ๊ธฐ์ˆ  TransformerLens์˜ activation patching, attention pattern ์ถ”์ถœ

2.5 FLAIR โ€” Feature-Level Anomaly Identification & Reporting (Bias & Hallucination Detection)

ํ•ญ๋ชฉ ์„ค๋ช…
์˜๋ฃŒ ์›๋ณธ FLAIR: ๋ณ‘๋ณ€(lesion)์„ ๊ฐ•์กฐํ•˜์—ฌ ์ด์ƒ ๋ถ€์œ„๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ณด์—ฌ์คŒ
AI ๋งคํ•‘ ๋ชจ๋ธ์˜ "๋ฌธ์ œ ์ง€์ " โ€” ํ• ๋ฃจ์‹œ๋„ค์ด์…˜, ํŽธํ–ฅ, ๋ถˆํ™•์‹ค์„ฑ์ด ๋†’์€ ์˜์—ญ
์‹œ๊ฐํ™” ์ •์ƒ ์˜์—ญ์€ ์–ด๋‘ก๊ฒŒ, ์ด์ƒ ์˜์—ญ์€ ๋นจ๊ฐ„์ƒ‰/ํ•‘ํฌ์ƒ‰์œผ๋กœ ํŽ„์Šค. ์ด์ƒ ์ ์ˆ˜์— ๋”ฐ๋ฅธ ๊ฐ•๋„
๋ฐ์ดํ„ฐ ์†Œ์Šค (1) Logit lens: ์ค‘๊ฐ„ ๋ ˆ์ด์–ด์˜ ์˜ˆ์ธก์ด ์ตœ์ข… ์˜ˆ์ธก๊ณผ ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ์ง€. (2) Entropy: ๊ฐ ์œ„์น˜์˜ ๋‹ค์Œ ํ† ํฐ ์˜ˆ์ธก ๋ถˆํ™•์‹ค์„ฑ. (3) SAE feature ์ค‘ ์•Œ๋ ค์ง„ ํŽธํ–ฅ/ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ๊ด€๋ จ feature์˜ ํ™œ์„ฑํ™”
์ธํ„ฐ๋ž™์…˜ ์ด์ƒ ๋…ธ๋“œ ํด๋ฆญ ์‹œ ํ•ด๋‹น ๋‰ด๋Ÿฐ/feature์˜ ์ƒ์„ธ ์ •๋ณด, ๊ด€๋ จ ํ•™์Šต ๋ฐ์ดํ„ฐ ํŒจํ„ด ์ถ”์ •

3. System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Frontend (React)                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ MRI Canvas โ”‚  โ”‚ Mode Tabs โ”‚  โ”‚ Control Panels   โ”‚ โ”‚
โ”‚  โ”‚ (D3 / SVG) โ”‚  โ”‚ T1~FLAIR  โ”‚  โ”‚ Stim, Perturb,  โ”‚ โ”‚
โ”‚  โ”‚            โ”‚  โ”‚           โ”‚  โ”‚ Layer Summary    โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚        โ”‚  WebSocket (real-time activation stream)     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚        โ–ผ          Backend (FastAPI + Python)          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ Model      โ”‚  โ”‚ Analysis  โ”‚  โ”‚ Perturbation     โ”‚ โ”‚
โ”‚  โ”‚ Manager    โ”‚  โ”‚ Engine    โ”‚  โ”‚ Engine           โ”‚ โ”‚
โ”‚  โ”‚ (load/     โ”‚  โ”‚ (Trans-   โ”‚  โ”‚ (activation      โ”‚ โ”‚
โ”‚  โ”‚  swap)     โ”‚  โ”‚  formerLensโ”‚ โ”‚  patching, etc.) โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚        โ”‚                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚  Model Registry (HuggingFace Hub cache)          โ”‚ โ”‚
โ”‚  โ”‚  Llama-3.2-3B, Qwen-2.5-3B, Gemma-2-2B, etc.   โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3.1 Frontend

ํ•ญ๋ชฉ ๊ธฐ์ˆ 
Framework React 18+ (Vite)
์‹œ๊ฐํ™” ์—”์ง„ D3.js (SVG ๊ธฐ๋ฐ˜) โ€” ๋‰ด๋Ÿฐ/์—ฐ๊ฒฐ ๋ Œ๋”๋ง
์‹ค์‹œ๊ฐ„ ํ†ต์‹  WebSocket (activation ์ŠคํŠธ๋ฆฌ๋ฐ)
์ƒํƒœ ๊ด€๋ฆฌ Zustand (๊ฒฝ๋Ÿ‰)
์Šคํƒ€์ผ Tailwind CSS + CSS Variables (DICOM ํ…Œ๋งˆ)
์• ๋‹ˆ๋ฉ”์ด์…˜ requestAnimationFrame (์บ”๋ฒ„์Šค ํŽ„์Šค), CSS transitions (UI)

3.2 Backend

ํ•ญ๋ชฉ ๊ธฐ์ˆ 
์„œ๋ฒ„ FastAPI (Python 3.11+)
๋ชจ๋ธ ์ธํŠธ๋กœ์ŠคํŽ™์…˜ TransformerLens (HookedTransformer)
SAE ๋ถ„์„ SAELens (์„ ํƒ์‚ฌํ•ญ, Phase 2)
ํ…์„œ ์—ฐ์‚ฐ PyTorch 2.x
๋ชจ๋ธ ๋กœ๋”ฉ HuggingFace transformers + accelerate
WebSocket fastapi[websockets]
์‹œ๋ฆฌ์–ผ๋ผ์ด์ฆˆ orjson (๋Œ€์šฉ๋Ÿ‰ ํ…์„œ ๋ฐ์ดํ„ฐ ์ง๋ ฌํ™”)

3.3 ์ง€์› ๋ชจ๋ธ (MVP)

๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ TransformerLens ์ง€์› ์šฐ์„ ์ˆœ์œ„
GPT-2 small (124M) 124M โœ… ๊ณต์‹ ์ง€์› P0 (๊ฐœ๋ฐœ/ํ…Œ์ŠคํŠธ์šฉ)
GPT-2 medium (355M) 355M โœ… ๊ณต์‹ ์ง€์› P0
Pythia-1.4B 1.4B โœ… ๊ณต์‹ ์ง€์› P0
Gemma-2-2B 2B โœ… ์ง€์› P1
Llama-3.2-3B 3.21B โš ๏ธ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ง€์› P1
Qwen-2.5-3B 3B โš ๏ธ ์ปค๋ฎค๋‹ˆํ‹ฐ/์ปค์Šคํ…€ P1
Mistral-7B-v0.3 7.24B โš ๏ธ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ง€์› P2 (GPU ํ•„์š”)
Phi-3-mini-3.8B 3.8B โš ๏ธ ์ปค์Šคํ…€ ํ•„์š” P2

์ฐธ๊ณ : TransformerLens๋Š” GPT-2, Pythia ๊ณ„์—ด์ด ๊ฐ€์žฅ ์•ˆ์ •์ . Llama/Qwen ๋“ฑ์€ HookedTransformer.from_pretrained() ํ˜ธํ™˜์„ฑ ํ™•์ธ ํ•„์š”. ๋ฏธ์ง€์› ๋ชจ๋ธ์€ nnsight๋กœ ๋Œ€์ฒด ๊ฐ€๋Šฅ.


4. API Design

4.1 REST Endpoints

POST   /api/model/load          ๋ชจ๋ธ ๋กœ๋“œ (HuggingFace ID ๋˜๋Š” ๋กœ์ปฌ ๊ฒฝ๋กœ)
GET    /api/model/info           ํ˜„์žฌ ๋กœ๋“œ๋œ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ ์ •๋ณด (T1 ๋ฐ์ดํ„ฐ)
DELETE /api/model/unload         ๋ชจ๋ธ ์–ธ๋กœ๋“œ (๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ)

POST   /api/scan/structural      T1 ์Šค์บ”: ์ •์  ๊ตฌ์กฐ ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
POST   /api/scan/weights         T2 ์Šค์บ”: weight ํ†ต๊ณ„ ๋ฐ˜ํ™˜
POST   /api/scan/activation      fMRI ์Šค์บ”: ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฐ˜ activation ๋ฐ˜ํ™˜
POST   /api/scan/circuits        DTI ์Šค์บ”: attention + attribution ๊ฒฝ๋กœ ๋ฐ˜ํ™˜
POST   /api/scan/anomaly         FLAIR ์Šค์บ”: ์ด์ƒ ํƒ์ง€ ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜

POST   /api/perturb/zero         ํŠน์ • ์ปดํฌ๋„ŒํŠธ zero-out
POST   /api/perturb/amplify      ํŠน์ • ์ปดํฌ๋„ŒํŠธ amplify (factor)
POST   /api/perturb/ablate       ํŠน์ • ์ปดํฌ๋„ŒํŠธ ablate (์ œ๊ฑฐ)
POST   /api/perturb/inject       ํŠน์ • ์œ„์น˜์— activation ์ฃผ์ž…
POST   /api/perturb/patch        activation patching (causal tracing)
POST   /api/perturb/reset        perturbation ์ดˆ๊ธฐํ™” (์›๋ณธ ๋ณต์›)

GET    /api/features/list        SAE feature ๋ชฉ๋ก (Phase 2)
POST   /api/features/activate    ํŠน์ • SAE feature ํ™œ์„ฑํ™”/๋น„ํ™œ์„ฑํ™” (Phase 2)

4.2 WebSocket Endpoint

WS  /ws/stream

ํด๋ผ์ด์–ธํŠธ โ†’ ์„œ๋ฒ„:
{
  "type": "scan_stream",
  "mode": "fMRI",
  "prompt": "The capital of France is",
  "token_step": true       // true๋ฉด ํ† ํฐ๋ณ„๋กœ ์ŠคํŠธ๋ฆฌ๋ฐ
}

์„œ๋ฒ„ โ†’ ํด๋ผ์ด์–ธํŠธ:
{
  "type": "activation_frame",
  "token_idx": 3,
  "token": "capital",
  "layers": [
    {
      "layer_id": "blocks.0.attn",
      "type": "attention",
      "activations": [0.12, 0.87, ...],   // ์š”์•ฝ๋œ per-head ๊ฐ’
      "attention_pattern": [[...], ...]     // DTI ๋ชจ๋“œ ์‹œ ํฌํ•จ
    },
    ...
  ]
}

4.3 ์š”์ฒญ/์‘๋‹ต ์Šคํ‚ค๋งˆ ์˜ˆ์‹œ

POST /api/scan/activation

Request:

{
  "prompt": "The Eiffel Tower is located in",
  "layers": "all",              // ๋˜๋Š” ["blocks.3.mlp", "blocks.4.attn"]
  "aggregation": "l2_norm",     // "l2_norm" | "max" | "mean" | "raw"
  "include_residual": true,
  "token_positions": "all"      // ๋˜๋Š” [0, 1, 5]  (ํŠน์ • ํ† ํฐ ์œ„์น˜)
}

Response:

{
  "model": "gpt2-small",
  "prompt_tokens": ["The", " Eiff", "el", " Tower", " is", " located", " in"],
  "scan_mode": "fMRI",
  "data": {
    "embed": {
      "type": "embedding",
      "shape": [7, 768],
      "activations_summary": [0.45, 0.52, 0.48, 0.61, 0.33, 0.55, 0.41]
    },
    "blocks.0.attn": {
      "type": "attention",
      "num_heads": 12,
      "per_head_activation": [0.12, 0.87, 0.34, ...],
      "attention_patterns": {
        "shape": [12, 7, 7],
        "data_url": "/api/tensor/attn_0_patterns"
      }
    },
    "blocks.0.mlp": {
      "type": "mlp",
      "activation_summary": [0.22, 0.91, 0.45, ...],
      "top_neurons": [
        {"idx": 1247, "activation": 3.82, "label": null},
        {"idx": 892, "activation": 2.91, "label": null}
      ]
    }
  },
  "metadata": {
    "compute_time_ms": 342,
    "gpu_memory_mb": 1240
  }
}

POST /api/perturb/patch

Request:

{
  "prompt": "The Eiffel Tower is located in",
  "target_token_idx": -1,
  "target_component": "blocks.5.mlp",
  "method": "zero",
  "compare_logits": true
}

Response:

{
  "original_prediction": {
    "token": " Paris",
    "logit": 12.34,
    "prob": 0.87
  },
  "perturbed_prediction": {
    "token": " the",
    "logit": 8.12,
    "prob": 0.23
  },
  "logit_diff": -4.22,
  "affected_components": [
    {"id": "blocks.5.mlp", "impact_score": 0.92},
    {"id": "blocks.6.attn.head_3", "impact_score": 0.45}
  ]
}

5. Frontend Specification

5.1 ์ „์ฒด ๋ ˆ์ด์•„์›ƒ

โ”Œโ”€ Top Bar โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ [โ—] NEURAL MRI  โ”‚  Model Resonance Imaging  โ”‚  Model: [Dropdown โ–พ]  โ”‚  GPU: 2.1GB/8GB โ”‚
โ”œโ”€ Mode Tabs โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ [ T1 Topology ] [ T2 Tensor ] [ fMRI ]                    โ”‚
โ”‚ [ DTI ] [ FLAIR ]                                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                โ”‚  Layer Summary             โ”‚
โ”‚   DICOM Header                 โ”‚  โ”œโ”€ Embed:  โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 0.45  โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚  โ”œโ”€ Attn1:  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 0.87  โ”‚
โ”‚   โ”‚                      โ”‚     โ”‚  โ”œโ”€ MLP1:   โ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘ 0.34  โ”‚
โ”‚   โ”‚    Main Scan Canvas   โ”‚     โ”‚  โ””โ”€ ...                   โ”‚
โ”‚   โ”‚    (SVG/D3)           โ”‚     โ”‚                            โ”‚
โ”‚   โ”‚                      โ”‚     โ”‚  โ—‰ Stimulation Panel       โ”‚
โ”‚   โ”‚    - neurons          โ”‚     โ”‚  ID: blocks.3.attn.h7     โ”‚
โ”‚   โ”‚    - connections      โ”‚     โ”‚  Activation: 0.8721       โ”‚
โ”‚   โ”‚    - flow animations  โ”‚     โ”‚  [Zero] [Amp] [Inv]       โ”‚
โ”‚   โ”‚                      โ”‚     โ”‚  [Noise] [Ablate]          โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚                            โ”‚
โ”‚                                โ”‚  Comparison Panel          โ”‚
โ”‚   PROMPT: [________________]   โ”‚  Original: "Paris" (0.87)  โ”‚
โ”‚   [โ–ถ SCAN] [โธ PAUSE] [โ†บ RESET]โ”‚  Perturbed: "the" (0.23)  โ”‚
โ”‚                                โ”‚                            โ”‚
โ”œโ”€โ”€ Log Panel โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ [00:12] Scan complete โ€” Mode: fMRI, 7 tokens processed     โ”‚
โ”‚ [00:14] Perturbation: Zero-out on blocks.3.attn.head_7     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

5.2 ๋””์ž์ธ ์‹œ์Šคํ…œ

ํ…Œ๋งˆ: "Medical Dark" โ€” DICOM ๋ทฐ์–ด + ์ˆ˜์ˆ ์‹ค ๋ชจ๋‹ˆํ„ฐ ๋ฏธํ•™

/* Color Palette */
--bg-primary:     #0a0c10;     /* ๊ฑฐ์˜ ๊ฒ€์ •, ์•ฝ๊ฐ„ ๋ธ”๋ฃจ */
--bg-secondary:   #0c0e14;     /* ํŒจ๋„ ๋ฐฐ๊ฒฝ */
--bg-surface:     #12151c;     /* ์นด๋“œ/์ž…๋ ฅ ๋ฐฐ๊ฒฝ */
--border:         rgba(100, 170, 136, 0.15);  /* ์˜๋ฃŒ ๊ทธ๋ฆฐ ๋ณด๋” */
--text-primary:   #66aa88;     /* ์˜๋ฃŒ ๊ทธ๋ฆฐ ํ…์ŠคํŠธ */
--text-secondary: #556;        /* ํšŒ์ƒ‰ ๋ณด์กฐ ํ…์ŠคํŠธ */
--text-data:      #aabbcc;     /* ๋ฐ์ดํ„ฐ ๊ฐ’ */
--accent-active:  #00ffaa;     /* ์„ ํƒ/ํ™œ์„ฑ ํ•˜์ด๋ผ์ดํŠธ */
--scan-line:      rgba(255, 255, 255, 0.04);  /* ์Šค์บ”๋ผ์ธ ์˜ค๋ฒ„๋ ˆ์ด */

/* Mode-specific Colors (T1=Topology, T2=Tensor, fMRI=functional MRI, DTI=Data Tractography, FLAIR=Feature-Level Anomaly) */
--t1-base:     #8899aa;   --t1-accent:  #e0e0e0;
--t2-base:     #4488cc;   --t2-accent:  #aaccee;
--fmri-cold:   #1a2a5a;   --fmri-warm:  #cc8830;  --fmri-hot: #ff4420;
--dti-green:   #44ddaa;   --dti-purple: #8866ff;
--flair-normal:#334;       --flair-hot:  #ff4466;

/* Typography โ€” Monospace only */
--font-primary: 'JetBrains Mono', 'Fira Code', 'Courier New', monospace;
--font-size-xs:  9px;   /* ๋กœ๊ทธ, ๋ฒ”๋ก€ */
--font-size-sm:  10px;  /* ๋ผ๋ฒจ, ํƒญ */
--font-size-md:  11px;  /* ๋ณธ๋ฌธ ๋ฐ์ดํ„ฐ */
--font-size-lg:  14px;  /* ํƒ€์ดํ‹€ */

ํ•„์ˆ˜ ๋น„์ฃผ์–ผ ์š”์†Œ:

  1. ์Šค์บ”๋ผ์ธ ์˜ค๋ฒ„๋ ˆ์ด โ€” ์บ”๋ฒ„์Šค ์œ„์— 1px ๊ฐ„๊ฒฉ์˜ ์ˆ˜ํ‰์„ . opacity 0.03~0.05. CRT ๋ชจ๋‹ˆํ„ฐ ๋А๋‚Œ
  2. DICOM ํ—ค๋” โ€” ์บ”๋ฒ„์Šค ์ƒ๋‹จ์— ์˜๋ฃŒ ์˜์ƒ ์Šคํƒ€์ผ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (๋ชจ๋ธ๋ช…, ์‹œํ€€์Šค, ๋‚ ์งœ/์‹œ๊ฐ„, FOV, "Model Resonance Imaging" ํ‘œ๊ธฐ)
  3. Vignette ํšจ๊ณผ โ€” ์บ”๋ฒ„์Šค ๊ฐ€์žฅ์ž๋ฆฌ๊ฐ€ ์‚ด์ง ์–ด๋‘์›Œ์ง€๋Š” ํšจ๊ณผ
  4. Pulse ์• ๋‹ˆ๋ฉ”์ด์…˜ โ€” fMRI ๋ชจ๋“œ์—์„œ ํ™œ์„ฑํ™”๋œ ๋‰ด๋Ÿฐ์˜ ํฌ๊ธฐ์™€ ๋ฐ๊ธฐ๊ฐ€ ์ฃผ๊ธฐ์ ์œผ๋กœ ๋ฏธ์„ธํ•˜๊ฒŒ ๋ณ€๋™
  5. Flow ์• ๋‹ˆ๋ฉ”์ด์…˜ โ€” DTI ๋ชจ๋“œ์—์„œ ์—ฐ๊ฒฐ์„ ์„ ๋”ฐ๋ผ ์ž‘์€ ์ž…์ž/๋ฐ๊ธฐ๊ฐ€ ํ๋ฅด๋Š” ํšจ๊ณผ

5.3 Canvas ๋ Œ๋”๋ง ์‚ฌ์–‘

๋‰ด๋Ÿฐ(๋…ธ๋“œ) ๋ Œ๋”๋ง

๊ฐ ๋‰ด๋Ÿฐ์€ ์›(circle)์œผ๋กœ ํ‘œํ˜„.

์œ„์น˜ ๊ฒฐ์ •:
- Y์ถ•: ๋ ˆ์ด์–ด ์ˆœ์„œ (์ƒ๋‹จ = embedding, ํ•˜๋‹จ = output)
- X์ถ•: ๊ฐ™์€ ๋ ˆ์ด์–ด ๋‚ด ๋‰ด๋Ÿฐ๋“ค์ด ์ˆ˜ํ‰์œผ๋กœ ๋ถ„ํฌ
- ๋ ˆ์ด์–ด ๊ฐ„ ๊ฐ„๊ฒฉ: 60~80px
- ๋‰ด๋Ÿฐ ๊ฐ„ ๊ฐ„๊ฒฉ: ๋ ˆ์ด์–ด ๋‚ด ๋‰ด๋Ÿฐ ์ˆ˜์— ๋”ฐ๋ผ ์ž๋™ ์กฐ์ •

ํฌ๊ธฐ ๊ฒฐ์ • (๋ชจ๋“œ๋ณ„):
- T1: ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์— ๋น„๋ก€ (4~10px ๋ฐ˜์ง€๋ฆ„)
- T2: weight magnitude์— ๋น„๋ก€
- fMRI: base ํฌ๊ธฐ ร— (0.5 + activation ร— 1.0) ร— pulse_factor
- DTI: ์ผ์ • ํฌ๊ธฐ, ์ƒ‰์ƒ์œผ๋กœ ๋ฐฉํ–ฅ ์ธ์ฝ”๋”ฉ
- FLAIR: ์ •์ƒ=์ž‘๊ฒŒ, ์ด์ƒ=ํฌ๊ฒŒ + ํŽ„์Šค

์ƒ‰์ƒ ๊ฒฐ์ • (๋ชจ๋“œ๋ณ„):
- T1: ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ (rgb(v,v,v+10), v = 160~220)
- T2: ๋ธ”๋ฃจ ์Šค์ผ€์ผ (weight ์ž‘์œผ๋ฉด ์–ด๋‘์šด ๋‚จ์ƒ‰, ํฌ๋ฉด ๋ฐ์€ ํ•˜๋Š˜์ƒ‰)
- fMRI: cool-to-hot colormap
  - activation < 0.3: ์–ด๋‘์šด ํŒŒ๋ž‘ rgb(30+a*80, 30+a*100, 80+a*120)
  - activation 0.3~0.6: ๋…ธ๋ž‘/์ฃผํ™ฉ rgb(a*200, a*160, 40+a*60)
  - activation > 0.6: ๋นจ๊ฐ•/ํฐ rgb(200+a*55, a*120, a*30)
- DTI: HSL, hue = (x/width)*120 + (y/height)*120, saturation 70%, lightness 55%
- FLAIR: ์ •์ƒ=rgb(60,65,75), ์ด์ƒ=rgb(255, 50+a*60, 80+a*40) ํŽ„์Šค

์—ฐ๊ฒฐ(์—ฃ์ง€) ๋ Œ๋”๋ง

์—ฐ๊ฒฐ์€ ๋ ˆ์ด์–ด ๊ฐ„ ์ •๋ณด ํ๋ฆ„์„ ํ‘œํ˜„.

๋ชจ๋“œ๋ณ„ ํ‘œํ˜„:
- T1: ์–‡์€ ํšŒ์ƒ‰ ์„  (opacity 0.15, width 0.5)
- T2: weight ํฌ๊ธฐ์— ๋”ฐ๋ผ opacity์™€ ๋‘๊ป˜ ๋ณ€ํ™”
- fMRI: ์–‘๋ ๋‰ด๋Ÿฐ์˜ ํ‰๊ท  activation์— ๋”ฐ๋ผ ์ƒ‰์ƒ/๋‘๊ป˜ ๋ณ€ํ™”
  - ๋†’์€ activation: ํ•ซ ์ปฌ๋Ÿฌ, ๊ตต์€ ์„ 
  - ๋‚ฎ์€ activation: ๊ฑฐ์˜ ํˆฌ๋ช…
- DTI: ์œ ์˜๋ฏธํ•œ pathway๋งŒ ํ‘œ์‹œ
  - ๊ณก์„ (quadratic bezier) ์‚ฌ์šฉ
  - ๋ฐฉํ–ฅ์— ๋”ฐ๋ฅธ HSL ์ƒ‰์ƒ
  - flow ์• ๋‹ˆ๋ฉ”์ด์…˜ (sin wave๋กœ opacity ๋ณ€๋™)
  - ๋น„-pathway ์—ฐ๊ฒฐ์€ ๊ฑฐ์˜ ํˆฌ๋ช…
- FLAIR: ์ด์ƒ ๋…ธ๋“œ์— ์—ฐ๊ฒฐ๋œ ์—ฃ์ง€๋งŒ ๋นจ๊ฐ„์ƒ‰ ํ•˜์ด๋ผ์ดํŠธ

ํ† ํด๋กœ์ง€ ๋ ˆ์ด์•„์›ƒ ์˜ต์…˜ (Phase 2 ์ดํ›„)

MVP: ์ˆ˜์ง ๋ ˆ์ด์–ด ์Šคํƒ (์œ„โ†’์•„๋ž˜)
Phase 2: ์‚ฌ์šฉ์ž๊ฐ€ ๋ ˆ์ด์•„์›ƒ ๋ชจ๋“œ๋ฅผ ์„ ํƒ ๊ฐ€๋Šฅ
  - Stack (๊ธฐ๋ณธ): ์ˆ˜์ง ๋ ˆ์ด์–ด ์Šคํƒ
  - Brain: ํƒ€์›ํ˜• ๋‡Œ ๋ชจ์–‘์œผ๋กœ ๊ฐ์‹ธ์„œ ๋ฐฐ์น˜ (์ฝ”๋ฅดํ‹ฐ์ปฌ ๋งคํ•‘ ๋น„์œ )
  - Network: force-directed ๊ทธ๋ž˜ํ”„ (D3 force simulation)
  - Radial: ์ค‘์‹ฌ์—์„œ ๋ฐ”๊นฅ์œผ๋กœ ๋ ˆ์ด์–ด๊ฐ€ ํ™•์žฅ

5.4 ์ธํ„ฐ๋ž™์…˜ ์‚ฌ์–‘

๋‰ด๋Ÿฐ ์„ ํƒ (Stimulation Mode)

1. ๋‰ด๋Ÿฐ ํด๋ฆญ โ†’ ์„ ํƒ ์ƒํƒœ ์ง„์ž…
2. ์„ ํƒ๋œ ๋‰ด๋Ÿฐ ์ฃผ์œ„์— ๋™์‹ฌ์› ์• ๋‹ˆ๋ฉ”์ด์…˜ (green glow)
3. ์šฐ์ธก ํŒจ๋„์— ์ƒ์„ธ ์ •๋ณด ํ‘œ์‹œ:
   - Node ID (layer.component.index)
   - Layer type (attention / mlp / embedding / output)
   - ํ˜„์žฌ ๋ชจ๋“œ์˜ ์ฃผ์š” ๊ฐ’ (activation, weight, anomaly score)
   - Top-k ์—ฐ๊ฒฐ๋œ ๋‰ด๋Ÿฐ (๊ฐ€์žฅ ๊ฐ•ํ•œ ์—ฐ๊ฒฐ)
4. Perturbation ๋ฒ„ํŠผ ํ™œ์„ฑํ™”:
   - Zero-out: ํ•ด๋‹น ์ปดํฌ๋„ŒํŠธ ์ถœ๋ ฅ์„ 0์œผ๋กœ
   - Amplify 2ร—: ์ถœ๋ ฅ์„ 2๋ฐฐ๋กœ
   - Invert: ์ถœ๋ ฅ ๋ถ€ํ˜ธ ๋ฐ˜์ „
   - Noise ยฑฯƒ: ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€
   - Ablate: ์™„์ „ ์ œ๊ฑฐ (zero + gradient ์ฐจ๋‹จ)
5. Perturbation ์ ์šฉ ์‹œ:
   - ๋ฐฑ์—”๋“œ์— perturbation ์š”์ฒญ โ†’ ์ƒˆ๋กœ์šด activation ์ˆ˜์‹ 
   - ์บ”๋ฒ„์Šค ์ „์ฒด๊ฐ€ 0.3์ดˆ๊ฐ„ ์žฌ์Šค์บ” ์• ๋‹ˆ๋ฉ”์ด์…˜
   - ๋ณ€ํ™”๋œ ๋ถ€๋ถ„์ด ์ž ์‹œ ํ•˜์ด๋ผ์ดํŠธ
   - ์šฐ์ธก Comparison Panel์— before/after ํ‘œ์‹œ

ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ & ์Šค์บ”

1. ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ โ†’ SCAN ๋ฒ„ํŠผ ํด๋ฆญ (๋˜๋Š” Enter)
2. ์Šค์บ” ํ”„๋กœ๊ทธ๋ ˆ์Šค ๋ฐ” ํ‘œ์‹œ (์‹ค์ œ ๋ฐฑ์—”๋“œ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„ ๋ฐ˜์˜)
3. WebSocket์œผ๋กœ ํ† ํฐ๋ณ„ activation ์ŠคํŠธ๋ฆฌ๋ฐ
4. ํ† ํฐ step-through ๊ฐ€๋Šฅ:
   - ํ”„๋กฌํ”„ํŠธ ์˜์—ญ์— ๊ฐ ํ† ํฐ์ด ์นฉ(chip)์œผ๋กœ ํ‘œ์‹œ
   - ํ† ํฐ ์นฉ ํด๋ฆญ โ†’ ํ•ด๋‹น ํ† ํฐ ์‹œ์ ์˜ activation๋งŒ ํ‘œ์‹œ
   - โ† โ†’ ํ™”์‚ดํ‘œ๋กœ ํ† ํฐ ๊ฐ„ ์ด๋™
   - ์ž๋™ ์žฌ์ƒ (0.5์ดˆ ๊ฐ„๊ฒฉ)

๋ชจ๋“œ ์ „ํ™˜

1. ๋ชจ๋“œ ํƒญ ํด๋ฆญ โ†’ 0.3์ดˆ ํฌ๋กœ์ŠคํŽ˜์ด๋“œ ์ „ํ™˜
2. ๋™์ผํ•œ ํ† ํด๋กœ์ง€(๋‰ด๋Ÿฐ ์œ„์น˜)๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์ƒ‰์ƒ/ํฌ๊ธฐ/์—ฐ๊ฒฐ ํ‘œํ˜„๋งŒ ๋ณ€๊ฒฝ
3. ์ด๋Š” ์‹ค์ œ MRI์—์„œ ๊ฐ™์€ ํ™˜์ž์˜ T1โ†’fMRI ์ „ํ™˜๊ณผ ๋™์ผํ•œ ๊ฒฝํ—˜

5.5 ๋ฐ˜์‘ํ˜• ๊ณ ๋ ค์‚ฌํ•ญ

- ์ตœ์†Œ ์ง€์› ํ•ด์ƒ๋„: 1280ร—720
- ๊ถŒ์žฅ ํ•ด์ƒ๋„: 1920ร—1080
- ์บ”๋ฒ„์Šค ํฌ๊ธฐ: ์ปจํ…Œ์ด๋„ˆ์— ๋งž๊ฒŒ ์Šค์ผ€์ผ๋ง (SVG viewBox ์‚ฌ์šฉ)
- ๋ชจ๋ฐ”์ผ: ๋ฏธ์ง€์› (๋ฐ์Šคํฌํ†ฑ ์ „์šฉ ๋„๊ตฌ)

6. Backend Specification

6.1 ๋ชจ๋ธ ๋งค๋‹ˆ์ € (ModelManager)

class ModelManager:
    """๋ชจ๋ธ ๋กœ๋”ฉ, ์Šค์™‘, ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ"""

    def load_model(self, model_id: str, device: str = "auto") -> ModelInfo:
        """
        HuggingFace ๋ชจ๋ธ์„ TransformerLens HookedTransformer๋กœ ๋กœ๋“œ.
        - model_id: "gpt2", "EleutherAI/pythia-1.4b", "meta-llama/Llama-3.2-3B" ๋“ฑ
        - device: "cpu", "cuda", "mps", "auto"
        - ๋ฐ˜ํ™˜: ๋ชจ๋ธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (๋ ˆ์ด์–ด ์ˆ˜, hidden size, head ์ˆ˜ ๋“ฑ)
        """

    def unload_model(self) -> None:
        """ํ˜„์žฌ ๋ชจ๋ธ ์–ธ๋กœ๋“œ + GPU ๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ (gc + torch.cuda.empty_cache)"""

    def get_model_info(self) -> ModelInfo:
        """ํ˜„์žฌ ๋กœ๋“œ๋œ ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ฒ˜ ์ •๋ณด ๋ฐ˜ํ™˜ (T1 ๋ฐ์ดํ„ฐ)"""

    def get_model(self) -> HookedTransformer:
        """ํ˜„์žฌ ๋กœ๋“œ๋œ ๋ชจ๋ธ ์ธ์Šคํ„ด์Šค ๋ฐ˜ํ™˜"""

6.2 ๋ถ„์„ ์—”์ง„ (AnalysisEngine)

class AnalysisEngine:
    """๊ฐ ์Šค์บ” ๋ชจ๋“œ์— ๋Œ€ํ•œ ๋ถ„์„ ์ˆ˜ํ–‰"""

    def scan_structural(self) -> StructuralData:
        """T1: model.cfg์—์„œ ์ •์  ๊ตฌ์กฐ ์ถ”์ถœ"""

    def scan_weights(self, layers: list[str] | None = None) -> WeightData:
        """T2: state_dict์—์„œ weight ํ†ต๊ณ„ ์ถ”์ถœ"""

    def scan_activation(self, prompt: str, **kwargs) -> ActivationData:
        """
        fMRI: prompt์— ๋Œ€ํ•œ activation ์บ์‹œ.
        TransformerLens run_with_cache() ์‚ฌ์šฉ.

        ํ•ต์‹ฌ ๊ตฌํ˜„:
        logits, cache = model.run_with_cache(prompt)

        ์ถ”์ถœ ๋Œ€์ƒ hook points:
        - hook_embed: ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด
        - blocks.{i}.hook_resid_pre: ๊ฐ ๋ธ”๋ก ์ž…๋ ฅ residual
        - blocks.{i}.attn.hook_result: attention ์ถœ๋ ฅ
        - blocks.{i}.hook_mlp_out: MLP ์ถœ๋ ฅ
        - blocks.{i}.hook_resid_post: ๊ฐ ๋ธ”๋ก ์ถœ๋ ฅ residual

        aggregation ์˜ต์…˜:
        - "l2_norm": L2 norm per position (์Šค์นผ๋ผ)
        - "max": max absolute value
        - "mean": mean absolute value
        - "raw": ์ „์ฒด ํ…์„œ ๋ฐ˜ํ™˜ (๋Œ€์šฉ๋Ÿ‰, ์„ ํƒ์ )
        """

    def scan_circuits(self, prompt: str, target_token: int = -1) -> CircuitData:
        """
        DTI: attention pattern + attribution ๊ฒฝ๋กœ ์ถ”์ถœ.

        (1) Attention Pattern:
        _, cache = model.run_with_cache(prompt)
        attn_patterns = cache["blocks.{i}.attn.hook_pattern"]
        โ†’ shape: [num_heads, seq_len, seq_len]

        (2) Attribution (๊ฐ„์ด ๋ฒ„์ „):
        ๊ฐ head/mlp์˜ ์ถœ๋ ฅ์„ zero-out ํ–ˆ์„ ๋•Œ target logit ๋ณ€ํ™”๋Ÿ‰ ๊ณ„์‚ฐ.
        โ†’ ์ ˆ๋Œ€๊ฐ’์ด ํฐ ์ปดํฌ๋„ŒํŠธ = ์ค‘์š” ๊ฒฝ๋กœ
        """

    def scan_anomaly(self, prompt: str) -> AnomalyData:
        """
        FLAIR: ์ด์ƒ ํƒ์ง€.

        (1) Logit Lens:
        ๊ฐ ์ค‘๊ฐ„ ๋ ˆ์ด์–ด์˜ residual stream์„ unembedํ•˜์—ฌ
        ์ค‘๊ฐ„ ์˜ˆ์ธก vs ์ตœ์ข… ์˜ˆ์ธก์˜ KL divergence ๊ณ„์‚ฐ.
        ํฐ divergence = ํ•ด๋‹น ๋ ˆ์ด์–ด์—์„œ "์ƒ๊ฐ์ด ํฌ๊ฒŒ ๋ฐ”๋€œ" = ์ž ์žฌ์  ์ด์ƒ

        (2) Entropy:
        ๊ฐ ์œ„์น˜์˜ logit์—์„œ softmax โ†’ entropy ๊ณ„์‚ฐ.
        ๋†’์€ entropy = ๋ชจ๋ธ์ด ๋ถˆํ™•์‹ค = ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ์œ„ํ—˜

        (3) ์ด์ƒ ์ ์ˆ˜:
        anomaly_score = ฮฑ * normalized_kl_div + ฮฒ * normalized_entropy
        ฮฑ = 0.6, ฮฒ = 0.4 (ํŠœ๋‹ ๊ฐ€๋Šฅ)
        """

6.3 Perturbation ์—”์ง„ (PerturbationEngine)

class PerturbationEngine:
    """๋ชจ๋ธ ๋‚ด๋ถ€์— ์ž๊ทน/๋ณ€ํ˜•์„ ๊ฐ€ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ต"""

    def zero_out(self, component: str, prompt: str) -> PerturbResult:
        """
        ํŠน์ • ์ปดํฌ๋„ŒํŠธ์˜ ์ถœ๋ ฅ์„ 0์œผ๋กœ ๋งŒ๋“ค๊ณ  ์žฌ์‹คํ–‰.

        ๊ตฌํ˜„:
        def zero_hook(value, hook):
            value[:, :, :] = 0  # ๋˜๋Š” ํŠน์ • head๋งŒ
            return value

        model.run_with_hooks(prompt, fwd_hooks=[(component, zero_hook)])
        """

    def amplify(self, component: str, factor: float, prompt: str) -> PerturbResult:
        """์ถœ๋ ฅ์— factor๋ฅผ ๊ณฑํ•˜์—ฌ ์ฆํญ"""

    def ablate(self, component: str, prompt: str) -> PerturbResult:
        """์ปดํฌ๋„ŒํŠธ๋ฅผ ์™„์ „ํžˆ ์ œ๊ฑฐ (mean ablation: ํ‰๊ท ๊ฐ’์œผ๋กœ ๋Œ€์ฒด)"""

    def inject_activation(self, component: str, values: list, prompt: str) -> PerturbResult:
        """ํŠน์ • activation ๊ฐ’์„ ์ง์ ‘ ์ฃผ์ž…"""

    def activation_patch(
        self,
        clean_prompt: str,
        corrupt_prompt: str,
        component: str
    ) -> PatchResult:
        """
        Activation Patching (Causal Tracing).

        clean_prompt์˜ ํŠน์ • ์ปดํฌ๋„ŒํŠธ activation์„
        corrupt_prompt ์‹คํ–‰ ์ค‘์— ๊ต์ฒดํ•˜์—ฌ ๋ณต๊ตฌ ์ •๋„๋ฅผ ์ธก์ •.

        ๊ตฌํ˜„:
        _, clean_cache = model.run_with_cache(clean_prompt)
        clean_activation = clean_cache[component]

        def patch_hook(value, hook):
            value[:] = clean_activation
            return value

        patched_logits = model.run_with_hooks(
            corrupt_prompt,
            fwd_hooks=[(component, patch_hook)]
        )

        recovery = (patched_logit - corrupt_logit) / (clean_logit - corrupt_logit)
        """

    def compare_results(self, original: Logits, perturbed: Logits) -> ComparisonData:
        """์›๋ณธ๊ณผ ๋ณ€ํ˜• ๊ฒฐ๊ณผ ๋น„๊ต: top-k ์˜ˆ์ธก, logit diff, KL divergence"""

6.4 ๋ฐ์ดํ„ฐ ์š”์•ฝ ์ „๋žต

๋Œ€์šฉ๋Ÿ‰ ํ…์„œ๋ฅผ ํ”„๋ก ํŠธ์—”๋“œ๋กœ ์ „์†กํ•  ๋•Œ์˜ ์š”์•ฝ ์ „๋žต:

๋ฌธ์ œ: GPT-2 small๋งŒ ํ•ด๋„ ๋‹จ์ผ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ์ „์ฒด activation cache๊ฐ€ ์ˆ˜๋ฐฑMB.

ํ•ด๊ฒฐ:
1. ๊ธฐ๋ณธ ์‘๋‹ต: ๋ ˆ์ด์–ด๋ณ„/head๋ณ„ ์š”์•ฝ ํ†ต๊ณ„๋งŒ ์ „์†ก (L2 norm, max, mean โ†’ ์Šค์นผ๋ผ ๋ฐฐ์—ด)
2. ์˜จ๋””๋งจ๋“œ: ์‚ฌ์šฉ์ž๊ฐ€ ํŠน์ • ๋ ˆ์ด์–ด/head๋ฅผ ์„ ํƒํ•˜๋ฉด ํ•ด๋‹น ๋ถ€๋ถ„๋งŒ ์ƒ์„ธ ๋ฐ์ดํ„ฐ ์ „์†ก
3. ์–ดํ…์…˜ ํŒจํ„ด: full attention matrix๋Š” ์š”์ฒญ ์‹œ์—๋งŒ ์ „์†ก (shape: [heads, seq, seq])
4. ์ŠคํŠธ๋ฆฌ๋ฐ: ํ† ํฐ๋ณ„ step-through ์‹œ ๊ฐ ํ† ํฐ์˜ ๋ฐ์ดํ„ฐ๋งŒ ์ฆ๋ถ„ ์ „์†ก
5. ์บ์‹ฑ: ๋™์ผ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ์บ์‹œ๋Š” ์„œ๋ฒ„ ๋ฉ”๋ชจ๋ฆฌ์— ๋ณด๊ด€ (LRU, ์ตœ๋Œ€ 5๊ฐœ ํ”„๋กฌํ”„ํŠธ)

7. Implementation Phases

Phase 0: Foundation (1~2์ฃผ)

๋ชฉํ‘œ: ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ ์…‹์—… + GPT-2 small๋กœ T1/T2 ๋ชจ๋“œ ์ž‘๋™

Backend:
- [ ] FastAPI ํ”„๋กœ์ ํŠธ ์…‹์—… (poetry/uv ๊ธฐ๋ฐ˜ dependency ๊ด€๋ฆฌ)
- [ ] ModelManager ๊ตฌํ˜„ (GPT-2 small ๋กœ๋“œ)
- [ ] scan_structural() ๊ตฌํ˜„ โ†’ T1 ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
- [ ] scan_weights() ๊ตฌํ˜„ โ†’ T2 ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
- [ ] ๊ธฐ๋ณธ REST API ์—”๋“œํฌ์ธํŠธ (/model/load, /model/info, /scan/structural, /scan/weights)

Frontend:
- [ ] Vite + React ํ”„๋กœ์ ํŠธ ์…‹์—…
- [ ] DICOM ํ…Œ๋งˆ CSS ๋ณ€์ˆ˜ ์ •์˜
- [ ] ๊ธฐ๋ณธ ๋ ˆ์ด์•„์›ƒ ๊ตฌํ˜„ (Top Bar, Mode Tabs, Canvas, Panels)
- [ ] T1 Canvas ๋ Œ๋”๋ง: ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ๋…ธ๋“œ/์—ฃ์ง€๋กœ ์‹œ๊ฐํ™”
- [ ] T2 Canvas ๋ Œ๋”๋ง: weight ํžˆํŠธ๋งต
- [ ] Model selector dropdown

ํ…Œ์ŠคํŠธ:
- [ ] GPT-2 small ๋กœ๋“œ โ†’ T1 ๋ฐ์ดํ„ฐ ํ‘œ์‹œ โ†’ T2 ๋ชจ๋“œ ์ „ํ™˜ ๊ฒ€์ฆ

Phase 1: Core Scanning (2~3์ฃผ)

๋ชฉํ‘œ: fMRI + DTI ๋ชจ๋“œ ์ž‘๋™. ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ โ†’ activation ์‹œ๊ฐํ™”

Backend:
- [ ] TransformerLens ํ†ตํ•ฉ (HookedTransformer.from_pretrained)
- [ ] scan_activation() ๊ตฌํ˜„ โ†’ fMRI ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
- [ ] scan_circuits() ๊ตฌํ˜„ โ†’ DTI ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜
- [ ] WebSocket ์—”๋“œํฌ์ธํŠธ (ํ† ํฐ๋ณ„ activation ์ŠคํŠธ๋ฆฌ๋ฐ)
- [ ] ๋ฐ์ดํ„ฐ ์š”์•ฝ/์ง๋ ฌํ™” ํŒŒ์ดํ”„๋ผ์ธ (orjson)

Frontend:
- [ ] fMRI Canvas: cool-to-hot ์ปฌ๋Ÿฌ๋งต, ํŽ„์Šค ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] DTI Canvas: ๊ณก์„  ๊ฒฝ๋กœ, ๋ฐฉํ–ฅ๋ณ„ ์ƒ‰์ƒ, flow ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] Prompt ์ž…๋ ฅ UI + SCAN ๋ฒ„ํŠผ + ํ”„๋กœ๊ทธ๋ ˆ์Šค ๋ฐ”
- [ ] ํ† ํฐ step-through UI (ํ† ํฐ ์นฉ + ํ™”์‚ดํ‘œ ๋‚ด๋น„๊ฒŒ์ด์…˜)
- [ ] Layer Summary ๋ฐ” ์ฐจํŠธ (๋ชจ๋“œ๋ณ„ ์ ์‘)
- [ ] WebSocket ์—ฐ๊ฒฐ + ์‹ค์‹œ๊ฐ„ ์—…๋ฐ์ดํŠธ

ํ…Œ์ŠคํŠธ:
- [ ] "The capital of France is" โ†’ fMRI์—์„œ "France" ํ† ํฐ ์‹œ ๊ด€๋ จ ๋‰ด๋Ÿฐ ํ™œ์„ฑํ™” ํ™•์ธ
- [ ] DTI์—์„œ ์œ ์˜๋ฏธํ•œ information flow ๊ฒฝ๋กœ ์‹œ๊ฐํ™” ํ™•์ธ

Phase 2: Perturbation + FLAIR (2~3์ฃผ)

๋ชฉํ‘œ: ์ž๊ทน/๋ณ€ํ˜• ์‹คํ—˜ + ์ด์ƒ ํƒ์ง€

Backend:
- [ ] PerturbationEngine ์ „์ฒด ๊ตฌํ˜„ (zero, amplify, ablate, inject, patch)
- [ ] scan_anomaly() ๊ตฌํ˜„ (logit lens + entropy)
- [ ] compare_results() ๊ตฌํ˜„ (before/after ๋น„๊ต)
- [ ] Activation patching (causal tracing) ๊ตฌํ˜„

Frontend:
- [ ] FLAIR Canvas: ์ด์ƒ ์˜์—ญ ํ•˜์ด๋ผ์ดํŠธ, ํŽ„์Šค ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] Stimulation Panel: ๋‰ด๋Ÿฐ ํด๋ฆญ โ†’ ์ƒ์„ธ ์ •๋ณด + perturbation ๋ฒ„ํŠผ
- [ ] Comparison Panel: ์›๋ณธ vs ๋ณ€ํ˜• ๊ฒฐ๊ณผ ๋‚˜๋ž€ํžˆ ํ‘œ์‹œ
- [ ] Perturbation ์ ์šฉ ์‹œ ์žฌ์Šค์บ” ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] Reset ๊ธฐ๋Šฅ (๋ชจ๋“  perturbation ์ดˆ๊ธฐํ™”)
- [ ] ์Šค์บ”๋ผ์ธ ์˜ค๋ฒ„๋ ˆ์ด + vignette ํšจ๊ณผ

ํ…Œ์ŠคํŠธ:
- [ ] ํŠน์ • attention head zero-out โ†’ ์˜ˆ์ธก ๋ณ€ํ™” ํ™•์ธ
- [ ] "The Eiffel Tower is in" โ†’ ์‚ฌ์‹ค ๊ด€๋ จ ์ปดํฌ๋„ŒํŠธ ablation โ†’ ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ์œ ๋„ ํ™•์ธ
- [ ] FLAIR์—์„œ entropy๊ฐ€ ๋†’์€ ์œ„์น˜๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํ•˜์ด๋ผ์ดํŠธ๋˜๋Š”์ง€ ํ™•์ธ

Phase 3: Polish + Multi-Model (2์ฃผ)

๋ชฉํ‘œ: ๋‹ค์ค‘ ๋ชจ๋ธ ์ง€์› + UX ์™„์„ฑ

Backend:
- [ ] Pythia-1.4B, Gemma-2-2B ์ง€์› ์ถ”๊ฐ€ ๋ฐ ํ…Œ์ŠคํŠธ
- [ ] Llama-3.2-3B ์ง€์› (TransformerLens ํ˜ธํ™˜์„ฑ ํ™•์ธ, ํ•„์š” ์‹œ nnsight ๋Œ€์ฒด)
- [ ] ๋ชจ๋ธ ์Šค์™‘ ์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ์ตœ์ ํ™”
- [ ] API ์‘๋‹ต ์บ์‹ฑ ๋ ˆ์ด์–ด

Frontend:
- [ ] ๋ชจ๋“œ ์ „ํ™˜ ํฌ๋กœ์ŠคํŽ˜์ด๋“œ ์• ๋‹ˆ๋ฉ”์ด์…˜
- [ ] ๋‰ด๋Ÿฐ ํ˜ธ๋ฒ„ ํˆดํŒ
- [ ] ์ „์ฒด ์Šค์บ”๋ผ์ธ + CRT ๋ฏธํ•™ ์™„์„ฑ
- [ ] ์„ฑ๋Šฅ ์ตœ์ ํ™” (large graph์—์„œ 60fps ์œ ์ง€)
- [ ] ์—๋Ÿฌ/๋กœ๋”ฉ ์ƒํƒœ UX

ํ…Œ์ŠคํŠธ:
- [ ] ๋ชจ๋ธ ๊ฐ„ ์Šค์™‘ ์‹œ ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜ ์—†์Œ ํ™•์ธ
- [ ] 3B ๋ชจ๋ธ์—์„œ ์ „์ฒด ์Šค์บ” ํŒŒ์ดํ”„๋ผ์ธ e2e ํ™•์ธ

Phase 4: Advanced Features (ํ–ฅํ›„)

- [ ] SAE Feature ํƒ์ƒ‰๊ธฐ (SAELens ํ†ตํ•ฉ)
- [ ] Brain ๋ ˆ์ด์•„์›ƒ ๋ชจ๋“œ (์ฝ”๋ฅดํ‹ฐ์ปฌ ๋งคํ•‘)
- [ ] Multi-prompt ๋น„๊ต (๊ฐ™์€ ๋ชจ๋ธ์— ๋‹ค๋ฅธ ์ž…๋ ฅ ์‹œ activation ์ฐจ์ด)
- [ ] ์‹œ๊ณ„์—ด ๋…นํ™”/์žฌ์ƒ (์Šค์บ” ์„ธ์…˜ ์ €์žฅ)
- [ ] Export: ์Šค์บ” ๊ฒฐ๊ณผ๋ฅผ ์ด๋ฏธ์ง€/์˜์ƒ์œผ๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ
- [ ] ํ˜‘์—…: ์—ฌ๋Ÿฌ ์‚ฌ์šฉ์ž๊ฐ€ ๊ฐ™์€ ์Šค์บ” ์„ธ์…˜์„ ๊ณต์œ 
- [ ] ์ž๋™ ์ง„๋‹จ: "์ด ๋ชจ๋ธ์€ ์ด๋Ÿฐ ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค" ๋ณด๊ณ ์„œ ์ƒ์„ฑ

8. Development Environment

8.1 ํ•„์ˆ˜ ์š”๊ตฌ์‚ฌํ•ญ

- Python 3.11+
- Node.js 20+
- GPU: NVIDIA GPU with 8GB+ VRAM (๊ถŒ์žฅ). CPU ์ „์šฉ๋„ ๊ฐ€๋Šฅ (GPT-2 small ํ•œ์ •)
- CUDA 12.x (GPU ์‚ฌ์šฉ ์‹œ)
- ๋ฉ”๋ชจ๋ฆฌ: 16GB+ RAM

8.2 Backend ์˜์กด์„ฑ

[project]
name = "neural-mri"
requires-python = ">=3.11"

dependencies = [
    "fastapi>=0.110",
    "uvicorn[standard]>=0.27",
    "websockets>=12.0",
    "transformer-lens>=2.0",
    "torch>=2.2",
    "transformers>=4.40",
    "accelerate>=0.28",
    "sae-lens>=3.0",         # Phase 2
    "orjson>=3.9",
    "numpy>=1.26",
    "pydantic>=2.6",
]

8.3 Frontend ์˜์กด์„ฑ

{
  "dependencies": {
    "react": "^18.3",
    "react-dom": "^18.3",
    "d3": "^7.9",
    "zustand": "^4.5",
    "use-websocket": "^4.8"
  },
  "devDependencies": {
    "vite": "^5.4",
    "@vitejs/plugin-react": "^4.2",
    "tailwindcss": "^3.4",
    "autoprefixer": "^10.4",
    "postcss": "^8.4"
  }
}

8.4 ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

neural-mri/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ docker-compose.yml
โ”‚
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ pyproject.toml
โ”‚   โ”œโ”€โ”€ neural_mri/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ main.py              # FastAPI app entry
โ”‚   โ”‚   โ”œโ”€โ”€ config.py            # ์„ค์ • (๋ชจ๋ธ ๊ฒฝ๋กœ, ์บ์‹œ, GPU)
โ”‚   โ”‚   โ”œโ”€โ”€ api/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ routes_model.py  # /api/model/* ๋ผ์šฐํŠธ
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ routes_scan.py   # /api/scan/* ๋ผ์šฐํŠธ
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ routes_perturb.py # /api/perturb/* ๋ผ์šฐํŠธ
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ws_stream.py     # WebSocket ํ•ธ๋“ค๋Ÿฌ
โ”‚   โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ model_manager.py
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ analysis_engine.py
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ perturbation_engine.py
โ”‚   โ”‚   โ”œโ”€โ”€ schemas/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ model.py         # ModelInfo, ModelConfig
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ scan.py          # ActivationData, CircuitData ๋“ฑ
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ perturb.py       # PerturbResult, PatchResult
โ”‚   โ”‚   โ””โ”€โ”€ utils/
โ”‚   โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚       โ”œโ”€โ”€ tensor_summary.py # ํ…์„œ โ†’ ์š”์•ฝ ๋ณ€ํ™˜
โ”‚   โ”‚       โ””โ”€โ”€ serialization.py  # orjson ์ปค์Šคํ…€ ์ง๋ ฌํ™”
โ”‚   โ””โ”€โ”€ tests/
โ”‚       โ”œโ”€โ”€ test_model_manager.py
โ”‚       โ”œโ”€โ”€ test_analysis.py
โ”‚       โ””โ”€โ”€ test_perturbation.py
โ”‚
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ package.json
โ”‚   โ”œโ”€โ”€ vite.config.js
โ”‚   โ”œโ”€โ”€ tailwind.config.js
โ”‚   โ”œโ”€โ”€ index.html
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ main.jsx
โ”‚   โ”‚   โ”œโ”€โ”€ App.jsx
โ”‚   โ”‚   โ”œโ”€โ”€ theme/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ variables.css    # DICOM ํ…Œ๋งˆ CSS ๋ณ€์ˆ˜
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ globals.css
โ”‚   โ”‚   โ”œโ”€โ”€ store/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ useModelStore.js
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ useScanStore.js
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ usePerturbStore.js
โ”‚   โ”‚   โ”œโ”€โ”€ components/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ TopBar.jsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ModeTabs.jsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ DicomHeader.jsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ScanCanvas/
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ScanCanvas.jsx       # ๋ฉ”์ธ SVG ์บ”๋ฒ„์Šค
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ NeuronRenderer.jsx   # ๋‰ด๋Ÿฐ ๋ Œ๋”๋ง ๋กœ์ง
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ConnectionRenderer.jsx # ์—ฃ์ง€ ๋ Œ๋”๋ง ๋กœ์ง
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ScanLineOverlay.jsx  # CRT ์Šค์บ”๋ผ์ธ
โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ colorMaps.js         # ๋ชจ๋“œ๋ณ„ ์ƒ‰์ƒ ํ•จ์ˆ˜
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ Panels/
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ LayerSummary.jsx     # ๋ ˆ์ด์–ด๋ณ„ ๋ง‰๋Œ€ ์ฐจํŠธ
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ StimPanel.jsx        # ๋‰ด๋Ÿฐ ์„ ํƒ + perturbation
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ComparisonPanel.jsx  # before/after ๋น„๊ต
โ”‚   โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ LogPanel.jsx         # ํ•˜๋‹จ ๋กœ๊ทธ
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ PromptInput.jsx
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ TokenStepper.jsx         # ํ† ํฐ๋ณ„ step-through
โ”‚   โ”‚   โ”œโ”€โ”€ hooks/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ useWebSocket.js
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ useAnimationFrame.js
โ”‚   โ”‚   โ””โ”€โ”€ api/
โ”‚   โ”‚       โ”œโ”€โ”€ client.js                # REST API ํด๋ผ์ด์–ธํŠธ
โ”‚   โ”‚       โ””โ”€โ”€ ws.js                    # WebSocket ํด๋ผ์ด์–ธํŠธ
โ”‚   โ””โ”€โ”€ public/
โ”‚       โ””โ”€โ”€ fonts/                       # JetBrains Mono
โ”‚
โ””โ”€โ”€ docs/
    โ”œโ”€โ”€ SPEC.md                          # ์ด ๋ฌธ์„œ
    โ”œโ”€โ”€ API.md                           # API ์ƒ์„ธ ๋ฌธ์„œ
    โ””โ”€โ”€ ARCHITECTURE.md                  # ์•„ํ‚คํ…์ฒ˜ ๋‹ค์ด์–ด๊ทธ๋žจ

9. Key Technical Decisions & Risks

9.1 TransformerLens ํ˜ธํ™˜์„ฑ

๋ฆฌ์Šคํฌ: TransformerLens๋Š” GPT-2, Pythia ๋“ฑ ์ผ๋ถ€ ๋ชจ๋ธ๋งŒ ๊ณต์‹ ์ง€์›.
       Llama, Qwen ๋“ฑ์€ ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ตฌํ˜„์— ์˜์กดํ•˜๋ฉฐ ๋ฒ„์ „์— ๋”ฐ๋ผ ๊นจ์งˆ ์ˆ˜ ์žˆ์Œ.

๋Œ€์‘:
1. MVP๋Š” GPT-2 small/medium + Pythia๋กœ ์‹œ์ž‘ (ํ™•์‹คํ•œ ์ง€์›)
2. ์ƒˆ ๋ชจ๋ธ ์ถ”๊ฐ€ ์‹œ from_pretrained() ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ ์Šคํฌ๋ฆฝํŠธ ์ž‘์„ฑ
3. TransformerLens ๋ฏธ์ง€์› ๋ชจ๋ธ์€ nnsight ๋ฐฑ์—”๋“œ๋กœ ํด๋ฐฑ
4. ๋ชจ๋ธ๋ณ„ hook point ์ด๋ฆ„์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ถ”์ƒํ™” ๋ ˆ์ด์–ด ํ•„์š”

9.2 ์„ฑ๋Šฅ

๋ฆฌ์Šคํฌ: 3B+ ๋ชจ๋ธ์˜ full activation cache๊ฐ€ ์ˆ˜GB์— ๋‹ฌํ•  ์ˆ˜ ์žˆ์Œ.

๋Œ€์‘:
1. ์š”์•ฝ ์šฐ์„  ์ „๋žต: ์ „์ฒด ํ…์„œ ๋Œ€์‹  per-layer/per-head ํ†ต๊ณ„๋งŒ ๊ธฐ๋ณธ ์ „์†ก
2. Lazy loading: ์‚ฌ์šฉ์ž๊ฐ€ ํŠน์ • ๋ ˆ์ด์–ด ์„ ํƒ ์‹œ์—๋งŒ ์ƒ์„ธ ๋ฐ์ดํ„ฐ ์ „์†ก
3. ์„œ๋ฒ„์‚ฌ์ด๋“œ ์บ์‹ฑ: ๋™์ผ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ์บ์‹œ ์œ ์ง€ (LRU 5๊ฐœ)
4. ํ† ํฐ ์ŠคํŠธ๋ฆฌ๋ฐ: ์ „์ฒด ์‹œํ€€์Šค๋ฅผ ํ•œ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋˜, ํ”„๋ก ํŠธ์—”๋“œ์—๋Š” ํ† ํฐ๋ณ„ ์ „์†ก
5. GPU ๋ฉ”๋ชจ๋ฆฌ: ๋ชจ๋ธ + ์บ์‹œ๊ฐ€ VRAM ์ดˆ๊ณผ ์‹œ ์ž๋™์œผ๋กœ CPU ์˜คํ”„๋กœ๋“œ

9.3 ์‹œ๊ฐํ™” ์„ฑ๋Šฅ

๋ฆฌ์Šคํฌ: ๋…ธ๋“œ/์—ฃ์ง€๊ฐ€ ์ˆ˜๋ฐฑ ๊ฐœ์ผ ๋•Œ SVG ๋ Œ๋”๋ง์ด ๋А๋ ค์งˆ ์ˆ˜ ์žˆ์Œ.

๋Œ€์‘:
1. ์ง‘์•ฝ ํ‘œํ˜„: ๊ฐœ๋ณ„ ๋‰ด๋Ÿฐ์ด ์•„๋‹Œ "head" ๋˜๋Š” "layer component" ๋‹จ์œ„๋กœ ๋…ธ๋“œ ํ‘œํ˜„
   (GPT-2 small: 12 layers ร— 3 components = ~36 nodes + embedding + output)
2. Viewport culling: ํ™”๋ฉด์— ๋ณด์ด๋Š” ๋…ธ๋“œ๋งŒ ๋ Œ๋”๋ง
3. ์—ฃ์ง€ ๊ฐ„์†Œํ™”: ๋ชจ๋“œ์— ๋”ฐ๋ผ ๋น„ํ™œ์„ฑ ์—ฃ์ง€๋ฅผ ์•„์˜ˆ ๋ Œ๋”๋งํ•˜์ง€ ์•Š์Œ
4. Canvas ์ „ํ™˜: SVG ์„ฑ๋Šฅ ํ•œ๊ณ„ ์‹œ WebGL (Three.js) ๋˜๋Š” Canvas 2D๋กœ ์ „ํ™˜

9.4 Perturbation ์•ˆ์ „์„ฑ

๋ฆฌ์Šคํฌ: perturbation์ด ๋ชจ๋ธ weight ์ž์ฒด๋ฅผ ์ˆ˜์ •ํ•˜๋ฉด ๋ณต๊ตฌ๊ฐ€ ์–ด๋ ค์›€.

๋Œ€์‘:
1. run_with_hooks()๋งŒ ์‚ฌ์šฉ: ๋ชจ๋ธ weight๋Š” ์ ˆ๋Œ€ ์ˆ˜์ •ํ•˜์ง€ ์•Š์Œ. Hook์œผ๋กœ activation๋งŒ ๋ณ€ํ˜•.
2. Reset ๋ฒ„ํŠผ: ๋ชจ๋“  hook์„ ์ œ๊ฑฐํ•˜๊ณ  ์›๋ณธ ์ƒํƒœ๋กœ ๋ณต๊ท€
3. ๋ชจ๋“  perturbation์€ stateless: ๊ฐ ์š”์ฒญ๋งˆ๋‹ค ์ƒˆ๋กœ hook์„ ์„ค์ •

10. Success Metrics

MVP (Phase 0~2 ์™„๋ฃŒ ๊ธฐ์ค€)

1. GPT-2 small์— ๋Œ€ํ•ด 5๊ฐœ ๋ชจ๋“œ ๋ชจ๋‘ ์ž‘๋™
2. ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ โ†’ ์Šค์บ” ์™„๋ฃŒ๊นŒ์ง€ 2์ดˆ ์ด๋‚ด (GPU ๊ธฐ์ค€)
3. ํ† ํฐ step-through๊ฐ€ smoothํ•˜๊ฒŒ ์ž‘๋™ (ํ”„๋ ˆ์ž„ ๋“œ๋กญ ์—†์ด)
4. perturbation ์ ์šฉ โ†’ ๊ฒฐ๊ณผ ๋น„๊ต๊ฐ€ 1์ดˆ ์ด๋‚ด
5. ๋ชจ๋“œ ์ „ํ™˜ ์‹œ ํ† ํด๋กœ์ง€ ์œ ์ง€ํ•˜๋ฉด์„œ 0.3์ดˆ ์ด๋‚ด ์ „ํ™˜

ํ™•์žฅ (Phase 3 ์ดํ›„)

1. 3B ๋ชจ๋ธ์—์„œ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ 5์ดˆ ์ด๋‚ด
2. ์ตœ์†Œ 3๊ฐœ ์ด์ƒ์˜ ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ ์ง€์›
3. Activation patching (causal tracing) ์‹œ๊ฐํ™”๊ฐ€ ๋…ผ๋ฌธ Figure ์ˆ˜์ค€

11. References

ํ•ต์‹ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

ํ•ต์‹ฌ ๋…ผ๋ฌธ/์ž๋ฃŒ

  • Elhage et al. (2022) "Toy Models of Superposition" โ€” ์ค‘์ฒฉ(superposition) ์ด๋ก 
  • Wang et al. (2022) "Interpretability in the Wild: IOI Circuit" โ€” ํšŒ๋กœ ๋ถ„์„
  • Meng et al. (2022) "ROME: Rank-One Model Editing" โ€” ์‚ฌ์‹ค ์ €์žฅ ์œ„์น˜ ์ถ”์ 
  • Anthropic (2024) "Scaling Monosemanticity" โ€” SAE feature ์ถ”์ถœ
  • Neel Nanda's TransformerLens tutorials: https://neelnanda.io/

์˜๊ฐ


End of Specification