Neural-MRI / docs /neural-mri-next-phase.md
Hiconcep's picture
Upload folder using huggingface_hub
0ce9643 verified

Neural MRI โ€” Phase 5 ๋ฐฉํ–ฅ ๋ฉ”๋ชจ

From: JJ
To: Cody
Date: 2026-03-01
Re: Phase 4 ์™„๋ฃŒ ๋ฆฌ๋ทฐ + ํ–ฅํ›„ ์šฐ์„ ์ˆœ์œ„ ์กฐ์ • + ์‹ ๊ทœ ๊ธฐ๋Šฅ ์ œ์•ˆ


1. Phase 4 ์™„๋ฃŒ์— ๋Œ€ํ•ด

Phase 0โ€“4 ์ „์ฒด๋ฅผ ๊น”๋”ํ•˜๊ฒŒ ๋งˆ๋ฌด๋ฆฌํ•ด์ค˜์„œ ๊ณ ๋ง™๋‹ค. ํŠนํžˆ SAE Feature ํƒ์ƒ‰๊ธฐ, ์‹ค์‹œ๊ฐ„ ํ˜‘์—…, ๋…นํ™”/์žฌ์ƒ, 4๊ฐ€์ง€ ๋ ˆ์ด์•„์›ƒ๊นŒ์ง€ โ€” ์›๋ž˜ ์ŠคํŽ™์—์„œ "ํ–ฅํ›„"๋กœ ์žก์•˜๋˜ ๊ฒƒ๋“ค์ด ์ „๋ถ€ ๋“ค์–ด๊ฐ„ ๊ฑด ๊ธฐ๋Œ€ ์ด์ƒ์ด์—ˆ๋‹ค. CI ํŒŒ์ดํ”„๋ผ์ธ + pytest 111๊ฐœ๋„ ์˜คํ”ˆ์†Œ์Šค ๊ณต๊ฐœ๋ฅผ ๊ณ ๋ คํ•˜๋ฉด ํฐ ์ž์‚ฐ์ด๋‹ค.


2. ํ–ฅํ›„ ์ œ์•ˆ์— ๋Œ€ํ•œ ์šฐ์„ ์ˆœ์œ„ ์žฌ์กฐ์ •

๋„ค๊ฐ€ ์ œ์•ˆํ•œ Tier ๋ถ„๋ฅ˜๋Š” interpretability ๋„๊ตฌ๋กœ์„œ์˜ ์™„์„ฑ๋„ ๊ธฐ์ค€์œผ๋กœ๋Š” ์ •ํ™•ํ•˜๋‹ค. ํ•˜์ง€๋งŒ Neural MRI๋Š” "๋˜ ํ•˜๋‚˜์˜ interpretability ๋„๊ตฌ"๊ฐ€ ์•„๋‹ˆ๋ผ Model Medicine์˜ ์ง„๋‹จ ์žฅ๋น„๋ผ๋Š” ๋” ํฐ ํ”„๋ ˆ์ž„ ์•ˆ์— ์žˆ๋‹ค. ์ด ๊ด€์ ์—์„œ ์šฐ์„ ์ˆœ์œ„๋ฅผ ์กฐ์ •ํ•˜๊ณ  ์‹ถ๋‹ค.

์กฐ์ • 1: Cross-model ๋น„๊ต โ†’ Tier 1 ์ตœ์šฐ์„ ์œผ๋กœ ์Šน๊ฒฉ

์˜ํ•™์—์„œ ์ง„๋‹จ์˜ ํ•ต์‹ฌ์€ "์ •์ƒ vs ๋น„์ •์ƒ" ๋น„๊ต๋‹ค. ๊ฐ™์€ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•ด GPT-2 vs Pythia vs Gemma์˜ ๋ฐ˜์‘์„ ๋‚˜๋ž€ํžˆ ๋ณด๋Š” ๊ฒƒ์€ Model Medicine์—์„œ **๋น„๊ต ํ•ด๋ถ€ํ•™(Comparative Anatomy)**์ด์ž **๊ฐ๋ณ„ ์ง„๋‹จ(Differential Diagnosis)**์˜ ๊ธฐ์ดˆ ๋„๊ตฌ์— ํ•ด๋‹นํ•œ๋‹ค.

Four Shell Model ๊ด€์ ์—์„œ ๋ณด๋ฉด, Core(๋ชจ๋ธ ๊ฐ€์ค‘์น˜)๊ฐ€ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์ด ๋™์ผํ•œ Shell(ํ”„๋กฌํ”„ํŠธ)์— ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅด๊ฒŒ ๋ฐ˜์‘ํ•˜๋Š”์ง€๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ์œผ๋ฉด, ๊ทธ ์ž์ฒด๊ฐ€ ๋…ผ๋ฌธ figure๋กœ ์“ธ ์ˆ˜ ์žˆ๋Š” ์ˆ˜์ค€์˜ ๊ฒฐ๊ณผ๋ฌผ์ด ๋œ๋‹ค. CompareView ์ธํ”„๋ผ๊ฐ€ ์ด๋ฏธ Multi-prompt์—์„œ ๋งŒ๋“ค์–ด์ ธ ์žˆ์œผ๋‹ˆ ํ™•์žฅ ๋‚œ์ด๋„๋„ Tier 2๋ณด๋‹ค ๋‚ฎ์„ ์ˆ˜ ์žˆ๋‹ค.

์กฐ์ • 2: Causal Tracing ์‹œ๊ฐํ™” โ†’ Tier 2๋กœ ์Šน๊ฒฉ

PerturbationEngine์— activation_patch๊ฐ€ ์ด๋ฏธ ๊ตฌํ˜„๋˜์–ด ์žˆ๋‹ค. ๋ถ€์กฑํ•œ ๊ฑด ๋ ˆ์ด์–ด ร— ํ† ํฐ ํžˆํŠธ๋งต ํ˜•ํƒœ์˜ ํ”„๋ก ํŠธ์—”๋“œ ์‹œ๊ฐํ™”๋ฟ์ด๋‹ค. ์ด๊ฑธ ๊ตฌํ˜„ํ•˜๋ฉด:

  • ROME/MEMIT ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ figure๋ฅผ ์›ํด๋ฆญ์œผ๋กœ ์ƒ์„ฑ ๊ฐ€๋Šฅ
  • Model Medicine์—์„œ "์ด ๋ชจ๋ธ์˜ ์–ด๋А ๋ ˆ์ด์–ด๊ฐ€ ์ด ์‚ฌ์‹ค์„ ์ €์žฅํ•˜๊ณ  ์žˆ๋Š”๊ฐ€"๋ผ๋Š” ์ง„๋‹จ ๊ฒ€์‚ฌ์˜ gold standard๊ฐ€ ๋จ
  • ์˜ํ•™ ๋น„์œ ๋กœ๋Š” CT scan ๋˜๋Š” ์กฐ์˜ MRI์— ํ•ด๋‹น

๋ฐฑ์—”๋“œ๋Š” ๊ฑฐ์˜ ์™„์„ฑ ์ƒํƒœ์ด๋ฏ€๋กœ D3 ํžˆํŠธ๋งต ์‹œ๊ฐํ™” + ์ปดํฌ๋„ŒํŠธ๋ณ„ recovery score ํ‘œ์‹œ๊ฐ€ ํ•ต์‹ฌ ์ž‘์—…์ด ๋  ๊ฒƒ์ด๋‹ค.

์กฐ์ •๋œ ์šฐ์„ ์ˆœ์œ„ ์š”์•ฝ

์ˆœ์œ„ ๊ธฐ๋Šฅ ์ด์œ 
1 Cross-model ๋น„๊ต ๊ฐ๋ณ„ ์ง„๋‹จ์˜ ๊ธฐ์ดˆ. CompareView ํ™•์žฅ์œผ๋กœ ๊ตฌํ˜„ ๊ฐ€๋Šฅ
2 Causal Tracing ์‹œ๊ฐํ™” ๋ฐฑ์—”๋“œ ์™„์„ฑ ์ƒํƒœ. ํ”„๋ก ํŠธ์—”๋“œ ํžˆํŠธ๋งต๋งŒ ์ถ”๊ฐ€
3 Attention Head Heatmap DTI ๋ฐ์ดํ„ฐ ์žฌํ™œ์šฉ. ๋น ๋ฅด๊ฒŒ ๊ตฌํ˜„ ๊ฐ€๋Šฅ
4 Logit Lens ๋Œ€์‹œ๋ณด๋“œ FLAIR ๋‚ด๋ถ€ ๋กœ์ง ์žฌํ™œ์šฉ. ๋ชจ๋ธ ๋‚ด๋ถ€ ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ • ์‹œ๊ฐํ™”
5 ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ์ž ์˜จ๋ณด๋”ฉ. IOI/Greater-Than ๋“ฑ ํ”„๋ฆฌ์…‹
6 ํ‚ค๋ณด๋“œ ๋‹จ์ถ•ํ‚ค ํ™•์žฅ ํŒŒ์›Œ์œ ์ € ์ƒ์‚ฐ์„ฑ. ๋‚ฎ์€ ๋‚œ์ด๋„

Tier 3์˜ nnsight ํด๋ฐฑ๊ณผ ํ”Œ๋Ÿฌ๊ทธ์ธ ์‹œ์Šคํ…œ์€ ํ˜„์žฌ 8๊ฐœ ๋ชจ๋ธ ์ง€์›๋งŒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํ•˜๋ฏ€๋กœ ํ›„์ˆœ์œ„ ์œ ์ง€.


3. ์‹ ๊ทœ ๊ธฐ๋Šฅ ์ œ์•ˆ: HuggingFace Hub ์›๊ฒฉ ๋ชจ๋ธ ์—ฐ๊ฒฐ

ํ˜„์žฌ Neural MRI๋Š” ๋กœ์ปฌ์— ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•ด์„œ TransformerLens๋กœ ๋กœ๋“œํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ์ด๊ฑธ ํ™•์žฅํ•ด์„œ HuggingFace Hub์— ์˜ฌ๋ผ์˜จ ๋ชจ๋ธ์„ ์›น ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ”๋กœ ์—ฐ๊ฒฐํ•˜๊ณ  ์Šค์บ”ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์„ ๊ฒ€ํ† ํ•ด ๋ณด์ž.

๊ตฌ์ƒ

  • ์‚ฌ์šฉ์ž๊ฐ€ HuggingFace ๋ชจ๋ธ ID๋ฅผ ์ž…๋ ฅํ•˜๋ฉด โ†’ ๋ชจ๋ธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์กฐํšŒ โ†’ TransformerLens ํ˜ธํ™˜์„ฑ ์ฒดํฌ โ†’ ๋กœ๋“œ ๊ฐ€๋Šฅํ•˜๋ฉด ์›ํด๋ฆญ ์Šค์บ”
  • Model Registry๋ฅผ ์ •์  ๋ชฉ๋ก์—์„œ ๋™์  ๊ฒ€์ƒ‰์œผ๋กœ ํ™•์žฅ
  • HuggingFace Hub API (huggingface_hub ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)๋กœ ๋ชจ๋ธ ๊ฒ€์ƒ‰/ํ•„ํ„ฐ๋ง UI ์ œ๊ณต

ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๋ฌธ์ œ: ์ ‘๊ทผ ํ† ํฐ

ํ˜„์žฌ gated model(Gemma, Llama ๋“ฑ)์€ .env์— NMRI_HF_TOKEN์„ ์ˆ˜๋™ ์„ค์ •ํ•˜๊ณ  ์žˆ๋‹ค. ์ด ๋ถ€๋ถ„์˜ ์ž๋™ํ™”๊ฐ€ ํ•ต์‹ฌ ๊ณผ์ œ์ธ๋ฐ, ๋ช‡ ๊ฐ€์ง€ ์ ‘๊ทผ๋ฒ•์„ ๊ฒ€ํ† ํ•ด ๋‹ฌ๋ผ:

์ ‘๊ทผ๋ฒ• A: ์‚ฌ์šฉ์ž ํ† ํฐ ์ž…๋ ฅ UI

  • ํ”„๋ก ํŠธ์—”๋“œ Settings ํŒจ๋„์— HF Token ์ž…๋ ฅ ํ•„๋“œ ์ถ”๊ฐ€
  • ์ž…๋ ฅ๋œ ํ† ํฐ์„ ์„ธ์…˜ ๋™์•ˆ๋งŒ ๋ฐฑ์—”๋“œ์— ์ „๋‹ฌ (์˜๊ตฌ ์ €์žฅํ•˜์ง€ ์•Š์Œ)
  • ์žฅ์ : ๊ฐ€์žฅ ๋‹จ์ˆœํ•˜๊ณ  ์•ˆ์ „
  • ๋‹จ์ : ์‚ฌ์šฉ์ž๊ฐ€ ๋งค๋ฒˆ ํ† ํฐ์„ ์ž…๋ ฅํ•ด์•ผ ํ•จ

์ ‘๊ทผ๋ฒ• B: HuggingFace OAuth ๋กœ๊ทธ์ธ

  • HF Hub์˜ OAuth ํ”Œ๋กœ์šฐ๋ฅผ ํ†ตํ•œ ์ธ์ฆ
  • huggingface_hub ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ login() ๋˜๋Š” OAuth ๋ฆฌ๋‹ค์ด๋ ‰ํŠธ ํ™œ์šฉ
  • ์žฅ์ : ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์ด ์ข‹์Œ
  • ๋‹จ์ : ๊ตฌํ˜„ ๋ณต์žก๋„ ๋†’์Œ, ์„œ๋ฒ„์‚ฌ์ด๋“œ ํ† ํฐ ๊ด€๋ฆฌ ํ•„์š”

์ ‘๊ทผ๋ฒ• C: ๋กœ์ปฌ HF CLI ํ† ํฐ ์ž๋™ ๊ฐ์ง€

  • ~/.cache/huggingface/token์— ์ด๋ฏธ ์ €์žฅ๋œ ํ† ํฐ์„ ์ž๋™์œผ๋กœ ์ฝ์–ด์˜ค๊ธฐ
  • huggingface_hub์˜ HfApi(token=True) ํŒจํ„ด ํ™œ์šฉ
  • ์žฅ์ : ๋กœ์ปฌ ์‚ฌ์šฉ์ž์—๊ฒŒ๋Š” ์ œ๋กœ ์„ค์ •
  • ๋‹จ์ : Docker/์›๊ฒฉ ๋ฐฐํฌ ์‹œ ์ ์šฉ ๋ถˆ๊ฐ€

์ ‘๊ทผ๋ฒ• D: ๋น„-gated ๋ชจ๋ธ๋งŒ ๋™์  ์ง€์›

  • ๋™์  ๊ฒ€์ƒ‰ ๋Œ€์ƒ์„ gated๊ฐ€ ์•„๋‹Œ ๋ชจ๋ธ๋กœ ํ•œ์ •
  • gated ๋ชจ๋ธ์€ ๊ธฐ์กด์ฒ˜๋Ÿผ ์ˆ˜๋™ ํ† ํฐ ์„ค์ • ์œ ์ง€
  • ์žฅ์ : ํ† ํฐ ๋ฌธ์ œ๋ฅผ ์™„์ „ํžˆ ์šฐํšŒ
  • ๋‹จ์ : Gemma, Llama ๋“ฑ ์ฃผ์š” ๋ชจ๋ธ์ด ์ œ์™ธ๋จ

๋‚ด ์ƒ๊ฐ

ํ˜„์‹ค์ ์œผ๋กœ๋Š” C + A ํ•˜์ด๋ธŒ๋ฆฌ๋“œ๊ฐ€ ๊ฐ€์žฅ ํ•ฉ๋ฆฌ์ ์ผ ๊ฒƒ ๊ฐ™๋‹ค. ๋กœ์ปฌ์— huggingface-cli login์ด ๋˜์–ด ์žˆ์œผ๋ฉด ์ž๋™ ๊ฐ์ง€, ์•ˆ ๋˜์–ด ์žˆ์œผ๋ฉด UI์—์„œ ํ† ํฐ ์ž…๋ ฅ. ํ•˜์ง€๋งŒ ์ด๊ฒŒ ๊ตฌํ˜„ ๋Œ€๋น„ ๊ฐ€์น˜๊ฐ€ ์žˆ๋Š”์ง€, ํ˜น์‹œ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”์ง€ ๋„ค ์˜๊ฒฌ์„ ๋“ฃ๊ณ  ์‹ถ๋‹ค.

๋˜ํ•œ HuggingFace ์™ธ์— ๋‹ค๋ฅธ ๋ชจ๋ธ ํ—ˆ๋ธŒ(Ollama ๋กœ์ปฌ, GGUF ํฌ๋งท ๋“ฑ)์™€์˜ ์—ฐ๋™ ๊ฐ€๋Šฅ์„ฑ๋„ ๊ฐ™์ด ๊ฒ€ํ† ํ•ด ์ฃผ๋ฉด ์ข‹๊ฒ ๋‹ค. TransformerLens๊ฐ€ ์ง€์›ํ•˜๋Š” ๋ฒ”์œ„ ๋‚ด์—์„œ ์–ด๋””๊นŒ์ง€ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ์ง€ ํŒŒ์•…์ด ํ•„์š”ํ•˜๋‹ค.

์ถ”๊ฐ€ ๊ณ ๋ ค: TransformerLens ํ˜ธํ™˜์„ฑ ์ž๋™ ๊ฒ€์ฆ

๋™์ ์œผ๋กœ ๋ชจ๋ธ์„ ๋กœ๋“œํ•  ๊ฒฝ์šฐ ํ˜ธํ™˜์„ฑ ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ ๋กœ์ง์ด ํ•„์š”ํ•  ๊ฒƒ์ด๋‹ค:

  1. HF ๋ชจ๋ธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ์—์„œ architecture ํƒ€์ž… ํ™•์ธ
  2. TransformerLens์˜ ์•Œ๋ ค์ง„ ์ง€์› ์•„ํ‚คํ…์ฒ˜ ๋ชฉ๋ก๊ณผ ๋งค์นญ
  3. ๋งค์นญ๋˜๋ฉด ๋กœ๋“œ ์‹œ๋„, ์‹คํŒจ ์‹œ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋ช…ํ™•ํ•œ ํ”ผ๋“œ๋ฐฑ
  4. ์žฅ๊ธฐ์ ์œผ๋กœ nnsight ํด๋ฐฑ๊นŒ์ง€ ์—ฐ๊ฒฐ

4. ์ •๋ฆฌ

๊ตฌ๋ถ„ ๋‚ด์šฉ
์ฆ‰์‹œ ์‹œ์ž‘ Cross-model ๋น„๊ต (์ตœ์šฐ์„ ) + Causal Tracing ์‹œ๊ฐํ™”
์ด์–ด์„œ Attention Heatmap + Logit Lens ๋Œ€์‹œ๋ณด๋“œ + ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ
์กฐ์‚ฌ/์„ค๊ณ„ HuggingFace Hub ๋™์  ๋ชจ๋ธ ์—ฐ๊ฒฐ (ํ† ํฐ ์ „๋žต ํฌํ•จ)
ํ›„์ˆœ์œ„ nnsight ํด๋ฐฑ, ํ”Œ๋Ÿฌ๊ทธ์ธ ์‹œ์Šคํ…œ, PDF ๋ฆฌํฌํŠธ

์งˆ๋ฌธ์ด๋‚˜ ๋‹ค๋ฅธ ์˜๊ฒฌ ์žˆ์œผ๋ฉด ์•Œ๋ ค์ค˜.