AD-Styles commited on
Commit
ee5e61c
ยท
verified ยท
1 Parent(s): 85399b0

docs: soften OOD claim, name slim as PEFT workaround, clarify VQA scope

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -24,7 +24,7 @@ tags:
24
  > v2 baseline ์œ„์— **capability 2๊ฐœ (KoreanยทOOD) ์ถ”๊ฐ€ + deployment 1๊ฐœ (Slim packaging) ์ตœ์ ํ™”**.
25
  > CLIP-ViT-B/32 + MLP Projector + Qwen2.5-0.5B + LoRA(r=16) ๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•œ Vision-Language Model ์˜ ํ•™์Šต ๊ฐ€์ค‘์น˜.
26
  >
27
- > โš ๏ธ **ํฌ๊ธฐ โ‰  ์„ฑ๋Šฅ ๋ช…์‹œ**: Slim adapter (8.28 MB) ๋Š” **๊ฐ™์€ ๋ชจ๋ธ, ๊ฐ™์€ ์ถœ๋ ฅ** (greedy 7/7 ๋น„ํŠธ ์ผ์น˜). ๋ชจ๋ธ์ด ๋” ๋˜‘๋˜‘ํ•ด์ง„ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ํŒจํ‚ค์ง•๋งŒ ํšจ์œจํ™”. ์ง„์งœ capability ๊ฐœ์„ ์€ Korean / OOD ๋‘ ๊ฐ€์ง€.
28
 
29
  ## ๐Ÿ“ฆ ์ด ๋ ˆํฌ์˜ ๊ตฌ์„ฑ (~14 MB total)
30
 
@@ -87,7 +87,7 @@ detector = OODDetector(threshold=0.5, device="cpu")
87
  | ํ•ญ๋ชฉ | v2 | **v3 (์ด ๋ ˆํฌ)** |
88
  |---|---|---|
89
  | ๋‹ค๊ตญ์–ด ์‘๋‹ต | โŒ ์˜๋ฌธ only (catastrophic forgetting) | โœ… **์˜๋ฌธ + ํ•œ๊ตญ์–ด** |
90
- | OOD ์‹ ํ˜ธ | โŒ ๋ฌด์กฐ๊ฑด ๋‹ต๋ณ€ (hallucination) | โœ… **"์ž˜ ๋ชจ๋ฅด๊ฒ ์Œ" ๊ฐ€๋Šฅ** (CLIP+entropy) |
91
 
92
  ### ๐Ÿ”ต deployment ์ตœ์ ํ™” (์„ฑ๋Šฅ ๋ณ€ํ™” 0, ๋ฐฐํฌ ํšจ์œจ๋งŒ)
93
 
@@ -100,7 +100,7 @@ detector = OODDetector(threshold=0.5, device="cpu")
100
  ### ๐ŸŸก ๋ณ€ํ•˜์ง€ ์•Š์€ ๊ฒƒ (์ •์งํ•œ ๋ช…์‹œ)
101
 
102
  - ์ด๋ฏธ์ง€ ์ดํ•ด ์ •ํ™•๋„ โ€” 0.5B LLM ํ•œ๊ณ„๋กœ v2/v3 ๋™์ผ ์ˆ˜์ค€ (v4 LLM size up ์œผ๋กœ ํ•ด๊ฒฐ ์˜ˆ์ •)
103
- - ์˜๋ฌธ VQA โ€” v3 baseline 36.67% (v2 34.67% ๋Œ€๋น„ +2.00%p, VQAv2 50 samples greedy decoding ๊ธฐ์ค€)
104
 
105
  ## ๐Ÿง  ํ•™์Šต ๋ฐ์ดํ„ฐ (Step 1, 175๋ถ„)
106
 
@@ -124,7 +124,7 @@ entropy_signal: H(LLM first-token logits) / 8.0 nats
124
 
125
  ๊ฒ€์ฆ ๊ฒฐ๊ณผ (`scripts/test_ood_integration.py`): In-Dist (์‹ค์ œ ๊ฐœ) 0.365 (โœ…) ยท OOD (Pikachu ์นดํˆฐ) 0.505 (โš ๏ธ)
126
 
127
- ## ๐Ÿชถ Slim Adapter โ€” ํ•ต์‹ฌ ๊ธฐ์ˆ 
128
 
129
  PEFT ํ‘œ์ค€์€ `modules_to_save` (embed_tokens + lm_head) ์„ **ํ†ต์งธ๋กœ** ์ €์žฅ โ†’ 1 GB.
130
  ํ•˜์ง€๋งŒ ์‚ฌ์ „ ๋ถ„์„์œผ๋กœ ๋ฐœ๊ฒฌ:
@@ -138,6 +138,8 @@ saved embed_tokens vs base Qwen2.5:
138
  โ†’ `image_token_row.safetensors` (7 KB) ๋งŒ ๋ณ„๋„ ์ €์žฅํ•˜๊ณ , ์ถ”๋ก  ์‹œ base Qwen2.5 ์˜ ๋งˆ์ง€๋ง‰ row ๋งŒ patch.
139
  โ†’ **greedy decoding 7/7 ์‘๋‹ต ๋น„ํŠธ ๋‹จ์œ„ ์ผ์น˜** (`scripts/verify_slim_adapter.py`).
140
 
 
 
141
  ## โš ๏ธ ํ•œ๊ณ„
142
 
143
  - **0.5B LLM** โ€” ์ด๋ฏธ์ง€ ๋‚ด์šฉ ์ •ํ™•๋„๋Š” ์—ฌ์ „ํžˆ ํ•œ๊ณ„ (๊ฐœ๋ฅผ ์†Œ๋กœ ์˜ค์ธ ๋“ฑ)
 
24
  > v2 baseline ์œ„์— **capability 2๊ฐœ (KoreanยทOOD) ์ถ”๊ฐ€ + deployment 1๊ฐœ (Slim packaging) ์ตœ์ ํ™”**.
25
  > CLIP-ViT-B/32 + MLP Projector + Qwen2.5-0.5B + LoRA(r=16) ๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•œ Vision-Language Model ์˜ ํ•™์Šต ๊ฐ€์ค‘์น˜.
26
  >
27
+ > โš ๏ธ **ํฌ๊ธฐ โ‰  ์„ฑ๋Šฅ ๋ช…์‹œ**: Slim adapter (8.28 MB) ๋Š” **๊ฐ™์€ ๋ชจ๋ธ, ๊ฐ™์€ ์ถœ๋ ฅ** (greedy 7/7 ๋น„ํŠธ ์ผ์น˜). ๋ชจ๋ธ์ด ๋” ๋˜‘๋˜‘ํ•ด์ง„ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ํŒจํ‚ค์ง•๋งŒ ํšจ์œจํ™”. ์ง„์งœ capability ๊ฐœ์„ ์€ Korean (ํ•œ๊ตญ์–ด ์‘๋‹ต ๊ฐ€๋Šฅ). OOD ๋Š” ๊ตฌํ˜„ + 2 ์ผ€์ด์Šค sanity check ์ˆ˜์ค€์ด๋ฉฐ ๋ณธ๊ฒฉ ๊ฒ€์ฆ์€ v4.
28
 
29
  ## ๐Ÿ“ฆ ์ด ๋ ˆํฌ์˜ ๊ตฌ์„ฑ (~14 MB total)
30
 
 
87
  | ํ•ญ๋ชฉ | v2 | **v3 (์ด ๋ ˆํฌ)** |
88
  |---|---|---|
89
  | ๋‹ค๊ตญ์–ด ์‘๋‹ต | โŒ ์˜๋ฌธ only (catastrophic forgetting) | โœ… **์˜๋ฌธ + ํ•œ๊ตญ์–ด** |
90
+ | OOD ์‹ ํ˜ธ | โŒ ๋ฌด์กฐ๊ฑด ๋‹ต๋ณ€ (hallucination) | โœ… **"์ž˜ ๋ชจ๋ฅด๊ฒ ์Œ" layer ์ถ”๊ฐ€** (CLIP+entropy, ๊ฒ€์ฆ N=2 โ€” ๋ณธ๊ฒฉ ROC ๋ถ„์„์€ v4) |
91
 
92
  ### ๐Ÿ”ต deployment ์ตœ์ ํ™” (์„ฑ๋Šฅ ๋ณ€ํ™” 0, ๋ฐฐํฌ ํšจ์œจ๋งŒ)
93
 
 
100
  ### ๐ŸŸก ๋ณ€ํ•˜์ง€ ์•Š์€ ๊ฒƒ (์ •์งํ•œ ๋ช…์‹œ)
101
 
102
  - ์ด๋ฏธ์ง€ ์ดํ•ด ์ •ํ™•๋„ โ€” 0.5B LLM ํ•œ๊ณ„๋กœ v2/v3 ๋™์ผ ์ˆ˜์ค€ (v4 LLM size up ์œผ๋กœ ํ•ด๊ฒฐ ์˜ˆ์ •)
103
+ - ์˜๋ฌธ VQA โ€” v3 baseline 36.67% vs v2 34.67% (+2.00%p, VQAv2 50 samples greedy decoding). ์ถ”๋ก  wrapper ์ถ”๊ฐ€๋„ ์ž์œ  ์„œ์ˆ ํ˜• ์งˆ๋ฌธ ์ ์ˆ˜์—๋Š” ์˜ํ–ฅ ์—†์Œ โ€” wrapper ์˜ ์˜๋ฏธ ์žˆ๋Š” ๊ฐœ์„ ์€ POPE ํ™˜๊ฐ ์ฐจ๋‹จ ์ชฝ (+3 ~ +20%p, ์ž์„ธํ•œ ๋‚ด์šฉ์€ GitHub README)
104
 
105
  ## ๐Ÿง  ํ•™์Šต ๋ฐ์ดํ„ฐ (Step 1, 175๋ถ„)
106
 
 
124
 
125
  ๊ฒ€์ฆ ๊ฒฐ๊ณผ (`scripts/test_ood_integration.py`): In-Dist (์‹ค์ œ ๊ฐœ) 0.365 (โœ…) ยท OOD (Pikachu ์นดํˆฐ) 0.505 (โš ๏ธ)
126
 
127
+ ## ๐Ÿชถ Slim Adapter โ€” PEFT default ๋™์ž‘ ์šฐํšŒ (๋ชจ๋ธ ์••์ถ• X)
128
 
129
  PEFT ํ‘œ์ค€์€ `modules_to_save` (embed_tokens + lm_head) ์„ **ํ†ต์งธ๋กœ** ์ €์žฅ โ†’ 1 GB.
130
  ํ•˜์ง€๋งŒ ์‚ฌ์ „ ๋ถ„์„์œผ๋กœ ๋ฐœ๊ฒฌ:
 
138
  โ†’ `image_token_row.safetensors` (7 KB) ๋งŒ ๋ณ„๋„ ์ €์žฅํ•˜๊ณ , ์ถ”๋ก  ์‹œ base Qwen2.5 ์˜ ๋งˆ์ง€๋ง‰ row ๋งŒ patch.
139
  โ†’ **greedy decoding 7/7 ์‘๋‹ต ๋น„ํŠธ ๋‹จ์œ„ ์ผ์น˜** (`scripts/verify_slim_adapter.py`).
140
 
141
+ > ์ •์งํ•˜๊ฒŒ ์ ์ž๋ฉด ์ด 99% ์ ˆ๊ฐ์€ ๋ชจ๋ธ ์••์ถ•์ด ์•„๋‹ˆ๋ผ **PEFT ์˜ `modules_to_save` default ๊ฐ€ tied embedding ๊ณผ ๊ฒฐํ•ฉ๋˜๋ฉฐ ํ•™์Šต๋˜์ง€ ์•Š์€ ํ–‰๊นŒ์ง€ ํ†ต์งธ๋กœ ์ €์žฅํ•˜๋Š” ๋™์ž‘์„ ์šฐํšŒํ•œ ๊ฒฐ๊ณผ**. ๋™์ผ ๋ฌธ์ œ๋กœ ๋‹ต๋‹ตํ•ดํ•  ๋‹ค๋ฅธ ์‚ฌ์šฉ์ž๋ฅผ ์œ„ํ•ด PEFT issue ์— ์ •๋ฆฌํ•ด ๋ณด๋‚ผ ๊ณ„ํš.
142
+
143
  ## โš ๏ธ ํ•œ๊ณ„
144
 
145
  - **0.5B LLM** โ€” ์ด๋ฏธ์ง€ ๋‚ด์šฉ ์ •ํ™•๋„๋Š” ์—ฌ์ „ํžˆ ํ•œ๊ณ„ (๊ฐœ๋ฅผ ์†Œ๋กœ ์˜ค์ธ ๋“ฑ)