AD-Styles commited on
Commit
62f15c2
ยท
verified ยท
1 Parent(s): eeaf08c

Honest messaging: separate capability gains (Korean, OOD) from deployment optimization (Slim)

Browse files
Files changed (1) hide show
  1. README.md +21 -6
README.md CHANGED
@@ -19,10 +19,12 @@ tags:
19
  - mini-llava
20
  ---
21
 
22
- # Mini-LLaVA v3 โ€” Korean Multilingual + Slim LoRA + OOD Detection
23
 
24
- > v2 ์˜ ๋ฏธํ•ด๊ฒฐ ๊ณผ์ œ 3๊ฐ€์ง€ (ํ•œ๊ตญ์–ด forgetting, 1 GB adapter, OOD hallucination) ๋ฅผ ์ •์กฐ์ค€ํ•œ ์ง„ํ™” ๋ฒ„์ „.
25
  > CLIP-ViT-B/32 + MLP Projector + Qwen2.5-0.5B + LoRA(r=16) ๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•œ Vision-Language Model ์˜ ํ•™์Šต ๊ฐ€์ค‘์น˜.
 
 
26
 
27
  ## ๐Ÿ“ฆ ์ด ๋ ˆํฌ์˜ ๊ตฌ์„ฑ (~14 MB total)
28
 
@@ -78,14 +80,27 @@ detector = OODDetector(threshold=0.5, device="cpu")
78
  # generate ํ•  ๋•Œ output_scores=True ๋กœ first_logits ๋ฐ›์•„์„œ detector.score(image, first_logits) ํ˜ธ์ถœ
79
  ```
80
 
81
- ## โœจ v2 โ†’ v3 ํ•ต์‹ฌ ๊ฐœ์„ 
 
 
82
 
83
  | ํ•ญ๋ชฉ | v2 | **v3 (์ด ๋ ˆํฌ)** |
84
  |---|---|---|
85
  | ๋‹ค๊ตญ์–ด ์‘๋‹ต | โŒ ์˜๋ฌธ only (catastrophic forgetting) | โœ… **์˜๋ฌธ + ํ•œ๊ตญ์–ด** |
86
- | LoRA adapter | 1045 MB | **8.28 MB (โˆ’99.21%)** |
87
- | OOD ์ฒ˜๋ฆฌ | ๋ฌด์กฐ๊ฑด ๋‹ต๋ณ€ (hallucination) | **"์ž˜ ๋ชจ๋ฅด๊ฒ ์Œ" ๊ฐ€๋Šฅ** (CLIP+entropy) |
88
- | ๋‹ค์šด๋กœ๋“œ ์ž์‚ฐ ์ดํ•ฉ | ~1051 MB | **~14 MB** |
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  ## ๐Ÿง  ํ•™์Šต ๋ฐ์ดํ„ฐ (Step 1, 175๋ถ„)
91
 
 
19
  - mini-llava
20
  ---
21
 
22
+ # Mini-LLaVA v3 โ€” Korean Multilingual + OOD Detection + Slim Deploy
23
 
24
+ > v2 baseline ์œ„์— **capability 2๊ฐœ (KoreanยทOOD) ์ถ”๊ฐ€ + deployment 1๊ฐœ (Slim packaging) ์ตœ์ ํ™”**.
25
  > CLIP-ViT-B/32 + MLP Projector + Qwen2.5-0.5B + LoRA(r=16) ๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•œ Vision-Language Model ์˜ ํ•™์Šต ๊ฐ€์ค‘์น˜.
26
+ >
27
+ > โš ๏ธ **ํฌ๊ธฐ โ‰  ์„ฑ๋Šฅ ๋ช…์‹œ**: Slim adapter (8.28 MB) ๋Š” **๊ฐ™์€ ๋ชจ๋ธ, ๊ฐ™์€ ์ถœ๋ ฅ** (greedy 7/7 ๋น„ํŠธ ์ผ์น˜). ๋ชจ๋ธ์ด ๋” ๋˜‘๋˜‘ํ•ด์ง„ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ํŒจํ‚ค์ง•๋งŒ ํšจ์œจํ™”. ์ง„์งœ capability ๊ฐœ์„ ์€ Korean / OOD ๋‘ ๊ฐ€์ง€.
28
 
29
  ## ๐Ÿ“ฆ ์ด ๋ ˆํฌ์˜ ๊ตฌ์„ฑ (~14 MB total)
30
 
 
80
  # generate ํ•  ๋•Œ output_scores=True ๋กœ first_logits ๋ฐ›์•„์„œ detector.score(image, first_logits) ํ˜ธ์ถœ
81
  ```
82
 
83
+ ## โœจ v2 โ†’ v3 ๋ณ€ํ™” (capability vs deployment ๋ถ„๋ฆฌ)
84
+
85
+ ### ๐ŸŸข capability ์ถ”๊ฐ€ (๋ชจ๋ธ์ด ์ƒˆ๋กœ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ ๊ฒƒ โ€” ์ง„์งœ ์„ฑ๋Šฅ ๊ฐœ์„ )
86
 
87
  | ํ•ญ๋ชฉ | v2 | **v3 (์ด ๋ ˆํฌ)** |
88
  |---|---|---|
89
  | ๋‹ค๊ตญ์–ด ์‘๋‹ต | โŒ ์˜๋ฌธ only (catastrophic forgetting) | โœ… **์˜๋ฌธ + ํ•œ๊ตญ์–ด** |
90
+ | OOD ์‹ ํ˜ธ | โŒ ๋ฌด์กฐ๊ฑด ๋‹ต๋ณ€ (hallucination) | โœ… **"์ž˜ ๋ชจ๋ฅด๊ฒ ์Œ" ๊ฐ€๋Šฅ** (CLIP+entropy) |
91
+
92
+ ### ๐Ÿ”ต deployment ์ตœ์ ํ™” (์„ฑ๋Šฅ ๋ณ€ํ™” 0, ๋ฐฐํฌ ํšจ์œจ๋งŒ)
93
+
94
+ | ํ•ญ๋ชฉ | v2 | v3 |
95
+ |---|---|---|
96
+ | LoRA adapter | 1045 MB | 8.28 MB (โˆ’99.21%) |
97
+ | ๋ชจ๋ธ ์ž์‚ฐ ์ดํ•ฉ | ~1051 MB | ~14 MB |
98
+ | ๋ชจ๋ธ ์ถœ๋ ฅ | (baseline) | **bit-identical** to FULL (greedy 7/7 ๊ฒ€์ฆ) |
99
+
100
+ ### ๐ŸŸก ๋ณ€ํ•˜์ง€ ์•Š์€ ๊ฒƒ (์ •์งํ•œ ๋ช…์‹œ)
101
+
102
+ - ์ด๋ฏธ์ง€ ์ดํ•ด ์ •ํ™•๋„ โ€” 0.5B LLM ํ•œ๊ณ„๋กœ v2/v3 ๋™์ผ ์ˆ˜์ค€ (v4 LLM size up ์œผ๋กœ ํ•ด๊ฒฐ ์˜ˆ์ •)
103
+ - ์˜๋ฌธ VQA head-to-head โ€” v2 vs v3 ๋น„๊ต๋Š” ๋ฏธ์ธก์ •
104
 
105
  ## ๐Ÿง  ํ•™์Šต ๋ฐ์ดํ„ฐ (Step 1, 175๋ถ„)
106