Image-Text-to-Text
PEFT
Safetensors
English
Korean
vision-language
multimodal
clip
qwen2.5
lora
llava
korean
ood-detection
mini-llava
Instructions to use AD-Styles/mini-llava-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AD-Styles/mini-llava-v3 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
docs: sync model card with corrected VQAv2 measurement (+2pp v3>v2)
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ lora_adapter_slim/
|
|
| 37 |
ββ README.md (PEFT auto-generated)
|
| 38 |
```
|
| 39 |
|
| 40 |
-
**v2 λλΉ β99.21%** (1045 MB β 8.28 MB) β slim ν μ리λ [GitHub README Β§Slim Adapter](https://github.com/AD-Styles/vlm-from-scratch-v3#
|
| 41 |
|
| 42 |
## π Quick Start
|
| 43 |
|
|
@@ -100,7 +100,7 @@ detector = OODDetector(threshold=0.5, device="cpu")
|
|
| 100 |
### π‘ λ³νμ§ μμ κ² (μ μ§ν λͺ
μ)
|
| 101 |
|
| 102 |
- μ΄λ―Έμ§ μ΄ν΄ μ νλ β 0.5B LLM νκ³λ‘ v2/v3 λμΌ μμ€ (v4 LLM size up μΌλ‘ ν΄κ²° μμ )
|
| 103 |
-
- μλ¬Έ VQA
|
| 104 |
|
| 105 |
## π§ νμ΅ λ°μ΄ν° (Step 1, 175λΆ)
|
| 106 |
|
|
|
|
| 37 |
ββ README.md (PEFT auto-generated)
|
| 38 |
```
|
| 39 |
|
| 40 |
+
**v2 λλΉ β99.21%** (1045 MB β 8.28 MB) β slim ν μ리λ [GitHub README Β§Slim Adapter](https://github.com/AD-Styles/vlm-from-scratch-v3#step-4--slim-adapter-1045-mb--828-mb-μΆλ ₯-λ³ν-μμ) μ°Έμ‘°.
|
| 41 |
|
| 42 |
## π Quick Start
|
| 43 |
|
|
|
|
| 100 |
### π‘ λ³νμ§ μμ κ² (μ μ§ν λͺ
μ)
|
| 101 |
|
| 102 |
- μ΄λ―Έμ§ μ΄ν΄ μ νλ β 0.5B LLM νκ³λ‘ v2/v3 λμΌ μμ€ (v4 LLM size up μΌλ‘ ν΄κ²° μμ )
|
| 103 |
+
- μλ¬Έ VQA β v3 baseline 36.67% (v2 34.67% λλΉ +2.00%p, VQAv2 50 samples greedy decoding κΈ°μ€)
|
| 104 |
|
| 105 |
## π§ νμ΅ λ°μ΄ν° (Step 1, 175λΆ)
|
| 106 |
|