Instructions to use AD-Styles/mini-llava-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AD-Styles/mini-llava-v4 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: peft | |
| base_model: Qwen/Qwen2.5-1.5B-Instruct | |
| tags: | |
| - vision-language | |
| - multimodal | |
| - llava | |
| - qlora | |
| # Mini-LLaVA v4 โ weights | |
| ์ฒ์๋ถํฐ ์กฐ๋ฆฝํ ๋ฉํฐ๋ชจ๋ฌ LLM (`vlm-from-scratch-v4`) ์ ํ์ต๋ ๊ฐ์ค์น. | |
| - **๊ตฌ์กฐ**: CLIP-ViT-B/32 (frozen) + 2-layer MLP Projector + Qwen2.5-1.5B-Instruct + LoRA | |
| - **ํ์ต**: QLoRA 4-bit NF4 ยท Stage 1 ์ ๋ ฌ โ Stage 2 instruction 46K (์๋ฌธ + ํ๊ตญ์ด ๊ท ํ ๋ฏน์ค) ยท RTX 4060 8GB | |
| - **ํ๊ฐ**: raw ๋ชจ๋ธ ๊ธฐ์ค VQAv2 56.8% / POPE 71.8% (n=400, wrapper ์์). 8GB GPUยท์ฝ 9๋ง ์ํ๋ก ํ์ตํ ์ํ ๋ชจ๋ธ์ด๋ผ ์ ๋ ์ฑ๋ฅ์ ๊ณต๊ฐ VLM ์ ๋ชป ๋ฏธ์นฉ๋๋ค โ ์์ธํ ๋ด์ฉ์ GitHub README. | |
| ## ํ์ผ | |
| | ํ์ผ | ์ค๋ช | | |
| |---|---| | |
| | `projector.pt` | MultiModalProjector (CLIP 768 โ LLM 1536) state_dict | | |
| | `lora_adapter/` | Qwen2.5-1.5B ์ linear layer LoRA ์ด๋ํฐ (r=16) | | |
| `<image>` ํ ํฐ์ผ๋ก Qwen2.5 ๋ด์ฅ `<|image_pad|>` ๋ฅผ ์ฌ์ฌ์ฉํ๋ฏ๋ก adapter ์ | |
| embedding ๊ตฐ๋๋๊ธฐ๊ฐ ์๋ค (70 MB ์ ๋ถ LoRA). | |
| ## ์ฌ์ฉ | |
| ์ถ๋ก ์ฝ๋๋ [github.com/AD-Styles/vlm-from-scratch-v4](https://github.com/AD-Styles/vlm-from-scratch-v4) | |
| ์ `src/` ์ฐธ๊ณ . ๋ฐ๋ชจ: HF Space `AD-Styles/mini-llava-v4-demo`. | |