Instructions to use AD-Styles/mini-llava-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AD-Styles/mini-llava-v4 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Mini-LLaVA v4 โ weights
์ฒ์๋ถํฐ ์กฐ๋ฆฝํ ๋ฉํฐ๋ชจ๋ฌ LLM (vlm-from-scratch-v4) ์ ํ์ต๋ ๊ฐ์ค์น.
- ๊ตฌ์กฐ: CLIP-ViT-B/32 (frozen) + 2-layer MLP Projector + Qwen2.5-1.5B-Instruct + LoRA
- ํ์ต: QLoRA 4-bit NF4 ยท Stage 1 ์ ๋ ฌ โ Stage 2 instruction 46K (์๋ฌธ + ํ๊ตญ์ด ๊ท ํ ๋ฏน์ค) ยท RTX 4060 8GB
- ํ๊ฐ: raw ๋ชจ๋ธ ๊ธฐ์ค VQAv2 56.8% / POPE 71.8% (n=400, wrapper ์์). 8GB GPUยท์ฝ 9๋ง ์ํ๋ก ํ์ตํ ์ํ ๋ชจ๋ธ์ด๋ผ ์ ๋ ์ฑ๋ฅ์ ๊ณต๊ฐ VLM ์ ๋ชป ๋ฏธ์นฉ๋๋ค โ ์์ธํ ๋ด์ฉ์ GitHub README.
ํ์ผ
| ํ์ผ | ์ค๋ช |
|---|---|
projector.pt |
MultiModalProjector (CLIP 768 โ LLM 1536) state_dict |
lora_adapter/ |
Qwen2.5-1.5B ์ linear layer LoRA ์ด๋ํฐ (r=16) |
<image> ํ ํฐ์ผ๋ก Qwen2.5 ๋ด์ฅ <|image_pad|> ๋ฅผ ์ฌ์ฌ์ฉํ๋ฏ๋ก adapter ์
embedding ๊ตฐ๋๋๊ธฐ๊ฐ ์๋ค (70 MB ์ ๋ถ LoRA).
์ฌ์ฉ
์ถ๋ก ์ฝ๋๋ github.com/AD-Styles/vlm-from-scratch-v4
์ src/ ์ฐธ๊ณ . ๋ฐ๋ชจ: HF Space AD-Styles/mini-llava-v4-demo.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Task type is invalid.