How to use from the
Use from the
PEFT library
Task type is invalid.

Mini-LLaVA v4 โ€” weights

์ฒ˜์Œ๋ถ€ํ„ฐ ์กฐ๋ฆฝํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ LLM (vlm-from-scratch-v4) ์˜ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜.

  • ๊ตฌ์กฐ: CLIP-ViT-B/32 (frozen) + 2-layer MLP Projector + Qwen2.5-1.5B-Instruct + LoRA
  • ํ•™์Šต: QLoRA 4-bit NF4 ยท Stage 1 ์ •๋ ฌ โ†’ Stage 2 instruction 46K (์˜๋ฌธ + ํ•œ๊ตญ์–ด ๊ท ํ˜• ๋ฏน์Šค) ยท RTX 4060 8GB
  • ํ‰๊ฐ€: raw ๋ชจ๋ธ ๊ธฐ์ค€ VQAv2 56.8% / POPE 71.8% (n=400, wrapper ์—†์Œ). 8GB GPUยท์•ฝ 9๋งŒ ์ƒ˜ํ”Œ๋กœ ํ•™์Šตํ•œ ์†Œํ˜• ๋ชจ๋ธ์ด๋ผ ์ ˆ๋Œ€ ์„ฑ๋Šฅ์€ ๊ณต๊ฐœ VLM ์— ๋ชป ๋ฏธ์นฉ๋‹ˆ๋‹ค โ€” ์ž์„ธํ•œ ๋‚ด์šฉ์€ GitHub README.

ํŒŒ์ผ

ํŒŒ์ผ ์„ค๋ช…
projector.pt MultiModalProjector (CLIP 768 โ†’ LLM 1536) state_dict
lora_adapter/ Qwen2.5-1.5B ์ „ linear layer LoRA ์–ด๋Œ‘ํ„ฐ (r=16)

<image> ํ† ํฐ์œผ๋กœ Qwen2.5 ๋‚ด์žฅ <|image_pad|> ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜๋ฏ€๋กœ adapter ์— embedding ๊ตฐ๋”๋”๊ธฐ๊ฐ€ ์—†๋‹ค (70 MB ์ „๋ถ€ LoRA).

์‚ฌ์šฉ

์ถ”๋ก  ์ฝ”๋“œ๋Š” github.com/AD-Styles/vlm-from-scratch-v4 ์˜ src/ ์ฐธ๊ณ . ๋ฐ๋ชจ: HF Space AD-Styles/mini-llava-v4-demo.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AD-Styles/mini-llava-v4

Adapter
(1002)
this model

Space using AD-Styles/mini-llava-v4 1