llava-l1-30pct / README.md

CrystalRaindropsFall

Upload folder using huggingface_hub

25e3367 verified 26 days ago

preview code

raw

history blame contribute delete

1.58 kB

metadata

tags:
  - vision
  - image-to-text
  - pruning
  - llava
base_model: llava-hf/llava-1.5-7b-hf

llava-l1-30pct

This is a pruned version of LLaVA-1.5-7b.

Pruning Details

Method: L1 Unstructured Pruning
Sparsity: 30%

This model was pruned to improve efficiency while maintaining performance.

Usage

Since this model was pruned structurally, the architecture remains compatible with the standard LlavaForConditionalGeneration class. However, you should use the processor from the base model to ensure correct input preprocessing.

from transformers import AutoProcessor, LlavaForConditionalGeneration
import torch

model_id = "CrystalRaindropsFall/llava-l1-30pct"
base_model_id = "llava-hf/llava-1.5-7b-hf"

# 1. Load the processor from the base model
processor = AutoProcessor.from_pretrained(base_model_id)

# 2. Load the pruned model
model = LlavaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example inference
from PIL import Image
import requests

url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_logo.png?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "USER: <image>\nWhat is shown in this image?\nASSISTANT:"

inputs = processor(images=image, text=prompt, return_tensors="pt").to(model.device, model.dtype)

output = model.generate(**inputs, max_new_tokens=100, do_sample=False)
print(processor.decode(output[0], skip_special_tokens=True))