YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Granite Vision 4 Pretrained (Base Model)

This is a pretrained Granite Vision 4 model with custom modeling code, compatible with the latest Transformers library.

Compatibility

  • ✅ Transformers >= 4.57.3
  • ✅ SFT / Fine-tuning ready (TRL, PEFT, LoRA)
  • ✅ Requires trust_remote_code=True

Usage

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model_path = "granite-vision-dev/granite-vision-pretrained"
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(model_path, trust_remote_code=True).to(device)

# Prepare inputs
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "path/to/image.png"},
            {"type": "text", "text": "Describe this image."},
        ],
    },
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(device)

output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))

Fine-tuning with LoRA

from transformers import AutoModelForVision2Seq, AutoProcessor
from peft import LoraConfig, get_peft_model
import torch

model_path = "granite-vision-dev/granite-vision-pretrained"

model = AutoModelForVision2Seq.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

lora_config = LoraConfig(
    r=64,
    lora_alpha=64,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
)

model = get_peft_model(model, lora_config)

Model Architecture

  • Vision encoder: SigLIP2
  • Vision-language connector: Two-layer MLP with GELU activation
  • Language model: Granite 3.1 2B Instruct

License

Apache 2.0

Downloads last month
429
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for granite-vision-dev/granite-4-vision-micro-pretrained

Adapters
1 model

Collection including granite-vision-dev/granite-4-vision-micro-pretrained