YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BLIP HyperModulator

An enhanced hypernetwork that generates LoRA adapters for BLIP image captioning models based on natural language task descriptions.

Overview

This model is a sophisticated hypernetwork that can generate task-specific LoRA (Low-Rank Adaptation) weights for BLIP image captioning models. Given a natural language description of a desired captioning style or task, it generates appropriate LoRA adapters to modify the BLIP model's behavior.

Features

  • Cross-attention: Attention mechanism between task descriptions and layer conditioning
  • Image-aware adaptation: Optional conditioning on image features for context-aware LoRA generation
  • Adaptive gating: Learned gating mechanisms to control LoRA application strength
  • Parameter scaling: Dynamic scaling of LoRA parameters
  • Caching: Efficient caching of similar task embeddings
  • Task interpolation: Ability to blend different captioning styles

Architecture

  • Total Parameters: 515,644,222
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Task Embedding Size: 384
  • Latent Size: 256
  • Target Modules: dense, fc1, fc2, intermediate, key, output, projection, qkv, query, value

Usage

from transformers import BlipForConditionalGeneration, BlipProcessor
from blip_hyper_modulator import BlipHyperModulator
from sentence_transformers import SentenceTransformer

# Load base BLIP model
blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")

# Load hypermodulator
hypermodulator = BlipHyperModulator.from_pretrained("path/to/model", blip_model)

# Load text encoder
text_encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Generate LoRA weights
lora_weights = hypermodulator.generate_lora_weights(
    task_description="Describe this image like a pirate would",
    text_encoder=text_encoder,
    text_tokenizer=None
)

Model Details

  • Base Model: Salesforce/blip-image-captioning-large
  • Created: 2025-08-29
  • Version: 1.0.0
  • Framework: PyTorch
  • License: MIT

Citation

If you use this model, please cite:

@misc{blip-hypermodulator,
  title={BLIP HyperModulator: Dynamic LoRA Generation for Image Captioning},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/TanVir17Niloy/blip-hypermodulator-v1}
}
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support