YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
BLIP HyperModulator
An enhanced hypernetwork that generates LoRA adapters for BLIP image captioning models based on natural language task descriptions.
Overview
This model is a sophisticated hypernetwork that can generate task-specific LoRA (Low-Rank Adaptation) weights for BLIP image captioning models. Given a natural language description of a desired captioning style or task, it generates appropriate LoRA adapters to modify the BLIP model's behavior.
Features
- Cross-attention: Attention mechanism between task descriptions and layer conditioning
- Image-aware adaptation: Optional conditioning on image features for context-aware LoRA generation
- Adaptive gating: Learned gating mechanisms to control LoRA application strength
- Parameter scaling: Dynamic scaling of LoRA parameters
- Caching: Efficient caching of similar task embeddings
- Task interpolation: Ability to blend different captioning styles
Architecture
- Total Parameters: 515,644,222
- LoRA Rank: 16
- LoRA Alpha: 32
- Task Embedding Size: 384
- Latent Size: 256
- Target Modules: dense, fc1, fc2, intermediate, key, output, projection, qkv, query, value
Usage
from transformers import BlipForConditionalGeneration, BlipProcessor
from blip_hyper_modulator import BlipHyperModulator
from sentence_transformers import SentenceTransformer
# Load base BLIP model
blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
# Load hypermodulator
hypermodulator = BlipHyperModulator.from_pretrained("path/to/model", blip_model)
# Load text encoder
text_encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
# Generate LoRA weights
lora_weights = hypermodulator.generate_lora_weights(
task_description="Describe this image like a pirate would",
text_encoder=text_encoder,
text_tokenizer=None
)
Model Details
- Base Model: Salesforce/blip-image-captioning-large
- Created: 2025-08-29
- Version: 1.0.0
- Framework: PyTorch
- License: MIT
Citation
If you use this model, please cite:
@misc{blip-hypermodulator,
title={BLIP HyperModulator: Dynamic LoRA Generation for Image Captioning},
author={Your Name},
year={2024},
url={https://huggingface.co/TanVir17Niloy/blip-hypermodulator-v1}
}
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support