YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
BLIP HyperModulator
An enhanced hypernetwork that generates LoRA adapters for BLIP image captioning models based on natural language task descriptions.
Overview
This model is a sophisticated hypernetwork that can generate task-specific LoRA (Low-Rank Adaptation) weights for BLIP image captioning models. Given a natural language description of a desired captioning style or task, it generates appropriate LoRA adapters to modify the BLIP model's behavior.
Features
- Cross-attention: Attention mechanism between task descriptions and layer conditioning
- Image-aware adaptation: Optional conditioning on image features for context-aware LoRA generation
- Adaptive gating: Learned gating mechanisms to control LoRA application strength
- Parameter scaling: Dynamic scaling of LoRA parameters
- Caching: Efficient caching of similar task embeddings
- Task interpolation: Ability to blend different captioning styles
Architecture
- Total Parameters: 515,644,222
- LoRA Rank: 16
- LoRA Alpha: 32
- Task Embedding Size: 384
- Latent Size: 256
- Target Modules: dense, fc1, fc2, intermediate, key, output, projection, qkv, query, value
Usage
from transformers import BlipForConditionalGeneration, BlipProcessor
from blip_hyper_modulator import BlipHyperModulator
from sentence_transformers import SentenceTransformer
# Load base BLIP model
blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
# Load hypermodulator
hypermodulator = BlipHyperModulator.from_pretrained("path/to/model", blip_model)
# Load text encoder
text_encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
# Generate LoRA weights
lora_weights = hypermodulator.generate_lora_weights(
task_description="Describe this image like a pirate would",
text_encoder=text_encoder,
text_tokenizer=None
)
Model Details
- Base Model: Salesforce/blip-image-captioning-large
- Created: 2025-08-29
- Version: 1.0.0
- Framework: PyTorch
- License: MIT
Citation
If you use this model, please cite:
@misc{blip-hypermodulator,
title={BLIP HyperModulator: Dynamic LoRA Generation for Image Captioning},
author={Your Name},
year={2024},
url={https://huggingface.co/TanVir17Niloy/blip-hypermodulator-v1}
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support