Image-to-Text
Transformers
Safetensors
PyTorch
blip
image-text-to-text
image-captioning
vision-language-model
multimodal-ai
computer-vision
deep-learning
Instructions to use YaekobB/blip-caption-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use YaekobB/blip-caption-model with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="YaekobB/blip-caption-model")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("YaekobB/blip-caption-model") model = AutoModelForImageTextToText.from_pretrained("YaekobB/blip-caption-model") - Notebooks
- Google Colab
- Kaggle
File size: 1,370 Bytes
386f5a8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | {
"architectures": [
"BlipForConditionalGeneration"
],
"dtype": "float32",
"image_text_hidden_size": 256,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"label_smoothing": 0.0,
"logit_scale_init_value": 2.6592,
"model_type": "blip",
"pad_token_id": 0,
"projection_dim": 512,
"text_config": {
"attention_probs_dropout_prob": 0.4,
"dtype": "float32",
"encoder_hidden_size": 768,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.4,
"hidden_size": 768,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"label_smoothing": 0.0,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "blip_text_model",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"projection_dim": 768,
"use_cache": true,
"vocab_size": 30524
},
"transformers_version": "4.56.0",
"vision_config": {
"attention_dropout": 0.0,
"dropout": 0.0,
"dtype": "float32",
"hidden_act": "gelu",
"hidden_size": 768,
"image_size": 384,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"model_type": "blip_vision_model",
"num_attention_heads": 12,
"num_channels": 3,
"num_hidden_layers": 12,
"patch_size": 16,
"projection_dim": 512
}
}
|