Image-to-Text
Transformers
Safetensors
Vietnamese
vision-encoder-decoder
image-text-to-text
image-captioning
vietnamese
deit
gpt2
Instructions to use slyviee/vietnamese-image-captioning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use slyviee/vietnamese-image-captioning with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="slyviee/vietnamese-image-captioning")# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("slyviee/vietnamese-image-captioning") model = AutoModelForImageTextToText.from_pretrained("slyviee/vietnamese-image-captioning") - Notebooks
- Google Colab
- Kaggle
Vietnamese Image Captioning
Model image captioning tieng Viet duoc train tren Flickr8k va 10k anh MSCOCO da dich sang tieng Viet.
Model nhan mot anh dau vao va sinh mot cau chu thich bang tieng Viet.
Architecture
- Encoder:
facebook/deit-base-distilled-patch16-224 - Decoder:
NlpHUST/gpt2-vietnamese - Transformers class:
VisionEncoderDecoderModel
Usage
from PIL import Image
import torch
from transformers import AutoImageProcessor, AutoTokenizer, VisionEncoderDecoderModel
model_id = "slyviee/vietnamese-image-captioning"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
image_processor = AutoImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
model = VisionEncoderDecoderModel.from_pretrained(model_id).to(device)
model.eval()
image = Image.open("image.jpg").convert("RGB")
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(device)
with torch.no_grad():
output_ids = model.generate(pixel_values, max_new_tokens=40, num_beams=4)
caption = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(caption)
- Downloads last month
- -