Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
metadata
title: Image Captioning Model Comparison
emoji: πΌοΈ
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
Image Captioning Model Comparison
This Space lets you test three image captioning models in one live Gradio app:
- Custom EfficientNet-V2-S + Transformer trained on 5k samples
- Custom EfficientNet-V2-S + Transformer trained on 100k samples
- BLIP image-captioning base fine-tuned with LoRA on COCO 2014
Upload an image, choose a model, and generate a caption. You can also compare all three models on the same image.
Files
.
βββ app.py
βββ custom_caption_model.py
βββ requirements.txt
βββ README.md
βββ models/
βββ custom_5k/
β βββ best_phase-5k.pt
β βββ vocab-5k.json
βββ custom_100k/
β βββ best_phase-100k.pt
β βββ vocab-100k.json
βββ blip_lora/
βββ adapter_config.json
βββ adapter_model.safetensors
βββ preprocessor_config.json
βββ tokenizer.json
βββ tokenizer_config.json
βββ special_tokens_map.json
βββ vocab.txt
Notes
The custom models use their original PyTorch architecture and saved vocabularies. The BLIP model uses the base model Salesforce/blip-image-captioning-base plus the LoRA adapter files.
For faster inference, use GPU hardware in the Space settings.