Spaces:

Gilfoyle-alised
/

Image_Captioning

Sleeping

App Files Files Community

Image_Captioning / README.md

Gilfoyle-alised

add models and app.py and another files for initialize the space

257509a 10 days ago

preview code

Raw

History Blame Contribute Delete

1.46 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: Image Captioning Model Comparison
emoji: 🖼️
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false

Image Captioning Model Comparison

This Space lets you test three image captioning models in one live Gradio app:

Custom EfficientNet-V2-S + Transformer trained on 5k samples
Custom EfficientNet-V2-S + Transformer trained on 100k samples
BLIP image-captioning base fine-tuned with LoRA on COCO 2014

Upload an image, choose a model, and generate a caption. You can also compare all three models on the same image.

Files

.
├── app.py
├── custom_caption_model.py
├── requirements.txt
├── README.md
└── models/
    ├── custom_5k/
    │   ├── best_phase-5k.pt
    │   └── vocab-5k.json
    ├── custom_100k/
    │   ├── best_phase-100k.pt
    │   └── vocab-100k.json
    └── blip_lora/
        ├── adapter_config.json
        ├── adapter_model.safetensors
        ├── preprocessor_config.json
        ├── tokenizer.json
        ├── tokenizer_config.json
        ├── special_tokens_map.json
        └── vocab.txt

Notes

The custom models use their original PyTorch architecture and saved vocabularies. The BLIP model uses the base model Salesforce/blip-image-captioning-base plus the LoRA adapter files.

For faster inference, use GPU hardware in the Space settings.