Image_Captioning / README.md
Gilfoyle-alised's picture
add models and app.py and another files for initialize the space
257509a
|
Raw
History Blame Contribute Delete
1.46 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Image Captioning Model Comparison
emoji: πŸ–ΌοΈ
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false

Image Captioning Model Comparison

This Space lets you test three image captioning models in one live Gradio app:

  1. Custom EfficientNet-V2-S + Transformer trained on 5k samples
  2. Custom EfficientNet-V2-S + Transformer trained on 100k samples
  3. BLIP image-captioning base fine-tuned with LoRA on COCO 2014

Upload an image, choose a model, and generate a caption. You can also compare all three models on the same image.

Files

.
β”œβ”€β”€ app.py
β”œβ”€β”€ custom_caption_model.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── models/
    β”œβ”€β”€ custom_5k/
    β”‚   β”œβ”€β”€ best_phase-5k.pt
    β”‚   └── vocab-5k.json
    β”œβ”€β”€ custom_100k/
    β”‚   β”œβ”€β”€ best_phase-100k.pt
    β”‚   └── vocab-100k.json
    └── blip_lora/
        β”œβ”€β”€ adapter_config.json
        β”œβ”€β”€ adapter_model.safetensors
        β”œβ”€β”€ preprocessor_config.json
        β”œβ”€β”€ tokenizer.json
        β”œβ”€β”€ tokenizer_config.json
        β”œβ”€β”€ special_tokens_map.json
        └── vocab.txt

Notes

The custom models use their original PyTorch architecture and saved vocabularies. The BLIP model uses the base model Salesforce/blip-image-captioning-base plus the LoRA adapter files.

For faster inference, use GPU hardware in the Space settings.