Spaces:
Sleeping
Sleeping
| title: Image Captioning Model Comparison | |
| emoji: πΌοΈ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: false | |
| # Image Captioning Model Comparison | |
| This Space lets you test three image captioning models in one live Gradio app: | |
| 1. Custom EfficientNet-V2-S + Transformer trained on 5k samples | |
| 2. Custom EfficientNet-V2-S + Transformer trained on 100k samples | |
| 3. BLIP image-captioning base fine-tuned with LoRA on COCO 2014 | |
| Upload an image, choose a model, and generate a caption. You can also compare all three models on the same image. | |
| ## Files | |
| ```text | |
| . | |
| βββ app.py | |
| βββ custom_caption_model.py | |
| βββ requirements.txt | |
| βββ README.md | |
| βββ models/ | |
| βββ custom_5k/ | |
| β βββ best_phase-5k.pt | |
| β βββ vocab-5k.json | |
| βββ custom_100k/ | |
| β βββ best_phase-100k.pt | |
| β βββ vocab-100k.json | |
| βββ blip_lora/ | |
| βββ adapter_config.json | |
| βββ adapter_model.safetensors | |
| βββ preprocessor_config.json | |
| βββ tokenizer.json | |
| βββ tokenizer_config.json | |
| βββ special_tokens_map.json | |
| βββ vocab.txt | |
| ``` | |
| ## Notes | |
| The custom models use their original PyTorch architecture and saved vocabularies. The BLIP model uses the base model `Salesforce/blip-image-captioning-base` plus the LoRA adapter files. | |
| For faster inference, use GPU hardware in the Space settings. | |