Image_Captioning / README.md
Gilfoyle-alised's picture
add models and app.py and another files for initialize the space
257509a
|
Raw
History Blame Contribute Delete
1.46 kB
---
title: Image Captioning Model Comparison
emoji: πŸ–ΌοΈ
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---
# Image Captioning Model Comparison
This Space lets you test three image captioning models in one live Gradio app:
1. Custom EfficientNet-V2-S + Transformer trained on 5k samples
2. Custom EfficientNet-V2-S + Transformer trained on 100k samples
3. BLIP image-captioning base fine-tuned with LoRA on COCO 2014
Upload an image, choose a model, and generate a caption. You can also compare all three models on the same image.
## Files
```text
.
β”œβ”€β”€ app.py
β”œβ”€β”€ custom_caption_model.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── models/
β”œβ”€β”€ custom_5k/
β”‚ β”œβ”€β”€ best_phase-5k.pt
β”‚ └── vocab-5k.json
β”œβ”€β”€ custom_100k/
β”‚ β”œβ”€β”€ best_phase-100k.pt
β”‚ └── vocab-100k.json
└── blip_lora/
β”œβ”€β”€ adapter_config.json
β”œβ”€β”€ adapter_model.safetensors
β”œβ”€β”€ preprocessor_config.json
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ special_tokens_map.json
└── vocab.txt
```
## Notes
The custom models use their original PyTorch architecture and saved vocabularies. The BLIP model uses the base model `Salesforce/blip-image-captioning-base` plus the LoRA adapter files.
For faster inference, use GPU hardware in the Space settings.