Spaces:

Gilfoyle-alised
/

Image_Captioning

Sleeping

add models and app.py and another files for initialize the space

257509a 10 days ago

1.46 kB

	---
	title: Image Captioning Model Comparison
	emoji: 🖼️
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	app_file: app.py
	pinned: false
	---

	# Image Captioning Model Comparison

	This Space lets you test three image captioning models in one live Gradio app:

	1. Custom EfficientNet-V2-S + Transformer trained on 5k samples
	2. Custom EfficientNet-V2-S + Transformer trained on 100k samples
	3. BLIP image-captioning base fine-tuned with LoRA on COCO 2014

	Upload an image, choose a model, and generate a caption. You can also compare all three models on the same image.

	## Files

	```text
	.
	├── app.py
	├── custom_caption_model.py
	├── requirements.txt
	├── README.md
	└── models/
	├── custom_5k/
	│ ├── best_phase-5k.pt
	│ └── vocab-5k.json
	├── custom_100k/
	│ ├── best_phase-100k.pt
	│ └── vocab-100k.json
	└── blip_lora/
	├── adapter_config.json
	├── adapter_model.safetensors
	├── preprocessor_config.json
	├── tokenizer.json
	├── tokenizer_config.json
	├── special_tokens_map.json
	└── vocab.txt
	```

	## Notes

	The custom models use their original PyTorch architecture and saved vocabularies. The BLIP model uses the base model `Salesforce/blip-image-captioning-base` plus the LoRA adapter files.

	For faster inference, use GPU hardware in the Space settings.