Spaces:
Sleeping
title: HW 3 Vision Language AI Demo
emoji: π€
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
π€ Vision Language AI Demo
A comprehensive web application showcasing state-of-the-art Vision-Language AI models.
β¨ Features
πΌοΈ Image Captioning
Automatically generate natural language descriptions of images using BLIP model.
π Visual Question Answering (VQA)
Ask questions about images and get intelligent answers based on visual content.
π·οΈ Zero-Shot Image Classification
Classify images into custom categories without training using CLIP model.
π¬ Multimodal Chat
Interactive conversations about image content with context retention.
πΈ Demo Screenshots
Main Interface
Image Captioning
Visual Question Answering
Zero-Shot Classification
Multimodal Chat
π Quick Start
Local Run
pip install -r requirements.txt
python app.py
Access at http://localhost:7860
Deploy to Hugging Face Spaces
Create a Space
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose name and select Gradio SDK
Upload Files
- Upload
app.py,requirements.txt, andREADME.md - Or use Git:
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME cd YOUR_SPACE_NAME # Copy your files here git add . git commit -m "Initial commit" git push- Upload
Wait for Build
- Space will auto-deploy in 5-10 minutes
- Access at:
https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
Enable GPU (Optional)
- Go to Space Settings β Hardware
- Select GPU option for faster processing
- Restart the Space
π οΈ Models Used
| Model | Purpose | Size |
|---|---|---|
| BLIP-Captioning | Image Description | 447MB |
| BLIP-VQA | Visual Q&A | 447MB |
| CLIP | Classification | 605MB |
π Usage Examples
Image Captioning
Upload an image β Click "Generate Caption" β Get description
Example Output:
π Image Caption:
A golden retriever sitting in a park with green grass
Visual Question Answering
Upload image β Ask question β Get answer
Example:
Q: What color is the car?
A: red
Zero-Shot Classification
Upload image β Define categories (comma-separated) β Get probabilities
Example:
Categories: cat, dog, bird
Results:
cat: 92.5% ββββββββββββββββββ
dog: 5.2% β
bird: 2.3% β
Multimodal Chat
Upload image β Chat naturally about it
Example:
You: Describe this image
AI: A modern kitchen with white cabinets
You: What color are the walls?
AI: white
βοΈ Configuration
Change Models
Edit app.py to use different models:
# Use larger BLIP model
caption_model = BlipForConditionalGeneration.from_pretrained(
"Salesforce/blip-image-captioning-large"
)
Customize Interface
Modify custom_css in app.py:
custom_css = """
#title {
background: linear-gradient(90deg, #YOUR_COLOR 0%, #YOUR_COLOR 100%);
}
"""
π Troubleshooting
Issue: Models downloading slowly
# Set cache directory
export HF_HOME=/path/to/storage
Issue: Out of memory
# Use CPU only
device = "cpu"
Issue: Port already in use
python app.py --server-port 8080
π License
MIT License - See LICENSE file
π Acknowledgments
β Star this project if you find it helpful!