Spaces:

kkkai123456
/

HW_3

Sleeping

App Files Files Community

HW_3 / README.md

kkkai123456

Update README.md

4f09101 verified 6 months ago

4.44 kB

title: HW 3 Vision Language AI Demo
emoji: 🤖
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

🤖 Vision Language AI Demo

A comprehensive web application showcasing state-of-the-art Vision-Language AI models.

✨ Features

🖼️ Image Captioning

Automatically generate natural language descriptions of images using BLIP model.

🔍 Visual Question Answering (VQA)

Ask questions about images and get intelligent answers based on visual content.

🏷️ Zero-Shot Image Classification

Classify images into custom categories without training using CLIP model.

💬 Multimodal Chat

Interactive conversations about image content with context retention.

📸 Demo Screenshots

Main Interface

Image Captioning

Visual Question Answering

Zero-Shot Classification

Multimodal Chat

🚀 Quick Start

Local Run

pip install -r requirements.txt
python app.py

Access at http://localhost:7860

Deploy to Hugging Face Spaces

Create a Space
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose name and select Gradio SDK

Upload Files

Upload app.py, requirements.txt, and README.md
Or use Git:

git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME
# Copy your files here
git add .
git commit -m "Initial commit"
git push

Wait for Build
- Space will auto-deploy in 5-10 minutes
- Access at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

Enable GPU (Optional)

Go to Space Settings → Hardware
Select GPU option for faster processing
Restart the Space

🛠️ Models Used

Model	Purpose	Size
BLIP-Captioning	Image Description	447MB
BLIP-VQA	Visual Q&A	447MB
CLIP	Classification	605MB

📖 Usage Examples

Image Captioning

Upload an image → Click "Generate Caption" → Get description

Example Output:

📝 Image Caption:
A golden retriever sitting in a park with green grass

Visual Question Answering

Upload image → Ask question → Get answer

Example:

Q: What color is the car?
A: red

Zero-Shot Classification

Upload image → Define categories (comma-separated) → Get probabilities

Example:

Categories: cat, dog, bird
Results:
cat:  92.5% ██████████████████
dog:   5.2% █
bird:  2.3% ▌

Multimodal Chat

Upload image → Chat naturally about it

Example:

You: Describe this image
AI: A modern kitchen with white cabinets
You: What color are the walls?
AI: white

⚙️ Configuration

Change Models

Edit app.py to use different models:

# Use larger BLIP model
caption_model = BlipForConditionalGeneration.from_pretrained(
    "Salesforce/blip-image-captioning-large"
)

Customize Interface

Modify custom_css in app.py:

custom_css = """
#title {
    background: linear-gradient(90deg, #YOUR_COLOR 0%, #YOUR_COLOR 100%);
}
"""

🐛 Troubleshooting

Issue: Models downloading slowly

# Set cache directory
export HF_HOME=/path/to/storage

Issue: Out of memory

# Use CPU only
device = "cpu"

Issue: Port already in use

python app.py --server-port 8080

📄 License

MIT License - See LICENSE file

🙏 Acknowledgments

⭐ Star this project if you find it helpful!