Spaces:

kkkai123456
/

HW_3

Sleeping

App Files Files Community

kkkai123456 commited on Nov 5

Commit

4f09101

verified ·

1 Parent(s): ce86ad4

Update README.md

Browse files

Files changed (1) hide show

README.md +126 -72

README.md CHANGED Viewed

@@ -11,129 +11,183 @@ pinned: false
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-🤖 Vision Language AI Demo
 A comprehensive web application showcasing state-of-the-art Vision-Language AI models.
-✨ Features
-🖼️ Image Captioning
 Automatically generate natural language descriptions of images using BLIP model.
-🔍 Visual Question Answering (VQA)
 Ask questions about images and get intelligent answers based on visual content.
-🏷️ Zero-Shot Image Classification
 Classify images into custom categories without training using CLIP model.
-💬 Multimodal Chat
 Interactive conversations about image content with context retention.
-📸 Demo Screenshots
-Main Interface
-Show Image
-Image Captioning
-Show Image
-Visual Question Answering
-Show Image
-Zero-Shot Classification
-Show Image
-Multimodal Chat
-Show Image
-🚀 Quick Start
-Local Run
-bashpip install -r requirements.txt
-python app.py
-Access at http://localhost:7860
-Deploy to Hugging Face Spaces
-Create a Space
-Go to https://huggingface.co/spaces
-Click "Create new Space"
-Choose name and select Gradio SDK
-Upload Files
-Upload app.py, requirements.txt, and README.md
-Or use Git:
-bash   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
    # Copy your files here
    git add .
    git commit -m "Initial commit"
    git push
-Wait for Build
-Space will auto-deploy in 5-10 minutes
-Access at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
-Enable GPU (Optional)
-Go to Space Settings → Hardware
-Select GPU option for faster processing
-Restart the Space
-🛠️ Models Used
-ModelPurposeSizeBLIP-CaptioningImage Description447MBBLIP-VQAVisual Q&A447MBCLIPClassification605MB
-📖 Usage Examples
-Image Captioning
 Upload an image → Click "Generate Caption" → Get description
-Example Output:
 📝 Image Caption:
 A golden retriever sitting in a park with green grass
-Visual Question Answering
 Upload image → Ask question → Get answer
-Example:
 Q: What color is the car?
 A: red
-Zero-Shot Classification
 Upload image → Define categories (comma-separated) → Get probabilities
-Example:
 Categories: cat, dog, bird
 Results:
 cat:  92.5% ██████████████████
 dog:   5.2% █
 bird:  2.3% ▌
-Multimodal Chat
 Upload image → Chat naturally about it
-Example:
 You: Describe this image
 AI: A modern kitchen with white cabinets
 You: What color are the walls?
 AI: white
-⚙️ Configuration
-Change Models
-Edit app.py to use different models:
-python# Use larger BLIP model
 caption_model = BlipForConditionalGeneration.from_pretrained(
     "Salesforce/blip-image-captioning-large"
 )
-Customize Interface
-Modify custom_css in app.py:
-pythoncustom_css = """
 #title {
     background: linear-gradient(90deg, #YOUR_COLOR 0%, #YOUR_COLOR 100%);
 }
 """
-🐛 Troubleshooting
-Issue: Models downloading slowly
-bash# Set cache directory
 export HF_HOME=/path/to/storage
-Issue: Out of memory
-python# Use CPU only
 device = "cpu"
-Issue: Port already in use
-bashpython app.py --server-port 8080
-📄 License
-MIT License - See LICENSE file
-🙏 Acknowledgments
-Salesforce BLIP
-OpenAI CLIP
-Hugging Face
-Gradio
-⭐ Star this project if you find it helpful!

 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# 🤖 Vision Language AI Demo
 A comprehensive web application showcasing state-of-the-art Vision-Language AI models.
+## ✨ Features
+### 🖼️ Image Captioning
 Automatically generate natural language descriptions of images using BLIP model.
+### 🔍 Visual Question Answering (VQA)
 Ask questions about images and get intelligent answers based on visual content.
+### 🏷️ Zero-Shot Image Classification
 Classify images into custom categories without training using CLIP model.
+### 💬 Multimodal Chat
 Interactive conversations about image content with context retention.
+## 📸 Demo Screenshots
+### Main Interface
+![Main Interface](https://via.placeholder.com/800x400/667eea/ffffff?text=Main+Interface)
+### Image Captioning
+![Image Captioning](https://via.placeholder.com/800x400/667eea/ffffff?text=Image+Captioning)
+### Visual Question Answering
+![VQA](https://via.placeholder.com/800x400/667eea/ffffff?text=Visual+QA)
+### Zero-Shot Classification
+![Classification](https://via.placeholder.com/800x400/667eea/ffffff?text=Classification)
+### Multimodal Chat
+![Chat](https://via.placeholder.com/800x400/667eea/ffffff?text=Multimodal+Chat)
+## 🚀 Quick Start
+### Local Run
+```bash
+pip install -r requirements.txt
+python app.py
+```
+Access at `http://localhost:7860`
+### Deploy to Hugging Face Spaces
+1. **Create a Space**
+   - Go to https://huggingface.co/spaces
+   - Click "Create new Space"
+   - Choose name and select **Gradio** SDK
+2. **Upload Files**
+   - Upload `app.py`, `requirements.txt`, and `README.md`
+   - Or use Git:
+   ```bash
+   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
    cd YOUR_SPACE_NAME
    # Copy your files here
    git add .
    git commit -m "Initial commit"
    git push
+   ```
+3. **Wait for Build**
+   - Space will auto-deploy in 5-10 minutes
+   - Access at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
+### Enable GPU (Optional)
+- Go to Space Settings → Hardware
+- Select GPU option for faster processing
+- Restart the Space
+## 🛠️ Models Used
+| Model | Purpose | Size |
+|-------|---------|------|
+| [BLIP-Captioning](https://huggingface.co/Salesforce/blip-image-captioning-base) | Image Description | 447MB |
+| [BLIP-VQA](https://huggingface.co/Salesforce/blip-vqa-base) | Visual Q&A | 447MB |
+| [CLIP](https://huggingface.co/openai/clip-vit-base-patch32) | Classification | 605MB |
+## 📖 Usage Examples
+### Image Captioning
 Upload an image → Click "Generate Caption" → Get description
+**Example Output:**
+```
 📝 Image Caption:
 A golden retriever sitting in a park with green grass
+```
+### Visual Question Answering
 Upload image → Ask question → Get answer
+**Example:**
+```
 Q: What color is the car?
 A: red
+```
+### Zero-Shot Classification
 Upload image → Define categories (comma-separated) → Get probabilities
+**Example:**
+```
 Categories: cat, dog, bird
 Results:
 cat:  92.5% ██████████████████
 dog:   5.2% █
 bird:  2.3% ▌
+```
+### Multimodal Chat
 Upload image → Chat naturally about it
+**Example:**
+```
 You: Describe this image
 AI: A modern kitchen with white cabinets
 You: What color are the walls?
 AI: white
+```
+## ⚙️ Configuration
+### Change Models
+Edit `app.py` to use different models:
+```python
+# Use larger BLIP model
 caption_model = BlipForConditionalGeneration.from_pretrained(
     "Salesforce/blip-image-captioning-large"
 )
+```
+### Customize Interface
+Modify `custom_css` in `app.py`:
+```python
+custom_css = """
 #title {
     background: linear-gradient(90deg, #YOUR_COLOR 0%, #YOUR_COLOR 100%);
 }
 """
+```
+## 🐛 Troubleshooting
+**Issue: Models downloading slowly**
+```bash
+# Set cache directory
 export HF_HOME=/path/to/storage
+```
+**Issue: Out of memory**
+```python
+# Use CPU only
 device = "cpu"
+```
+**Issue: Port already in use**
+```bash
+python app.py --server-port 8080
+```
+## 📄 License
+MIT License - See [LICENSE](LICENSE) file
+## 🙏 Acknowledgments
+- [Salesforce BLIP](https://github.com/salesforce/BLIP)
+- [OpenAI CLIP](https://github.com/openai/CLIP)
+- [Hugging Face](https://huggingface.co/)
+- [Gradio](https://gradio.app/)
+---
+**⭐ Star this project if you find it helpful!**