kkkai123456 commited on
Commit
4f09101
Β·
verified Β·
1 Parent(s): ce86ad4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -72
README.md CHANGED
@@ -11,129 +11,183 @@ pinned: false
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
 
14
- πŸ€– Vision Language AI Demo
 
15
  A comprehensive web application showcasing state-of-the-art Vision-Language AI models.
16
- ✨ Features
17
- πŸ–ΌοΈ Image Captioning
 
 
18
  Automatically generate natural language descriptions of images using BLIP model.
19
- πŸ” Visual Question Answering (VQA)
 
20
  Ask questions about images and get intelligent answers based on visual content.
21
- 🏷️ Zero-Shot Image Classification
 
22
  Classify images into custom categories without training using CLIP model.
23
- πŸ’¬ Multimodal Chat
 
24
  Interactive conversations about image content with context retention.
25
- πŸ“Έ Demo Screenshots
26
- Main Interface
27
- Show Image
28
- Image Captioning
29
- Show Image
30
- Visual Question Answering
31
- Show Image
32
- Zero-Shot Classification
33
- Show Image
34
- Multimodal Chat
35
- Show Image
36
- πŸš€ Quick Start
37
- Local Run
38
- bashpip install -r requirements.txt
39
- python app.py
40
- Access at http://localhost:7860
41
- Deploy to Hugging Face Spaces
42
 
43
- Create a Space
 
 
 
44
 
45
- Go to https://huggingface.co/spaces
46
- Click "Create new Space"
47
- Choose name and select Gradio SDK
48
 
 
 
49
 
50
- Upload Files
 
51
 
52
- Upload app.py, requirements.txt, and README.md
53
- Or use Git:
54
 
 
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
- bash git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
 
 
 
 
58
  cd YOUR_SPACE_NAME
59
  # Copy your files here
60
  git add .
61
  git commit -m "Initial commit"
62
  git push
 
63
 
64
- Wait for Build
 
 
65
 
66
- Space will auto-deploy in 5-10 minutes
67
- Access at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
 
 
68
 
 
69
 
 
 
 
 
 
70
 
71
- Enable GPU (Optional)
72
 
73
- Go to Space Settings β†’ Hardware
74
- Select GPU option for faster processing
75
- Restart the Space
76
-
77
- πŸ› οΈ Models Used
78
- ModelPurposeSizeBLIP-CaptioningImage Description447MBBLIP-VQAVisual Q&A447MBCLIPClassification605MB
79
- πŸ“– Usage Examples
80
- Image Captioning
81
  Upload an image β†’ Click "Generate Caption" β†’ Get description
82
- Example Output:
 
 
83
  πŸ“ Image Caption:
84
  A golden retriever sitting in a park with green grass
85
- Visual Question Answering
 
 
86
  Upload image β†’ Ask question β†’ Get answer
87
- Example:
 
 
88
  Q: What color is the car?
89
  A: red
90
- Zero-Shot Classification
 
 
91
  Upload image β†’ Define categories (comma-separated) β†’ Get probabilities
92
- Example:
 
 
93
  Categories: cat, dog, bird
94
  Results:
95
  cat: 92.5% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
96
  dog: 5.2% β–ˆ
97
  bird: 2.3% β–Œ
98
- Multimodal Chat
 
 
99
  Upload image β†’ Chat naturally about it
100
- Example:
 
 
101
  You: Describe this image
102
  AI: A modern kitchen with white cabinets
103
  You: What color are the walls?
104
  AI: white
105
- βš™οΈ Configuration
106
- Change Models
107
- Edit app.py to use different models:
108
- python# Use larger BLIP model
 
 
 
 
109
  caption_model = BlipForConditionalGeneration.from_pretrained(
110
  "Salesforce/blip-image-captioning-large"
111
  )
112
- Customize Interface
113
- Modify custom_css in app.py:
114
- pythoncustom_css = """
 
 
 
115
  #title {
116
  background: linear-gradient(90deg, #YOUR_COLOR 0%, #YOUR_COLOR 100%);
117
  }
118
  """
119
- πŸ› Troubleshooting
120
- Issue: Models downloading slowly
121
- bash# Set cache directory
 
 
 
 
122
  export HF_HOME=/path/to/storage
123
- Issue: Out of memory
124
- python# Use CPU only
 
 
 
125
  device = "cpu"
126
- Issue: Port already in use
127
- bashpython app.py --server-port 8080
128
- πŸ“„ License
129
- MIT License - See LICENSE file
130
- πŸ™ Acknowledgments
 
 
 
131
 
132
- Salesforce BLIP
133
- OpenAI CLIP
134
- Hugging Face
135
- Gradio
 
 
 
 
 
 
136
 
 
137
 
138
- ⭐ Star this project if you find it helpful!
139
 
 
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
 
14
+ # πŸ€– Vision Language AI Demo
15
+
16
  A comprehensive web application showcasing state-of-the-art Vision-Language AI models.
17
+
18
+ ## ✨ Features
19
+
20
+ ### πŸ–ΌοΈ Image Captioning
21
  Automatically generate natural language descriptions of images using BLIP model.
22
+
23
+ ### πŸ” Visual Question Answering (VQA)
24
  Ask questions about images and get intelligent answers based on visual content.
25
+
26
+ ### 🏷️ Zero-Shot Image Classification
27
  Classify images into custom categories without training using CLIP model.
28
+
29
+ ### πŸ’¬ Multimodal Chat
30
  Interactive conversations about image content with context retention.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
+ ## πŸ“Έ Demo Screenshots
33
+
34
+ ### Main Interface
35
+ ![Main Interface](https://via.placeholder.com/800x400/667eea/ffffff?text=Main+Interface)
36
 
37
+ ### Image Captioning
38
+ ![Image Captioning](https://via.placeholder.com/800x400/667eea/ffffff?text=Image+Captioning)
 
39
 
40
+ ### Visual Question Answering
41
+ ![VQA](https://via.placeholder.com/800x400/667eea/ffffff?text=Visual+QA)
42
 
43
+ ### Zero-Shot Classification
44
+ ![Classification](https://via.placeholder.com/800x400/667eea/ffffff?text=Classification)
45
 
46
+ ### Multimodal Chat
47
+ ![Chat](https://via.placeholder.com/800x400/667eea/ffffff?text=Multimodal+Chat)
48
 
49
+ ## πŸš€ Quick Start
50
 
51
+ ### Local Run
52
+ ```bash
53
+ pip install -r requirements.txt
54
+ python app.py
55
+ ```
56
+
57
+ Access at `http://localhost:7860`
58
+
59
+ ### Deploy to Hugging Face Spaces
60
+
61
+ 1. **Create a Space**
62
+ - Go to https://huggingface.co/spaces
63
+ - Click "Create new Space"
64
+ - Choose name and select **Gradio** SDK
65
 
66
+ 2. **Upload Files**
67
+ - Upload `app.py`, `requirements.txt`, and `README.md`
68
+ - Or use Git:
69
+ ```bash
70
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
71
  cd YOUR_SPACE_NAME
72
  # Copy your files here
73
  git add .
74
  git commit -m "Initial commit"
75
  git push
76
+ ```
77
 
78
+ 3. **Wait for Build**
79
+ - Space will auto-deploy in 5-10 minutes
80
+ - Access at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
81
 
82
+ ### Enable GPU (Optional)
83
+ - Go to Space Settings β†’ Hardware
84
+ - Select GPU option for faster processing
85
+ - Restart the Space
86
 
87
+ ## πŸ› οΈ Models Used
88
 
89
+ | Model | Purpose | Size |
90
+ |-------|---------|------|
91
+ | [BLIP-Captioning](https://huggingface.co/Salesforce/blip-image-captioning-base) | Image Description | 447MB |
92
+ | [BLIP-VQA](https://huggingface.co/Salesforce/blip-vqa-base) | Visual Q&A | 447MB |
93
+ | [CLIP](https://huggingface.co/openai/clip-vit-base-patch32) | Classification | 605MB |
94
 
95
+ ## πŸ“– Usage Examples
96
 
97
+ ### Image Captioning
 
 
 
 
 
 
 
98
  Upload an image β†’ Click "Generate Caption" β†’ Get description
99
+
100
+ **Example Output:**
101
+ ```
102
  πŸ“ Image Caption:
103
  A golden retriever sitting in a park with green grass
104
+ ```
105
+
106
+ ### Visual Question Answering
107
  Upload image β†’ Ask question β†’ Get answer
108
+
109
+ **Example:**
110
+ ```
111
  Q: What color is the car?
112
  A: red
113
+ ```
114
+
115
+ ### Zero-Shot Classification
116
  Upload image β†’ Define categories (comma-separated) β†’ Get probabilities
117
+
118
+ **Example:**
119
+ ```
120
  Categories: cat, dog, bird
121
  Results:
122
  cat: 92.5% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
123
  dog: 5.2% β–ˆ
124
  bird: 2.3% β–Œ
125
+ ```
126
+
127
+ ### Multimodal Chat
128
  Upload image β†’ Chat naturally about it
129
+
130
+ **Example:**
131
+ ```
132
  You: Describe this image
133
  AI: A modern kitchen with white cabinets
134
  You: What color are the walls?
135
  AI: white
136
+ ```
137
+
138
+ ## βš™οΈ Configuration
139
+
140
+ ### Change Models
141
+ Edit `app.py` to use different models:
142
+ ```python
143
+ # Use larger BLIP model
144
  caption_model = BlipForConditionalGeneration.from_pretrained(
145
  "Salesforce/blip-image-captioning-large"
146
  )
147
+ ```
148
+
149
+ ### Customize Interface
150
+ Modify `custom_css` in `app.py`:
151
+ ```python
152
+ custom_css = """
153
  #title {
154
  background: linear-gradient(90deg, #YOUR_COLOR 0%, #YOUR_COLOR 100%);
155
  }
156
  """
157
+ ```
158
+
159
+ ## πŸ› Troubleshooting
160
+
161
+ **Issue: Models downloading slowly**
162
+ ```bash
163
+ # Set cache directory
164
  export HF_HOME=/path/to/storage
165
+ ```
166
+
167
+ **Issue: Out of memory**
168
+ ```python
169
+ # Use CPU only
170
  device = "cpu"
171
+ ```
172
+
173
+ **Issue: Port already in use**
174
+ ```bash
175
+ python app.py --server-port 8080
176
+ ```
177
+
178
+ ## πŸ“„ License
179
 
180
+ MIT License - See [LICENSE](LICENSE) file
181
+
182
+ ## πŸ™ Acknowledgments
183
+
184
+ - [Salesforce BLIP](https://github.com/salesforce/BLIP)
185
+ - [OpenAI CLIP](https://github.com/openai/CLIP)
186
+ - [Hugging Face](https://huggingface.co/)
187
+ - [Gradio](https://gradio.app/)
188
+
189
+ ---
190
 
191
+ **⭐ Star this project if you find it helpful!**
192
 
 
193