kkkai123456 commited on
Commit
1ca85db
Β·
verified Β·
1 Parent(s): 434a1b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -114
README.md CHANGED
@@ -50,13 +50,13 @@ Interactive conversations about image content with context retention.
50
  ## πŸ“Έ Demo Screenshots
51
 
52
  ### Image Captioning
53
- ![Image Captioning](source/image%20(1).png)
54
 
55
  ### Visual Question Answering
56
- ![Visual Question Answering](source/image%20(1).png)
57
 
58
  ### Zero-Shot Classification
59
- ![Zero-Shot Classification](source/image%20(1).png)
60
 
61
  ### Multimodal Chat
62
  ![Multimodal Chat](source/image%20(1).png)
@@ -76,7 +76,6 @@ Access at `http://localhost:7860`
76
 
77
  ### Deploy to Hugging Face Spaces
78
 
79
- #### Method 1: Web Interface
80
  1. Go to https://huggingface.co/spaces
81
  2. Click **"Create new Space"**
82
  3. Fill in:
@@ -91,28 +90,7 @@ Access at `http://localhost:7860`
91
  - `source/` folder (with screenshots)
92
  5. Space will auto-deploy in 5-10 minutes
93
 
94
- #### Method 2: Git
95
- ```bash
96
- # Clone your space repository
97
- git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
98
- cd YOUR_SPACE_NAME
99
-
100
- # Copy your files
101
- cp app.py requirements.txt README.md ./
102
- cp -r source ./
103
-
104
- # Push to Hugging Face
105
- git add .
106
- git commit -m "Initial commit"
107
- git push
108
- ```
109
-
110
- #### Enable GPU (Optional)
111
- 1. Go to **Settings** β†’ **Hardware**
112
- 2. Select **GPU** option
113
- 3. Restart the Space
114
 
115
- GPU provides 10-50x faster processing and better user experience.
116
 
117
  ## πŸ› οΈ Models Used
118
 
@@ -222,80 +200,6 @@ building: 0.00%
222
  - Build on previous responses
223
  - Keep questions related to the image
224
 
225
- ---
226
-
227
- ## βš™οΈ Advanced Configuration
228
-
229
- ### Change Models
230
- Edit `app.py` to use different models:
231
-
232
- ```python
233
- # Use larger BLIP model for better quality
234
- caption_model = BlipForConditionalGeneration.from_pretrained(
235
- "Salesforce/blip-image-captioning-large" # 990MB, better quality
236
- )
237
-
238
- # Use larger CLIP model
239
- clip_model = CLIPModel.from_pretrained(
240
- "openai/clip-vit-large-patch14" # 1.7GB, more accurate
241
- )
242
- ```
243
-
244
- ### Customize Interface Style
245
- Modify `custom_css` in `app.py`:
246
-
247
- ```python
248
- custom_css = """
249
- #title {
250
- background: linear-gradient(90deg, #FF6B6B 0%, #4ECDC4 100%);
251
- font-size: 3.5em;
252
- }
253
- """
254
- ```
255
-
256
- ### Adjust Generation Parameters
257
- Control model behavior:
258
-
259
- ```python
260
- # Generate longer captions
261
- out = caption_model.generate(**inputs, max_length=100)
262
-
263
- # More accurate but slower VQA
264
- out = vqa_model.generate(**inputs, max_length=50, num_beams=5)
265
- ```
266
-
267
- ## πŸ› Troubleshooting
268
-
269
- ### Common Issues
270
-
271
- **Models downloading slowly**
272
- ```bash
273
- # Set cache directory to a location with more space
274
- export HF_HOME=/path/to/large/storage
275
- python app.py
276
- ```
277
-
278
- **Out of memory error**
279
- ```python
280
- # Add at the start of app.py
281
- import torch
282
- torch.cuda.empty_cache()
283
-
284
- # Or force CPU usage
285
- device = "cpu"
286
- ```
287
-
288
- **Port already in use**
289
- ```bash
290
- # Use different port
291
- python app.py --server-port 8080
292
- ```
293
-
294
- **Space build failing**
295
- - Check `requirements.txt` for correct package versions
296
- - Verify all files are uploaded correctly
297
- - Check build logs in Space settings
298
-
299
  ### Getting Help
300
  - πŸ“– [Gradio Documentation](https://gradio.app/docs/)
301
  - πŸ€— [Hugging Face Forums](https://discuss.huggingface.co/)
@@ -324,7 +228,6 @@ MIT License - See [LICENSE](LICENSE) file for details.
324
  - **BLIP**: BSD-3-Clause License
325
  - **CLIP**: MIT License
326
 
327
- All models are free for commercial use.
328
 
329
  ## πŸ™ Acknowledgments
330
 
@@ -334,18 +237,5 @@ Built with amazing open-source projects:
334
  - [Hugging Face Transformers](https://huggingface.co/docs/transformers) - Model hub and inference
335
  - [Gradio](https://gradio.app/) - Beautiful web interfaces
336
 
337
- ## πŸ”— Links
338
-
339
- - **Live Demo**: [Your Space URL]
340
- - **GitHub Repository**: [Your Repo URL]
341
- - **Report Issues**: [GitHub Issues]
342
-
343
- ---
344
-
345
- <div align="center">
346
-
347
- **⭐ If you find this project helpful, please star it! ⭐**
348
-
349
- Made with ❀️ by the open-source community
350
 
351
- </div>
 
50
  ## πŸ“Έ Demo Screenshots
51
 
52
  ### Image Captioning
53
+ ![Image Captioning](source/image%20(4).png)
54
 
55
  ### Visual Question Answering
56
+ ![Visual Question Answering](source/image%20(3).png)
57
 
58
  ### Zero-Shot Classification
59
+ ![Zero-Shot Classification](source/image%20(2).png)
60
 
61
  ### Multimodal Chat
62
  ![Multimodal Chat](source/image%20(1).png)
 
76
 
77
  ### Deploy to Hugging Face Spaces
78
 
 
79
  1. Go to https://huggingface.co/spaces
80
  2. Click **"Create new Space"**
81
  3. Fill in:
 
90
  - `source/` folder (with screenshots)
91
  5. Space will auto-deploy in 5-10 minutes
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
 
94
 
95
  ## πŸ› οΈ Models Used
96
 
 
200
  - Build on previous responses
201
  - Keep questions related to the image
202
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
203
  ### Getting Help
204
  - πŸ“– [Gradio Documentation](https://gradio.app/docs/)
205
  - πŸ€— [Hugging Face Forums](https://discuss.huggingface.co/)
 
228
  - **BLIP**: BSD-3-Clause License
229
  - **CLIP**: MIT License
230
 
 
231
 
232
  ## πŸ™ Acknowledgments
233
 
 
237
  - [Hugging Face Transformers](https://huggingface.co/docs/transformers) - Model hub and inference
238
  - [Gradio](https://gradio.app/) - Beautiful web interfaces
239
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
 
241
+ ---