File size: 2,427 Bytes
e33780b
 
874c611
 
 
e33780b
70d08b0
e33780b
 
 
0bc6e46
874c611
70d08b0
e33780b
 
ea44aca
2758067
0bc6e46
 
2758067
0bc6e46
484cbfc
2758067
0bc6e46
 
2758067
ea44aca
 
 
2758067
 
0bc6e46
2758067
0bc6e46
 
 
 
 
 
 
2758067
 
 
 
 
0bc6e46
2758067
0bc6e46
 
 
 
 
 
 
 
 
 
2758067
 
 
 
 
0bc6e46
2758067
0bc6e46
2758067
0bc6e46
 
2758067
 
 
0bc6e46
2758067
0bc6e46
 
2758067
0bc6e46
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: CV MCP Server
emoji: 💻
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: mit
short_description: Real-time CV VLM MCP Server for MCP 1st Birthday Hackathon
tags:
- building-mcp-track-creative
---

# 🎥 CV MCP Server

A **Model Context Protocol (MCP) server** that provides **real-time scene analysis** for webcam images.  
This Space allows users to stream live video feeds and get detailed insights about the environment, objects, humans, and more.  

Check out the Hackathon details [here](https://huggingface.co/MCP-1st-Birthday).  
The social media post of this MCP Hackthon project on [Discord](https://discord.com/channels/879548962464493619/1439001549492719726/1443045145284051084) [Instagram](https://www.instagram.com/p/DRsw2KOADrB/) [Thread](https://www.threads.com/@oppa.ai_the.one.and.only/post/DRsxlNzAdCj?xmt=AQF0fVYU0qfeEUT4nDojv48yYZmjtK6tCrMx3sehnhVyOw).  

🎬 Watch a demo of the CV MCP Server analyzing a robot’s camera feed:  
[Demo Video](https://photos.app.goo.gl/guxui1EsdPNoL4mw7)  

GitHub repo of the cv robot python script:
https://github.com/OppaAI/CV_Robot_MCP

---

## 🌟 Key Features

- Real-time scene description  
- Human detection  
- Animal and object detection  
- Environment classification (indoor/outdoor)  
- Lighting condition analysis  
- Hazards identification  
- Optimized for **context window efficiency**  

---

## 🔧 How It Works

This Space leverages the MCP server to analyze images captured from your webcam:

1. Stream an image from your webcam.  
2. The image is sent to the **MCP server**.  
3. The server processes it using the **Model Context Protocol**.  
4. Outputs are returned and displayed in the UI, including:
   - General scene description  
   - Detected humans  
   - Detected animals and objects  
   - Environment type (indoor/outdoor)  
   - Lighting condition  
   - Hazards  

---

## ⚡ Demo

Compatible with **PC, mobile, and robots with cameras**.  

Stream images via your webcam or phone camera to receive real-time scene analysis.  

Watch a demo video of the CV MCP Server analyzing the video feed from my robot:  
[Demo Video](https://photos.app.goo.gl/guxui1EsdPNoL4mw7)  

---

## 🔑 Requirements

- A valid **Hugging Face API Token** is required.  
- If running locally, set your token as an environment variable:  

```bash
export HF_TOKEN=your_huggingface_token