Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: green
|
| 5 |
colorTo: gray
|
| 6 |
sdk: gradio
|
|
@@ -11,3 +11,30 @@ license: apache-2.0
|
|
| 11 |
tag: agent-demo-track
|
| 12 |
---
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: ScouterAI
|
| 3 |
+
emoji: 👓
|
| 4 |
colorFrom: green
|
| 5 |
colorTo: gray
|
| 6 |
sdk: gradio
|
|
|
|
| 11 |
tag: agent-demo-track
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# ScouterAI - The Vision enhanced Agent
|
| 15 |
+
|
| 16 |
+
Welcome to ScouterAI, my [Agents - MCP Hackathon](https://huggingface.co/Agents-MCP-Hackathon) submission.
|
| 17 |
+
This app falls under the track 3 : Agentic Demo.
|
| 18 |
+
The goal of the app is to demonstrate the capabilities of agentic llm's combined with more "traditional" deep learning computer vision.
|
| 19 |
+
LLM's (and VLM's) are great models when it comes to interacting with the user and understanding its queries but are not (yet) capable of a precise perception of the images presented to them.
|
| 20 |
+
Computer Vision models like object detection or image segmentation models are tailored models to accomplish these tasks but require some engineering to wrap them and be user ready.
|
| 21 |
+
The idea of the agentic demo is to provide powerful LLM with access to expert vision models like object detection or image segmentation models.
|
| 22 |
+
The agent can fulfill precise perception task on any object present in the image : detection, location, classification, masking, counting, etc...
|
| 23 |
+
|
| 24 |
+
##
|
| 25 |
+
|
| 26 |
+
In this preliminary app, the agent is a CodeAgent (provided by the smolagents framework) provided with access to a set of tools :
|
| 27 |
+
- Any object detection and image segmentation models available of HuggingFace
|
| 28 |
+
- Image processing functions
|
| 29 |
+
- Image annotation functions
|
| 30 |
+
|
| 31 |
+
To complete a user request
|
| 32 |
+
|
| 33 |
+
## Use-cases
|
| 34 |
+
|
| 35 |
+
## Stack
|
| 36 |
+
|
| 37 |
+
Agent framework : smolagents
|
| 38 |
+
LLM : Anthropic
|
| 39 |
+
Compute : Modal
|
| 40 |
+
|