Spaces:
Runtime error
Runtime error
Refine short description in README.md to clarify the agent's capability of using over 9000 vision models from the HF Hub.
2e7c961
| title: ScouterAI | |
| emoji: 👓 | |
| colorFrom: green | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 5.33.0 | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| tag: agent-demo-track | |
| short_description: The agent using over 9000 vision models from the HF Hub. | |
| # ScouterAI - The Vision enhanced Agent | |
| Welcome to ScouterAI, my [Agents - MCP Hackathon](https://huggingface.co/Agents-MCP-Hackathon) submission. | |
| This app falls under the track 3 : Agentic Demo. | |
| The goal of the app is to demonstrate the capabilities of agentic llm's combined with more "traditional" deep learning computer vision. | |
| LLM's (and VLM's) are great models when it comes to interacting with the user and understanding its queries but are not (yet) capable of a precise perception of the images presented to them. | |
| Computer Vision models like object detection or image segmentation models are tailored models to accomplish these tasks but require some engineering to wrap them and be user ready. | |
| The idea of the agentic demo is to provide powerful LLM with access to expert vision models like object detection or image segmentation models. | |
| The agent can fulfill precise perception task on any object present in the image : detection, location, classification, masking, counting, etc... | |
| ## Overview | |
| In this preliminary app, the agent is a CodeAgent provided by the smolagents framework. | |
| Its interface consists of a chat interface with example and a gallery which is used to display the agent's work. | |
| The agent is provided with a set of tools : | |
| - Task model retriever : a RAG tool which, given a task (object-detection or image-segmentation) and a query (car e.g.), returns a list of models with their model id and the list of classes it is capable of detecting/segmenting. The list if based on a curated dataset of all the models available on the HuggingFace Hub, returns the mo | |
| - Computer vision models : Any object detection and image segmentation models available of HuggingFace | |
| - Image processing functions : Resizing, cropping, ... | |
| - Image annotation functions : Label, bounding box and mask annotators | |
| To complete a user request | |
| ## Use-cases | |
| ## Stack | |
| Agent framework : smolagents | |
| LLM : Anthropic | |
| Compute : Modal |