Spaces:
Runtime error
Runtime error
| title: ScouterAI | |
| emoji: 👓 | |
| colorFrom: green | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 5.33.0 | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| tag: agent-demo-track | |
| # ScouterAI - The Vision enhanced Agent | |
| Welcome to ScouterAI, my [Agents - MCP Hackathon](https://huggingface.co/Agents-MCP-Hackathon) submission. | |
| This app falls under the track 3 : Agentic Demo. | |
| The goal of the app is to demonstrate the capabilities of agentic llm's combined with more "traditional" deep learning computer vision. | |
| LLM's (and VLM's) are great models when it comes to interacting with the user and understanding its queries but are not (yet) capable of a precise perception of the images presented to them. | |
| Computer Vision models like object detection or image segmentation models are tailored models to accomplish these tasks but require some engineering to wrap them and be user ready. | |
| The idea of the agentic demo is to provide powerful LLM with access to expert vision models like object detection or image segmentation models. | |
| The agent can fulfill precise perception task on any object present in the image : detection, location, classification, masking, counting, etc... | |
| ## | |
| In this preliminary app, the agent is a CodeAgent (provided by the smolagents framework) provided with access to a set of tools : | |
| - Any object detection and image segmentation models available of HuggingFace | |
| - Image processing functions | |
| - Image annotation functions | |
| To complete a user request | |
| ## Use-cases | |
| ## Stack | |
| Agent framework : smolagents | |
| LLM : Anthropic | |
| Compute : Modal | |