Spaces:
Running
Running
| title: Potato - Agent Trace Evaluation Demo | |
| emoji: 🥔 | |
| colorFrom: yellow | |
| colorTo: yellow | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - annotation | |
| - evaluation | |
| - agent-traces | |
| - potato | |
| # Potato Agent Trace Evaluation Demo | |
| Live demo of [Potato](https://github.com/davidjurgens/potato), the portable text annotation tool for NLP research. | |
| This Space showcases **agent trace evaluation** — annotating AI agent execution traces to assess: | |
| - Task completion success | |
| - Efficiency of the agent's approach | |
| - Side effects and safety concerns | |
| - Error taxonomy (MAST framework) | |
| - Span-level hallucination marking | |
| ## Try It Out | |
| 1. Enter any username to log in (no password required) | |
| 2. Read the agent trace showing the AI agent's reasoning and actions | |
| 3. Annotate using the schemas on the right panel | |
| 4. Click Next to proceed to the next trace | |
| ## About Potato | |
| Potato is a free, open-source annotation platform supporting 20+ annotation types, AI-assisted annotation, quality control, and more. [Learn more on GitHub](https://github.com/davidjurgens/potato). | |