A newer version of the Streamlit SDK is available: 1.58.0
title: PhantomOps
emoji: π»
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.32.0
app_file: app_hf.py
pinned: false
tags:
- amd
- amd-hackathon-2026
- ai-agents
- streamlit
short_description: Crash-test lab for AI agents β finding failures before users
π» PhantomOps
The Crash Test Lab for AI Agents
Finds failures in your AI agents before your users do β then fixes them automatically.
Built on AMD Instinct MI300X using ROCm 7.2 and Qwen 2.5 from HuggingFace Hub.
The Problem
Every AI agent gets tested in a perfect world. Real users do not live in a perfect world.
You build an agent. You test it on clean inputs. You deploy it confidently. Then real users arrive with messy requests, incomplete information, contradictory instructions, and adversarial prompts. The agent fails β confidently, silently β and your user trusts the wrong answer.
Tools like LangSmith show you what broke after the fact. They do not prevent it. They do not fix it.
PhantomOps prevents it. Automatically.
3 Things No Other Tool Does
Personalized Chaos β Not generic benchmarks. PhantomOps studies YOUR specific agent first, then builds adversarial scenarios designed for exactly how your agent thinks and where it is most likely to break.
Reasoning Autopsy β Does not just flag failures. Traces the reasoning chain to find the exact step where logic broke down and identifies the root cause.
Auto-Patching β Rewrites the system prompt to fix the failure, tests the fix immediately, and verifies it works.
The 6 Agents
| Agent | What it does |
|---|---|
| π Fingerprint Agent | Maps your agent's domain and predicts specific weaknesses |
| π₯ Chaos Generator | Builds 5 personalized adversarial scenarios for YOUR agent |
| π― Target Runner | Runs all scenarios on AMD MI300X simultaneously |
| π¬ Reasoning Autopsy | Finds root cause β not just what failed, but why |
| π§ Patch Agent | Rewrites prompts automatically and verifies fixes work |
| π Drift Detector | Monitors behavior so agents never silently degrade |
Why AMD MI300X
Running 6 agents simultaneously with a local model in real time requires serious compute. Cloud APIs impose rate limits that make parallel simulation impossible and send your private agent data to external servers.
AMD MI300X with ROCm runs everything locally β fast, private, and practical for continuous monitoring.
The hardware is not just the platform. It is the reason this works.
Quick Start
git clone https://github.com/YOUR_USERNAME/phantomops
cd phantomops
python3 -m venv venv
source venv/bin/activate
pip install torch --index-url https://download.pytorch.org/whl/rocm6.0
pip install transformers==4.37.0 accelerate streamlit huggingface_hub sentencepiece
hf auth login
export PYTHONPATH=$(pwd)
python main.py
streamlit run ui/app.py --server.port 8501
Requires AMD GPU with ROCm. HuggingFace account needed to pull Qwen 2.5.
Stack
| Component | Technology |
|---|---|
| GPU | AMD Instinct MI300X |
| GPU Software | ROCm 7.2 |
| Model | Qwen 2.5 1.5B from HuggingFace Hub |
| Framework | Python + Transformers |
| UI | Streamlit |
About This Space
This Space displays real results generated by running the full PhantomOps pipeline on AMD MI300X hardware. The full pipeline requires a GPU. Source code on GitHub.
AMD Developer Hackathon 2026 β Track 1: AI Agents and Agentic Workflows