Spaces:

lablab-ai-amd-developer-hackathon
/

PhantomOps

Sleeping

App Files Files Community

PhantomOps / README.md

shanko585

Update README.md

d78ee64 verified about 2 months ago

preview code

Raw

History Blame Contribute Delete

3.88 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade

metadata

title: PhantomOps
emoji: 👻
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.32.0
app_file: app_hf.py
pinned: false
tags:
  - amd
  - amd-hackathon-2026
  - ai-agents
  - streamlit
short_description: Crash-test lab for AI agents — finding failures before users

👻 PhantomOps

The Crash Test Lab for AI Agents

Finds failures in your AI agents before your users do — then fixes them automatically.

Built on AMD Instinct MI300X using ROCm 7.2 and Qwen 2.5 from HuggingFace Hub.

The Problem

Every AI agent gets tested in a perfect world. Real users do not live in a perfect world.

You build an agent. You test it on clean inputs. You deploy it confidently. Then real users arrive with messy requests, incomplete information, contradictory instructions, and adversarial prompts. The agent fails — confidently, silently — and your user trusts the wrong answer.

Tools like LangSmith show you what broke after the fact. They do not prevent it. They do not fix it.

PhantomOps prevents it. Automatically.

3 Things No Other Tool Does

Personalized Chaos — Not generic benchmarks. PhantomOps studies YOUR specific agent first, then builds adversarial scenarios designed for exactly how your agent thinks and where it is most likely to break.

Reasoning Autopsy — Does not just flag failures. Traces the reasoning chain to find the exact step where logic broke down and identifies the root cause.

Auto-Patching — Rewrites the system prompt to fix the failure, tests the fix immediately, and verifies it works.

The 6 Agents

Agent	What it does
🔍 Fingerprint Agent	Maps your agent's domain and predicts specific weaknesses
💥 Chaos Generator	Builds 5 personalized adversarial scenarios for YOUR agent
🎯 Target Runner	Runs all scenarios on AMD MI300X simultaneously
🔬 Reasoning Autopsy	Finds root cause — not just what failed, but why
🔧 Patch Agent	Rewrites prompts automatically and verifies fixes work
📈 Drift Detector	Monitors behavior so agents never silently degrade

Why AMD MI300X

Running 6 agents simultaneously with a local model in real time requires serious compute. Cloud APIs impose rate limits that make parallel simulation impossible and send your private agent data to external servers.

AMD MI300X with ROCm runs everything locally — fast, private, and practical for continuous monitoring.

The hardware is not just the platform. It is the reason this works.

Quick Start

git clone https://github.com/YOUR_USERNAME/phantomops
cd phantomops
python3 -m venv venv
source venv/bin/activate
pip install torch --index-url https://download.pytorch.org/whl/rocm6.0
pip install transformers==4.37.0 accelerate streamlit huggingface_hub sentencepiece
hf auth login
export PYTHONPATH=$(pwd)
python main.py
streamlit run ui/app.py --server.port 8501

Requires AMD GPU with ROCm. HuggingFace account needed to pull Qwen 2.5.

Stack

Component	Technology
GPU	AMD Instinct MI300X
GPU Software	ROCm 7.2
Model	Qwen 2.5 1.5B from HuggingFace Hub
Framework	Python + Transformers
UI	Streamlit

About This Space

This Space displays real results generated by running the full PhantomOps pipeline on AMD MI300X hardware. The full pipeline requires a GPU. Source code on GitHub.

AMD Developer Hackathon 2026 — Track 1: AI Agents and Agentic Workflows