PhantomOps / README.md
shanko585's picture
Update README.md
d78ee64 verified
|
Raw
History Blame Contribute Delete
3.88 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade
metadata
title: PhantomOps
emoji: πŸ‘»
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.32.0
app_file: app_hf.py
pinned: false
tags:
  - amd
  - amd-hackathon-2026
  - ai-agents
  - streamlit
short_description: Crash-test lab for AI agents β€” finding failures before users

πŸ‘» PhantomOps

The Crash Test Lab for AI Agents

Finds failures in your AI agents before your users do β€” then fixes them automatically.

Built on AMD Instinct MI300X using ROCm 7.2 and Qwen 2.5 from HuggingFace Hub.


The Problem

Every AI agent gets tested in a perfect world. Real users do not live in a perfect world.

You build an agent. You test it on clean inputs. You deploy it confidently. Then real users arrive with messy requests, incomplete information, contradictory instructions, and adversarial prompts. The agent fails β€” confidently, silently β€” and your user trusts the wrong answer.

Tools like LangSmith show you what broke after the fact. They do not prevent it. They do not fix it.

PhantomOps prevents it. Automatically.


3 Things No Other Tool Does

Personalized Chaos β€” Not generic benchmarks. PhantomOps studies YOUR specific agent first, then builds adversarial scenarios designed for exactly how your agent thinks and where it is most likely to break.

Reasoning Autopsy β€” Does not just flag failures. Traces the reasoning chain to find the exact step where logic broke down and identifies the root cause.

Auto-Patching β€” Rewrites the system prompt to fix the failure, tests the fix immediately, and verifies it works.


The 6 Agents

Agent What it does
πŸ” Fingerprint Agent Maps your agent's domain and predicts specific weaknesses
πŸ’₯ Chaos Generator Builds 5 personalized adversarial scenarios for YOUR agent
🎯 Target Runner Runs all scenarios on AMD MI300X simultaneously
πŸ”¬ Reasoning Autopsy Finds root cause β€” not just what failed, but why
πŸ”§ Patch Agent Rewrites prompts automatically and verifies fixes work
πŸ“ˆ Drift Detector Monitors behavior so agents never silently degrade

Why AMD MI300X

Running 6 agents simultaneously with a local model in real time requires serious compute. Cloud APIs impose rate limits that make parallel simulation impossible and send your private agent data to external servers.

AMD MI300X with ROCm runs everything locally β€” fast, private, and practical for continuous monitoring.

The hardware is not just the platform. It is the reason this works.


Quick Start

git clone https://github.com/YOUR_USERNAME/phantomops
cd phantomops
python3 -m venv venv
source venv/bin/activate
pip install torch --index-url https://download.pytorch.org/whl/rocm6.0
pip install transformers==4.37.0 accelerate streamlit huggingface_hub sentencepiece
hf auth login
export PYTHONPATH=$(pwd)
python main.py
streamlit run ui/app.py --server.port 8501

Requires AMD GPU with ROCm. HuggingFace account needed to pull Qwen 2.5.


Stack

Component Technology
GPU AMD Instinct MI300X
GPU Software ROCm 7.2
Model Qwen 2.5 1.5B from HuggingFace Hub
Framework Python + Transformers
UI Streamlit

About This Space

This Space displays real results generated by running the full PhantomOps pipeline on AMD MI300X hardware. The full pipeline requires a GPU. Source code on GitHub.


AMD Developer Hackathon 2026 β€” Track 1: AI Agents and Agentic Workflows