Spaces:
Sleeping
Sleeping
File size: 12,555 Bytes
08c0cf7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 | ---
title: NEXON-AI
emoji: ๐ก๏ธ
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---
<!-- LAST_SYNC_VERIFICATION: 2026-04-08 00:07:00 -->
# NEXUS-AI ๐๐ก๏ธ
### Autonomous Incident Investigation Dashboard
<div align="center">





**Status:** Active Simulation Pipeline
**Architecture:** Real-time WebSockets + Multi-Agent Consensus
</div>
---
## ๐ What is NEXUS-AI?
NEXUS is a next-generation, autonomous dual-agent environment designed to investigate and validate software incidents in real-time. Using a combination of an **Investigator** and a **Validator** agent, NEXUS autonomously forms hypotheses, executes systems tools, evaluates system behavior, and reaches strict consensus on root causes.
Traditional manual debugging requires extensive context-switching and tool fatigue. NEXUS solves this through:
1. **Dual-Agent Autonomy**: Two specialized models communicating word-by-word via WebSockets.
2. **Dynamic Tool Execution**: Fully integrated system terminals allowing agents to run sandboxed validation scripts.
3. **Semantic Reward Engine**: Evaluates conversational drift mathematically (using native GPU embeddings).
The result: An AI "Incident Response Team" that navigates servers, traces logs, and fixes bugs identically to a human SRE.
---
## ๐ผ๏ธ Application Screenshots
### ๐ Simulation Dashboard
> The core command center. Features live agent terminals, a dual-communication consensus log, and a mathematical performance reward graph plotting investigation confidence.
<div align="center">
<img src="./assets/screenshots/Dashboard.png" alt="Simulation Dashboard" width="90%"/>
</div>
---
## ๐๏ธ Scenario Registry & Core Settings
> The system is architected for instant adaptability โ seamlessly switch LLM providers and inject custom threat models entirely through the frontend DOM.
<table>
<tr>
<td align="center" width="50%">
<img src="./assets/screenshots/Scenarios.png" alt="Scenario Browser"/>
<br/><b>Scenario Registry</b>
<br/><sub>A persistent LocalStorage-backed grid of tactical simulations. Users can dynamically inject custom infrastructure-specific incidents directly into the agent pipeline.</sub>
</td>
<td align="center" width="50%">
<img src="./assets/screenshots/Settings.png" alt="Hardware Configuration"/>
<br/><b>Runtime Configuration</b>
<br/><sub>Dynamically maps available locally-installed Ollama networks, allowing the user to pair models (e.g., Qwen vs Dolphin-Phi) with fully independent parameters.</sub>
</td>
</tr>
</table>
---
## ๐๏ธ System Architecture
```text
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CLIENT BROWSER โ
โ React SPA (Tailwind + Framer Motion) โ
โ localhost:5173 โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ HTTP (REST) โ ws://
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FASTAPI BACKEND (localhost:7860) โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ /config โ โ/scenariosโ โ /reset โ โ ws:// Simulator โ โ
โ โ Env Sync โ โ DB Cache โ โ Injectionโ โ Live Stream Syncโ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OLLAMA ENGINE / LLM PIPELINE โ
โ Agent A (Investigator) โโโโโโโโบ Agent B (Validator) โ
โ - Generates Hypotheses - Challenges Assertions โ
โ - Runs System Tools - Requires Proof โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
---
## ๐ Execution Environments
NEXUS-AI supports two distinct execution models for agent tools, toggleable via the **Settings** dashboard:
### 1. Simulated Mode (Safe Sandbox)
* **Default Mode**: Agents interact with a pre-defined `clue_map` within the scenario YAML.
* **No System Impact**: Commands like `read_logs` or `check_service` return mocked data.
* **Use Case**: Training, logic validation, and "what-if" analysis without infrastructure risk.
### 2. SSH Lab Node (Real-World Execution)
* **Live Connection**: Commands are executed in real-time on a remote Linux server via SSH.
* **Autonomous Terminal**: Agents use the `run_terminal_command` tool to browse logs, check systemd status, and inspect real configs.
* **Security**: Includes a command blocklist to prevent highly destructive operations (e.g., `rm -rf /`).
* **Use Case**: Actual incident response on isolated Lab/Staging nodes.
---
## ๐ OpenEnv Specification
NEXUS-AI strictly adheres to the **OpenEnv 1.0** standard for agent-environment interaction.
### ๐ฎ Action Space
The environment accepts a typed **NexusAction** (Text-based with structured tool calls).
- **agent_id**: `string` ("agent_a" or "agent_b")
- **message**: `string` (The natural language reasoning/communication)
- **tool_calls**: `List[ToolCall]` (Optional structured calls like `TOOL: read_logs(file='app.log')`)
- **confidence**: `float` (0.0 - 1.0)
### ๐ง Observation Space
The environment returns a structured **NexusObservation** summarizing the system state.
- **scenario_description**: `string` (High-level objective)
- **scenario_context**: `string` (Background telemetry/environment info)
- **partner_message**: `string` (The last message from the other agent)
- **tool_results**: `List[ToolResult]` (Output of any executed system tools)
- **clues_found**: `List[string]` (Accumulated evidence identified by the Reward Engine)
- **investigation_stage**: `string` (`investigating`, `narrowing`, `found`, `verified`)
- **round**: `integer` (Current episode round)
- **available_tools**: `List[string]` (List of permitted tools for the current mode)
### ๐ Task Registry & Difficulty
| Task Name | Difficulty | Objective | Grader Method |
|---|---|---|---|
| `software-incident` | **Easy** | Fix Nginx 503 rate-limit misconfiguration | State Check: `nginx-proxy.rate_limit` |
| `business-process-failure` | **Medium** | Resolve inventory stockout logic error | State Check: `stock_threshold` + Red Herring Penalty |
| `cascade-system-failure` | **Hard** | Fix Postgres connection exhaustion | Multi-Step: Query Termination + Config Update |
### ๐ Baseline Benchmarks
Validated using `inference.py` (Phi-3-mini & Qwen2.5-1.5B).
- **Software Incident**: 0.88 / 1.00
- **Business Process Failure**: 0.72 / 1.00
- **Cascade System Failure**: 0.48 / 1.00
---
## ๐ง The AI Pipeline Deep-Dive
### Step 1: Scenario Injection & Bootstrapping
```python
# The EpisodeManager receives the frontend custom scenario JSON
# Broadcasts 'episode_start' natively over the WebSocket to synchronize the UI
await broadcast("episode_start", {
"scenario": active_scenario,
"agent_a_model": settings.AGENT_A_MODEL
})
```
### Step 2: Agent Consensus Loop
```python
# Agents interact sequentially. The Investigator attempts a solution
# while the Validator challenges it. Both agents have access to dynamic system execution.
client, model_name = model_manager.get_client(agent_id)
stream = await client.chat.completions.create(
model=model_name,
messages=injected_history,
tools=available_tools, # e.g. fix_proposer, run_terminal_command
stream=True
)
```
### Step 3: Fast GPU Embeddings (Similarity Evaluation)
```python
# Heavy CPU blocking is completely bypassed.
# Semantic embedding computations map strictly into the Ollama GPU pipeline.
@lru_cache(maxsize=256)
def get_embedding(text: str) -> List[float]:
response = httpx.post("http://localhost:11434/api/embeddings", json={
"model": "all-minilm",
"prompt": text
}, timeout=60.0)
return response.json().get("embedding", [])
```
---
## ๐ ๏ธ Full Technology Stack
| Layer | Technology | Why |
|---|---|---|
| Frontend Framework | React 18 (Vite) | Lightning fast HMR, component isolation |
| Frontend Styling | Tailwind CSS | Utility-first tactical glassmorphism |
| Backend Framework | FastAPI | Async Python, explicit endpoint mapping |
| Transport Layer | WebSockets | Word-by-word streaming across UI boundaries |
| Local AI Engine | Ollama | Native device acceleration, absolute privacy |
| Remote Provider | HuggingFace Inference API | Drop-in SaaS alternatives |
| SSH Connectivity | Paramiko | Secure remote shell execution for Lab Nodes |
| Data Persistence | LocalStorage & `.env` Injection | Avoids over-architected SQL constraints |
---
## ๐ How to Run This Project (Full Step-by-Step Guide)
### ๐ Prerequisites
- Python 3.10+
- Node.js 18+
- [Ollama](https://ollama.com/) (installed locally for model hosting)
- **Optional**: A remote Linux VM (Ubuntu/Kali) with SSH enabled for Lab Node mode
---
### 1๏ธโฃ Backend Setup (FastAPI / Python)
```bash
cd backend
# Create and activate virtual environment
python -m venv venv
# source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
# Install all dependencies
pip install -r requirements.txt
```
#### Start the Backend Engine
```bash
# This exposes the core REST API and the WebSocket simulation tunnel
python main.py
```
---
### 2๏ธโฃ Frontend Setup (React)
Open a **new terminal tab**:
```bash
cd frontend
# Install Node.js dependencies
npm install
# Start the Vite development server
npm run dev
```
The application is now fully accessible at [http://localhost:5173](http://localhost:5173).
---
### 3๏ธโฃ Pulling Models
To run the simulation locally without cloud API keys, you must ensure you pull suitable reasoning models through Ollama:
```bash
ollama run qwen2.5:3b # Excellent validator logic footprint
ollama run dolphin-llama3 # Uncensored investigative assertions
ollama pull all-minilm # Mandatory for semantic similarity scoring
```
---
## ๐งช Automated Testing
NEXUS-AI includes a comprehensive test suite to ensure environment stability and specification compliance.
```bash
# Run the OpenEnv specification validator
python openenv_validator.py
# Run unit tests for core logic
pip install pytest
pytest tests/
```
---
## ๐ค Authors
**Developed by: Ashish Menon** & Vector
|