File size: 12,555 Bytes
08c0cf7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
---

title: NEXON-AI
emoji: ๐Ÿ›ก๏ธ
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---


<!-- LAST_SYNC_VERIFICATION: 2026-04-08 00:07:00 -->

# NEXUS-AI ๐ŸŒ๐Ÿ›ก๏ธ
### Autonomous Incident Investigation Dashboard

<div align="center">

![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)
![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-009688?style=for-the-badge&logo=fastapi&logoColor=white)
![React](https://img.shields.io/badge/React-18.x-61DAFB?style=for-the-badge&logo=react&logoColor=black)
![Tailwind](https://img.shields.io/badge/Tailwind_CSS-3.x-38B2AC?style=for-the-badge&logo=tailwind-css&logoColor=white)
![Ollama](https://img.shields.io/badge/Ollama-Local_LLM-000000?style=for-the-badge&logo=ollama)

**Status:** Active Simulation Pipeline  
**Architecture:** Real-time WebSockets + Multi-Agent Consensus

</div>

---

## ๐Ÿ“– What is NEXUS-AI?

NEXUS is a next-generation, autonomous dual-agent environment designed to investigate and validate software incidents in real-time. Using a combination of an **Investigator** and a **Validator** agent, NEXUS autonomously forms hypotheses, executes systems tools, evaluates system behavior, and reaches strict consensus on root causes.

Traditional manual debugging requires extensive context-switching and tool fatigue. NEXUS solves this through:
1. **Dual-Agent Autonomy**: Two specialized models communicating word-by-word via WebSockets.
2. **Dynamic Tool Execution**: Fully integrated system terminals allowing agents to run sandboxed validation scripts.
3. **Semantic Reward Engine**: Evaluates conversational drift mathematically (using native GPU embeddings).

The result: An AI "Incident Response Team" that navigates servers, traces logs, and fixes bugs identically to a human SRE.

---

## ๐Ÿ–ผ๏ธ Application Screenshots

### ๐Ÿ“Š Simulation Dashboard

> The core command center. Features live agent terminals, a dual-communication consensus log, and a mathematical performance reward graph plotting investigation confidence.

<div align="center">
  <img src="./assets/screenshots/Dashboard.png" alt="Simulation Dashboard" width="90%"/>
</div>

---

## ๐ŸŽ›๏ธ Scenario Registry & Core Settings

> The system is architected for instant adaptability โ€” seamlessly switch LLM providers and inject custom threat models entirely through the frontend DOM.

<table>
  <tr>
    <td align="center" width="50%">

      <img src="./assets/screenshots/Scenarios.png" alt="Scenario Browser"/>

      <br/><b>Scenario Registry</b>

      <br/><sub>A persistent LocalStorage-backed grid of tactical simulations. Users can dynamically inject custom infrastructure-specific incidents directly into the agent pipeline.</sub>

    </td>

    <td align="center" width="50%">

      <img src="./assets/screenshots/Settings.png" alt="Hardware Configuration"/>

      <br/><b>Runtime Configuration</b>

      <br/><sub>Dynamically maps available locally-installed Ollama networks, allowing the user to pair models (e.g., Qwen vs Dolphin-Phi) with fully independent parameters.</sub>

    </td>

  </tr>

</table>


---

## ๐Ÿ—๏ธ System Architecture

```text

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”

โ”‚                    CLIENT BROWSER                               โ”‚

โ”‚          React SPA (Tailwind + Framer Motion)                   โ”‚

โ”‚          localhost:5173                                         โ”‚

โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

            โ”‚ HTTP (REST)                     โ”‚ ws://

            โ–ผ                                 โ–ผ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”

โ”‚              FASTAPI BACKEND (localhost:7860)                   โ”‚

โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚

โ”‚  โ”‚ /config  โ”‚ โ”‚/scenariosโ”‚ โ”‚  /reset  โ”‚ โ”‚  ws:// Simulator โ”‚    โ”‚

โ”‚  โ”‚ Env Sync โ”‚ โ”‚ DB Cache โ”‚ โ”‚ Injectionโ”‚ โ”‚  Live Stream Syncโ”‚    โ”‚

โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚

โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

            โ”‚                                   โ”‚

            โ–ผ                                   โ–ผ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”

โ”‚                  OLLAMA ENGINE / LLM PIPELINE                   โ”‚

โ”‚  Agent A (Investigator)   โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ–บ   Agent B (Validator)        โ”‚

โ”‚  - Generates Hypotheses              - Challenges Assertions    โ”‚

โ”‚  - Runs System Tools                 - Requires Proof           โ”‚

โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

```

---

## ๐ŸŒ Execution Environments

NEXUS-AI supports two distinct execution models for agent tools, toggleable via the **Settings** dashboard:

### 1. Simulated Mode (Safe Sandbox)
*   **Default Mode**: Agents interact with a pre-defined `clue_map` within the scenario YAML.
*   **No System Impact**: Commands like `read_logs` or `check_service` return mocked data.
*   **Use Case**: Training, logic validation, and "what-if" analysis without infrastructure risk.

### 2. SSH Lab Node (Real-World Execution)
*   **Live Connection**: Commands are executed in real-time on a remote Linux server via SSH.
*   **Autonomous Terminal**: Agents use the `run_terminal_command` tool to browse logs, check systemd status, and inspect real configs.
*   **Security**: Includes a command blocklist to prevent highly destructive operations (e.g., `rm -rf /`).
*   **Use Case**: Actual incident response on isolated Lab/Staging nodes.

---

## ๐Ÿ“ OpenEnv Specification

NEXUS-AI strictly adheres to the **OpenEnv 1.0** standard for agent-environment interaction.

### ๐ŸŽฎ Action Space
The environment accepts a typed **NexusAction** (Text-based with structured tool calls).
- **agent_id**: `string` ("agent_a" or "agent_b")

- **message**: `string` (The natural language reasoning/communication)

- **tool_calls**: `List[ToolCall]` (Optional structured calls like `TOOL: read_logs(file='app.log')`)
- **confidence**: `float` (0.0 - 1.0)

### ๐Ÿง Observation Space
The environment returns a structured **NexusObservation** summarizing the system state.
- **scenario_description**: `string` (High-level objective)

- **scenario_context**: `string` (Background telemetry/environment info)
- **partner_message**: `string` (The last message from the other agent)

- **tool_results**: `List[ToolResult]` (Output of any executed system tools)
- **clues_found**: `List[string]` (Accumulated evidence identified by the Reward Engine)

- **investigation_stage**: `string` (`investigating`, `narrowing`, `found`, `verified`)
- **round**: `integer` (Current episode round)
- **available_tools**: `List[string]` (List of permitted tools for the current mode)



### ๐Ÿ“ Task Registry & Difficulty

| Task Name | Difficulty | Objective | Grader Method |

|---|---|---|---|

| `software-incident` | **Easy** | Fix Nginx 503 rate-limit misconfiguration | State Check: `nginx-proxy.rate_limit` |

| `business-process-failure` | **Medium** | Resolve inventory stockout logic error | State Check: `stock_threshold` + Red Herring Penalty |
| `cascade-system-failure` | **Hard** | Fix Postgres connection exhaustion | Multi-Step: Query Termination + Config Update |

### ๐Ÿ“ˆ Baseline Benchmarks
Validated using `inference.py` (Phi-3-mini & Qwen2.5-1.5B).
- **Software Incident**: 0.88 / 1.00
- **Business Process Failure**: 0.72 / 1.00
- **Cascade System Failure**: 0.48 / 1.00

---

## ๐Ÿง  The AI Pipeline Deep-Dive

### Step 1: Scenario Injection & Bootstrapping
```python

# The EpisodeManager receives the frontend custom scenario JSON

# Broadcasts 'episode_start' natively over the WebSocket to synchronize the UI

await broadcast("episode_start", {

    "scenario": active_scenario,

    "agent_a_model": settings.AGENT_A_MODEL

})

```

### Step 2: Agent Consensus Loop
```python

# Agents interact sequentially. The Investigator attempts a solution

# while the Validator challenges it. Both agents have access to dynamic system execution.

client, model_name = model_manager.get_client(agent_id)

stream = await client.chat.completions.create(

    model=model_name,

    messages=injected_history,

    tools=available_tools, # e.g. fix_proposer, run_terminal_command

    stream=True

)

```

### Step 3: Fast GPU Embeddings (Similarity Evaluation)
```python

# Heavy CPU blocking is completely bypassed.

# Semantic embedding computations map strictly into the Ollama GPU pipeline.

@lru_cache(maxsize=256)

def get_embedding(text: str) -> List[float]:

    response = httpx.post("http://localhost:11434/api/embeddings", json={

        "model": "all-minilm",

        "prompt": text

    }, timeout=60.0)

    return response.json().get("embedding", [])

```

---

## ๐Ÿ› ๏ธ Full Technology Stack

| Layer | Technology | Why |
|---|---|---|
| Frontend Framework | React 18 (Vite) | Lightning fast HMR, component isolation |
| Frontend Styling | Tailwind CSS | Utility-first tactical glassmorphism |
| Backend Framework | FastAPI | Async Python, explicit endpoint mapping |
| Transport Layer | WebSockets | Word-by-word streaming across UI boundaries |
| Local AI Engine | Ollama | Native device acceleration, absolute privacy |
| Remote Provider | HuggingFace Inference API | Drop-in SaaS alternatives |
| SSH Connectivity | Paramiko | Secure remote shell execution for Lab Nodes |
| Data Persistence | LocalStorage & `.env` Injection | Avoids over-architected SQL constraints |

---

## ๐Ÿš€ How to Run This Project (Full Step-by-Step Guide)

### ๐Ÿ“‹ Prerequisites
- Python 3.10+
- Node.js 18+
- [Ollama](https://ollama.com/) (installed locally for model hosting)
- **Optional**: A remote Linux VM (Ubuntu/Kali) with SSH enabled for Lab Node mode

---

### 1๏ธโƒฃ Backend Setup (FastAPI / Python)

```bash

cd backend



# Create and activate virtual environment

python -m venv venv

# source venv/bin/activate       # Linux/macOS

venv\Scripts\activate        # Windows



# Install all dependencies

pip install -r requirements.txt

```

#### Start the Backend Engine
```bash

# This exposes the core REST API and the WebSocket simulation tunnel

python main.py

```

---

### 2๏ธโƒฃ Frontend Setup (React)

Open a **new terminal tab**:

```bash

cd frontend



# Install Node.js dependencies

npm install



# Start the Vite development server

npm run dev

```

The application is now fully accessible at [http://localhost:5173](http://localhost:5173).

---

### 3๏ธโƒฃ Pulling Models

To run the simulation locally without cloud API keys, you must ensure you pull suitable reasoning models through Ollama:

```bash

ollama run qwen2.5:3b     # Excellent validator logic footprint

ollama run dolphin-llama3 # Uncensored investigative assertions

ollama pull all-minilm    # Mandatory for semantic similarity scoring

```

---

## ๐Ÿงช Automated Testing
NEXUS-AI includes a comprehensive test suite to ensure environment stability and specification compliance.

```bash

# Run the OpenEnv specification validator

python openenv_validator.py



# Run unit tests for core logic

pip install pytest

pytest tests/

```

---

## ๐Ÿค Authors
**Developed by: Ashish Menon** & Vector