---
title: NEXON-AI
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---

<!-- LAST_SYNC_VERIFICATION: 2026-04-08 00:07:00 -->

# NEXUS-AI 🌐🛡️
### Autonomous Incident Investigation Dashboard

<div align="center">

![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)
![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-009688?style=for-the-badge&logo=fastapi&logoColor=white)
![React](https://img.shields.io/badge/React-18.x-61DAFB?style=for-the-badge&logo=react&logoColor=black)
![Tailwind](https://img.shields.io/badge/Tailwind_CSS-3.x-38B2AC?style=for-the-badge&logo=tailwind-css&logoColor=white)
![Ollama](https://img.shields.io/badge/Ollama-Local_LLM-000000?style=for-the-badge&logo=ollama)

**Status:** Active Simulation Pipeline  
**Architecture:** Real-time WebSockets + Multi-Agent Consensus

</div>

---

## 📖 What is NEXUS-AI?

NEXUS is a next-generation, autonomous dual-agent environment designed to investigate and validate software incidents in real-time. Using a combination of an **Investigator** and a **Validator** agent, NEXUS autonomously forms hypotheses, executes systems tools, evaluates system behavior, and reaches strict consensus on root causes.

Traditional manual debugging requires extensive context-switching and tool fatigue. NEXUS solves this through:
1. **Dual-Agent Autonomy**: Two specialized models communicating word-by-word via WebSockets.
2. **Dynamic Tool Execution**: Fully integrated system terminals allowing agents to run sandboxed validation scripts.
3. **Semantic Reward Engine**: Evaluates conversational drift mathematically (using native GPU embeddings).

The result: An AI "Incident Response Team" that navigates servers, traces logs, and fixes bugs identically to a human SRE.

---

## 🖼️ Application Screenshots

### 📊 Simulation Dashboard

> The core command center. Features live agent terminals, a dual-communication consensus log, and a mathematical performance reward graph plotting investigation confidence.

<div align="center">
  <img src="./assets/screenshots/Dashboard.png" alt="Simulation Dashboard" width="90%"/>
</div>

---

## 🎛️ Scenario Registry & Core Settings

> The system is architected for instant adaptability — seamlessly switch LLM providers and inject custom threat models entirely through the frontend DOM.

<table>
  <tr>
    <td align="center" width="50%">
      <img src="./assets/screenshots/Scenarios.png" alt="Scenario Browser"/>
      <br/><b>Scenario Registry</b>
      <br/><sub>A persistent LocalStorage-backed grid of tactical simulations. Users can dynamically inject custom infrastructure-specific incidents directly into the agent pipeline.</sub>
    </td>
    <td align="center" width="50%">
      <img src="./assets/screenshots/Settings.png" alt="Hardware Configuration"/>
      <br/><b>Runtime Configuration</b>
      <br/><sub>Dynamically maps available locally-installed Ollama networks, allowing the user to pair models (e.g., Qwen vs Dolphin-Phi) with fully independent parameters.</sub>
    </td>
  </tr>
</table>

---

## 🏗️ System Architecture

```text
┌─────────────────────────────────────────────────────────────────┐
│                    CLIENT BROWSER                               │
│          React SPA (Tailwind + Framer Motion)                   │
│          localhost:5173                                         │
└───────────┬─────────────────────────────────┬───────────────────┘
            │ HTTP (REST)                     │ ws://
            ▼                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│              FASTAPI BACKEND (localhost:7860)                   │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐    │
│  │ /config  │ │/scenarios│ │  /reset  │ │  ws:// Simulator │    │
│  │ Env Sync │ │ DB Cache │ │ Injection│ │  Live Stream Sync│    │
│  └──────────┘ └──────────┘ └──────────┘ └──────────────────┘    │
└───────────┬───────────────────────────────────┬─────────────────┘
            │                                   │
            ▼                                   ▼
┌─────────────────────────────────────────────────────────────────┐
│                  OLLAMA ENGINE / LLM PIPELINE                   │
│  Agent A (Investigator)   ◄──────►   Agent B (Validator)        │
│  - Generates Hypotheses              - Challenges Assertions    │
│  - Runs System Tools                 - Requires Proof           │
└─────────────────────────────────────────────────────────────────┘
```

---

## 🌐 Execution Environments

NEXUS-AI supports two distinct execution models for agent tools, toggleable via the **Settings** dashboard:

### 1. Simulated Mode (Safe Sandbox)
*   **Default Mode**: Agents interact with a pre-defined `clue_map` within the scenario YAML.
*   **No System Impact**: Commands like `read_logs` or `check_service` return mocked data.
*   **Use Case**: Training, logic validation, and "what-if" analysis without infrastructure risk.

### 2. SSH Lab Node (Real-World Execution)
*   **Live Connection**: Commands are executed in real-time on a remote Linux server via SSH.
*   **Autonomous Terminal**: Agents use the `run_terminal_command` tool to browse logs, check systemd status, and inspect real configs.
*   **Security**: Includes a command blocklist to prevent highly destructive operations (e.g., `rm -rf /`).
*   **Use Case**: Actual incident response on isolated Lab/Staging nodes.

---

## 📐 OpenEnv Specification

NEXUS-AI strictly adheres to the **OpenEnv 1.0** standard for agent-environment interaction.

### 🎮 Action Space
The environment accepts a typed **NexusAction** (Text-based with structured tool calls).
- **agent_id**: `string` ("agent_a" or "agent_b")
- **message**: `string` (The natural language reasoning/communication)
- **tool_calls**: `List[ToolCall]` (Optional structured calls like `TOOL: read_logs(file='app.log')`)
- **confidence**: `float` (0.0 - 1.0)

### 🧐 Observation Space
The environment returns a structured **NexusObservation** summarizing the system state.
- **scenario_description**: `string` (High-level objective)
- **scenario_context**: `string` (Background telemetry/environment info)
- **partner_message**: `string` (The last message from the other agent)
- **tool_results**: `List[ToolResult]` (Output of any executed system tools)
- **clues_found**: `List[string]` (Accumulated evidence identified by the Reward Engine)
- **investigation_stage**: `string` (`investigating`, `narrowing`, `found`, `verified`)
- **round**: `integer` (Current episode round)
- **available_tools**: `List[string]` (List of permitted tools for the current mode)

### 📝 Task Registry & Difficulty
| Task Name | Difficulty | Objective | Grader Method |
|---|---|---|---|
| `software-incident` | **Easy** | Fix Nginx 503 rate-limit misconfiguration | State Check: `nginx-proxy.rate_limit` |
| `business-process-failure` | **Medium** | Resolve inventory stockout logic error | State Check: `stock_threshold` + Red Herring Penalty |
| `cascade-system-failure` | **Hard** | Fix Postgres connection exhaustion | Multi-Step: Query Termination + Config Update |

### 📈 Baseline Benchmarks
Validated using `inference.py` (Phi-3-mini & Qwen2.5-1.5B).
- **Software Incident**: 0.88 / 1.00
- **Business Process Failure**: 0.72 / 1.00
- **Cascade System Failure**: 0.48 / 1.00

---

## 🧠 The AI Pipeline Deep-Dive

### Step 1: Scenario Injection & Bootstrapping
```python
# The EpisodeManager receives the frontend custom scenario JSON
# Broadcasts 'episode_start' natively over the WebSocket to synchronize the UI
await broadcast("episode_start", {
    "scenario": active_scenario,
    "agent_a_model": settings.AGENT_A_MODEL
})
```

### Step 2: Agent Consensus Loop
```python
# Agents interact sequentially. The Investigator attempts a solution
# while the Validator challenges it. Both agents have access to dynamic system execution.
client, model_name = model_manager.get_client(agent_id)
stream = await client.chat.completions.create(
    model=model_name,
    messages=injected_history,
    tools=available_tools, # e.g. fix_proposer, run_terminal_command
    stream=True
)
```

### Step 3: Fast GPU Embeddings (Similarity Evaluation)
```python
# Heavy CPU blocking is completely bypassed.
# Semantic embedding computations map strictly into the Ollama GPU pipeline.
@lru_cache(maxsize=256)
def get_embedding(text: str) -> List[float]:
    response = httpx.post("http://localhost:11434/api/embeddings", json={
        "model": "all-minilm",
        "prompt": text
    }, timeout=60.0)
    return response.json().get("embedding", [])
```

---

## 🛠️ Full Technology Stack

| Layer | Technology | Why |
|---|---|---|
| Frontend Framework | React 18 (Vite) | Lightning fast HMR, component isolation |
| Frontend Styling | Tailwind CSS | Utility-first tactical glassmorphism |
| Backend Framework | FastAPI | Async Python, explicit endpoint mapping |
| Transport Layer | WebSockets | Word-by-word streaming across UI boundaries |
| Local AI Engine | Ollama | Native device acceleration, absolute privacy |
| Remote Provider | HuggingFace Inference API | Drop-in SaaS alternatives |
| SSH Connectivity | Paramiko | Secure remote shell execution for Lab Nodes |
| Data Persistence | LocalStorage & `.env` Injection | Avoids over-architected SQL constraints |

---

## 🚀 How to Run This Project (Full Step-by-Step Guide)

### 📋 Prerequisites
- Python 3.10+
- Node.js 18+
- [Ollama](https://ollama.com/) (installed locally for model hosting)
- **Optional**: A remote Linux VM (Ubuntu/Kali) with SSH enabled for Lab Node mode

---

### 1️⃣ Backend Setup (FastAPI / Python)

```bash
cd backend

# Create and activate virtual environment
python -m venv venv
# source venv/bin/activate       # Linux/macOS
venv\Scripts\activate        # Windows

# Install all dependencies
pip install -r requirements.txt
```

#### Start the Backend Engine
```bash
# This exposes the core REST API and the WebSocket simulation tunnel
python main.py
```

---

### 2️⃣ Frontend Setup (React)

Open a **new terminal tab**:

```bash
cd frontend

# Install Node.js dependencies
npm install

# Start the Vite development server
npm run dev
```

The application is now fully accessible at [http://localhost:5173](http://localhost:5173).

---

### 3️⃣ Pulling Models

To run the simulation locally without cloud API keys, you must ensure you pull suitable reasoning models through Ollama:

```bash
ollama run qwen2.5:3b     # Excellent validator logic footprint
ollama run dolphin-llama3 # Uncensored investigative assertions
ollama pull all-minilm    # Mandatory for semantic similarity scoring
```

---

## 🧪 Automated Testing
NEXUS-AI includes a comprehensive test suite to ensure environment stability and specification compliance.

```bash
# Run the OpenEnv specification validator
python openenv_validator.py

# Run unit tests for core logic
pip install pytest
pytest tests/
```

---

## 🤝 Authors
**Developed by: Ashish Menon** & Vector