---
language: py
tags:
- concept-reasoning
- neural-symbolic
- graph-neural-network
- GAT
- self-correcting-agent
- code-generation
- edge-ai
license: mit
---

# 🚀 CAT V3 Coding Agent (Graph-MoE + Self-Correcting Sandbox)

Welcome to the official repository for the **CAT V3 Coding Agent**. This project represents a state-of-the-art **neural-symbolic coding agent** designed for edge deployment. It decouples high-level logical path planning (System 2) from code syntax generation (System 1) and pairs them with a multi-language self-correcting execution sandbox.

👉 **Model Repository**: [huggingface.co/Chaman1234/cat-v3-coding-agent](https://huggingface.co/Chaman1234/cat-v3-coding-agent)

---

## 🏛️ Architecture & Core Philosophy

Traditional LLMs generate code token-by-token, which frequently leads to logical drift, syntax errors, and reasoning hallucinations. The **Concept Attention Transformer V3 (CAT V3)** resolves this by enforcing structural constraints:

```text
User Query ➔ Semantic Router ➔ Specialist Expert GATs ➔ Concept Path (0% Logical Hallucinations)
                                                                 │
┌─────────────────────────── Self-Correction Loop ◄──────────────┘
▼
Code Draft (Ollama 3B) ➔ Sandboxed Execution ➔ Success / Debug Retry
```

### Key Stages:
1. **Query Seeding & Normalization**: The input query is cleaned by the `grammar_parser` (resolving spelling errors, normalizing units, and mapping boundary conditions).
2. **Sparse Graph Mixture of Experts (Graph-MoE)**: The query is semantically routed to active specialists. For programming tasks, it routes to the **Coding GAT Specialist**.
3. **Topologically Bounded Concept Planning**: The GAT specialist operates on a concept graph. It predicts a deterministic transition path of concept nodes (e.g. `["List Input", "Modulo Condition", "List Comprehension", "Filtered Output"]`) that strictly respects GNN edge transition masks.
4. **Autonomous Agent Code Generator**: The planned path context is passed to the local generative model (Ollama `qwen2.5-coder:3b`) to draft the source code.
5. **Sandboxed Subprocess Executor**: Code runs inside a safe environment. Supported runtimes include **Python, JavaScript, C++, Go, SQL (SQLite3), HTML/CSS, Java, and Rust**.
6. **Iterative Debugger**: If a run fails (non-zero exit code), the sandbox captures `stderr` and feeds it back to the agent for self-correction (up to 5 attempts).

---

## 📊 Research Benchmarks & Scalability Results

The CAT V3/VLCM concept-based framework achieves massive memory compression and inference efficiency compared to standard token-based autoregressive models.

### 1. Empirical Model Comparison
Benchmarked on the physical query: *"Why does compressor pressure ratio affect turbine efficiency?"*

| Metric | CAT V3 (Concept Graph-MoE) | Traditional Causal LLM (GPT-style) | Advantage / Scale Factor |
| :--- | :---: | :---: | :---: |
| **Model Parameters** | 2,294,835 | 721,900 | ~3.18x parameters |
| **Inference Latency** | **324.49 ms** | 232.31 ms | Linear execution / Single-pass |
| **Logic Hallucination Rate** | **0.0%** (Topologically Masked) | High (Unconstrained next-token drift) | **0% Hallucinations** |
| **Explainable Reasoning Trace**| **Yes** (100% Auditable Path) | No (Black-box attention states) | Full Audit Trail |

### 2. CAT V3 Scalability Stress Test (100 ➔ 10,000 Concepts)
Demonstrating how the Graph-MoE routing and expert networks scale as the vocabulary size grows:

| Vocabulary Size | Avg Expert Activations | Inference Latency | RAM Footprint Increase | VRAM Usage |
| :---: | :---: | :---: | :---: | :---: |
| **100 Concepts** | 5.0 experts | 167.82 ms | +2.76 MB | 2.38 MB |
| **1,000 Concepts** | 3.7 experts | 232.51 ms | +3.82 MB | 10.77 MB |
| **10,000 Concepts** | 3.8 experts | 292.42 ms | -728.45 MB (cleanups) | 697.07 MB |

*Scaling the vocabulary by **100x** only increases latency by **1.7x** due to sparse routing, enabling massive scale-up on consumer CPUs.*

### 3. VLCM Memory Footprint Savings (KV Cache vs. Graph State)
Comparison representing 100,000 tokens of corpus knowledge:
- **Sequence unit count**: 100,000 (LLM) vs. **5,000** (VLCM)
- **KV Cache size (Llama-3 70B at 8k context)**: **2.50 GB** vs. **131 KB** (VLCM Tiny Decoder)
- **Graph state memory**: **2.61 MB** (VLCM) ➔ **19,134.6x memory compression**
- **Generation FLOPs per query**: ~8.19 Trillion FLOPs vs. **~7.66 Million FLOPs** (1,000,000x savings)

### 4. End-to-End Stress Test: 100,000 Concepts & Actual Code Generation

We stress-tested the performance, memory footprint, and reliability of the scaled symbolic reasoning engine using a **100,001-node coding concept graph with 1.2 Million directed edges**, paired with the local **Qwen 2.5 Coder 3B** model (`qwen2.5-coder:3b`) and a multi-language subprocess execution sandbox.

#### 📈 Stress Test Performance & Memory Metrics:
*   **Graph Sizing**: **100,001 nodes** and **1,200,000 directed edges**
*   **Graph Load Time**: **14.58 seconds** (deserializing and building the in-memory graph structure)
*   **RAM Memory Footprint**: **1,501.59 MB** (approx. 1.50 GB)
*   **Symbolic Traversal Latency (5-hop Beam Search)**: **121.81 ms** (average over 50 runs, highly optimized via pre-calculated activation mappings)
*   **Average Code Generation Time**: **8.94 seconds** per task (System 1 inference)
*   **Sandbox Code Execution Time**: **0.41 seconds** (System 2 sandbox execution)
*   **Sandbox Compilation/Execution Success Rate**: **100.0%** (5 out of 5 tasks successfully compiled and passed on the first attempt)

---

#### 💻 Multi-Language Code Generation & Sandbox Results

We ran 5 distinct coding tasks across Python and JavaScript, enforcing strict concept planning paths to test compliance, syntax validity, and execution outcomes.

````carousel
### 🐍 Task 1: Fibonacci Sequence (Python)
*   **Prompt**: *"Write a Python function fibonacci(n) that returns the first n Fibonacci numbers. In the main block, call this function with n=10, print the result, and do not use any interactive input() calls."*
*   **GNN Concept Path**: `Array Allocation in Python` ➔ `Array Execution in Python` ➔ `Array Optimization in Python`
*   **Generation Time**: 10.99 seconds
*   **Sandbox Output**: `The first 10 Fibonacci numbers are: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]`

```python
# Fibonacci sequence generator in Python

def fibonacci(n):
    '''
    Generate the first n Fibonacci numbers.

    Parameters:
    n (int): The number of Fibonacci numbers to generate.

    Returns:
    list: A list containing the first n Fibonacci numbers.
    '''
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    
    fib_sequence = [0, 1]
    for i in range(2, n):
        next_fib = fib_sequence[i-1] + fib_sequence[i-2]
        fib_sequence.append(next_fib)
    
    return fib_sequence

# Main block to test the fibonacci function
if __name__ == "__main__":
    n = 10
    try:
        result = fibonacci(n)
        print(f"The first {n} Fibonacci numbers are: {result}")
    except Exception as e:
        print(f"An error occurred: {e}")
```
<!-- slide -->
### 🐍 Task 2: Greatest Common Divisor (Python)
*   **Prompt**: *"Write a Python function calculate_gcd(a, b) that computes the greatest common divisor using the Euclidean algorithm. Test it with inputs a=48 and b=18, print the result, and do not use interactive input() calls."*
*   **GNN Concept Path**: `Variable Normalization in Python` ➔ `Variable Execution in Python` ➔ `Variable Optimization in Python`
*   **Generation Time**: 6.26 seconds
*   **Sandbox Output**: `The greatest common divisor of 48 and 18 is: 6`

```python
# Function to calculate the greatest common divisor using the Euclidean algorithm
def calculate_gcd(a, b):
    # Ensure both numbers are positive
    a = abs(a)
    b = abs(b)
    
    # Use the Euclidean algorithm to find GCD
    while b:
        a, b = b, a % b
    
    return a

# Main block with test data
if __name__ == "__main__":
    a = 48
    b = 18
    try:
        result = calculate_gcd(a, b)
        print(f"The greatest common divisor of {a} and {b} is: {result}")
    except Exception as e:
        print(f"An error occurred: {e}")
```
<!-- slide -->
### 🐍 Task 3: Matrix Transpose (Python)
*   **Prompt**: *"Write a Python function transpose(matrix) that transposes a 2D list (matrix) of size 3x3. Test it with matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]], print the transposed matrix, and do not use interactive input() calls."*
*   **GNN Concept Path**: `Matrix Normalization in Python` ➔ `Matrix Execution in Python` ➔ `Matrix Optimization in Python`
*   **Generation Time**: 8.75 seconds
*   **Sandbox Output**:
    ```text
    Original Matrix:
    [1, 2, 3]
    [4, 5, 6]
    [7, 8, 9]

    Transposed Matrix:
    [1, 4, 7]
    [2, 5, 8]
    [3, 6, 9]
    ```

```python
import numpy as np

def transpose(matrix):
    try:
        if len(matrix) != 3 or any(len(row) != 3 for row in matrix):
            raise ValueError("Input must be a 3x3 matrix.")
        
        # Transpose using numpy's transpose function
        transposed_matrix = np.array(matrix).T.tolist()
        return transposed_matrix
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

if __name__ == "__main__":
    matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    transposed_matrix = transpose(matrix)
    if transposed_matrix is not None:
        print("Original Matrix:")
        for row in matrix:
            print(row)
        print("
Transposed Matrix:")
        for row in transposed_matrix:
            print(row)
```
<!-- slide -->
### 🟨 Task 4: Run-length Encoding (JavaScript)
*   **Prompt**: *"Write a JavaScript function rle(input) that performs run-length encoding on a string. For example, 'AABBBCCCC' should return 'A2B3C4'. Call it with 'AABBBCCCC' in the main block and print the result using console.log. Do not use interactive prompt() calls."*
*   **GNN Concept Path**: `String Parsing in JavaScript` ➔ `String Execution in JavaScript` ➔ `String Validation in JavaScript`
*   **Generation Time**: 9.93 seconds
*   **Sandbox Output**: `A2B3C4`

```javascript
// Function to perform run-length encoding on a string
function rle(input) {
    if (typeof input !== 'string') {
        throw new Error('Input must be a string');
    }

    let encoded = '';
    let count = 1;

    for (let i = 1; i < input.length; i++) {
        if (input[i] === input[i - 1]) {
            count++;
        } else {
            encoded += input[i - 1] + count;
            count = 1;
        }
    }

    // Add the last character and its count
    encoded += input[input.length - 1] + count;
    return encoded;
}

// Main block to test the rle function
try {
    const inputString = 'AABBBCCCC';
    const result = rle(inputString);
    console.log(result); // Output: A2B3C4
} catch (error) {
    console.error('Error:', error.message);
}
```
<!-- slide -->
### 🐍 Task 5: Bubble Sort (Python)
*   **Prompt**: *"Write a Python function bubble_sort(arr) that sorts an array of integers in ascending order. Test it with inputs arr=[64, 34, 25, 12, 22, 11, 90], print the sorted array, and do not use interactive input() calls."*
*   **GNN Concept Path**: `Array Optimization in Python` ➔ `Array Parsing in Python` ➔ `Array Execution in Python`
*   **Generation Time**: 8.75 seconds
*   **Sandbox Output**:
    ```text
    Original array: [64, 34, 25, 12, 22, 11, 90]
    Sorted array: [11, 12, 22, 25, 34, 64, 90]
    ```

```python
# Bubble Sort Function in Python

def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]

if __name__ == "__main__":
    arr = [64, 34, 25, 12, 22, 11, 90]
    try:
        print("Original array:", arr)
        bubble_sort(arr)
        print("Sorted array:", arr)
    except Exception as e:
        print(f"An error occurred: {e}")
```
````

---

## 🚀 How to Run the Coding Lab locally

1. **Prerequisites**: Make sure you have python installed.
2. **Start the server**:
   ```bash
   python coding_lab_server.py
   ```
3. **Open the browser**: Navigate to **[http://localhost:8002/](http://localhost:8002/)**.
4. **Features**:
   - Visual **Vis.js Concept Network** displaying active nodes and transition edges.
   - Real-time **MoE routing probability bars**.
   - Interactive tab panel showing the **Execution Trace logs**, **Generated Code**, and **Sandbox Stdout/Stderr**.

---

## 📂 Project Structure

- `cat_v3/`: Core model definition, router, GAT experts, and combiner.
- `checkpoints/cat_v3/cat_v3_model.pt`: Pre-trained weights (Graph-MoE).
- `agent_executor.py`: Sandbox runner and execution manager.
- `coding_lab_server.py`: Web server hosting the GUI and APIs.
- `push_to_hf.py`: Helper script to synchronize files with Hugging Face Hub.

---

## ⚖️ License
This project is licensed under the MIT License.