---
title: CodeTribunal
emoji: 💻
colorFrom: pink
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: The AI Courtroom That Exposes Bad Freelance Code
---
<div align="center">

# CodeTribunal

### Put Freelance Code on Trial.

**Upload code. Get a verdict. Know the risk.**

Built with **GLM 5.1 + CrewAI + GritQL**

[![Tests](https://github.com/amineyagoub/CodeTribunal/actions/workflows/tests.yml/badge.svg)](https://github.com/amineyagoub/CodeTribunal/actions/workflows/tests.yml)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Built for GLM 5.1 Hackathon](https://img.shields.io/badge/Built%20for-GLM%205.1-ff69b4)](https://build-with-glm-5-1-challenge.devpost.com)

</div>

---

## 🚨 The Problem

Clients receive code they don’t understand.

- Looks clean… but hides security risks
- Passes linters… but fails in production
- Works… but is architecturally broken

**No one answers the only question that matters:**

> _Is this code safe, professional, and worth paying for?_

---

## The Solution

**CodeTribunal turns code review into a courtroom trial.**

Upload a `.zip` → get:

- Forensic evidence (AST-level)
- Multi-agent investigation
- AI courtroom debate
- Final verdict + risk score

> Not just analysis — **judgment**.

---

## 🧠 Why This Exist

### 1. Real System

- 6-phase pipeline
- 8 specialized agents
- Persistent execution engine

### 2. Agents That Actually Act

- File reads, pattern search, call tracing
- Real tool usage via function calling (not fake reasoning)

### 3. Deterministic + AI Hybrid

- **GritQL = ground truth**
- **Agents = interpretation + argument**

### 4. End-to-End Story

From raw code → evidence → debate → verdict → report

## How It Works

CodeTribunal runs a **6-phase pipeline**, each building on the last:

### Phase 1: Forensic Evidence (Deterministic — No LLM)

GritQL scans the entire codebase with **17 forensic patterns** across security and quality domains:

| Domain      | Patterns | Examples                                                                                 |
| ----------- | -------- | ---------------------------------------------------------------------------------------- |
| 🔴 Security | 13       | Hardcoded secrets, `eval()`, SQL injection, `pickle.load()`, `os.system()`, weak hashing |
| 🟡 Quality  | 4        | `TODO`, `FIXME`, `HACK` comments                                                         |

All scanning is **read-only** (`--dry-run`) and runs in **parallel** across patterns.

### Phase 2: Code Dependency Graph (AST — No LLM)

Python's `ast` module and regex-based JS parsing build a **lightweight dependency graph**:

- Nodes: files, functions, classes, imports
- Edges: calls, imports, containment, inheritance
- Enables call-chain tracing: `eval() → handle_request() → app.route()`

### Phase 3: Investigation (3 ReACT Agents + 4 Tools)

Three specialist investigators, each running a **genuine ReACT loop** (Reason → Act → Observe → Repeat) using **Z.ai's native function calling** via LiteLLM:

| Agent                        | Tools                                                     | Purpose                                    |
| ---------------------------- | --------------------------------------------------------- | ------------------------------------------ |
| 🛡️ Security Investigator     | FileReader, PatternSearch, CodeGraphQuery, FindingContext | Find vulnerabilities, trace attack vectors |
| 📋 Quality Investigator      | FileReader, FindingContext                                | Assess technical debt, detect negligence   |
| 🏗️ Architecture Investigator | FileReader, CodeGraphQuery                                | Analyze structure, trace dependencies      |

Each agent **autonomously decides which tools to call**, observes the results, and iterates. For example, the Security Investigator might:

1. Call `file_reader` to read a flagged file
2. Observe hardcoded secrets on specific lines
3. Call `code_graph_query` to trace where those secrets are used
4. Produce a detailed report with file paths, line numbers, and severity ratings

**Verified working**: GLM-5 + LiteLLM function calling confirmed. Agents make real tool calls that execute real code analysis.

### Phase 4: The Trial (3 Agents)

A courtroom debate between AI agents:

1. ** The Prosecutor** — builds the case for negligence, cites specific evidence
2. ** The Defense Attorney** — challenges claims, argues context and proportionality
3. ** Rebuttal** — the prosecutor responds to the defense

Agents use CrewAI's `context` parameter to chain arguments: prosecution output feeds into defense context, both feed into rebuttal.

### Phase 5: The Verdict

**🔨 The Judge** reviews all evidence, investigation reports, and the full trial transcript. Delivers:

- Overall ruling: GUILTY / MIXED / NOT GUILTY
- Reputational Risk Score (0-100)
- Findings summary with severity rankings

### Phase 6: Structured Report

**📝 Verdict Report Agent** compiles everything into a professional report:

- Executive Summary
- Findings Table (sorted by severity)
- Per-Finding Analysis (impact, remediation, estimated fix effort)
- Sentencing Recommendations

---

## Architecture

```
                          ┌──────────────┐
                          │  Gradio UI   │
                          │  + Export    │
                          └──────┬───────┘
                                 │
                     ┌───────────▼────────────┐
                     │    Pipeline Engine      │
                     │  State · Persistence    │
                     │  Cancel · Resume        │
                     └───────────┬────────────┘
                                 │
          ┌──────────┬───────────┼───────────┬──────────┐
          ▼          ▼           ▼           ▼          ▼
     ┌─────────┐ ┌──────┐ ┌─────────┐ ┌─────────┐ ┌──────┐
     │Evidence │ │Code  │ │Invest.  │ │  Trial  │ │Report│
     │ Scanner │ │Graph │ │ Agents  │ │ Agents  │ │Agent │
     │(GritQL) │ │(AST) │ │+ Tools  │ │         │ │      │
     └─────────┘ └──────┘ └─────────┘ └─────────┘ └──────┘
          │          │           │           │          │
          └──────────┴───────────┴───────────┴──────────┘
                                 │
                     ┌───────────▼────────────┐
                     │   Custom Tool Layer    │
                     │ FileReader · Pattern   │
                     │ CodeGraph · Context    │
                     └────────────────────────┘
```

### Key Design Decisions

| Decision                              | Why                                                                                         |
| ------------------------------------- | ------------------------------------------------------------------------------------------- |
| **Agents have tools, not text dumps** | Agents read files, search patterns, and trace calls on demand — scales to any codebase size |
| **ReACT loop via LiteLLM**            | Direct function calling with GLM-5 — bypasses CrewAI's unreliable tool routing              |
| **Pipeline state persisted to JSON**  | Runs can resume after crashes. State is queryable                                           |
| **GritQL for evidence**               | AST-level pattern matching, not regex. Language-aware, precise                              |
| **Custom CrewAI tools (BaseTool)**    | Pydantic-validated inputs, proper error handling, CrewAI-native integration                 |
| **Rate-limit retry with backoff**     | Exponential backoff (4s → 64s) on Z.ai 429 errors — pipeline survives API spikes            |

---

## Tech Stack

| Component            | Technology               | Purpose                                              |
| -------------------- | ------------------------ | ---------------------------------------------------- |
| **LLM**              | GLM 5 via Z.ai (LiteLLM) | Agent reasoning and debate                           |
| **Code Scanning**    | GritQL                   | Deterministic AST-level pattern matching             |
| **Multi-Agent**      | CrewAI 1.12              | Agent orchestration, task chaining, context handoffs |
| **Function Calling** | LiteLLM                  | Direct ReACT loop with GLM-5 tool calling            |
| **Code Graph**       | Python `ast` + regex     | Dependency graph (Python + JS)                       |
| **UI**               | Gradio 6                 | Streaming chatbot, file upload, export               |
| **Export**           | fpdf2                    | PDF report generation                                |

---

## Install

```bash
# Clone
git clone https://github.com/amineyagoub/CodeTribunal.git
cd CodeTribunal

# Install dependencies
pip install -e .

# Install GritQL CLI
npm install -g @getgrit/cli

# Configure
cp .env.example .env
# Edit .env: set ZAI_API_KEY (get one at https://open.bigmodel.cn/)
```

### Requirements

- Python 3.11+
- Node.js (for GritQL CLI)
- Z.ai API key ([get one here](https://open.bigmodel.cn/))

---

## Usage

### Web UI (Recommended)

```bash
python3 -m code_tribunal.app
```

Open http://localhost:7860, upload a `.zip` of code, and watch the trial unfold.

### CLI

```bash
# Full trial
code-tribunal ./path/to/codebase

# Evidence only (no LLM, fast)
code-tribunal ./path/to/codebase --evidence-only

# Save results to JSON
code-tribunal ./path/to/codebase --output report.json
```

### Python API

```python
from code_tribunal.config import TribunalConfig
from code_tribunal.courtroom import Courtroom
from code_tribunal.pipeline import Phase

config = TribunalConfig()
courtroom = Courtroom(config)

for event in courtroom.run("./path/to/code"):
    print(f"[{event.phase.value}] {event.status}")

# Interactive Q&A
answer = courtroom.ask_question(
    "Why was eval() considered critical?",
    context={"evidence": "...", "verdict": "...", ...}
)
```

---

## 🔧 Production Features

| Feature                        | Details                                                                                 |
| ------------------------------ | --------------------------------------------------------------------------------------- |
| **4 Custom Tools**             | FileReader, PatternSearch, CodeGraphQuery, FindingContext — agents actively investigate |
| **8 Specialized Agents**       | 3 investigators, prosecutor, defense, rebuttal, judge, verdict report, expert witness   |
| **ReACT Engine**               | Custom Reason-Act-Observe loop via LiteLLM function calling with GLM-5                  |
| **Code Dependency Graph**      | AST-based (Python + JS), with call-chain tracing and impact analysis                    |
| **Parallel Evidence Scanning** | ThreadPoolExecutor for GritQL patterns — 4x faster than sequential                      |
| **Rate-Limit Resilience**      | Exponential backoff retry on 429 errors — survives API rate limits                      |
| **Pipeline Persistence**       | State saved to JSON, runs can resume after interruption                                 |
| **Deduplication**              | Same file+line merged into one finding with multiple categories                         |
| **Zip Safety**                 | Zip-slip attack prevention                                                              |
| **Streaming UI**               | Real-time pipeline progress in Gradio Chatbot with phase indicators                     |
| **Export**                     | Markdown and PDF report generation                                                      |

---

## 🧪 Testing

```bash
# Run evidence scan on test fixtures
code-tribunal tests/fixtures/locale/ --evidence-only

# Run Python tests
pytest tests/
```

Test fixtures in `tests/fixtures/locale/` contain deliberately bad Python and JavaScript code with:

- Hardcoded passwords, API keys, AWS secrets, Stripe keys, JWT secrets
- SQL injection via f-strings and template literals
- `eval()`, `pickle.load()`, `os.system()`, `subprocess.call(shell=True)`
- MD5 hashing
- TODO, FIXME, HACK comments

---

---

<div align="center">

Built for the [Build with GLM 5.1](https://build-with-glm-5-1-challenge.devpost.com) hackathon.

> > > > > > > b4fcdee (feat: Add initial CodeTribunal implementation)

</div>