Spaces:

amine-yagoub
/

CodeTribunal

Sleeping

App Files Files Community

amine-yagoub commited on Apr 2

Commit

eecc2a5

1 Parent(s): 12c5d69

docs: expand README with comprehensive project documentation

Browse files

Files changed (1) hide show

README.md +300 -15

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
-<<<<<<< HEAD
----
 title: CodeTribunal
 emoji: 💻
 colorFrom: pink
@@ -8,34 +8,319 @@ sdk: docker
 pinned: false
 license: mit
 short_description: The AI Courtroom That Exposes Bad Freelance Code
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-=======
-# CodeTribunal
-The AI courtroom that exposes bad freelance code.
-Multi-agent forensic investigation powered by GLM 5.1. Instead of guessing code quality, CodeTribunal puts it on trial — a live-streaming debate where an AI Prosecutor and Defense Attorney clash over real, deterministic technical evidence.
-## Install
 ```bash
 pip install -e .
 ```
-## Usage
 ```bash
 code-tribunal ./path/to/codebase
 ```
-## How it works
-1. **Evidence Gathering** — Deterministic scans (security, code smells, hardcoded secrets, TODOs)
-2. **Investigation** — GLM 5.1 agents analyze the evidence
-3. **The Trial** — Prosecutor and Defense debate in a live-streamed courtroom
-4. **Verdict** — The Judge delivers a final ruling
 Built for the [Build with GLM 5.1](https://build-with-glm-5-1-challenge.devpost.com) hackathon.
->>>>>>> b4fcdee (feat: Add initial CodeTribunal implementation)

+## <<<<<<< HEAD
 title: CodeTribunal
 emoji: 💻
 colorFrom: pink
 pinned: false
 license: mit
 short_description: The AI Courtroom That Exposes Bad Freelance Code
+---
+# Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+<div align="center">
+# ⚖️ CodeTribunal
+### The AI Courtroom That Exposes Bad Freelance Code
+**Multi-Agent Forensic Investigation Powered by GLM 5 + GritQL + CrewAI**
+[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Built for GLM 5.1 Hackathon](https://img.shields.io/badge/Built%20for-GLM%205.1-ff69b4)](https://build-with-glm-5-1-challenge.devpost.com)
+[How It Works](#-how-it-works) • [Architecture](#-architecture) • [Install](#-install) • [Usage](#-usage) • [Demo](#-demo)
+</div>
+---
+## 🎬 The Problem
+A freelancer delivers code. The client can't tell if it's professional work or a security nightmare. Traditional linters find syntax errors. Code reviews miss architectural flaws. Nobody puts it all together and tells you:
+> _"This code is negligent, here's exactly why, and here's what it will cost you."_
+**CodeTribunal does.**
+Upload a `.zip` of code and watch a full courtroom trial unfold — evidence gathering, investigation by specialist agents, a live-streamed debate between an AI Prosecutor and Defense Attorney, and a Judge's verdict with a reputational risk score.
+---
+## 🏛️ How It Works
+CodeTribunal runs a **6-phase pipeline**, each building on the last:
+### Phase 1: Forensic Evidence (Deterministic — No LLM)
+GritQL scans the entire codebase with **17 forensic patterns** across security and quality domains:
+| Domain      | Patterns | Examples                                                                                 |
+| ----------- | -------- | ---------------------------------------------------------------------------------------- |
+| 🔴 Security | 13       | Hardcoded secrets, `eval()`, SQL injection, `pickle.load()`, `os.system()`, weak hashing |
+| 🟡 Quality  | 4        | `TODO`, `FIXME`, `HACK` comments                                                         |
+All scanning is **read-only** (`--dry-run`) and runs in **parallel** across patterns.
+### Phase 2: Code Dependency Graph (AST — No LLM)
+Python's `ast` module and regex-based JS parsing build a **lightweight dependency graph**:
+- Nodes: files, functions, classes, imports
+- Edges: calls, imports, containment, inheritance
+- Enables call-chain tracing: `eval() → handle_request() → app.route()`
+### Phase 3: Investigation (3 ReACT Agents + 4 Tools)
+Three specialist investigators, each running a **genuine ReACT loop** (Reason → Act → Observe → Repeat) using **Z.ai's native function calling** via LiteLLM:
+| Agent                        | Tools                                                     | Purpose                                    |
+| ---------------------------- | --------------------------------------------------------- | ------------------------------------------ |
+| 🛡️ Security Investigator     | FileReader, PatternSearch, CodeGraphQuery, FindingContext | Find vulnerabilities, trace attack vectors |
+| 📋 Quality Investigator      | FileReader, FindingContext                                | Assess technical debt, detect negligence   |
+| 🏗️ Architecture Investigator | FileReader, CodeGraphQuery                                | Analyze structure, trace dependencies      |
+Each agent **autonomously decides which tools to call**, observes the results, and iterates. For example, the Security Investigator might:
+1. Call `file_reader` to read a flagged file
+2. Observe hardcoded secrets on specific lines
+3. Call `code_graph_query` to trace where those secrets are used
+4. Produce a detailed report with file paths, line numbers, and severity ratings
+**Verified working**: GLM-5 + LiteLLM function calling confirmed. Agents make real tool calls that execute real code analysis.
+### Phase 4: The Trial (3 Agents)
+A courtroom debate between AI agents:
+1. **⚖️ The Prosecutor** — builds the case for negligence, cites specific evidence
+2. **🛡️ The Defense Attorney** — challenges claims, argues context and proportionality
+3. **⚖️ Rebuttal** — the prosecutor responds to the defense
+Agents use CrewAI's `context` parameter to chain arguments: prosecution output feeds into defense context, both feed into rebuttal.
+### Phase 5: The Verdict
+**🔨 The Judge** reviews all evidence, investigation reports, and the full trial transcript. Delivers:
+- Overall ruling: GUILTY / MIXED / NOT GUILTY
+- Reputational Risk Score (0-100)
+- Findings summary with severity rankings
+### Phase 6: Structured Report
+**📝 Verdict Report Agent** compiles everything into a professional report:
+- Executive Summary
+- Findings Table (sorted by severity)
+- Per-Finding Analysis (impact, remediation, estimated fix effort)
+- Sentencing Recommendations
 ---
+## 🏗️ Architecture
+```
+                          ┌──────────────┐
+                          │  Gradio UI   │
+                          │  + Export    │
+                          └──────┬───────┘
+                                 │
+                     ┌───────────▼────────────┐
+                     │    Pipeline Engine      │
+                     │  State · Persistence    │
+                     │  Cancel · Resume        │
+                     └───────────┬────────────┘
+                                 │
+          ┌──────────┬───────────┼───────────┬──────────┐
+          ▼          ▼           ▼           ▼          ▼
+     ┌─────────┐ ┌──────┐ ┌─────────┐ ┌─────────┐ ┌──────┐
+     │Evidence │ │Code  │ │Invest.  │ │  Trial  │ │Report│
+     │ Scanner │ │Graph │ │ Agents  │ │ Agents  │ │Agent │
+     │(GritQL) │ │(AST) │ │+ Tools  │ │         │ │      │
+     └─────────┘ └──────┘ └─────────┘ └─────────┘ └──────┘
+          │          │           │           │          │
+          └──────────┴───────────┴───────────┴──────────┘
+                                 │
+                     ┌───────────▼────────────┐
+                     │   Custom Tool Layer    │
+                     │ FileReader · Pattern   │
+                     │ CodeGraph · Context    │
+                     └────────────────────────┘
+```
+### Key Design Decisions
+| Decision                              | Why                                                                                         |
+| ------------------------------------- | ------------------------------------------------------------------------------------------- |
+| **Agents have tools, not text dumps** | Agents read files, search patterns, and trace calls on demand — scales to any codebase size |
+| **ReACT loop via LiteLLM**            | Direct function calling with GLM-5 — bypasses CrewAI's unreliable tool routing              |
+| **Pipeline state persisted to JSON**  | Runs can resume after crashes. State is queryable                                           |
+| **GritQL for evidence**               | AST-level pattern matching, not regex. Language-aware, precise                              |
+| **Custom CrewAI tools (BaseTool)**    | Pydantic-validated inputs, proper error handling, CrewAI-native integration                 |
+| **Rate-limit retry with backoff**     | Exponential backoff (4s → 64s) on Z.ai 429 errors — pipeline survives API spikes            |
+---
+## 🧰 Tech Stack
+| Component            | Technology               | Purpose                                              |
+| -------------------- | ------------------------ | ---------------------------------------------------- |
+| **LLM**              | GLM 5 via Z.ai (LiteLLM) | Agent reasoning and debate                           |
+| **Code Scanning**    | GritQL                   | Deterministic AST-level pattern matching             |
+| **Multi-Agent**      | CrewAI 1.12              | Agent orchestration, task chaining, context handoffs |
+| **Function Calling** | LiteLLM                  | Direct ReACT loop with GLM-5 tool calling            |
+| **Code Graph**       | Python `ast` + regex     | Dependency graph (Python + JS)                       |
+| **UI**               | Gradio 6                 | Streaming chatbot, file upload, export               |
+| **Export**           | fpdf2                    | PDF report generation                                |
+---
+## 📦 Install
 ```bash
+# Clone
+git clone https://github.com/amineyagoub/CodeTribunal.git
+cd CodeTribunal
+# Install dependencies
 pip install -e .
+# Install GritQL CLI
+npm install -g @getgrit/cli
+# Configure
+cp .env.example .env
+# Edit .env: set ZAI_API_KEY
+```
+### Requirements
+- Python 3.11+
+- Node.js (for GritQL CLI)
+- Z.ai API key ([get one here](https://open.bigmodel.cn/))
+---
+## 🚀 Usage
+### Web UI (Recommended)
+```bash
+python3 -m code_tribunal.app
 ```
+Open http://localhost:7860, upload a `.zip` of code, and watch the trial unfold.
+### CLI
 ```bash
+# Full trial
 code-tribunal ./path/to/codebase
+# Evidence only (no LLM, fast)
+code-tribunal ./path/to/codebase --evidence-only
+# Save results to JSON
+code-tribunal ./path/to/codebase --output report.json
 ```
+### Python API
+```python
+from code_tribunal.config import TribunalConfig
+from code_tribunal.courtroom import Courtroom
+from code_tribunal.pipeline import Phase
+config = TribunalConfig()
+courtroom = Courtroom(config)
+for event in courtroom.run("./path/to/code"):
+    print(f"[{event.phase.value}] {event.status}")
+# Interactive Q&A
+answer = courtroom.ask_question(
+    "Why was eval() considered critical?",
+    context={"evidence": "...", "verdict": "...", ...}
+)
+```
+---
+## 🔧 Production Features
+| Feature                        | Details                                                                                 |
+| ------------------------------ | --------------------------------------------------------------------------------------- |
+| **4 Custom Tools**             | FileReader, PatternSearch, CodeGraphQuery, FindingContext — agents actively investigate |
+| **8 Specialized Agents**       | 3 investigators, prosecutor, defense, rebuttal, judge, verdict report, expert witness   |
+| **ReACT Engine**               | Custom Reason-Act-Observe loop via LiteLLM function calling with GLM-5                  |
+| **Code Dependency Graph**      | AST-based (Python + JS), with call-chain tracing and impact analysis                    |
+| **Parallel Evidence Scanning** | ThreadPoolExecutor for GritQL patterns — 4x faster than sequential                      |
+| **Rate-Limit Resilience**      | Exponential backoff retry on 429 errors — survives API rate limits                      |
+| **Pipeline Persistence**       | State saved to JSON, runs can resume after interruption                                 |
+| **Deduplication**              | Same file+line merged into one finding with multiple categories                         |
+| **Zip Safety**                 | Zip-slip attack prevention                                                              |
+| **Streaming UI**               | Real-time pipeline progress in Gradio Chatbot with phase indicators                     |
+| **Export**                     | Markdown and PDF report generation                                                      |
+---
+## 🧪 Testing
+```bash
+# Run evidence scan on test fixtures
+code-tribunal tests/fixtures/locale/ --evidence-only
+# Run Python tests
+pytest tests/
+```
+Test fixtures in `tests/fixtures/locale/` contain deliberately bad Python and JavaScript code with:
+- Hardcoded passwords, API keys, AWS secrets, Stripe keys, JWT secrets
+- SQL injection via f-strings and template literals
+- `eval()`, `pickle.load()`, `os.system()`, `subprocess.call(shell=True)`
+- MD5 hashing
+- TODO, FIXME, HACK comments
+---
+## 📁 Project Structure
+```
+CodeTribunal/
+├── src/code_tribunal/
+│   ├── config.py          # Centralized configuration
+│   ├── evidence.py        # GritQL forensic scanning (17 patterns)
+│   ├── code_graph.py      # AST code dependency graph
+│   ├── tools.py           # 4 custom CrewAI tools
+│   ├── agents.py          # 8 agent definitions
+│   ├── react.py           # ReACT engine (LiteLLM function calling)
+│   ├── courtroom.py       # 6-phase pipeline orchestrator
+│   ├── pipeline.py        # State machine + persistence
+│   ├── app.py             # Gradio UI + export
+│   └── cli.py             # CLI entry point
+├── tests/
+│   ├── fixtures/locale/   # Deliberately bad code samples
+│   └── bad_code.zip       # Zip fixture for UI testing
+├── assets/
+│   └── logo.png           # 3D courtroom logo
+└── pyproject.toml
+```
+---
+## 🎓 What Makes This a Strong Hackathon Entry
+1. **System Complexity** — 6-phase pipeline with 8 agents, 4 custom tools, code graph, and streaming
+2. **Effective Tool Use** — Agents use BaseTool with Pydantic schemas to read files, search patterns, and trace calls
+3. **Context Handoffs** — CrewAI `context` parameter chains prosecution → defense → rebuttal
+4. **Custom ReACT Engine** — Direct LiteLLM function calling with GLM-5 for reliable tool use
+5. **Deterministic + AI** — GritQL provides ground-truth evidence, agents provide interpretation and debate
+6. **Resilient** — Rate-limit retry, pipeline persistence, error recovery
+---
+<div align="center">
 Built for the [Build with GLM 5.1](https://build-with-glm-5-1-challenge.devpost.com) hackathon.
+> > > > > > > b4fcdee (feat: Add initial CodeTribunal implementation)
+</div>