Spaces:
Sleeping
Sleeping
File size: 13,274 Bytes
c30312e 38cd7bb c30312e eecc2a5 1de0435 eecc2a5 c30312e eecc2a5 c30312e 64d4a2f 662e309 eecc2a5 c30312e eecc2a5 c30312e eecc2a5 c30312e eecc2a5 c30312e eecc2a5 c30312e eecc2a5 c30312e 1de0435 eecc2a5 1de0435 eecc2a5 38cd7bb 1de0435 eecc2a5 d5341cc eecc2a5 d5341cc eecc2a5 d5341cc 1de0435 eecc2a5 1de0435 d5341cc eecc2a5 d5341cc eecc2a5 1de0435 eecc2a5 1de0435 eecc2a5 d5341cc eecc2a5 d5341cc eecc2a5 d5341cc eecc2a5 d5341cc eecc2a5 d5341cc eecc2a5 d5341cc eecc2a5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | ---
title: CodeTribunal
emoji: π»
colorFrom: pink
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: The AI Courtroom That Exposes Bad Freelance Code
---
<div align="center">
# CodeTribunal
### Put Freelance Code on Trial.
**Upload code. Get a verdict. Know the risk.**
Built with **GLM 5.1 + CrewAI + GritQL**
[](https://github.com/amineyagoub/CodeTribunal/actions/workflows/tests.yml)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://build-with-glm-5-1-challenge.devpost.com)
</div>
---
## π¨ The Problem
Clients receive code they donβt understand.
- Looks clean⦠but hides security risks
- Passes linters⦠but fails in production
- Works⦠but is architecturally broken
**No one answers the only question that matters:**
> _Is this code safe, professional, and worth paying for?_
---
## The Solution
**CodeTribunal turns code review into a courtroom trial.**
Upload a `.zip` β get:
- Forensic evidence (AST-level)
- Multi-agent investigation
- AI courtroom debate
- Final verdict + risk score
> Not just analysis β **judgment**.
---
## π§ Why This Exist
### 1. Real System
- 6-phase pipeline
- 8 specialized agents
- Persistent execution engine
### 2. Agents That Actually Act
- File reads, pattern search, call tracing
- Real tool usage via function calling (not fake reasoning)
### 3. Deterministic + AI Hybrid
- **GritQL = ground truth**
- **Agents = interpretation + argument**
### 4. End-to-End Story
From raw code β evidence β debate β verdict β report
## How It Works
CodeTribunal runs a **6-phase pipeline**, each building on the last:
### Phase 1: Forensic Evidence (Deterministic β No LLM)
GritQL scans the entire codebase with **17 forensic patterns** across security and quality domains:
| Domain | Patterns | Examples |
| ----------- | -------- | ---------------------------------------------------------------------------------------- |
| π΄ Security | 13 | Hardcoded secrets, `eval()`, SQL injection, `pickle.load()`, `os.system()`, weak hashing |
| π‘ Quality | 4 | `TODO`, `FIXME`, `HACK` comments |
All scanning is **read-only** (`--dry-run`) and runs in **parallel** across patterns.
### Phase 2: Code Dependency Graph (AST β No LLM)
Python's `ast` module and regex-based JS parsing build a **lightweight dependency graph**:
- Nodes: files, functions, classes, imports
- Edges: calls, imports, containment, inheritance
- Enables call-chain tracing: `eval() β handle_request() β app.route()`
### Phase 3: Investigation (3 ReACT Agents + 4 Tools)
Three specialist investigators, each running a **genuine ReACT loop** (Reason β Act β Observe β Repeat) using **Z.ai's native function calling** via LiteLLM:
| Agent | Tools | Purpose |
| ---------------------------- | --------------------------------------------------------- | ------------------------------------------ |
| π‘οΈ Security Investigator | FileReader, PatternSearch, CodeGraphQuery, FindingContext | Find vulnerabilities, trace attack vectors |
| π Quality Investigator | FileReader, FindingContext | Assess technical debt, detect negligence |
| ποΈ Architecture Investigator | FileReader, CodeGraphQuery | Analyze structure, trace dependencies |
Each agent **autonomously decides which tools to call**, observes the results, and iterates. For example, the Security Investigator might:
1. Call `file_reader` to read a flagged file
2. Observe hardcoded secrets on specific lines
3. Call `code_graph_query` to trace where those secrets are used
4. Produce a detailed report with file paths, line numbers, and severity ratings
**Verified working**: GLM-5 + LiteLLM function calling confirmed. Agents make real tool calls that execute real code analysis.
### Phase 4: The Trial (3 Agents)
A courtroom debate between AI agents:
1. ** The Prosecutor** β builds the case for negligence, cites specific evidence
2. ** The Defense Attorney** β challenges claims, argues context and proportionality
3. ** Rebuttal** β the prosecutor responds to the defense
Agents use CrewAI's `context` parameter to chain arguments: prosecution output feeds into defense context, both feed into rebuttal.
### Phase 5: The Verdict
**π¨ The Judge** reviews all evidence, investigation reports, and the full trial transcript. Delivers:
- Overall ruling: GUILTY / MIXED / NOT GUILTY
- Reputational Risk Score (0-100)
- Findings summary with severity rankings
### Phase 6: Structured Report
**π Verdict Report Agent** compiles everything into a professional report:
- Executive Summary
- Findings Table (sorted by severity)
- Per-Finding Analysis (impact, remediation, estimated fix effort)
- Sentencing Recommendations
---
## Architecture
```
ββββββββββββββββ
β Gradio UI β
β + Export β
ββββββββ¬ββββββββ
β
βββββββββββββΌβββββββββββββ
β Pipeline Engine β
β State Β· Persistence β
β Cancel Β· Resume β
βββββββββββββ¬βββββββββββββ
β
ββββββββββββ¬ββββββββββββΌββββββββββββ¬βββββββββββ
βΌ βΌ βΌ βΌ βΌ
βββββββββββ ββββββββ βββββββββββ βββββββββββ ββββββββ
βEvidence β βCode β βInvest. β β Trial β βReportβ
β Scanner β βGraph β β Agents β β Agents β βAgent β
β(GritQL) β β(AST) β β+ Tools β β β β β
βββββββββββ ββββββββ βββββββββββ βββββββββββ ββββββββ
β β β β β
ββββββββββββ΄ββββββββββββ΄ββββββββββββ΄βββββββββββ
β
βββββββββββββΌβββββββββββββ
β Custom Tool Layer β
β FileReader Β· Pattern β
β CodeGraph Β· Context β
ββββββββββββββββββββββββββ
```
### Key Design Decisions
| Decision | Why |
| ------------------------------------- | ------------------------------------------------------------------------------------------- |
| **Agents have tools, not text dumps** | Agents read files, search patterns, and trace calls on demand β scales to any codebase size |
| **ReACT loop via LiteLLM** | Direct function calling with GLM-5 β bypasses CrewAI's unreliable tool routing |
| **Pipeline state persisted to JSON** | Runs can resume after crashes. State is queryable |
| **GritQL for evidence** | AST-level pattern matching, not regex. Language-aware, precise |
| **Custom CrewAI tools (BaseTool)** | Pydantic-validated inputs, proper error handling, CrewAI-native integration |
| **Rate-limit retry with backoff** | Exponential backoff (4s β 64s) on Z.ai 429 errors β pipeline survives API spikes |
---
## Tech Stack
| Component | Technology | Purpose |
| -------------------- | ------------------------ | ---------------------------------------------------- |
| **LLM** | GLM 5 via Z.ai (LiteLLM) | Agent reasoning and debate |
| **Code Scanning** | GritQL | Deterministic AST-level pattern matching |
| **Multi-Agent** | CrewAI 1.12 | Agent orchestration, task chaining, context handoffs |
| **Function Calling** | LiteLLM | Direct ReACT loop with GLM-5 tool calling |
| **Code Graph** | Python `ast` + regex | Dependency graph (Python + JS) |
| **UI** | Gradio 6 | Streaming chatbot, file upload, export |
| **Export** | fpdf2 | PDF report generation |
---
## Install
```bash
# Clone
git clone https://github.com/amineyagoub/CodeTribunal.git
cd CodeTribunal
# Install dependencies
pip install -e .
# Install GritQL CLI
npm install -g @getgrit/cli
# Configure
cp .env.example .env
# Edit .env: set ZAI_API_KEY (get one at https://open.bigmodel.cn/)
```
### Requirements
- Python 3.11+
- Node.js (for GritQL CLI)
- Z.ai API key ([get one here](https://open.bigmodel.cn/))
---
## Usage
### Web UI (Recommended)
```bash
python3 -m code_tribunal.app
```
Open http://localhost:7860, upload a `.zip` of code, and watch the trial unfold.
### CLI
```bash
# Full trial
code-tribunal ./path/to/codebase
# Evidence only (no LLM, fast)
code-tribunal ./path/to/codebase --evidence-only
# Save results to JSON
code-tribunal ./path/to/codebase --output report.json
```
### Python API
```python
from code_tribunal.config import TribunalConfig
from code_tribunal.courtroom import Courtroom
from code_tribunal.pipeline import Phase
config = TribunalConfig()
courtroom = Courtroom(config)
for event in courtroom.run("./path/to/code"):
print(f"[{event.phase.value}] {event.status}")
# Interactive Q&A
answer = courtroom.ask_question(
"Why was eval() considered critical?",
context={"evidence": "...", "verdict": "...", ...}
)
```
---
## π§ Production Features
| Feature | Details |
| ------------------------------ | --------------------------------------------------------------------------------------- |
| **4 Custom Tools** | FileReader, PatternSearch, CodeGraphQuery, FindingContext β agents actively investigate |
| **8 Specialized Agents** | 3 investigators, prosecutor, defense, rebuttal, judge, verdict report, expert witness |
| **ReACT Engine** | Custom Reason-Act-Observe loop via LiteLLM function calling with GLM-5 |
| **Code Dependency Graph** | AST-based (Python + JS), with call-chain tracing and impact analysis |
| **Parallel Evidence Scanning** | ThreadPoolExecutor for GritQL patterns β 4x faster than sequential |
| **Rate-Limit Resilience** | Exponential backoff retry on 429 errors β survives API rate limits |
| **Pipeline Persistence** | State saved to JSON, runs can resume after interruption |
| **Deduplication** | Same file+line merged into one finding with multiple categories |
| **Zip Safety** | Zip-slip attack prevention |
| **Streaming UI** | Real-time pipeline progress in Gradio Chatbot with phase indicators |
| **Export** | Markdown and PDF report generation |
---
## π§ͺ Testing
```bash
# Run evidence scan on test fixtures
code-tribunal tests/fixtures/locale/ --evidence-only
# Run Python tests
pytest tests/
```
Test fixtures in `tests/fixtures/locale/` contain deliberately bad Python and JavaScript code with:
- Hardcoded passwords, API keys, AWS secrets, Stripe keys, JWT secrets
- SQL injection via f-strings and template literals
- `eval()`, `pickle.load()`, `os.system()`, `subprocess.call(shell=True)`
- MD5 hashing
- TODO, FIXME, HACK comments
---
---
<div align="center">
Built for the [Build with GLM 5.1](https://build-with-glm-5-1-challenge.devpost.com) hackathon.
> > > > > > > b4fcdee (feat: Add initial CodeTribunal implementation)
</div>
|