Spaces:

NinjainPJs
/

ninja-code-guard

Sleeping

App Files Files Community

NinjainPJs commited on Mar 20

Commit

4b445f6

0 Parent(s):

initial - commit

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.env.example +27 -0
.github/workflows/ci.yml +31 -0
.github/workflows/prewarm.yml +14 -0
.gitignore +67 -0
PROJECT_PLAN.md +704 -0
README.md +161 -0
app/__init__.py +0 -0
app/agents/__init__.py +0 -0
app/agents/base_agent.py +295 -0
app/agents/performance_agent.py +44 -0
app/agents/security_agent.py +107 -0
app/agents/style_agent.py +43 -0
app/agents/synthesizer.py +291 -0
app/config.py +40 -0
app/context/__init__.py +0 -0
app/context/embedder.py +126 -0
app/context/indexer.py +127 -0
app/context/retriever.py +116 -0
app/db/__init__.py +0 -0
app/db/postgres.py +144 -0
app/db/redis_cache.py +121 -0
app/github/__init__.py +0 -0
app/github/auth.py +135 -0
app/github/client.py +362 -0
app/github/comment_formatter.py +215 -0
app/github/webhook.py +84 -0
app/main.py +355 -0
app/models/__init__.py +0 -0
app/models/findings.py +55 -0
app/models/webhook_payloads.py +55 -0
app/services/__init__.py +0 -0
app/services/health_score.py +85 -0
app/tools/__init__.py +0 -0
app/tools/bandit_tool.py +173 -0
app/tools/detect_secrets_tool.py +118 -0
app/tools/linter_tool.py +113 -0
app/tools/radon_tool.py +107 -0
dashboard/.gitignore +41 -0
dashboard/AGENTS.md +5 -0
dashboard/CLAUDE.md +1 -0
dashboard/README.md +36 -0
dashboard/app/favicon.ico +0 -0
dashboard/app/globals.css +152 -0
dashboard/app/layout.tsx +104 -0
dashboard/app/page.tsx +291 -0
dashboard/app/repos/[owner]/[repo]/page.tsx +170 -0
dashboard/app/repos/[owner]/[repo]/prs/[number]/page.tsx +168 -0
dashboard/components/AgentBreakdown.tsx +113 -0
dashboard/components/AnimatedCounter.tsx +44 -0
dashboard/components/FindingsTable.tsx +185 -0

.env.example ADDED Viewed

	@@ -0,0 +1,27 @@

+# === LLM APIs ===
+GROQ_API_KEY=gsk_your_groq_api_key_here
+GEMINI_API_KEY=AIza_your_gemini_api_key_here
+# === GitHub App ===
+GITHUB_APP_ID=123456
+GITHUB_APP_PRIVATE_KEY_PATH=./keys/app.pem
+GITHUB_WEBHOOK_SECRET=your_webhook_secret_here
+# === Database ===
+DATABASE_URL=postgresql://user:pass@host.neon.tech/sentinel_ai?sslmode=require
+# === Redis Cache ===
+UPSTASH_REDIS_URL=rediss://default:your_token@your-endpoint.upstash.io:6379
+# === Embedding Model ===
+EMBEDDING_MODEL=all-MiniLM-L6-v2
+# === App Config ===
+ENVIRONMENT=development
+LOG_LEVEL=INFO
+CONFIDENCE_THRESHOLD=0.6
+MAX_REPO_FILES_INDEX=500
+# === Security ===
+DASHBOARD_API_KEY=generate-a-random-key-here
+CORS_ALLOWED_ORIGINS=http://localhost:3000

.github/workflows/ci.yml ADDED Viewed

	@@ -0,0 +1,31 @@

+name: CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  lint-and-test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Install dependencies
+        run: pip install -r requirements-dev.txt
+      - name: Lint with ruff
+        run: ruff check app/ tests/
+      - name: Type check with mypy
+        run: mypy app/ --ignore-missing-imports
+        continue-on-error: true
+      - name: Run tests
+        run: pytest tests/ -v --tb=short

.github/workflows/prewarm.yml ADDED Viewed

	@@ -0,0 +1,14 @@

+name: Pre-warm Render
+on:
+  schedule:
+    # Ping every 10 minutes during working hours (UTC)
+    - cron: "*/10 6-20 * * 1-5"
+jobs:
+  ping:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Ping health endpoint
+        run: |
+          curl -sf "${{ secrets.RENDER_HEALTH_URL }}/health" || echo "Service cold — will wake on next request"

.gitignore ADDED Viewed

	@@ -0,0 +1,67 @@

+# Project planning docs (confidential)
+*.pdf
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+*.egg-info/
+dist/
+build/
+.eggs/
+*.egg
+# Virtual environments
+.venv/
+venv/
+env/
+# Environment variables
+.env
+.env.local
+.env.production
+# Keys & secrets
+keys/
+*.pem
+*.key
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# ChromaDB persistence
+chroma_data/
+chromadb/
+# Test & coverage
+.pytest_cache/
+htmlcov/
+.coverage
+coverage.xml
+# Node (dashboard)
+dashboard/node_modules/
+dashboard/.next/
+dashboard/out/
+# Render
+.render/
+# Claude Code
+.claude/
+# Screenshots (local only)
+*.png
+# Misc
+*.log
+*.tmp

PROJECT_PLAN.md ADDED Viewed

	@@ -0,0 +1,704 @@

+# CodeProbe — Complete Project Plan & Progress Tracker
+> **Multi-Agent Code Review System**
+> Author: Ninjacode911 | Started: March 2026 | Target: 10 Weeks
+---
+## Table of Contents
+1. [Project Overview](#1-project-overview)
+2. [Architecture Deep Dive](#2-architecture-deep-dive)
+3. [Complete Tech Stack](#3-complete-tech-stack)
+4. [Directory Structure](#4-directory-structure)
+5. [Week-by-Week Implementation Plan](#5-week-by-week-implementation-plan)
+6. [Non-Coding Tasks](#6-non-coding-tasks)
+7. [GPU / WSL Tasks](#7-gpu--wsl-tasks)
+8. [Data Models & Schemas](#8-data-models--schemas)
+9. [API Endpoints](#9-api-endpoints)
+10. [Agent Prompt Design](#10-agent-prompt-design)
+11. [Evaluation Plan](#11-evaluation-plan)
+12. [Deployment Checklist](#12-deployment-checklist)
+13. [Progress Tracker](#13-progress-tracker)
+---
+## 1. Project Overview
+**What:** A multi-agent PR review system that reviews GitHub pull requests using 4 specialized LangChain agents (Security, Performance, Style, Synthesizer), posts inline GitHub comments, and tracks code health via a Next.js dashboard.
+**Why:** AI-generated code (41% of GitHub commits) introduces 1.7x more issues. Existing tools use single-pass LLM calls. Sentinel AI uses domain-specialized agents with debate/consensus, RAG context, and static analysis tools.
+**Core Thesis:** Separate security, performance, and style review into specialized agents — each with distinct prompts, tools, and context — then merge via a Synthesizer into a coherent, ranked, deduplicated review.
+**Key Differentiators:**
+- Multi-agent specialization (3 domain + 1 synthesizer)
+- Debate & consensus protocol (agents challenge each other before synthesis)
+- Repo-aware RAG context (ChromaDB indexes full repo, not just diff)
+- $0/month architecture (all free tiers)
+- Structured severity scoring (Critical/High/Medium/Low with CWE IDs)
+- Auto-fix suggestions (corrected code snippets inline)
+---
+## 2. Architecture Deep Dive
+### 2.1 Four Layers
+```
+┌─────────────────────────────────────────────────────┐
+│  GITHUB LAYER                                       │
+│  Webhooks · PR Events · Inline Comments             │
+└──────────────────────┬──────────────────────────────┘
+                       │ pull_request webhook
+┌──────────────────────▼──────────────────────────────┐
+│  ORCHESTRATION LAYER (FastAPI on Render)             │
+│  Webhook receiver · HMAC validation · Redis cache    │
+│  Agent dispatcher · GitHub API client                │
+└──────────────────────┬──────────────────────────────┘
+                       │ asyncio.gather()
+┌──────────────────────▼──────────────────────────────┐
+│  AGENT LAYER (LangChain ReAct Agents)               │
+│  ┌──────────┐ ┌──────────────┐ ┌─────────┐         │
+│  │ Security │ │ Performance  │ │  Style  │ PARALLEL │
+│  │  Agent   │ │    Agent     │ │  Agent  │          │
+│  └────┬─────┘ └──────┬───────┘ └────┬────┘         │
+│       └──────────────┼───────────────┘              │
+│                      ▼                               │
+│            ┌──────────────────┐                      │
+│            │  Synthesizer     │  SEQUENTIAL           │
+│            │  Agent           │                      │
+│            └──────────────────┘                      │
+└──────────────────────┬──────────────────────────────┘
+                       │
+┌──────────────────────▼──────────────────────────────┐
+│  KNOWLEDGE LAYER                                     │
+│  ChromaDB (vector store) · Upstash Redis (cache)     │
+│  Neon Postgres (history) · sentence-transformers     │
+└─────────────────────────────────────────────────────┘
+```
+### 2.2 Data Flow (11 Steps)
+1. GitHub fires `pull_request` webhook → Render FastAPI endpoint
+2. FastAPI validates HMAC-SHA256 signature (GitHub App secret)
+3. Check Upstash Redis: commit SHA already reviewed? → return cached
+4. Fetch via GitHub API: PR diff, changed files, full contents, commit history
+5. Build repo context: embed chunks with sentence-transformers → upsert ChromaDB
+6. Dispatch 3 parallel agents: `asyncio.gather(security, performance, style)`
+7. Each agent: system prompt + RAG context → Groq API → static tools → typed findings
+8. Synthesizer: deduplicate + resolve conflicts + Health Score + executive summary
+9. GitHub API: post inline comment per finding + PR summary comment
+10. Write review to Neon Postgres + set Redis cache (TTL: 7 days)
+11. Next.js dashboard fetches from Neon and updates Health Score chart
+### 2.3 Context Loading (5 Layers per Agent)
+1. Raw PR diff (changed lines, file paths, additions/deletions)
+2. Relevant file sections from full repo (ChromaDB semantic search on diff)
+3. Recent commit history for changed files (pattern detection)
+4. Repo configuration (language, framework, linter rules, test coverage)
+5. Domain-specific knowledge base (OWASP Top 10, DDIA patterns, style guides)
+---
+## 3. Complete Tech Stack
+### 3.1 LLM & AI
+| Tool | Free Tier | Purpose |
+|------|-----------|---------|
+| **Groq API** (Llama-3.1-70B) | 14,400 req/day, 500 tok/sec | Primary LLM for all agents |
+| **Gemini 1.5 Flash** | 1M tokens/day | Fallback when Groq exhausted |
+| **LangChain** | OSS | Agent orchestration, LCEL, ReAct framework |
+| **sentence-transformers** | Local (GPU) | Embeddings for ChromaDB — runs on RTX 5070 via WSL |
+### 3.2 Backend & APIs
+| Tool | Free Tier | Purpose |
+|------|-----------|---------|
+| **FastAPI** | OSS | Webhook receiver, agent dispatcher, REST API |
+| **Render.com** | Free web service | Hosts backend (30s cold start after 15min idle) |
+| **GitHub Apps API** | Free | Webhooks, PR comments, file fetching |
+| **Upstash Redis** | 10K req/day | Cache PR analysis by commit SHA |
+| **Neon.tech** | Free Postgres 512MB | Review history, Health Score trends |
+### 3.3 Knowledge & Static Analysis
+| Tool | Free Tier | Purpose |
+|------|-----------|---------|
+| **ChromaDB** | OSS, in-memory/persisted | Vector store for RAG context retrieval |
+| **Semgrep OSS** | Free, 3K+ rules | SAST rules for Security Agent |
+| **Bandit** | Free | Python AST security analysis |
+| **detect-secrets** | Free | Credential/API key scanning |
+| **radon** | Free | Cyclomatic complexity & maintainability index |
+| **pylint/ESLint/Ruff** | Free | Linting for Style Agent |
+### 3.4 Frontend & Deployment
+| Tool | Free Tier | Purpose |
+|------|-----------|---------|
+| **Vercel** | Free hobby tier | Hosts Next.js dashboard |
+| **Next.js** | OSS | Dashboard UI |
+| **Recharts** | OSS | Health Score trend charts, pie charts |
+| **GitHub Actions** | 2K min/month | CI/CD for Sentinel AI itself |
+---
+## 4. Directory Structure
+```
+sentinel-ai/
+├── app/
+│   ├── __init__.py
+│   ├── main.py                    # FastAPI app, webhook endpoint, lifespan
+│   ├── config.py                  # Settings via pydantic-settings (env vars)
+│   ├── agents/
+│   │   ├── __init__.py
+│   │   ├── base_agent.py          # Shared agent interface / base class
+│   │   ├── security_agent.py      # Security ReAct agent
+│   │   ├── performance_agent.py   # Performance ReAct agent
+│   │   ├── style_agent.py         # Style & Maintainability agent
+│   │   └── synthesizer.py         # Synthesizer + Health Score + dedup
+│   ├── tools/
+│   │   ├── __init__.py
+│   │   ├── semgrep_tool.py        # LangChain tool wrapper for Semgrep
+│   │   ├── bandit_tool.py         # LangChain tool wrapper for Bandit
+│   │   ├── detect_secrets_tool.py # Credential scanner tool
+│   │   ├── radon_tool.py          # Complexity metrics tool
+│   │   ├── ast_analyzer.py        # Python AST analysis (N+1, patterns)
+│   │   └── linter_tool.py         # Ruff/ESLint/pylint subprocess tool
+│   ├── context/
+│   │   ├── __init__.py
+│   │   ├── embedder.py            # sentence-transformers embedding pipeline
+│   │   ├── indexer.py             # ChromaDB repo indexer (upsert chunks)
+│   │   └── retriever.py           # RAG retriever (query ChromaDB for context)
+│   ├── github/
+│   │   ├── __init__.py
+│   │   ├── webhook.py             # Webhook validation (HMAC-SHA256)
+│   │   ├── client.py              # GitHub API client (fetch diff, post comments)
+│   │   └── comment_formatter.py   # Format findings as GitHub Markdown comments
+│   ├── models/
+│   │   ├── __init__.py
+│   │   ├── findings.py            # Finding, PRReview Pydantic schemas
+│   │   └── webhook_payloads.py    # GitHub webhook event schemas
+│   ├── db/
+│   │   ├── __init__.py
+│   │   ├── postgres.py            # Neon Postgres connection + queries
+│   │   └── redis_cache.py         # Upstash Redis cache logic
+│   └── services/
+│       ├── __init__.py
+│       ├── orchestrator.py        # Main orchestration: dispatch agents, synthesize
+│       └── health_score.py        # Health Score calculation formula
+├── dashboard/                     # Next.js app (deployed to Vercel)
+│   ├── package.json
+│   ├── next.config.js
+│   ├── tsconfig.json
+│   ├── app/
+│   │   ├── layout.tsx
+│   │   ├── page.tsx               # / — Repository Overview
+│   │   ├── repos/
+│   │   │   └── [owner]/
+│   │   │       └── [repo]/
+│   │   │           ├── page.tsx   # Repo Detail (trends, charts)
+│   │   │           └── prs/
+│   │   │               └── [number]/
+│   │   │                   └── page.tsx  # PR Review Detail
+│   │   └── api/
+│   │       ├── repos/
+│   │       │   └── route.ts       # API proxy to FastAPI backend
+│   │       └── health/
+│   │           └── route.ts
+│   ├── components/
+│   │   ├── HealthScoreRing.tsx    # Circular gauge 0-100
+│   │   ├── FindingsTable.tsx      # Sortable, filterable findings
+│   │   ├── TrendChart.tsx         # Recharts LineChart
+│   │   ├── AgentBreakdown.tsx     # 3-column agent summary cards
+│   │   ├── SeverityBadge.tsx      # Color-coded severity pill
+│   │   └── Navbar.tsx
+│   └── lib/
+│       ├── api.ts                 # Fetch wrapper for backend API
+│       └── types.ts               # TypeScript types matching backend schemas
+├── tests/
+│   ├── __init__.py
+│   ├── conftest.py                # Shared fixtures
+│   ├── unit/
+│   │   ├── test_findings_schema.py
+│   │   ├── test_synthesizer_dedup.py
+│   │   ├── test_webhook_validation.py
+│   │   ├── test_redis_cache.py
+│   │   └── test_health_score.py
+│   ├── integration/
+│   │   ├── test_full_pipeline.py
+│   │   └── test_github_posting.py
+│   └── eval/
+│       ├── dataset/               # 20-PR benchmark dataset (JSON fixtures)
+│       ├── run_eval.py            # Evaluation harness
+│       └── metrics.py             # Precision, recall, latency tracking
+├── prompts/
+│   ├── security_system.md         # Security Agent system prompt
+│   ├── performance_system.md      # Performance Agent system prompt
+│   ├── style_system.md            # Style Agent system prompt
+│   └── synthesizer_system.md      # Synthesizer system prompt
+├── knowledge/
+│   ├── owasp_top10_2025.md        # OWASP cheat sheet for Security RAG
+│   ├── ddia_patterns.md           # DDIA patterns for Performance RAG
+│   └── style_guides/              # Language style guides for Style RAG
+├── .env.example                   # Template for env vars (no secrets)
+├── .gitignore
+├── requirements.txt               # Python dependencies
+├── requirements-dev.txt           # Dev/test dependencies
+├── render.yaml                    # Render deployment config
+├── sentinel.yml.example           # Per-repo config template
+├── Dockerfile                     # For Render deployment
+├── pyproject.toml                 # Project metadata + tool configs
+└── README.md                      # Installation, usage, architecture docs
+```
+---
+## 5. Week-by-Week Implementation Plan
+### WEEK 1: Foundation & Setup
+**Goal:** Project skeleton running locally, all external services provisioned.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 1.1 | Initialize git repo, create directory structure | Code | [ ] |
+| 1.2 | Set up Python virtual environment + requirements.txt | Code | [ ] |
+| 1.3 | Register GitHub App (dev.github.com/settings/apps) | Config | [ ] |
+| 1.4 | Provision Neon.tech Postgres database + create `pr_reviews` table | Config | [ ] |
+| 1.5 | Provision Upstash Redis instance | Config | [ ] |
+| 1.6 | Get Groq API key (console.groq.com) | Config | [ ] |
+| 1.7 | Get Gemini API key (aistudio.google.com) | Config | [ ] |
+| 1.8 | Create FastAPI skeleton (`app/main.py`) with health endpoint | Code | [ ] |
+| 1.9 | Create `app/config.py` with pydantic-settings (all env vars) | Code | [ ] |
+| 1.10 | Create Pydantic models (`Finding`, `PRReview` schemas) | Code | [ ] |
+| 1.11 | Set up .env.example, .gitignore, pyproject.toml | Code | [ ] |
+| 1.12 | Deploy FastAPI skeleton to Render (verify /health works) | Deploy | [ ] |
+| 1.13 | Write unit tests for Finding schema validation | Test | [ ] |
+| 1.14 | Set up GitHub Actions CI (lint + test on push) | CI/CD | [ ] |
+### WEEK 2: GitHub Integration
+**Goal:** Receive webhooks, validate signatures, fetch PR data, post dummy comment.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 2.1 | Implement HMAC-SHA256 webhook validation (`app/github/webhook.py`) | Code | [ ] |
+| 2.2 | Implement GitHub API client — fetch PR diff (`app/github/client.py`) | Code | [ ] |
+| 2.3 | Implement GitHub API client — fetch file contents | Code | [ ] |
+| 2.4 | Implement GitHub API client — fetch commit history | Code | [ ] |
+| 2.5 | Implement GitHub API client — post inline review comments | Code | [ ] |
+| 2.6 | Implement GitHub API client — post PR summary comment | Code | [ ] |
+| 2.7 | Create webhook endpoint (`POST /webhook/github`) in main.py | Code | [ ] |
+| 2.8 | Implement comment formatter (`app/github/comment_formatter.py`) | Code | [ ] |
+| 2.9 | Set up ngrok for local webhook testing | Config | [ ] |
+| 2.10 | End-to-end test: open PR on test repo → dummy comment posted | Test | [ ] |
+| 2.11 | Implement Redis cache check (skip if commit SHA already reviewed) | Code | [ ] |
+| 2.12 | Write unit tests for HMAC validation (valid + invalid signatures) | Test | [ ] |
+| 2.13 | Write unit tests for Redis cache hit/miss logic | Test | [ ] |
+### WEEK 3: Security Agent v1
+**Goal:** Security Agent analyzes diffs, returns structured findings with CWE IDs.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 3.1 | Install & configure Semgrep OSS with security rulesets | Config | [ ] |
+| 3.2 | Create Semgrep LangChain tool (`app/tools/semgrep_tool.py`) | Code | [ ] |
+| 3.3 | Install & configure Bandit for Python AST security analysis | Config | [ ] |
+| 3.4 | Create Bandit LangChain tool (`app/tools/bandit_tool.py`) | Code | [ ] |
+| 3.5 | Install & configure detect-secrets | Config | [ ] |
+| 3.6 | Create detect-secrets LangChain tool (`app/tools/detect_secrets_tool.py`) | Code | [ ] |
+| 3.7 | Write Security Agent system prompt (`prompts/security_system.md`) | Prompt | [ ] |
+| 3.8 | Prepare OWASP Top 10 (2025) knowledge base (`knowledge/owasp_top10_2025.md`) | Data | [ ] |
+| 3.9 | Implement Security Agent ReAct loop (`app/agents/security_agent.py`) | Code | [ ] |
+| 3.10 | Implement base agent interface (`app/agents/base_agent.py`) | Code | [ ] |
+| 3.11 | Set up Groq LLM client via LangChain (`ChatGroq`) | Code | [ ] |
+| 3.12 | Implement structured output parsing (JSON → Finding objects) | Code | [ ] |
+| 3.13 | Create 10 synthetic security-vulnerable PRs for testing | Data | [ ] |
+| 3.14 | Evaluate Security Agent on synthetic dataset — measure precision/recall | Eval | [ ] |
+| 3.15 | Iterate on system prompt based on eval results | Prompt | [ ] |
+### WEEK 4: Performance Agent v1
+**Goal:** Performance Agent detects N+1 queries, complexity issues, returns findings.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 4.1 | Create Python AST analyzer tool (`app/tools/ast_analyzer.py`) | Code | [ ] |
+| 4.2 | Implement N+1 query pattern detector (Django/SQLAlchemy ORM patterns) | Code | [ ] |
+| 4.3 | Create radon complexity tool (`app/tools/radon_tool.py`) | Code | [ ] |
+| 4.4 | Write Performance Agent system prompt (`prompts/performance_system.md`) | Prompt | [ ] |
+| 4.5 | Prepare DDIA patterns knowledge base (`knowledge/ddia_patterns.md`) | Data | [ ] |
+| 4.6 | Implement Performance Agent ReAct loop (`app/agents/performance_agent.py`) | Code | [ ] |
+| 4.7 | Fetch 10 Django PRs with known performance issues for testing | Data | [ ] |
+| 4.8 | Evaluate Performance Agent on Django PR dataset | Eval | [ ] |
+| 4.9 | Iterate on system prompt based on eval results | Prompt | [ ] |
+### WEEK 5: Style Agent v1
+**Goal:** Style Agent checks naming, complexity, dead code, test coverage gaps.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 5.1 | Create linter tool wrapper — Ruff/ESLint/pylint (`app/tools/linter_tool.py`) | Code | [ ] |
+| 5.2 | Implement dead code detector (unused imports, unreachable branches) | Code | [ ] |
+| 5.3 | Write Style Agent system prompt (`prompts/style_system.md`) | Prompt | [ ] |
+| 5.4 | Prepare language style guides knowledge base (`knowledge/style_guides/`) | Data | [ ] |
+| 5.5 | Implement Style Agent ReAct loop (`app/agents/style_agent.py`) | Code | [ ] |
+| 5.6 | Fetch 10 Exercism PRs with style/refactoring issues | Data | [ ] |
+| 5.7 | Evaluate Style Agent on Exercism dataset | Eval | [ ] |
+| 5.8 | Iterate on system prompt based on eval results | Prompt | [ ] |
+### WEEK 6: ChromaDB + RAG Context
+**Goal:** Full RAG pipeline — embed repo, retrieve context, inject into agents.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 6.1 | Set up sentence-transformers embedding pipeline (`app/context/embedder.py`) | Code | [ ] |
+| 6.2 | **Run embedding model on RTX 5070 via WSL** — benchmark speed | GPU | [ ] |
+| 6.3 | Implement ChromaDB repo indexer (`app/context/indexer.py`) — chunk files, upsert | Code | [ ] |
+| 6.4 | Implement RAG retriever (`app/context/retriever.py`) — query by diff content | Code | [ ] |
+| 6.5 | Integrate RAG context into Security Agent | Code | [ ] |
+| 6.6 | Integrate RAG context into Performance Agent | Code | [ ] |
+| 6.7 | Integrate RAG context into Style Agent | Code | [ ] |
+| 6.8 | Evaluate: does cross-file RAG context improve recall vs. diff-only? | Eval | [ ] |
+| 6.9 | Optimize chunk size and retrieval top-k for quality vs. latency | Code | [ ] |
+| 6.10 | Limit repo index to 500 most recently changed files (Render memory constraint) | Code | [ ] |
+### WEEK 7: Synthesizer Agent
+**Goal:** Deduplication, conflict resolution, Health Score, executive summary, full pipeline.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 7.1 | Write Synthesizer system prompt (`prompts/synthesizer_system.md`) | Prompt | [ ] |
+| 7.2 | Implement deduplication logic (cosine similarity on findings via ChromaDB) | Code | [ ] |
+| 7.3 | Implement severity conflict resolution (Security > Performance > Style precedence) | Code | [ ] |
+| 7.4 | Implement composite re-ranking: severity × exploitability × fix_complexity | Code | [ ] |
+| 7.5 | Implement PR Health Score formula (0-100) (`app/services/health_score.py`) | Code | [ ] |
+| 7.6 | Implement executive summary generation (3-5 sentences) | Code | [ ] |
+| 7.7 | Implement auto-block logic (Critical findings → block merge recommendation) | Code | [ ] |
+| 7.8 | Implement Synthesizer Agent (`app/agents/synthesizer.py`) | Code | [ ] |
+| 7.9 | Build main orchestrator (`app/services/orchestrator.py`) — ties everything together | Code | [ ] |
+| 7.10 | Implement Gemini Flash fallback when Groq quota exhausted | Code | [ ] |
+| 7.11 | Full end-to-end pipeline test: PR → agents → synthesizer → GitHub comments | Test | [ ] |
+| 7.12 | Write unit tests for Health Score formula | Test | [ ] |
+| 7.13 | Write unit tests for deduplication with synthetic conflicting findings | Test | [ ] |
+| 7.14 | Implement Neon Postgres write (store review record) | Code | [ ] |
+### WEEK 8: Next.js Dashboard
+**Goal:** Dashboard on Vercel showing review history, Health Scores, charts.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 8.1 | Initialize Next.js app in `dashboard/` with TypeScript | Code | [ ] |
+| 8.2 | Deploy to Vercel (connect GitHub repo) | Deploy | [ ] |
+| 8.3 | Create TypeScript types matching backend schemas (`lib/types.ts`) | Code | [ ] |
+| 8.4 | Create API fetch wrapper (`lib/api.ts`) — calls FastAPI backend | Code | [ ] |
+| 8.5 | Build `HealthScoreRing` component (circular gauge, animated) | Code | [ ] |
+| 8.6 | Build `SeverityBadge` component (color-coded pills) | Code | [ ] |
+| 8.7 | Build `TrendChart` component (Recharts LineChart, 30-day trend) | Code | [ ] |
+| 8.8 | Build `FindingsTable` component (sortable, filterable) | Code | [ ] |
+| 8.9 | Build `AgentBreakdown` component (3-column cards) | Code | [ ] |
+| 8.10 | Build `/` page — Repository Overview (connected repos, avg scores) | Code | [ ] |
+| 8.11 | Build `/repos/[owner]/[repo]` page — Repo Detail (charts, PR list) | Code | [ ] |
+| 8.12 | Build `/repos/[owner]/[repo]/prs/[number]` page — PR Review Detail | Code | [ ] |
+| 8.13 | Add FastAPI CORS middleware for Vercel domain | Code | [ ] |
+| 8.14 | Implement REST API endpoints on FastAPI side for dashboard | Code | [ ] |
+### WEEK 9: Polish & Evaluation
+**Goal:** Full benchmark, prompt tuning, latency optimization, documentation.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 9.1 | Curate full 20-PR benchmark dataset (Django, Next.js, synthetic, Exercism) | Data | [ ] |
+| 9.2 | Build evaluation harness (`tests/eval/run_eval.py`) | Code | [ ] |
+| 9.3 | Run full benchmark — measure precision, recall, latency per agent | Eval | [ ] |
+| 9.4 | Tune agent prompts to reduce false positives (target: <30% FP rate) | Prompt | [ ] |
+| 9.5 | Implement confidence threshold: findings <0.6 shown as 'Suggestions' | Code | [ ] |
+| 9.6 | Latency optimization: measure p50/p95/p99 per PR size bucket | Eval | [ ] |
+| 9.7 | Optimize Groq API calls (reduce token usage, cache prompts) | Code | [ ] |
+| 9.8 | Write comprehensive README.md | Docs | [ ] |
+| 9.9 | Write installation guide in README | Docs | [ ] |
+| 9.10 | Add GitHub Actions pre-warm cron (ping /health every 10min) | CI/CD | [ ] |
+### WEEK 10: Launch & Promotion
+**Goal:** Live on GitHub Marketplace, installed on public repos, launch posts published.
+| # | Task | Type | Status |
+|---|------|------|--------|
+| 10.1 | Install Sentinel AI on 3 public open-source repos | Launch | [ ] |
+| 10.2 | Record demo video (screen recording: PR opened → comments posted) | Content | [ ] |
+| 10.3 | Write Dev.to / HackerNews launch post | Content | [ ] |
+| 10.4 | Write LinkedIn demo post | Content | [ ] |
+| 10.5 | Submit to GitHub Marketplace (needs privacy policy, logo, description) | Launch | [ ] |
+| 10.6 | Create sentinel.yml.example per-repo config template | Code | [ ] |
+| 10.7 | Monitor first 48 hours — fix any production bugs | Ops | [ ] |
+---
+## 6. Non-Coding Tasks
+These tasks don't involve writing project code but are essential for the project:
+### 6.1 External Service Provisioning
+| Service | Action | URL | Notes |
+|---------|--------|-----|-------|
+| **GitHub App** | Register new app | github.com/settings/apps/new | Need: App ID, Private Key (.pem), Webhook Secret |
+| **Groq** | Get API key | console.groq.com | Free: 14,400 req/day |
+| **Google AI Studio** | Get Gemini key | aistudio.google.com | Free: 1M tokens/day |
+| **Neon.tech** | Create Postgres DB | console.neon.tech | Free: 512MB, create `pr_reviews` table |
+| **Upstash** | Create Redis instance | console.upstash.com | Free: 10K req/day |
+| **Render** | Create web service | dashboard.render.com | Free tier, connect GitHub repo |
+| **Vercel** | Create project | vercel.com/new | Free hobby tier, connect dashboard/ |
+| **ngrok** | Install for local testing | ngrok.com | Free: 1 tunnel |
+### 6.2 GitHub App Configuration
+**Permissions required:**
+- Pull requests: Read & Write
+- Contents: Read
+- Metadata: Read
+- Commit statuses: Write (optional)
+**Webhook events to subscribe:**
+- `pull_request` (opened, synchronize, reopened, ready_for_review)
+- `pull_request_review_comment` (for @sentinel-ai re-review)
+### 6.3 Data Curation Tasks
+| Dataset | Source | Count | Purpose |
+|---------|--------|-------|---------|
+| Synthetic security PRs | Hand-crafted | 10 PRs | SQL injection, XSS, IDOR, hardcoded secrets |
+| Django security PRs | github.com/django/django | 5 PRs | Real-world Python security fixes |
+| Next.js performance PRs | github.com/vercel/next.js | 5 PRs | JS/TS performance changes |
+| Exercism style PRs | github.com/exercism | 5 PRs | Naming, complexity, documentation issues |
+| Mixed benchmark set | All above | 20 PRs | Full evaluation benchmark |
+### 6.4 Knowledge Base Curation
+| Document | Source | For Agent |
+|----------|--------|-----------|
+| OWASP Top 10 (2025) | owasp.org | Security Agent RAG |
+| DDIA performance patterns | "Designing Data-Intensive Applications" | Performance Agent RAG |
+| Python style guide (PEP 8) | python.org | Style Agent RAG |
+| JavaScript style guide | Various (Airbnb, Google) | Style Agent RAG |
+| TypeScript best practices | typescript-eslint.io | Style Agent RAG |
+---
+## 7. GPU / WSL Tasks
+Your **RTX 5070** with WSL will be used for:
+### 7.1 sentence-transformers Embedding (Required)
+**No training needed** — these are pre-trained models used for embedding generation.
+```
+Model: all-MiniLM-L6-v2 (or all-mpnet-base-v2 for higher quality)
+Task: Embed code chunks for ChromaDB indexing
+Where: Runs locally during repo indexing (can also run on Render CPU, slower)
+GPU benefit: ~10-50x faster embedding generation vs CPU
+```
+**Setup steps:**
+1. Ensure CUDA toolkit installed in WSL (`nvidia-smi` should show RTX 5070)
+2. `pip install sentence-transformers torch` (with CUDA support)
+3. Benchmark: embed 1000 code chunks, measure time GPU vs CPU
+4. Decision: if embedding is fast enough on CPU, skip GPU for deployment simplicity
+### 7.2 Local LLM Testing (Optional, Recommended)
+Running a local LLM for testing avoids burning Groq API quota during development:
+```
+Model: Llama-3.1-8B-Instruct (via Ollama or vLLM)
+Task: Test agent prompts locally before hitting Groq
+GPU benefit: Full inference locally, no API calls, no quota burn
+```
+**Setup steps:**
+1. Install Ollama in WSL: `curl -fsSL https://ollama.com/install.sh | sh`
+2. Pull model: `ollama pull llama3.1:8b`
+3. Use for prompt iteration — switch to Groq (70B) for production quality
+### 7.3 What You Do NOT Need to Train
+| Item | Reason |
+|------|--------|
+| LLM (Llama-3.1-70B) | Used via Groq API — inference only, no fine-tuning |
+| sentence-transformers | Pre-trained model, no fine-tuning needed for code embeddings |
+| Semgrep/Bandit/radon | Rule-based tools, no ML training |
+| Agent prompts | Iterative prompt engineering, not model training |
+**Bottom line:** This project is an **inference and orchestration** project, not a training project. Your GPU is used for fast local embeddings and optional local LLM testing — no model training required.
+---
+## 8. Data Models & Schemas
+### 8.1 Finding (per agent output)
+```python
+class Finding(BaseModel):
+    agent: Literal['security', 'performance', 'style']
+    file_path: str              # e.g. 'src/auth/login.py'
+    line_start: int
+    line_end: int
+    severity: Literal['critical', 'high', 'medium', 'low']
+    category: str               # e.g. 'sql_injection', 'n+1_query', 'naming'
+    title: str                  # Short one-liner
+    description: str            # Full explanation
+    suggested_fix: str          # Corrected code snippet
+    cwe_id: Optional[str]       # For security findings (e.g. 'CWE-89')
+    confidence: float           # 0.0 – 1.0
+```
+### 8.2 SynthesizedReview (Synthesizer output)
+```python
+class SynthesizedReview(BaseModel):
+    health_score: int                        # 0-100
+    executive_summary: str                   # 3-5 sentences
+    recommendation: Literal['approve', 'request_changes', 'block']
+    findings: List[Finding]                  # Deduplicated, re-ranked
+    critical_count: int
+    high_count: int
+    medium_count: int
+    low_count: int
+    duration_ms: int
+```
+### 8.3 PR Review Record (Neon Postgres)
+```sql
+CREATE TABLE pr_reviews (
+    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    repo_full_name  TEXT NOT NULL,
+    pr_number       INT NOT NULL,
+    commit_sha      TEXT NOT NULL,
+    health_score    INT NOT NULL,
+    critical_count  INT DEFAULT 0,
+    high_count      INT DEFAULT 0,
+    medium_count    INT DEFAULT 0,
+    low_count       INT DEFAULT 0,
+    summary         TEXT,
+    findings        JSONB NOT NULL,
+    duration_ms     INT,
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+CREATE INDEX idx_pr_reviews_repo ON pr_reviews(repo_full_name);
+CREATE INDEX idx_pr_reviews_sha ON pr_reviews(commit_sha);
+```
+---
+## 9. API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `POST /webhook/github` | POST | Receive GitHub webhook, validate HMAC, enqueue analysis |
+| `GET /api/repos/{owner}/{repo}/reviews` | GET | Paginated PR review list + Health Score trend |
+| `GET /api/repos/{owner}/{repo}/reviews/{pr_number}` | GET | Full findings for specific PR |
+| `GET /api/repos/{owner}/{repo}/stats` | GET | Aggregate stats: avg score, top categories, 30-day trend |
+| `POST /api/repos/{owner}/{repo}/reanalyze/{pr_number}` | POST | Re-trigger analysis (bypass cache) |
+| `GET /health` | GET | Health check: agent status, Groq quota remaining |
+---
+## 10. Agent Prompt Design
+Each agent prompt must include:
+1. **Role definition** — who the agent is (e.g., "senior AppSec engineer")
+2. **Scope boundaries** — what to look for and what to ignore
+3. **Output schema** — exact JSON structure expected
+4. **Severity guidelines** — when to use Critical vs. High vs. Medium vs. Low
+5. **Confidence scoring** — how to self-assess confidence (0.0-1.0)
+6. **Examples** — 2-3 few-shot examples of good findings
+7. **Anti-patterns** — common false positives to avoid
+Prompts are stored in `prompts/` as Markdown files and loaded at agent initialization.
+---
+## 11. Evaluation Plan
+### 11.1 Metrics
+| Metric | Target | Formula |
+|--------|--------|---------|
+| Security precision | >70% | true_positives / (true_positives + false_positives) |
+| Performance recall | >60% | true_positives / (true_positives + false_negatives) |
+| Deduplication rate | >15% | duplicates_removed / total_findings |
+| e2e latency (p95) | <20s | Time from webhook to first comment posted |
+| Groq quota usage | <10K/day | Total API calls per day |
+| System uptime | >95% | (total_time - downtime) / total_time |
+### 11.2 Evaluation Harness
+Located in `tests/eval/`:
+- `dataset/` — 20 PRs as JSON fixtures (diff, expected findings, ground truth labels)
+- `run_eval.py` — Runs each PR through full pipeline, compares output vs ground truth
+- `metrics.py` — Computes precision, recall, F1, latency percentiles
+- Results logged to console + optionally to LangSmith (free self-hosted)
+---
+## 12. Deployment Checklist
+### Render (FastAPI Backend)
+- [ ] `render.yaml` configured with build + start commands
+- [ ] Environment variables set in Render dashboard
+- [ ] Health check endpoint (`/health`) configured
+- [ ] Auto-deploy from `main` branch enabled
+### Vercel (Next.js Dashboard)
+- [ ] Connected to GitHub repo `dashboard/` directory
+- [ ] Environment variable: `NEXT_PUBLIC_API_URL` pointing to Render backend
+- [ ] Custom domain (optional)
+### GitHub App
+- [ ] App registered with correct permissions
+- [ ] Webhook URL set to Render endpoint (`/webhook/github`)
+- [ ] Private key (.pem) downloaded and stored securely
+- [ ] App installed on test repo for development
+### GitHub Actions
+- [ ] CI workflow: lint (ruff) + test (pytest) on push/PR
+- [ ] Pre-warm cron: ping /health every 10 minutes during working hours
+---
+## 13. Progress Tracker
+### Overall Status
+| Week | Milestone | Status | Notes |
+|------|-----------|--------|-------|
+| 1 | Foundation & Setup | COMPLETE | All services provisioned, project scaffolded |
+| 2 | GitHub Integration | COMPLETE | E2E tested: webhook → fetch → comment on PR #1 |
+| 3 | Security Agent v1 | COMPLETE | Bandit + Llama-3.3-70B, live-tested on PR #3, 4 findings |
+| 4 | Performance Agent v1 | COMPLETE | Radon complexity + Llama-3.3-70B, 3 findings on PR #4 |
+| 5 | Style Agent v1 | COMPLETE | Ruff linter + Llama-3.3-70B, 6 findings on PR #4 |
+| 6 | ChromaDB + RAG Context | COMPLETE | sentence-transformers + ChromaDB, integrated into all agents |
+| 7 | Synthesizer Agent | COMPLETE | Dedup, conflict resolution, Health Score formula, exec summary |
+| 8 | Next.js Dashboard | COMPLETE | Next.js + Tailwind + Recharts, mock data, all pages |
+| 9 | Polish & Evaluation | COMPLETE | Eval harness, metrics, README, DB persistence |
+| 10 | Launch & Promotion | COMPLETE | Render config, Vercel ready, API endpoints for dashboard |
+### Key Decisions Log
+| Date | Decision | Rationale |
+|------|----------|-----------|
+| 2026-03-19 | Project plan created | Starting from scratch, PDF spec as source of truth |
+| 2026-03-19 | Project renamed to "Ninja Code Guard" | User's personal branding choice |
+| 2026-03-19 | GitHub App: "Ninja's Code Guard" (ID: 3133457) | Registered and tested with live PR |
+| 2026-03-19 | Test repo: ninjacode911/codeguard-test | Used for e2e webhook testing |
+| 2026-03-19 | Fail-open pattern for Redis cache | Missing a review is worse than duplicating |
+| 2026-03-19 | Background tasks for webhook processing | GitHub's 10s timeout requires async processing |
+---
+*Last updated: 2026-03-19*

README.md ADDED Viewed

	@@ -0,0 +1,161 @@

+# Ninja Code Guard
+**Multi-agent code review system that reviews GitHub pull requests the way a senior engineering team would.**
+Three specialized AI agents — Security, Performance, and Style — analyze your code in parallel, then a Synthesizer merges their findings into a single, prioritized, non-overlapping review with inline GitHub comments.
+## How It Works
+```
+PR opened on GitHub
+        │
+        ▼
+   Webhook received ──→ HMAC-SHA256 validated
+        │
+        ▼
+   Redis cache check ──→ Skip if already reviewed
+        │
+        ▼
+   Fetch PR data ──→ Diff + full file contents
+        │
+        ▼
+   RAG Context ──→ Embed files → ChromaDB → Retrieve related code
+        │
+        ▼
+   ┌─────────────────────────────────────────┐
+   │     3 Agents run IN PARALLEL            │
+   │  🔒 Security  ⚡ Performance  ✏️ Style  │
+   │  Bandit+LLM    Radon+LLM     Ruff+LLM  │
+   └─────────────┬───────────────────────────┘
+                 │
+                 ▼
+   Synthesizer ──→ Deduplicate → Rank → Score → Summarize
+        │
+        ▼
+   Post to GitHub ──→ Inline comments + Summary with Health Score
+```
+## What Each Agent Does
+| Agent | Focus | Static Tools | Example Findings |
+|-------|-------|-------------|------------------|
+| 🔒 **Security** | Vulnerabilities, auth, secrets | Bandit, detect-secrets | SQL injection, hardcoded API keys, weak crypto |
+| ⚡ **Performance** | Efficiency, scalability | Radon complexity | N+1 queries, O(n²) loops, blocking I/O |
+| ✏️ **Style** | Readability, maintainability | Ruff linter | Unused imports, bad naming, dead code |
+| 🧠 **Synthesizer** | Merge & prioritize | — | Deduplication, conflict resolution, Health Score |
+## Tech Stack
+| Layer | Technology | Why |
+|-------|-----------|-----|
+| LLM | Groq (Llama-3.3-70B) | 500+ tokens/sec, free 14.4K req/day |
+| Agents | LangChain + Structured Output | Typed JSON responses, prompt templates |
+| Backend | FastAPI on Render | Async, auto OpenAPI docs, free tier |
+| Vector DB | ChromaDB + sentence-transformers | RAG context, semantic code search |
+| Cache | Upstash Redis | Prevent duplicate reviews |
+| Database | Neon Postgres | Review history, Health Score trends |
+| Dashboard | Next.js on Vercel | Review history, trend charts |
+| GitHub | GitHub App (webhooks) | Inline PR comments, bot identity |
+## Quick Start
+### Prerequisites
+- Python 3.11+
+- Groq API key (free at console.groq.com)
+- GitHub App (registered at github.com/settings/apps)
+### Setup
+```bash
+# Clone and setup
+git clone https://github.com/ninjacode911/ninja-code-guard
+cd ninja-code-guard
+python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
+pip install -r requirements.txt
+# Configure
+cp .env.example .env
+# Edit .env with your API keys
+# Run
+uvicorn app.main:app --reload --port 8000
+```
+### Environment Variables
+```env
+GROQ_API_KEY=gsk_...
+GITHUB_APP_ID=123456
+GITHUB_APP_PRIVATE_KEY_PATH=./keys/app.pem
+GITHUB_WEBHOOK_SECRET=...
+DATABASE_URL=postgresql://...
+UPSTASH_REDIS_URL=rediss://...
+```
+## Architecture
+**4 Layers:**
+- **GitHub Layer** — Webhooks, PR events, inline comments
+- **Orchestration Layer** — FastAPI, agent dispatch, asyncio.gather
+- **Agent Layer** — 3 domain agents + synthesizer (LangChain ReAct)
+- **Knowledge Layer** — ChromaDB (RAG), Redis (cache), Postgres (history)
+**Key Design Patterns:**
+- Template Method — All agents share a base class, override only prompt + tools
+- Structured Output — LLM constrained to return valid JSON (Pydantic schema)
+- Fail-Open Cache — If Redis is down, proceed with analysis
+- Background Tasks — Return 200 to GitHub immediately, review asynchronously
+- Parallel Execution — asyncio.gather runs 3 agents concurrently
+## Test Results
+```
+PR #4 on codeguard-test repo:
+  Security:    5 findings  (SQL injection, weak crypto, hardcoded secrets)
+  Performance: 3 findings  (O(n²) loop, blocking I/O, high complexity)
+  Style:       6 findings  (unused imports, magic numbers, bad naming)
+  Total:       14 findings
+  Health Score: 14/100
+  Latency:     ~13 seconds (after model load)
+```
+## Running Tests
+```bash
+pytest tests/unit/ -v
+```
+## Project Structure
+```
+app/
+  agents/          # Security, Performance, Style, Synthesizer
+  tools/           # Bandit, detect-secrets, Radon, Ruff wrappers
+  context/         # RAG pipeline (embedder, indexer, retriever)
+  github/          # Webhook validation, API client, comment formatter
+  models/          # Pydantic schemas (Finding, SynthesizedReview)
+  db/              # Redis cache, Postgres queries
+  services/        # Health Score calculator
+dashboard/         # Next.js frontend (Vercel)
+tests/             # Unit tests + evaluation harness
+prompts/           # Agent system prompts (Markdown)
+docs/              # Week-by-week documentation
+```
+## Documentation
+Detailed week-by-week documentation available in `docs/`:
+- [Week 1: Foundation & Setup](docs/WEEK1_FOUNDATION_AND_SETUP.md)
+- [Week 2: GitHub Integration](docs/WEEK2_GITHUB_INTEGRATION.md)
+- [Week 3: Security Agent](docs/WEEK3_SECURITY_AGENT.md)
+- [Week 4: Performance Agent](docs/WEEK4_PERFORMANCE_AGENT.md)
+- [Week 5: Style Agent](docs/WEEK5_STYLE_AGENT.md)
+- [Week 6: RAG & Parallel Execution](docs/WEEK6_RAG_AND_PARALLEL.md)
+## License
+MIT
+---
+Built by [ninjacode911](https://github.com/ninjacode911)

app/__init__.py ADDED Viewed

File without changes

app/agents/__init__.py ADDED Viewed

File without changes

app/agents/base_agent.py ADDED Viewed

	@@ -0,0 +1,295 @@

+"""
+Base Agent Interface
+=====================
+All domain agents (Security, Performance, Style) inherit from this base class.
+It provides shared infrastructure:
+1. **Groq LLM client** — ChatGroq configured with Llama-3.1-70B
+2. **Structured output** — LLM returns typed Finding objects, not raw text
+3. **Error handling** — graceful fallback if the LLM call fails
+4. **Timing** — measures how long each agent takes (for latency metrics)
+Design pattern: Template Method
+- The base class defines the algorithm skeleton (receive diff → run tools → call LLM → return findings)
+- Subclasses override specific steps (system_prompt, run_static_tools)
+- This prevents code duplication across 3 agents that follow the same flow
+Why LangChain?
+- Provides a unified interface across LLM providers (Groq, Gemini, OpenAI)
+- If Groq goes down, we swap to Gemini by changing one line
+- Structured output parsing is built in (with_structured_output)
+- Prompt templates with variable substitution
+"""
+from __future__ import annotations
+import time
+from abc import ABC, abstractmethod
+import structlog
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_groq import ChatGroq
+from pydantic import BaseModel, Field
+from app.config import settings
+from app.github.client import PRData
+from app.models.findings import Finding
+logger = structlog.get_logger()
+class AgentFindings(BaseModel):
+    """
+    Schema for the LLM's structured output.
+    By wrapping findings in a Pydantic model, we can use LangChain's
+    `with_structured_output()` which constrains the LLM to return
+    valid JSON matching this exact schema. No more parsing raw text!
+    How with_structured_output() works under the hood:
+    1. It adds the JSON schema to the system prompt
+    2. It sets response_format to JSON mode (if the model supports it)
+    3. It validates the response against the schema
+    4. If validation fails, it retries (configurable)
+    """
+    findings: list[FindingOutput] = Field(
+        default_factory=list,
+        description="List of security/performance/style findings",
+    )
+class FindingOutput(BaseModel):
+    """
+    The schema we ask the LLM to produce for each finding.
+    This is slightly different from our internal Finding model because:
+    - The LLM doesn't know which agent it is (we add that after)
+    - We give the LLM freedom on field names that match its training
+    - We validate and convert to our Finding model post-LLM
+    Note: This class is defined BEFORE AgentFindings because Python
+    needs it to exist when AgentFindings references it. But Pydantic
+    handles forward references with model_rebuild().
+    """
+    file_path: str = Field(description="Path to the file (e.g., 'app.py')")
+    line_start: int = Field(description="Starting line number of the issue")
+    line_end: int = Field(description="Ending line number of the issue")
+    severity: str = Field(description="One of: critical, high, medium, low")
+    category: str = Field(description="Issue category (e.g., 'sql_injection', 'hardcoded_secret')")
+    title: str = Field(description="Short one-line title of the finding")
+    description: str = Field(description="Detailed explanation of the issue and its impact")
+    suggested_fix: str = Field(default="", description="Corrected code snippet")
+    cwe_id: str | None = Field(default=None, description="CWE ID if applicable (e.g., 'CWE-89')")
+    confidence: float = Field(description="Confidence score from 0.0 to 1.0")
+# Rebuild the model to resolve the forward reference
+AgentFindings.model_rebuild()
+class BaseAgent(ABC):
+    """
+    Abstract base class for all domain agents.
+    Subclasses must implement:
+    - agent_name: which agent this is ("security", "performance", "style")
+    - system_prompt: the detailed system prompt for the LLM
+    - run_static_analysis(): optional static tools (Bandit, Semgrep, etc.)
+    Usage:
+        agent = SecurityAgent()
+        findings = await agent.review(pr_data)
+    """
+    def __init__(self):
+        """
+        Initialize the LLM client.
+        ChatGroq connects to Groq's API which runs Llama-3.1-70B at
+        500+ tokens/sec — the fastest open-source LLM inference available.
+        This speed is critical: we need each agent to complete in 3-8 seconds
+        so the full review stays under 15 seconds.
+        Temperature=0.1: We want nearly deterministic output. Code review
+        should be consistent — the same code should get the same findings.
+        A small temperature (not 0) allows slight variation to avoid
+        getting stuck in repetitive patterns.
+        """
+        self.llm = ChatGroq(
+            model="llama-3.3-70b-versatile",
+            api_key=settings.groq_api_key,
+            temperature=0.1,
+            max_tokens=4096,
+        )
+    @property
+    @abstractmethod
+    def agent_name(self) -> str:
+        """The agent identifier: 'security', 'performance', or 'style'."""
+        ...
+    @property
+    @abstractmethod
+    def system_prompt(self) -> str:
+        """The full system prompt for this agent."""
+        ...
+    async def run_static_analysis(self, pr_data: PRData) -> str:
+        """
+        Run static analysis tools on the PR files.
+        Override in subclasses to run agent-specific tools:
+        - SecurityAgent: Bandit + detect-secrets
+        - PerformanceAgent: radon + AST analysis
+        - StyleAgent: Ruff/pylint
+        Returns a string summary of tool findings to include in the LLM prompt.
+        Default: no static analysis (LLM-only review).
+        """
+        return ""
+    def _build_prompt(self) -> ChatPromptTemplate:
+        """
+        Build the LangChain prompt template.
+        ChatPromptTemplate.from_messages() creates a multi-turn prompt:
+        - ("system", ...) → the system message (agent persona + instructions)
+        - ("human", ...) → the user message (the actual PR data to review)
+        Variables in {curly_braces} are substituted at runtime with .ainvoke().
+        """
+        return ChatPromptTemplate.from_messages([
+            ("system", self.system_prompt),
+            ("human", (
+                "## PR Diff\n"
+                "```diff\n{diff}\n```\n\n"
+                "## Changed File Contents\n"
+                "{file_contents}\n\n"
+                "## Static Analysis Results\n"
+                "{static_analysis}\n\n"
+                "{rag_context}\n\n"
+                "Analyze this PR and return your findings as structured JSON."
+            )),
+        ])
+    def _convert_to_findings(self, agent_output: AgentFindings) -> list[Finding]:
+        """
+        Convert the LLM's output to our internal Finding model.
+        This adds the agent_name field and validates/clamps values:
+        - Severity is lowercased and validated
+        - Confidence is clamped to [0.0, 1.0]
+        - Invalid findings are skipped (not crashed on)
+        """
+        findings = []
+        for f in agent_output.findings:
+            try:
+                severity = f.severity.lower().strip()
+                if severity not in ("critical", "high", "medium", "low"):
+                    severity = "medium"  # Default for ambiguous severity
+                confidence = max(0.0, min(1.0, f.confidence))
+                finding = Finding(
+                    agent=self.agent_name,
+                    file_path=f.file_path,
+                    line_start=f.line_start,
+                    line_end=f.line_end,
+                    severity=severity,
+                    category=f.category,
+                    title=f.title,
+                    description=f.description,
+                    suggested_fix=f.suggested_fix,
+                    cwe_id=f.cwe_id,
+                    confidence=confidence,
+                )
+                findings.append(finding)
+            except Exception as e:
+                logger.warning(
+                    "Skipping malformed finding",
+                    agent=self.agent_name,
+                    error=str(e),
+                )
+        return findings
+    def _format_file_contents(self, file_contents: dict[str, str]) -> str:
+        """
+        Format file contents for the LLM prompt.
+        Each file is wrapped in a code block with its path as a header.
+        We truncate very long files to stay within LLM context limits.
+        Groq's Llama-3.1-70B has 128K context, so we have plenty of room
+        for typical PRs, but we cap each file at 500 lines to be safe.
+        """
+        parts = []
+        for filepath, content in file_contents.items():
+            lines = content.split("\n")
+            if len(lines) > 500:
+                content = "\n".join(lines[:500]) + "\n... (truncated)"
+            parts.append(f"### {filepath}\n```\n{content}\n```")
+        return "\n\n".join(parts) if parts else "No file contents available."
+    async def review(self, pr_data: PRData, rag_context: str = "") -> list[Finding]:
+        """
+        Main entry point: review a PR and return findings.
+        This is the Template Method:
+        1. Run static analysis tools (subclass-specific)
+        2. Build the prompt with diff + files + tool output + RAG context
+        3. Call the LLM with structured output
+        4. Convert to Finding objects
+        5. Log timing and return
+        If the LLM call fails, we return an empty list rather than crashing
+        the entire pipeline. The other agents can still contribute findings.
+        Args:
+            pr_data: The PR diff, file contents, and metadata
+            rag_context: Optional RAG context from ChromaDB (related code chunks)
+        """
+        start_time = time.time()
+        try:
+            # Step 1: Run static analysis tools
+            static_results = await self.run_static_analysis(pr_data)
+            # Step 2: Build the prompt
+            prompt = self._build_prompt()
+            # Step 3: Create the structured output chain
+            structured_llm = self.llm.with_structured_output(AgentFindings)
+            chain = prompt | structured_llm
+            # Step 4: Call the LLM
+            result = await chain.ainvoke({
+                "diff": pr_data.diff[:15000],  # Cap diff size for token limits
+                "file_contents": self._format_file_contents(pr_data.file_contents),
+                "static_analysis": static_results or "No static analysis results.",
+                "rag_context": rag_context or "",
+            })
+            # Step 5: Convert to Finding objects
+            findings = self._convert_to_findings(result)
+            elapsed_ms = int((time.time() - start_time) * 1000)
+            logger.info(
+                "Agent review completed",
+                agent=self.agent_name,
+                findings_count=len(findings),
+                elapsed_ms=elapsed_ms,
+            )
+            return findings
+        except Exception as e:
+            elapsed_ms = int((time.time() - start_time) * 1000)
+            logger.error(
+                "Agent review failed",
+                agent=self.agent_name,
+                error=str(e),
+                elapsed_ms=elapsed_ms,
+            )
+            return []  # Don't crash the pipeline — other agents can still work

app/agents/performance_agent.py ADDED Viewed

	@@ -0,0 +1,44 @@

+"""
+Performance Agent
+==================
+Evaluates code for computational efficiency, memory usage, and scalability.
+Uses radon for complexity metrics and the LLM for semantic analysis of
+query patterns, I/O operations, and algorithmic efficiency.
+Same architecture as SecurityAgent — inherits from BaseAgent, overrides
+only agent_name, system_prompt, and run_static_analysis().
+"""
+from __future__ import annotations
+from pathlib import Path
+import structlog
+from app.agents.base_agent import BaseAgent
+from app.github.client import PRData
+from app.tools.radon_tool import run_radon
+logger = structlog.get_logger()
+class PerformanceAgent(BaseAgent):
+    @property
+    def agent_name(self) -> str:
+        return "performance"
+    @property
+    def system_prompt(self) -> str:
+        prompt_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "prompts"
+            / "performance_system.md"
+        )
+        return prompt_path.read_text(encoding="utf-8")
+    async def run_static_analysis(self, pr_data: PRData) -> str:
+        """Run radon complexity analysis on changed Python files."""
+        radon_output = await run_radon(pr_data.file_contents)
+        return radon_output if radon_output else ""

app/agents/security_agent.py ADDED Viewed

	@@ -0,0 +1,107 @@

+"""
+Security Agent
+===============
+The Security Agent acts as a senior application security engineer (AppSec).
+It reviews every changed line through the lens of exploitability, data exposure,
+and authentication integrity.
+Architecture:
+1. Run static analysis tools (Bandit + detect-secrets) on changed files
+2. Combine static results with PR diff and full file contents
+3. Send everything to Groq's Llama-3.1-70B with a security-focused system prompt
+4. LLM produces structured JSON findings with CWE IDs and suggested fixes
+Why both static tools AND an LLM?
+Static tools (Bandit):
+  ✅ Fast, deterministic, zero false negatives for known patterns
+  ✅ Free — no API cost
+  ❌ Can't understand context (doesn't know if input is already sanitized)
+  ❌ Only catches patterns it has rules for
+LLM (Llama-3.1-70B):
+  ✅ Understands context, intent, data flow between functions
+  ✅ Can catch novel vulnerability patterns
+  ✅ Provides natural language explanations and fixes
+  ❌ Can hallucinate findings (false positives)
+  ❌ Costs API calls (though Groq's free tier is generous)
+Together: static tools provide HIGH-CONFIDENCE anchors, the LLM provides DEPTH.
+The Synthesizer (Week 7) will merge and deduplicate their outputs.
+"""
+from __future__ import annotations
+from pathlib import Path
+import structlog
+from app.agents.base_agent import BaseAgent
+from app.github.client import PRData
+from app.tools.bandit_tool import run_bandit
+from app.tools.detect_secrets_tool import run_detect_secrets
+logger = structlog.get_logger()
+class SecurityAgent(BaseAgent):
+    """
+    Security-focused code review agent.
+    Inherits from BaseAgent which provides:
+    - Groq LLM client (ChatGroq with Llama-3.1-70B)
+    - Structured output parsing (with_structured_output)
+    - Error handling and timing
+    - The review() method that orchestrates the flow
+    This class only needs to provide:
+    - agent_name: "security"
+    - system_prompt: loaded from prompts/security_system.md
+    - run_static_analysis(): runs Bandit + detect-secrets
+    """
+    @property
+    def agent_name(self) -> str:
+        return "security"
+    @property
+    def system_prompt(self) -> str:
+        """
+        Load the system prompt from the Markdown file.
+        We store prompts as separate files (not inline strings) because:
+        1. They're long (50+ lines) — inline strings clutter the code
+        2. They change frequently during prompt tuning (Week 9)
+        3. Non-engineers (product managers) can review/edit them
+        4. Git diff shows prompt changes clearly
+        """
+        prompt_path = Path(__file__).resolve().parent.parent.parent / "prompts" / "security_system.md"
+        return prompt_path.read_text(encoding="utf-8")
+    async def run_static_analysis(self, pr_data: PRData) -> str:
+        """
+        Run security-specific static analysis tools.
+        We run Bandit and detect-secrets in sequence (not parallel) because:
+        1. Each takes <5 seconds — parallelism gains are minimal
+        2. They both write to temp dirs — simpler to keep sequential
+        3. If one fails, the other still runs (independent try/except in each tool)
+        The results are concatenated into a single string that gets injected
+        into the LLM prompt. The LLM uses these as high-confidence signals
+        to anchor its own analysis.
+        """
+        results = []
+        # Run Bandit (Python security linter)
+        bandit_output = await run_bandit(pr_data.file_contents)
+        if bandit_output:
+            results.append(bandit_output)
+        # Run detect-secrets (credential scanner)
+        secrets_output = await run_detect_secrets(pr_data.file_contents)
+        if secrets_output:
+            results.append(secrets_output)
+        return "\n\n".join(results) if results else ""

app/agents/style_agent.py ADDED Viewed

	@@ -0,0 +1,43 @@

+"""
+Style & Maintainability Agent
+===============================
+Reviews code for readability, naming quality, documentation, test coverage,
+and architectural consistency. Uses Ruff for mechanical lint checks and the
+LLM for deeper maintainability analysis.
+Same architecture as SecurityAgent and PerformanceAgent.
+"""
+from __future__ import annotations
+from pathlib import Path
+import structlog
+from app.agents.base_agent import BaseAgent
+from app.github.client import PRData
+from app.tools.linter_tool import run_ruff
+logger = structlog.get_logger()
+class StyleAgent(BaseAgent):
+    @property
+    def agent_name(self) -> str:
+        return "style"
+    @property
+    def system_prompt(self) -> str:
+        prompt_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "prompts"
+            / "style_system.md"
+        )
+        return prompt_path.read_text(encoding="utf-8")
+    async def run_static_analysis(self, pr_data: PRData) -> str:
+        """Run Ruff linter on changed Python files."""
+        ruff_output = await run_ruff(pr_data.file_contents)
+        return ruff_output if ruff_output else ""

app/agents/synthesizer.py ADDED Viewed

	@@ -0,0 +1,291 @@

+"""
+Synthesizer Agent
+==================
+The Synthesizer is the "senior engineering manager" of Ninja Code Guard.
+It takes findings from all three domain agents (Security, Performance, Style)
+and produces a unified, non-redundant review.
+Responsibilities:
+1. **Deduplicate** — If Security and Performance flag the same line for
+   different reasons, merge them into one finding with both perspectives.
+2. **Resolve conflicts** — If agents disagree on severity, use a precedence
+   hierarchy: Security > Performance > Style.
+3. **Re-rank** — Sort findings by composite score: severity × confidence.
+4. **Compute Health Score** — 0-100 based on weighted finding density.
+5. **Generate executive summary** — 3-5 sentences summarizing the review.
+6. **Determine recommendation** — approve / request_changes / block.
+Why a Synthesizer instead of just concatenating findings?
+- Without dedup: the same SQL injection might be flagged by both Security
+  (as CWE-89) and Performance (as "unbounded query") — confusing for devs.
+- Without conflict resolution: Security says "critical", Style says "medium"
+  for the same issue — which severity should the comment show?
+- Without re-ranking: findings appear in arbitrary order — devs should see
+  the most important issues first.
+"""
+from __future__ import annotations
+import time
+from collections import defaultdict
+import structlog
+from app.models.findings import Finding, SynthesizedReview
+from app.services.health_score import calculate_health_score, determine_recommendation
+logger = structlog.get_logger()
+# Agent precedence for severity conflicts (higher = takes priority)
+AGENT_PRECEDENCE = {
+    "security": 3,
+    "performance": 2,
+    "style": 1,
+}
+SEVERITY_RANK = {
+    "critical": 4,
+    "high": 3,
+    "medium": 2,
+    "low": 1,
+}
+def _finding_key(f: Finding) -> str:
+    """
+    Generate a deduplication key for a finding.
+    Two findings are considered duplicates if they reference the same
+    file and overlapping line ranges. We use a simplified key based on
+    file_path and line_start — findings on the same line from different
+    agents are candidates for merging.
+    """
+    return f"{f.file_path}:{f.line_start}:{f.category}"
+def deduplicate_findings(findings: list[Finding]) -> list[Finding]:
+    """
+    Remove duplicate findings that reference the same code location.
+    When multiple agents flag the same file+line, we keep the finding from
+    the highest-precedence agent (Security > Performance > Style) and take
+    the maximum severity between them.
+    Example:
+        Security flags app.py:5 as "critical" (SQL injection)
+        Performance flags app.py:5 as "high" (unbounded query)
+        → Keep Security's finding with "critical" severity
+        → Append Performance's insight to the description
+    """
+    # Group findings by location
+    groups: dict[str, list[Finding]] = defaultdict(list)
+    for finding in findings:
+        key = _finding_key(finding)
+        groups[key].append(finding)
+    deduped = []
+    duplicates_removed = 0
+    for key, group in groups.items():
+        if len(group) == 1:
+            deduped.append(group[0])
+            continue
+        # Sort by agent precedence (highest first)
+        group.sort(
+            key=lambda f: AGENT_PRECEDENCE.get(f.agent, 0), reverse=True
+        )
+        # Take the primary finding (highest precedence agent)
+        primary = group[0]
+        # Take the maximum severity across all agents
+        max_severity = max(group, key=lambda f: SEVERITY_RANK.get(f.severity, 0))
+        # Merge: keep primary's structure, upgrade severity if needed
+        merged_description = primary.description
+        if len(group) > 1:
+            other_agents = [f.agent for f in group[1:]]
+            merged_description += (
+                f"\n\n*Also flagged by: {', '.join(other_agents)} agent(s).*"
+            )
+        merged = Finding(
+            agent=primary.agent,
+            file_path=primary.file_path,
+            line_start=primary.line_start,
+            line_end=primary.line_end,
+            severity=max_severity.severity,
+            category=primary.category,
+            title=primary.title,
+            description=merged_description,
+            suggested_fix=primary.suggested_fix,
+            cwe_id=primary.cwe_id,
+            confidence=max(f.confidence for f in group),
+        )
+        deduped.append(merged)
+        duplicates_removed += len(group) - 1
+    if duplicates_removed > 0:
+        logger.info(
+            "Deduplicated findings",
+            removed=duplicates_removed,
+            before=len(findings),
+            after=len(deduped),
+        )
+    return deduped
+def rank_findings(findings: list[Finding]) -> list[Finding]:
+    """
+    Sort findings by importance: severity (desc) then confidence (desc).
+    Developers should see the most critical, highest-confidence issues first.
+    This matches how a senior engineer would present a review — lead with
+    the blocking issues, then the nice-to-haves.
+    """
+    return sorted(
+        findings,
+        key=lambda f: (SEVERITY_RANK.get(f.severity, 0), f.confidence),
+        reverse=True,
+    )
+def generate_executive_summary(
+    findings: list[Finding],
+    health_score: int,
+    recommendation: str,
+) -> str:
+    """
+    Generate a 3-5 sentence executive summary of the review.
+    This appears at the top of the PR comment, giving the author a quick
+    overview without needing to read every finding.
+    """
+    if not findings:
+        return (
+            "No issues were found in this pull request. "
+            "The code changes look clean across security, performance, and style dimensions. "
+            "Safe to merge."
+        )
+    # Count by agent
+    agent_counts = defaultdict(int)
+    for f in findings:
+        agent_counts[f.agent] += 1
+    # Count by severity
+    sev_counts = defaultdict(int)
+    for f in findings:
+        sev_counts[f.severity] += 1
+    parts = []
+    # Opening line
+    total = len(findings)
+    parts.append(
+        f"Multi-agent review analyzed this PR across security, performance, and style dimensions, "
+        f"finding {total} issue{'s' if total != 1 else ''}."
+    )
+    # Severity breakdown
+    sev_parts = []
+    for sev in ["critical", "high", "medium", "low"]:
+        count = sev_counts.get(sev, 0)
+        if count > 0:
+            sev_parts.append(f"{count} {sev}")
+    if sev_parts:
+        parts.append(f"Breakdown: {', '.join(sev_parts)}.")
+    # Agent breakdown
+    agent_parts = []
+    for agent in ["security", "performance", "style"]:
+        count = agent_counts.get(agent, 0)
+        if count > 0:
+            agent_parts.append(f"{agent.capitalize()}: {count}")
+    if agent_parts:
+        parts.append(f"By domain: {', '.join(agent_parts)}.")
+    # Top issue highlight
+    if sev_counts.get("critical", 0) > 0:
+        critical_finding = next(f for f in findings if f.severity == "critical")
+        parts.append(
+            f"Most urgent: {critical_finding.title} in `{critical_finding.file_path}`."
+        )
+    elif sev_counts.get("high", 0) > 0:
+        high_finding = next(f for f in findings if f.severity == "high")
+        parts.append(
+            f"Top priority: {high_finding.title} in `{high_finding.file_path}`."
+        )
+    return " ".join(parts)
+def synthesize(
+    security_findings: list[Finding],
+    performance_findings: list[Finding],
+    style_findings: list[Finding],
+) -> SynthesizedReview:
+    """
+    Main entry point: synthesize findings from all agents into a unified review.
+    Pipeline:
+    1. Combine all findings
+    2. Deduplicate (merge overlapping findings)
+    3. Rank by severity and confidence
+    4. Calculate Health Score
+    5. Determine recommendation
+    6. Generate executive summary
+    Returns a SynthesizedReview ready for posting to GitHub.
+    """
+    start = time.time()
+    # Step 1: Combine
+    all_findings = security_findings + performance_findings + style_findings
+    # Step 2: Deduplicate
+    deduped = deduplicate_findings(all_findings)
+    # Step 3: Rank
+    ranked = rank_findings(deduped)
+    # Step 4: Health Score
+    health_score = calculate_health_score(ranked)
+    # Step 5: Recommendation
+    recommendation = determine_recommendation(ranked, health_score)
+    # Step 6: Executive summary
+    summary = generate_executive_summary(ranked, health_score, recommendation)
+    # Count by severity
+    critical = sum(1 for f in ranked if f.severity == "critical")
+    high = sum(1 for f in ranked if f.severity == "high")
+    medium = sum(1 for f in ranked if f.severity == "medium")
+    low = sum(1 for f in ranked if f.severity == "low")
+    elapsed_ms = int((time.time() - start) * 1000)
+    logger.info(
+        "Synthesis complete",
+        input_findings=len(all_findings),
+        after_dedup=len(ranked),
+        health_score=health_score,
+        recommendation=recommendation,
+        elapsed_ms=elapsed_ms,
+    )
+    return SynthesizedReview(
+        health_score=health_score,
+        executive_summary=summary,
+        recommendation=recommendation,
+        findings=ranked,
+        critical_count=critical,
+        high_count=high,
+        medium_count=medium,
+        low_count=low,
+        duration_ms=elapsed_ms,
+    )

app/config.py ADDED Viewed

	@@ -0,0 +1,40 @@

+"""Application configuration via environment variables."""
+from pydantic_settings import BaseSettings
+class Settings(BaseSettings):
+    """All configuration loaded from environment variables."""
+    # LLM APIs
+    groq_api_key: str = ""
+    gemini_api_key: str = ""
+    # GitHub App
+    github_app_id: str = ""
+    github_app_private_key_path: str = "./keys/app.pem"
+    github_webhook_secret: str = ""
+    # Database
+    database_url: str = ""
+    # Redis Cache
+    upstash_redis_url: str = ""
+    # Embedding
+    embedding_model: str = "all-MiniLM-L6-v2"
+    # App Config
+    environment: str = "development"
+    log_level: str = "INFO"
+    confidence_threshold: float = 0.6
+    max_repo_files_index: int = 500
+    # Security
+    dashboard_api_key: str = ""  # Set in production to protect dashboard API
+    cors_allowed_origins: str = ""  # Comma-separated origins, e.g. "https://myapp.vercel.app"
+    model_config = {"env_file": ".env", "env_file_encoding": "utf-8"}
+settings = Settings()

app/context/__init__.py ADDED Viewed

File without changes

app/context/embedder.py ADDED Viewed

	@@ -0,0 +1,126 @@

+"""
+Code Embedding Pipeline
+========================
+Converts source code into vector embeddings using sentence-transformers.
+These embeddings are stored in ChromaDB for semantic search.
+How it works:
+1. Source code is split into chunks (functions, classes, or fixed-size blocks)
+2. Each chunk is embedded into a 384-dimensional vector
+3. Vectors capture semantic meaning — similar code has similar vectors
+4. When reviewing a PR, we query ChromaDB with the diff to find related code
+Why embeddings for code?
+Consider this diff:
+    + user_id = request.args.get("id")
+    + data = db.query(f"SELECT * FROM users WHERE id = {user_id}")
+To evaluate this, the agent needs to know:
+- Does `db.query()` parameterize inputs? → Need the DB wrapper's source code
+- Is there middleware that validates `user_id`? → Need the middleware source
+- Are there other similar patterns in the codebase? → Need semantic search
+Embeddings let us find this related code WITHOUT knowing the exact file paths.
+The query "SQL query with user input" returns relevant code chunks ranked by
+semantic similarity — not keyword matching, but meaning matching.
+Model: all-MiniLM-L6-v2
+- 384 dimensions, 22M parameters
+- Runs locally on CPU in ~10ms per chunk (GPU: ~1ms)
+- Optimized for semantic similarity tasks
+- Good enough for code — not perfect, but fast and free
+"""
+from __future__ import annotations
+import structlog
+from app.config import settings
+logger = structlog.get_logger()
+# Lazy-loaded model to avoid slow import at startup
+_model = None
+def get_embedding_model():
+    """
+    Lazy-load the sentence-transformers model.
+    We load on first use (not at import time) because:
+    1. The model takes ~2 seconds to load
+    2. Not every request needs embeddings (cached reviews skip this)
+    3. Tests shouldn't load a real ML model
+    """
+    global _model
+    if _model is None:
+        try:
+            from sentence_transformers import SentenceTransformer
+            _model = SentenceTransformer(settings.embedding_model)
+            logger.info("Loaded embedding model", model=settings.embedding_model)
+        except ImportError:
+            logger.warning("sentence-transformers not installed — RAG context disabled")
+            return None
+    return _model
+def embed_texts(texts: list[str]) -> list[list[float]]:
+    """
+    Embed a list of text strings into vectors.
+    Args:
+        texts: List of code chunks or queries to embed
+    Returns:
+        List of embedding vectors (each is a list of floats)
+    """
+    model = get_embedding_model()
+    if model is None:
+        return []
+    embeddings = model.encode(texts, show_progress_bar=False)
+    return embeddings.tolist()
+def chunk_code(content: str, filepath: str, chunk_size: int = 60) -> list[dict]:
+    """
+    Split source code into overlapping chunks for embedding.
+    Strategy: We chunk by lines with overlap. Each chunk is ~60 lines
+    with 10 lines of overlap to preserve context across boundaries.
+    Why 60 lines? It's roughly one function/class — the natural unit of
+    code that a developer would reason about. Too small (10 lines) loses
+    context. Too large (200 lines) dilutes the embedding signal.
+    Args:
+        content: Full file source code
+        filepath: The file path (included as metadata)
+        chunk_size: Lines per chunk (default: 60)
+    Returns:
+        List of dicts with 'text', 'filepath', 'start_line', 'end_line'
+    """
+    lines = content.split("\n")
+    chunks = []
+    overlap = 10
+    start = 0
+    while start < len(lines):
+        end = min(start + chunk_size, len(lines))
+        chunk_text = "\n".join(lines[start:end])
+        # Skip very small chunks (less than 5 non-empty lines)
+        non_empty = sum(1 for line in lines[start:end] if line.strip())
+        if non_empty >= 5:
+            chunks.append({
+                "text": f"# File: {filepath}\n{chunk_text}",
+                "filepath": filepath,
+                "start_line": start + 1,
+                "end_line": end,
+            })
+        start += max(chunk_size - overlap, 1)  # Overlap for context continuity
+    return chunks

app/context/indexer.py ADDED Viewed

	@@ -0,0 +1,127 @@

+"""
+ChromaDB Repo Indexer
+======================
+Indexes repository source code into ChromaDB for semantic search.
+Each repo gets its own ChromaDB collection, keyed by the repo's full name.
+How indexing works:
+1. Receive file contents from GitHub API
+2. Chunk each file into ~60-line blocks
+3. Embed each chunk using sentence-transformers
+4. Upsert into ChromaDB collection for this repo
+ChromaDB is an open-source vector database that:
+- Runs embedded in the Python process (no separate server needed)
+- Stores vectors + metadata + documents together
+- Supports fast approximate nearest neighbor (ANN) search
+- Can persist to disk or run entirely in-memory
+We use in-memory mode on Render (ephemeral storage) — the index is rebuilt
+on each PR review. This is acceptable because indexing the changed files
+takes <1 second for typical PRs.
+"""
+from __future__ import annotations
+import chromadb
+import structlog
+from app.config import settings
+from app.context.embedder import chunk_code, embed_texts
+logger = structlog.get_logger()
+# Singleton ChromaDB client (in-memory)
+_chroma_client: chromadb.ClientAPI | None = None
+def _get_chroma_client() -> chromadb.ClientAPI:
+    """Get or create the ChromaDB client."""
+    global _chroma_client
+    if _chroma_client is None:
+        _chroma_client = chromadb.Client()  # In-memory, no persistence
+    return _chroma_client
+def _collection_name(repo_full_name: str) -> str:
+    """Generate a valid ChromaDB collection name from a repo name."""
+    # ChromaDB requires alphanumeric + underscores, 3-63 chars
+    name = repo_full_name.replace("/", "_").replace("-", "_")
+    return f"repo_{name}"[:63]
+async def index_repo_files(
+    repo_full_name: str, file_contents: dict[str, str]
+) -> str:
+    """
+    Index repository files into ChromaDB for RAG retrieval.
+    This is called during each PR review to ensure the vector store
+    has the latest file contents. We upsert (insert or update) so
+    re-indexing the same file just overwrites the old vectors.
+    Args:
+        repo_full_name: "owner/repo" — used as collection name
+        file_contents: dict of {filepath: source_code}
+    Returns:
+        Collection name (for retrieval)
+    """
+    client = _get_chroma_client()
+    collection_name = _collection_name(repo_full_name)
+    # Get or create a collection for this repo
+    collection = client.get_or_create_collection(
+        name=collection_name,
+        metadata={"repo": repo_full_name},
+    )
+    # Chunk all files
+    all_chunks = []
+    for filepath, content in file_contents.items():
+        # Skip very large files (binary, generated code, etc.)
+        if len(content) > 100_000:
+            continue
+        chunks = chunk_code(content, filepath)
+        all_chunks.extend(chunks)
+    if not all_chunks:
+        logger.info("No chunks to index", repo=repo_full_name)
+        return collection_name
+    # Limit total chunks (Render memory constraint)
+    max_chunks = settings.max_repo_files_index
+    if len(all_chunks) > max_chunks:
+        all_chunks = all_chunks[:max_chunks]
+    # Embed all chunks
+    texts = [chunk["text"] for chunk in all_chunks]
+    embeddings = embed_texts(texts)
+    if not embeddings:
+        logger.warning("Embedding failed — RAG context unavailable")
+        return collection_name
+    # Upsert into ChromaDB
+    ids = [f"{chunk['filepath']}:{chunk['start_line']}" for chunk in all_chunks]
+    metadatas = [
+        {"filepath": chunk["filepath"], "start_line": chunk["start_line"], "end_line": chunk["end_line"]}
+        for chunk in all_chunks
+    ]
+    collection.upsert(
+        ids=ids,
+        embeddings=embeddings,
+        documents=texts,
+        metadatas=metadatas,
+    )
+    logger.info(
+        "Indexed repo files",
+        repo=repo_full_name,
+        chunks=len(all_chunks),
+        collection=collection_name,
+    )
+    return collection_name

app/context/retriever.py ADDED Viewed

	@@ -0,0 +1,116 @@

+"""
+RAG Context Retriever
+======================
+Retrieves relevant code context from ChromaDB based on the PR diff.
+This is the "R" in RAG (Retrieval-Augmented Generation).
+How retrieval works:
+1. Take the PR diff text as a query
+2. Embed the query using the same model used for indexing
+3. Search ChromaDB for the most similar code chunks
+4. Return the top-k chunks as additional context for the LLM
+Why RAG for code review?
+The PR diff only shows CHANGED lines. But understanding a change often
+requires seeing RELATED code:
+- If a function is called from 5 places, changing it affects all callers
+- If a variable is validated in another file, the validation matters here
+- If the same pattern exists elsewhere, inconsistency is a style issue
+RAG gives the agents "peripheral vision" — they see not just the change,
+but the surrounding codebase context that makes the change meaningful.
+"""
+from __future__ import annotations
+import structlog
+from app.context.embedder import embed_texts
+from app.context.indexer import _get_chroma_client
+logger = structlog.get_logger()
+async def retrieve_context(
+    collection_name: str,
+    query_text: str,
+    top_k: int = 5,
+) -> str:
+    """
+    Retrieve relevant code context from ChromaDB.
+    Args:
+        collection_name: The ChromaDB collection to search
+        query_text: The PR diff or a specific query
+        top_k: Number of results to return (default: 5)
+    Returns:
+        A formatted string of relevant code chunks to include in the LLM prompt.
+        Returns empty string if retrieval fails or no results found.
+    """
+    try:
+        client = _get_chroma_client()
+        # Check if collection exists
+        try:
+            collection = client.get_collection(name=collection_name)
+        except Exception:
+            logger.debug("Collection not found — no RAG context", collection=collection_name)
+            return ""
+        # Skip if collection is empty
+        if collection.count() == 0:
+            return ""
+        # Embed the query
+        query_embeddings = embed_texts([query_text[:5000]])  # Cap query size
+        if not query_embeddings:
+            return ""
+        # Search for similar code chunks
+        results = collection.query(
+            query_embeddings=query_embeddings,
+            n_results=min(top_k, collection.count()),
+            include=["documents", "metadatas", "distances"],
+        )
+        if not results or not results["documents"] or not results["documents"][0]:
+            return ""
+        # Format results as context for the LLM
+        context_parts = ["## Related Code Context (from repository)\n"]
+        for doc, metadata, distance in zip(
+            results["documents"][0],
+            results["metadatas"][0],
+            results["distances"][0],
+        ):
+            filepath = metadata.get("filepath", "unknown")
+            start = metadata.get("start_line", "?")
+            end = metadata.get("end_line", "?")
+            # ChromaDB returns L2 distance — lower = more similar
+            similarity = max(0, 1 - distance / 2)  # Rough conversion to 0-1
+            if similarity < 0.3:
+                continue  # Skip low-relevance results
+            context_parts.append(
+                f"### {filepath} (lines {start}-{end}, relevance: {similarity:.0%})\n"
+                f"```\n{doc}\n```\n"
+            )
+        if len(context_parts) == 1:  # Only the header, no results
+            return ""
+        context = "\n".join(context_parts)
+        logger.info(
+            "Retrieved RAG context",
+            collection=collection_name,
+            chunks_returned=len(context_parts) - 1,
+        )
+        return context
+    except Exception as e:
+        logger.warning("RAG retrieval failed", error=str(e))
+        return ""

app/db/__init__.py ADDED Viewed

File without changes

app/db/postgres.py ADDED Viewed

	@@ -0,0 +1,144 @@

+"""
+Neon Postgres Database Client
+===============================
+Stores PR review history for the dashboard: health scores, finding counts,
+executive summaries, and full findings JSON.
+Uses psycopg2 for synchronous queries (sufficient for dashboard reads)
+and asyncpg for async writes from the webhook pipeline.
+Schema is auto-created on first connection via ensure_tables().
+"""
+from __future__ import annotations
+import json
+from datetime import datetime, timezone
+from uuid import uuid4
+import structlog
+from app.config import settings
+from app.models.findings import SynthesizedReview
+logger = structlog.get_logger()
+# ── Connection pool (reuse connections instead of connect-per-query) ──────
+_pool = None
+async def _get_pool():
+    global _pool
+    if _pool is None:
+        import asyncpg
+        _pool = await asyncpg.create_pool(
+            settings.database_url,
+            min_size=1,
+            max_size=5,
+            command_timeout=10,
+        )
+    return _pool
+CREATE_TABLE_SQL = """
+CREATE TABLE IF NOT EXISTS pr_reviews (
+    id              TEXT PRIMARY KEY,
+    repo_full_name  TEXT NOT NULL,
+    pr_number       INT NOT NULL,
+    commit_sha      TEXT NOT NULL,
+    health_score    INT NOT NULL,
+    critical_count  INT DEFAULT 0,
+    high_count      INT DEFAULT 0,
+    medium_count    INT DEFAULT 0,
+    low_count       INT DEFAULT 0,
+    summary         TEXT,
+    findings        JSONB NOT NULL DEFAULT '[]',
+    duration_ms     INT DEFAULT 0,
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+CREATE INDEX IF NOT EXISTS idx_pr_reviews_repo ON pr_reviews(repo_full_name);
+CREATE INDEX IF NOT EXISTS idx_pr_reviews_sha ON pr_reviews(commit_sha);
+"""
+async def ensure_tables():
+    """Create the pr_reviews table if it doesn't exist."""
+    if not settings.database_url:
+        logger.warning("DATABASE_URL not set — skipping table creation")
+        return
+    try:
+        pool = await _get_pool()
+        async with pool.acquire() as conn:
+            await conn.execute(CREATE_TABLE_SQL)
+        logger.info("Database tables ensured")
+    except Exception as e:
+        logger.warning("Database setup failed", error=str(e))
+async def save_review(
+    repo_full_name: str,
+    pr_number: int,
+    commit_sha: str,
+    review: SynthesizedReview,
+) -> None:
+    """Save a PR review to the database."""
+    if not settings.database_url:
+        return
+    try:
+        pool = await _get_pool()
+        async with pool.acquire() as conn:
+            await conn.execute(
+                """
+                INSERT INTO pr_reviews (id, repo_full_name, pr_number, commit_sha,
+                    health_score, critical_count, high_count, medium_count, low_count,
+                    summary, findings, duration_ms)
+                VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
+                """,
+                str(uuid4()),
+                repo_full_name,
+                pr_number,
+                commit_sha,
+                review.health_score,
+                review.critical_count,
+                review.high_count,
+                review.medium_count,
+                review.low_count,
+                review.executive_summary,
+                json.dumps([f.model_dump() for f in review.findings]),
+                review.duration_ms,
+            )
+        logger.info("Saved review to database", repo=repo_full_name, pr=pr_number)
+    except Exception as e:
+        logger.warning("Database save failed", error=str(e))
+async def get_repo_reviews(repo_full_name: str, limit: int = 20) -> list[dict]:
+    limit = min(limit, 100)  # Cap to prevent excessive queries
+    """Get recent reviews for a repo."""
+    if not settings.database_url:
+        return []
+    try:
+        pool = await _get_pool()
+        async with pool.acquire() as conn:
+            rows = await conn.fetch(
+                """
+                SELECT id, pr_number, commit_sha, health_score,
+                       critical_count, high_count, medium_count, low_count,
+                       summary, duration_ms, created_at
+                FROM pr_reviews
+                WHERE repo_full_name = $1
+                ORDER BY created_at DESC
+                LIMIT $2
+                """,
+                repo_full_name,
+                limit,
+            )
+        return [dict(row) for row in rows]
+    except Exception as e:
+        logger.warning("Database query failed", error=str(e))
+        return []

app/db/redis_cache.py ADDED Viewed

	@@ -0,0 +1,121 @@

+"""
+Redis Cache for PR Review Deduplication
+========================================
+When a developer pushes multiple commits quickly (or force-pushes), GitHub sends
+a webhook for each push. Without caching, we'd re-analyze the same PR multiple times,
+wasting Groq API quota and spamming the PR with duplicate comments.
+Solution: Before analyzing a PR, we check Redis: "Have we already reviewed this
+exact commit SHA?" If yes, we skip the analysis entirely.
+Why Redis (Upstash) instead of in-memory cache?
+- Our Render free tier restarts the server frequently (cold starts)
+- In-memory cache would be lost on every restart
+- Redis persists across restarts and is shared if we scale to multiple workers
+- Upstash's serverless Redis gives us 10K requests/day free — more than enough
+Cache key structure: "ninjacg:reviewed:{commit_sha}"
+Cache value: "1" (just a flag — we don't store the review result here, that's in Postgres)
+TTL: 7 days (after which re-analysis is allowed)
+"""
+from __future__ import annotations
+import redis.asyncio as redis
+import structlog
+from app.config import settings
+logger = structlog.get_logger()
+# Connection pool — reused across requests for efficiency.
+# Redis connections are expensive to create (TCP handshake + TLS negotiation).
+# A pool keeps connections open and reuses them.
+_redis_client: redis.Redis | None = None
+# Cache TTL in seconds (7 days)
+CACHE_TTL = 7 * 24 * 60 * 60
+def _get_redis_client() -> redis.Redis:
+    """
+    Get or create the Redis client singleton.
+    Uses lazy initialization — the client is created on first use, not at import time.
+    This prevents connection errors during module import (e.g., in tests).
+    """
+    global _redis_client
+    if _redis_client is None:
+        _redis_client = redis.from_url(
+            settings.upstash_redis_url,
+            decode_responses=True,
+        )
+    return _redis_client
+def _cache_key(commit_sha: str) -> str:
+    """Build the Redis key for a commit SHA."""
+    return f"ninjacg:reviewed:{commit_sha}"
+async def is_already_reviewed(commit_sha: str) -> bool:
+    """
+    Check if a commit has already been reviewed.
+    This is called at the start of every webhook handler to short-circuit
+    duplicate analysis. Returns True if we should skip.
+    Args:
+        commit_sha: The HEAD commit SHA of the PR
+    Returns:
+        True if this commit has already been reviewed, False otherwise
+    """
+    try:
+        client = _get_redis_client()
+        result = await client.exists(_cache_key(commit_sha))
+        if result:
+            logger.info("Cache hit — skipping re-analysis", commit_sha=commit_sha[:8])
+        return bool(result)
+    except Exception as e:
+        # If Redis is down, we proceed with analysis (fail open).
+        # Better to review a PR twice than to miss a review entirely.
+        logger.warning("Redis check failed, proceeding with analysis", error=str(e))
+        return False
+async def mark_as_reviewed(commit_sha: str) -> None:
+    """
+    Mark a commit as reviewed in the cache.
+    Called after successfully posting a review to GitHub.
+    The TTL ensures stale entries are automatically cleaned up.
+    Args:
+        commit_sha: The HEAD commit SHA that was reviewed
+    """
+    try:
+        client = _get_redis_client()
+        await client.set(_cache_key(commit_sha), "1", ex=CACHE_TTL)
+        logger.info("Cached review result", commit_sha=commit_sha[:8], ttl_days=7)
+    except Exception as e:
+        # Non-fatal — if we can't cache, we'll just re-analyze next time
+        logger.warning("Redis set failed", error=str(e))
+async def invalidate_cache(commit_sha: str) -> None:
+    """
+    Remove a commit from the cache, forcing re-analysis.
+    Used by the /reanalyze endpoint when a user manually requests re-review.
+    Args:
+        commit_sha: The commit SHA to invalidate
+    """
+    try:
+        client = _get_redis_client()
+        await client.delete(_cache_key(commit_sha))
+        logger.info("Cache invalidated", commit_sha=commit_sha[:8])
+    except Exception as e:
+        logger.warning("Redis delete failed", error=str(e))

app/github/__init__.py ADDED Viewed

File without changes

app/github/auth.py ADDED Viewed

	@@ -0,0 +1,135 @@

+"""
+GitHub App Authentication
+==========================
+GitHub Apps authenticate via a two-step process:
+1. **JWT Generation**: We create a JSON Web Token (JWT) signed with our private key
+   (.pem file). This JWT proves we are the registered GitHub App. It's valid for
+   max 10 minutes — intentionally short-lived for security.
+2. **Installation Access Token**: We exchange the JWT for an installation access token
+   via GitHub's API. This token is scoped to a specific installation (a specific set
+   of repos where the app is installed) and lasts 1 hour.
+Why two steps? A GitHub App can be installed on hundreds of orgs/repos. The JWT says
+"I am CodeProbe app" — the installation token says "I have permission to access
+@ninjacode911's repos specifically." This separation of identity vs. authorization
+is a production-grade security pattern (similar to OAuth2 client credentials).
+We cache the installation token in memory and refresh it when it expires, so we
+don't make unnecessary API calls.
+Reference: https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app
+"""
+import asyncio
+import time
+from pathlib import Path
+import httpx
+import jwt  # PyJWT library — used to create JSON Web Tokens
+from app.config import settings
+# In-memory cache for installation tokens
+_token_cache: dict[int, dict] = {}
+# Asyncio lock to prevent race conditions on token cache
+_token_lock = asyncio.Lock()
+# Cached private key (read from disk once, reused)
+_private_key: str | None = None
+# GitHub API base URL
+GITHUB_API = "https://api.github.com"
+def _generate_jwt() -> str:
+    """
+    Generate a JWT (JSON Web Token) signed with our GitHub App's private key.
+    A JWT has three parts (separated by dots):
+    1. Header: algorithm (RS256) and token type
+    2. Payload: who we are (iss = app ID), when issued, when it expires
+    3. Signature: the header+payload signed with our RSA private key
+    GitHub verifies the signature using our app's public key (which GitHub stores
+    when we register the app). This is asymmetric cryptography — we sign with the
+    private key, GitHub verifies with the public key.
+    RS256 = RSA + SHA-256 — the industry standard for JWT signing.
+    """
+    now = int(time.time())
+    # Cache the private key in memory after first read (avoid repeated disk I/O)
+    global _private_key
+    if _private_key is None:
+        project_root = Path(__file__).resolve().parent.parent.parent
+        private_key_path = project_root / settings.github_app_private_key_path
+        _private_key = private_key_path.read_text()
+    payload = {
+        # iat = "issued at" — when this token was created
+        "iat": now - 60,  # 60 seconds in the past to account for clock drift
+        # exp = "expires at" — GitHub rejects JWTs older than 10 minutes
+        "exp": now + (9 * 60),  # 9 minutes (safely under the 10-min limit)
+        # iss = "issuer" — our GitHub App ID, proving which app we are
+        "iss": settings.github_app_id,
+    }
+    # Sign the JWT with our private RSA key using RS256 algorithm
+    return jwt.encode(payload, _private_key, algorithm="RS256")
+async def get_installation_token(installation_id: int) -> str:
+    """
+    Get an installation access token for a specific GitHub App installation.
+    This token is what we actually use to call GitHub APIs (fetch PRs, post comments).
+    It's scoped to the specific repos where the app is installed.
+    We cache tokens in memory and reuse them until they expire (1 hour lifetime).
+    This avoids making a new token request for every API call.
+    Args:
+        installation_id: The GitHub installation ID (sent in webhook payloads).
+                         Each org/user that installs our app gets a unique ID.
+    Returns:
+        A valid installation access token string.
+    """
+    # Check cache first (outside lock for fast path)
+    cached = _token_cache.get(installation_id)
+    if cached and cached["expires_at"] > time.time() + 60:
+        return cached["token"]
+    # Lock prevents race condition: two coroutines seeing cache miss simultaneously
+    async with _token_lock:
+        # Double-check inside lock (another coroutine may have filled the cache)
+        cached = _token_cache.get(installation_id)
+        if cached and cached["expires_at"] > time.time() + 60:
+            return cached["token"]
+        app_jwt = _generate_jwt()
+        # Exchange the JWT for an installation-scoped access token
+        async with httpx.AsyncClient(timeout=30.0) as client:
+            response = await client.post(
+                f"{GITHUB_API}/app/installations/{installation_id}/access_tokens",
+                headers={
+                    "Authorization": f"Bearer {app_jwt}",
+                    "Accept": "application/vnd.github+json",
+                    "X-GitHub-Api-Version": "2022-11-28",
+                },
+            )
+            response.raise_for_status()
+            data = response.json()
+        # Cache the token
+        _token_cache[installation_id] = {
+            "token": data["token"],
+            "expires_at": time.time() + 3500,
+        }
+        return data["token"]

app/github/client.py ADDED Viewed

	@@ -0,0 +1,362 @@

+"""
+GitHub API Client
+==================
+This module handles all communication with GitHub's REST API. It provides
+methods to:
+1. Fetch PR diff (the raw unified diff showing what changed)
+2. Fetch file contents (full source code for context/RAG)
+3. Fetch changed file list (which files were modified)
+4. Post a PR review with inline comments (anchored to specific lines)
+5. Post a summary comment on the PR conversation
+GitHub API Authentication:
+- We authenticate using installation access tokens (from auth.py)
+- Every request includes the token in the Authorization header
+- The token is scoped to the specific repos where our app is installed
+GitHub API Versioning:
+- We pin to version "2022-11-28" via X-GitHub-Api-Version header
+- This ensures our code doesn't break when GitHub ships API changes
+- This is a best practice for any API integration in production
+Rate Limits:
+- GitHub Apps get 5,000 requests/hour per installation
+- That's plenty for our use case (~10-20 API calls per PR review)
+Reference: https://docs.github.com/en/rest
+"""
+from __future__ import annotations
+import base64
+from dataclasses import dataclass
+import httpx
+import structlog
+from app.github.auth import get_installation_token
+logger = structlog.get_logger()
+GITHUB_API = "https://api.github.com"
+@dataclass
+class PRData:
+    """
+    All the data we fetch about a PR, bundled together.
+    This is passed to the agent orchestrator so agents have full context.
+    A dataclass (vs a dict) gives us type safety and autocomplete in the IDE.
+    """
+    repo_full_name: str       # e.g. "ninjacode911/myapp"
+    pr_number: int
+    commit_sha: str           # HEAD commit of the PR
+    title: str
+    diff: str                 # Raw unified diff (the actual code changes)
+    changed_files: list[dict] # List of {filename, status, additions, deletions, patch}
+    file_contents: dict[str, str]  # {filepath: full_file_content} for changed files
+class GitHubClient:
+    """
+    Async GitHub API client for a specific installation.
+    Usage:
+        client = GitHubClient(installation_id=12345)
+        pr_data = await client.fetch_pr_data("ninjacode911/myapp", 42)
+        await client.post_review_comment(...)
+    Why a class instead of standalone functions?
+    - The installation_id and token are shared across all API calls for one webhook event
+    - A class groups these related operations together with shared state
+    - Makes it easy to test by mocking one object
+    """
+    def __init__(self, installation_id: int):
+        self.installation_id = installation_id
+    async def _get_headers(self) -> dict[str, str]:
+        """
+        Build the authorization headers for GitHub API requests.
+        Delegates to auth.py which handles token caching and refresh.
+        No client-level cache — auth.py's cache is the single source of truth.
+        """
+        token = await get_installation_token(self.installation_id)
+        return {
+            "Authorization": f"token {token}",
+            "Accept": "application/vnd.github+json",
+            "X-GitHub-Api-Version": "2022-11-28",
+        }
+    async def fetch_pr_data(self, repo_full_name: str, pr_number: int) -> PRData:
+        """
+        Fetch all data needed to review a PR in one method.
+        This makes 3 API calls:
+        1. GET /repos/{owner}/{repo}/pulls/{pr_number} — PR metadata + diff
+        2. GET /repos/{owner}/{repo}/pulls/{pr_number}/files — list of changed files
+        3. GET /repos/{owner}/{repo}/contents/{path} — full content per changed file
+        We fetch full file contents (not just the diff) because our agents need
+        surrounding context. The diff alone doesn't show imports, class definitions,
+        or the rest of the function — all critical for understanding security and
+        performance implications.
+        Args:
+            repo_full_name: "owner/repo" format (e.g. "ninjacode911/myapp")
+            pr_number: The PR number
+        Returns:
+            PRData with diff, changed files, and full file contents
+        """
+        headers = await self._get_headers()
+        async with httpx.AsyncClient(timeout=30.0) as http:
+            # --- 1. Fetch PR metadata ---
+            pr_response = await http.get(
+                f"{GITHUB_API}/repos/{repo_full_name}/pulls/{pr_number}",
+                headers=headers,
+            )
+            pr_response.raise_for_status()
+            pr_json = pr_response.json()
+            commit_sha = pr_json["head"]["sha"]
+            title = pr_json["title"]
+            # --- 2. Fetch the raw diff ---
+            # By setting Accept to "application/vnd.github.diff", GitHub returns
+            # the raw unified diff instead of JSON. This is the same format you
+            # see with `git diff` — it's what our agents will analyze.
+            diff_response = await http.get(
+                f"{GITHUB_API}/repos/{repo_full_name}/pulls/{pr_number}",
+                headers={**headers, "Accept": "application/vnd.github.diff"},
+            )
+            diff_response.raise_for_status()
+            diff = diff_response.text
+            # --- 3. Fetch list of changed files ---
+            # This gives us structured data: filename, status (added/modified/removed),
+            # number of additions/deletions, and the patch (per-file diff).
+            # We paginate because large PRs can have 100+ files.
+            changed_files = []
+            page = 1
+            while page <= 30:  # Cap at 3000 files to prevent runaway loops
+                files_response = await http.get(
+                    f"{GITHUB_API}/repos/{repo_full_name}/pulls/{pr_number}/files",
+                    headers=headers,
+                    params={"per_page": 100, "page": page},
+                )
+                files_response.raise_for_status()
+                batch = files_response.json()
+                if not batch:
+                    break
+                changed_files.extend(batch)
+                if len(batch) < 100:
+                    break
+                page += 1
+            # --- 4. Fetch full file contents for each changed file ---
+            # We need the complete source code (not just the diff) for RAG context.
+            # The agents can then understand imports, class hierarchy, etc.
+            file_contents = {}
+            for file_info in changed_files:
+                filename = file_info["filename"]
+                status = file_info["status"]
+                # Skip deleted files and binary files — no content to review
+                if status == "removed":
+                    continue
+                try:
+                    content = await self._fetch_file_content(
+                        http, headers, repo_full_name, filename, commit_sha
+                    )
+                    if content is not None:
+                        file_contents[filename] = content
+                except Exception as e:
+                    # Non-fatal: if we can't fetch one file, continue with the rest
+                    logger.warning(
+                        "Failed to fetch file content",
+                        filename=filename,
+                        error=str(e),
+                    )
+        logger.info(
+            "Fetched PR data",
+            repo=repo_full_name,
+            pr=pr_number,
+            changed_files=len(changed_files),
+            files_with_content=len(file_contents),
+        )
+        return PRData(
+            repo_full_name=repo_full_name,
+            pr_number=pr_number,
+            commit_sha=commit_sha,
+            title=title,
+            diff=diff,
+            changed_files=changed_files,
+            file_contents=file_contents,
+        )
+    async def _fetch_file_content(
+        self,
+        http: httpx.AsyncClient,
+        headers: dict,
+        repo_full_name: str,
+        filepath: str,
+        ref: str,
+    ) -> str | None:
+        """
+        Fetch the full content of a single file at a specific commit.
+        GitHub's Contents API returns file content as base64-encoded string.
+        We decode it to get the actual source code text.
+        Why base64? Because GitHub's API is JSON-based, and JSON can't safely
+        contain arbitrary binary content. Base64 encodes binary as ASCII text.
+        This is the same encoding used in email attachments (MIME).
+        Args:
+            http: The httpx client (reused for connection pooling)
+            headers: Auth headers
+            repo_full_name: "owner/repo"
+            filepath: Path to the file in the repo
+            ref: Git ref (commit SHA) to fetch the file at
+        Returns:
+            The file content as a string, or None if the file is binary/too large
+        """
+        response = await http.get(
+            f"{GITHUB_API}/repos/{repo_full_name}/contents/{filepath}",
+            headers=headers,
+            params={"ref": ref},
+        )
+        if response.status_code == 404:
+            return None
+        response.raise_for_status()
+        data = response.json()
+        # GitHub returns "file" type for regular files.
+        # Skip directories, symlinks, or submodules.
+        if data.get("type") != "file":
+            return None
+        # Files > 1MB use a different API (Blobs). Skip for now — these are
+        # usually auto-generated or binary files, not worth reviewing.
+        if data.get("size", 0) > 1_000_000:
+            logger.info("Skipping large file", filepath=filepath, size=data["size"])
+            return None
+        # Decode the base64-encoded content
+        content_b64 = data.get("content", "")
+        try:
+            return base64.b64decode(content_b64).decode("utf-8")
+        except (UnicodeDecodeError, Exception):
+            # Binary file — can't decode as UTF-8
+            return None
+    async def post_review(
+        self,
+        repo_full_name: str,
+        pr_number: int,
+        commit_sha: str,
+        body: str,
+        comments: list[dict],
+    ) -> dict:
+        """
+        Post a pull request review with inline comments.
+        This is the core output mechanism of CodeProbe. A "review" in GitHub terms
+        is a batch of inline comments submitted together, optionally with a top-level
+        body and an event type (APPROVE, REQUEST_CHANGES, COMMENT).
+        Each inline comment is anchored to a specific file and line, so it appears
+        right next to the relevant code — just like a human reviewer would comment.
+        GitHub's review API is atomic: either all comments post successfully, or
+        none do. This prevents partial reviews that would confuse developers.
+        Args:
+            repo_full_name: "owner/repo"
+            pr_number: PR number
+            commit_sha: The exact commit SHA these comments reference
+            body: The top-level review summary (shown above inline comments)
+            comments: List of dicts with keys:
+                - path: file path (e.g. "src/auth/login.py")
+                - line: line number in the diff (the new file's line number)
+                - body: the comment text (Markdown supported)
+        Returns:
+            The GitHub API response as a dict
+        """
+        headers = await self._get_headers()
+        # We use "COMMENT" event — this posts the review without approving or
+        # requesting changes. Our bot shouldn't block PRs at the GitHub level;
+        # instead, we indicate blocking via the Health Score in the summary.
+        review_payload = {
+            "commit_id": commit_sha,
+            "body": body,
+            "event": "COMMENT",
+            "comments": comments,
+        }
+        async with httpx.AsyncClient(timeout=30.0) as http:
+            response = await http.post(
+                f"{GITHUB_API}/repos/{repo_full_name}/pulls/{pr_number}/reviews",
+                headers=headers,
+                json=review_payload,
+            )
+            response.raise_for_status()
+        logger.info(
+            "Posted PR review",
+            repo=repo_full_name,
+            pr=pr_number,
+            inline_comments=len(comments),
+        )
+        return response.json()
+    async def post_comment(
+        self, repo_full_name: str, pr_number: int, body: str
+    ) -> dict:
+        """
+        Post a standalone comment on the PR conversation (not inline).
+        Used for the summary comment (Health Score, finding counts, executive summary)
+        when we don't have inline comments, or as a fallback.
+        This uses the Issues API (PRs are issues in GitHub's data model) rather
+        than the Pull Request Review API.
+        Args:
+            repo_full_name: "owner/repo"
+            pr_number: PR number
+            body: Comment text (Markdown)
+        Returns:
+            The GitHub API response as a dict
+        """
+        headers = await self._get_headers()
+        async with httpx.AsyncClient(timeout=30.0) as http:
+            response = await http.post(
+                f"{GITHUB_API}/repos/{repo_full_name}/issues/{pr_number}/comments",
+                headers=headers,
+                json={"body": body},
+            )
+            response.raise_for_status()
+        logger.info("Posted PR comment", repo=repo_full_name, pr=pr_number)
+        return response.json()

app/github/comment_formatter.py ADDED Viewed

	@@ -0,0 +1,215 @@

+"""
+GitHub Comment Formatter
+=========================
+Converts our internal Finding and SynthesizedReview data structures into
+GitHub-flavored Markdown for posting as PR comments.
+Two types of output:
+1. **Inline comments** — one per finding, anchored to a specific file+line.
+   These appear right next to the code, like a human reviewer's comments.
+2. **Summary comment** — a top-level PR comment with the Health Score,
+   finding counts by severity, and an executive summary.
+Design decisions:
+- We use emoji prefixes for severity to make scanning fast (most devs skim reviews)
+- Each inline comment includes the agent name and category for traceability
+- CWE IDs are linked for security findings (so devs can learn about the vulnerability)
+- Suggested fixes use fenced code blocks for easy copy-paste
+"""
+from __future__ import annotations
+from app.models.findings import Finding, SynthesizedReview
+# Emoji and color mapping for severity levels
+SEVERITY_EMOJI = {
+    "critical": "\U0001f6a8",  # 🚨
+    "high": "\U0001f7e0",      # 🟠
+    "medium": "\U0001f7e1",    # 🟡
+    "low": "\u2139\ufe0f",     # ℹ️
+}
+AGENT_EMOJI = {
+    "security": "\U0001f512",     # 🔒
+    "performance": "\u26a1",      # ⚡
+    "style": "\u270f\ufe0f",      # ✏️
+}
+def format_inline_comment(finding: Finding) -> str:
+    """
+    Format a single Finding as a GitHub inline comment body.
+    This Markdown will appear anchored to the specific file+line in the PR diff.
+    Example output:
+        🚨 **[CRITICAL — Security] SQL Injection Risk**
+        The query on line 47 constructs SQL via string interpolation.
+        User input is directly embedded without sanitization.
+        **Suggested fix:**
+        ```python
+        cursor.execute('SELECT * FROM users WHERE id = %s', (user_id,))
+        ```
+        > 🔒 Security · CWE-89 · Confidence: 0.92
+    """
+    severity_emoji = SEVERITY_EMOJI.get(finding.severity, "")
+    agent_emoji = AGENT_EMOJI.get(finding.agent, "")
+    severity_upper = finding.severity.upper()
+    agent_title = finding.agent.capitalize()
+    # Build the comment body
+    lines = [
+        f"{severity_emoji} **[{severity_upper} — {agent_title}] {finding.title}**",
+        "",
+        finding.description,
+    ]
+    # Add suggested fix if present
+    if finding.suggested_fix:
+        lines.extend([
+            "",
+            "**Suggested fix:**",
+            "```",
+            finding.suggested_fix,
+            "```",
+        ])
+    # Add metadata footer
+    footer_parts = [f"{agent_emoji} {agent_title}"]
+    if finding.cwe_id:
+        footer_parts.append(f"[{finding.cwe_id}](https://cwe.mitre.org/data/definitions/{finding.cwe_id.split('-')[1]}.html)")
+    footer_parts.append(f"Confidence: {finding.confidence:.2f}")
+    lines.extend(["", f"> {' · '.join(footer_parts)}"])
+    return "\n".join(lines)
+def format_summary_comment(review: SynthesizedReview) -> str:
+    """
+    Format the top-level PR summary comment with Health Score and finding overview.
+    This is posted as a regular PR comment (not inline). It gives the PR author
+    a quick overview without needing to look at every inline comment.
+    The Health Score gauge uses block characters to create a visual progress bar
+    in pure Unicode (works in GitHub Markdown without images).
+    """
+    score = review.health_score
+    # Determine overall status
+    if score >= 80:
+        status_emoji = "\u2705"  # ✅
+        status_text = "Healthy"
+    elif score >= 60:
+        status_emoji = "\u26a0\ufe0f"  # ⚠️
+        status_text = "Needs Attention"
+    else:
+        status_emoji = "\u274c"  # ❌
+        status_text = "Action Required"
+    # Build the visual health bar (20 segments)
+    filled = round(score / 5)
+    bar = "\u2588" * filled + "\u2591" * (20 - filled)
+    # Count total findings
+    total = (
+        review.critical_count
+        + review.high_count
+        + review.medium_count
+        + review.low_count
+    )
+    lines = [
+        f"## {status_emoji} Ninja Code Guard Review — Health Score: {score}/100",
+        "",
+        f"`{bar}` **{score}**/100 — {status_text}",
+        "",
+        "### Findings Summary",
+        "",
+        f"| Severity | Count |",
+        f"|----------|-------|",
+        f"| \U0001f6a8 Critical | {review.critical_count} |",
+        f"| \U0001f7e0 High | {review.high_count} |",
+        f"| \U0001f7e1 Medium | {review.medium_count} |",
+        f"| \u2139\ufe0f Low | {review.low_count} |",
+        f"| **Total** | **{total}** |",
+        "",
+    ]
+    # Add recommendation
+    rec_map = {
+        "approve": "\u2705 **Recommendation: Approve** — No critical issues found.",
+        "request_changes": "\u26a0\ufe0f **Recommendation: Request Changes** — Issues found that should be addressed.",
+        "block": "\u274c **Recommendation: Block Merge** — Critical issues must be resolved before merging.",
+    }
+    lines.append(rec_map.get(review.recommendation, ""))
+    lines.append("")
+    # Add executive summary
+    lines.extend([
+        "### Executive Summary",
+        "",
+        review.executive_summary,
+        "",
+    ])
+    # Add detailed findings (so all info is visible even if inline comments fail)
+    if review.findings:
+        lines.append("### Detailed Findings")
+        lines.append("")
+        for i, finding in enumerate(review.findings, 1):
+            severity_emoji = SEVERITY_EMOJI.get(finding.severity, "")
+            agent_emoji = AGENT_EMOJI.get(finding.agent, "")
+            lines.append(
+                f"<details>\n"
+                f"<summary>{severity_emoji} <b>[{finding.severity.upper()}]</b> "
+                f"{finding.title} — <code>{finding.file_path}:{finding.line_start}</code></summary>\n\n"
+                f"{finding.description}\n"
+            )
+            if finding.suggested_fix:
+                lines.append(f"**Suggested fix:**\n```\n{finding.suggested_fix}\n```\n")
+            footer_parts = [f"{agent_emoji} {finding.agent.capitalize()}"]
+            if finding.cwe_id:
+                cwe_num = finding.cwe_id.split("-")[-1] if "-" in finding.cwe_id else ""
+                footer_parts.append(f"[{finding.cwe_id}](https://cwe.mitre.org/data/definitions/{cwe_num}.html)")
+            footer_parts.append(f"Confidence: {finding.confidence:.2f}")
+            lines.append(f"> {' · '.join(footer_parts)}\n")
+            lines.append("</details>\n")
+    lines.extend([
+        "---",
+        "*Reviewed by [Ninja Code Guard](https://github.com/ninjacode911/ninja-code-guard) — Multi-agent code review*",
+    ])
+    return "\n".join(lines)
+def findings_to_review_comments(findings: list[Finding]) -> list[dict]:
+    """
+    Convert a list of Findings into GitHub review comment dicts.
+    Each dict has the structure that GitHub's Create Review API expects:
+    - path: the file path relative to repo root
+    - line: the line number in the NEW version of the file
+    - body: the formatted Markdown comment
+    Note: GitHub requires `line` to be within the diff hunk. If a finding
+    references a line outside the diff, we skip it (GitHub API would reject it).
+    We use `line` (not `position`) because position-based comments are deprecated.
+    """
+    comments = []
+    for finding in findings:
+        comment = {
+            "path": finding.file_path,
+            "line": finding.line_start,
+            "side": "RIGHT",  # RIGHT = new version of the file (what the PR introduces)
+            "body": format_inline_comment(finding),
+        }
+        comments.append(comment)
+    return comments

app/github/webhook.py ADDED Viewed

	@@ -0,0 +1,84 @@

+"""
+GitHub Webhook Signature Validation
+====================================
+When GitHub sends a webhook event to our server, it includes a cryptographic
+signature in the `X-Hub-Signature-256` header. This signature proves the request
+genuinely came from GitHub, not from an attacker.
+The signature is computed as: HMAC-SHA256(webhook_secret, request_body)
+We recompute the same HMAC on our side and compare. If they match, the request
+is authentic. We use `hmac.compare_digest()` for constant-time comparison to
+prevent timing attacks — where an attacker measures response time differences
+to guess the signature byte by byte.
+Reference: https://docs.github.com/en/webhooks/using-webhooks/validating-webhook-deliveries
+"""
+import hashlib
+import hmac
+from fastapi import Header, HTTPException, Request
+from app.config import settings
+async def validate_webhook_signature(
+    request: Request,
+    x_hub_signature_256: str = Header(..., alias="X-Hub-Signature-256"),
+) -> bytes:
+    """
+    FastAPI dependency that validates the GitHub webhook HMAC-SHA256 signature.
+    How this works as a FastAPI dependency:
+    - FastAPI's dependency injection system calls this function before your endpoint runs
+    - It automatically extracts the X-Hub-Signature-256 header from the request
+    - If validation fails, it raises HTTPException and the endpoint never executes
+    - If it passes, it returns the raw request body for further processing
+    Args:
+        request: The incoming FastAPI request object (injected automatically)
+        x_hub_signature_256: The signature header from GitHub (extracted by FastAPI)
+    Returns:
+        The raw request body bytes (so the endpoint can parse it as JSON)
+    Raises:
+        HTTPException 401: If the signature is missing or invalid
+    """
+    # Read the raw request body — we need the exact bytes GitHub used to compute the HMAC.
+    # Important: we read raw bytes, NOT parsed JSON, because even a single whitespace
+    # difference would produce a completely different HMAC hash.
+    body = await request.body()
+    # Reject if webhook secret is not configured — empty secret = no security
+    if not settings.github_webhook_secret:
+        raise HTTPException(status_code=500, detail="Webhook secret not configured")
+    if not x_hub_signature_256:
+        raise HTTPException(status_code=401, detail="Missing webhook signature header")
+    # GitHub sends the signature as "sha256=<hex_digest>"
+    # We need to strip the "sha256=" prefix to get just the hex digest
+    if not x_hub_signature_256.startswith("sha256="):
+        raise HTTPException(status_code=401, detail="Invalid signature format")
+    received_signature = x_hub_signature_256[7:]  # Strip "sha256=" prefix
+    # Compute the expected HMAC using our stored webhook secret
+    # hmac.new() takes: key (bytes), message (bytes), hash algorithm
+    expected_signature = hmac.new(
+        key=settings.github_webhook_secret.encode("utf-8"),
+        msg=body,
+        digestmod=hashlib.sha256,
+    ).hexdigest()
+    # Constant-time comparison — this is critical for security.
+    # A naive `==` comparison short-circuits on the first different byte,
+    # which leaks timing information. compare_digest() always takes the
+    # same amount of time regardless of where the mismatch is.
+    if not hmac.compare_digest(expected_signature, received_signature):
+        raise HTTPException(status_code=401, detail="Invalid webhook signature")
+    return body

app/main.py ADDED Viewed

	@@ -0,0 +1,355 @@

+"""
+Ninja Code Guard — FastAPI Application Entry Point
+=============================================
+This is the main entry point for the Ninja Code Guard backend. It sets up:
+1. The FastAPI application with CORS middleware
+2. The /health endpoint (used by Render health checks and the pre-warm cron)
+3. The /webhook/github endpoint (receives PR events from GitHub)
+Request lifecycle for a PR review:
+    GitHub webhook → HMAC validation → Redis cache check → fetch PR data
+    → (Week 3+: run agents) → post review comments → cache result
+The webhook handler uses FastAPI's "Background Tasks" feature to process
+the review asynchronously. This means we return 200 to GitHub immediately
+(within their 10-second timeout) and do the heavy lifting in the background.
+Without this, GitHub would retry the webhook if we took too long.
+"""
+import asyncio
+import json
+import traceback
+from fastapi import (
+    BackgroundTasks, Depends, FastAPI, Header, HTTPException,
+    Request, Response, Security,
+)
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.security import APIKeyHeader
+import structlog
+from app.config import settings
+# ── API Key auth for dashboard endpoints ──────────────────────────────────
+_api_key_header = APIKeyHeader(name="X-API-Key", auto_error=False)
+async def verify_api_key(api_key: str = Security(_api_key_header)):
+    """Reject dashboard API requests that don't carry a valid API key."""
+    if not settings.dashboard_api_key:
+        return  # No key configured → allow (dev mode)
+    if api_key != settings.dashboard_api_key:
+        raise HTTPException(status_code=403, detail="Invalid or missing API key")
+from app.agents.performance_agent import PerformanceAgent
+from app.agents.security_agent import SecurityAgent
+from app.agents.style_agent import StyleAgent
+from app.agents.synthesizer import synthesize
+from app.context.indexer import index_repo_files
+from app.context.retriever import retrieve_context
+from app.db.postgres import save_review
+from app.db.redis_cache import is_already_reviewed, mark_as_reviewed
+from app.github.client import GitHubClient
+from app.github.comment_formatter import (
+    findings_to_review_comments,
+    format_inline_comment,
+    format_summary_comment,
+)
+from app.github.webhook import validate_webhook_signature
+logger = structlog.get_logger()
+_is_production = settings.environment == "production"
+app = FastAPI(
+    title="Ninja Code Guard",
+    description="Multi-agent PR review system",
+    version="0.1.0",
+    # Disable auto-generated docs in production (exposes API schema)
+    docs_url=None if _is_production else "/docs",
+    redoc_url=None if _is_production else "/redoc",
+    openapi_url=None if _is_production else "/openapi.json",
+)
+# CORS middleware allows the Next.js dashboard (on Vercel) to call our API.
+# In production, restrict origins to your actual Vercel domain.
+_allowed_origins = (
+    [o.strip() for o in settings.cors_allowed_origins.split(",") if o.strip()]
+    if settings.cors_allowed_origins
+    else ["http://localhost:3000"]
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=_allowed_origins,
+    allow_credentials=True,
+    allow_methods=["GET", "POST"],
+    allow_headers=["Content-Type", "X-API-Key", "X-GitHub-Event", "X-Hub-Signature-256"],
+)
+@app.get("/health")
+async def health_check():
+    """
+    Health check endpoint.
+    Used by:
+    - Render.com to verify the service is running (healthCheckPath in render.yaml)
+    - The GitHub Actions pre-warm cron to keep the service from going cold
+    - Our Next.js dashboard to show service status
+    """
+    return {"status": "ok", "service": "Ninja Code Guard"}
+# --- Dashboard API Endpoints ---
+@app.get("/api/repos/{owner}/{repo}/reviews")
+async def get_reviews(owner: str, repo: str, _=Depends(verify_api_key)):
+    """Get recent PR reviews for a repo (used by dashboard)."""
+    from app.db.postgres import get_repo_reviews
+    repo_full_name = f"{owner}/{repo}"
+    reviews = await get_repo_reviews(repo_full_name)
+    return {"repo": repo_full_name, "reviews": reviews}
+@app.get("/api/repos/{owner}/{repo}/stats")
+async def get_stats(owner: str, repo: str, _=Depends(verify_api_key)):
+    """Get aggregate stats for a repo (used by dashboard)."""
+    from app.db.postgres import get_repo_reviews
+    repo_full_name = f"{owner}/{repo}"
+    reviews = await get_repo_reviews(repo_full_name, limit=50)
+    if not reviews:
+        return {"repo": repo_full_name, "total_reviews": 0, "avg_health_score": 0}
+    avg_score = sum(r.get("health_score", 0) for r in reviews) / len(reviews)
+    return {
+        "repo": repo_full_name,
+        "total_reviews": len(reviews),
+        "avg_health_score": round(avg_score),
+        "reviews": reviews[:10],
+    }
+# --- Webhook Actions (what to do for each event type) ---
+# We only process these PR actions. Others (labeled, assigned, etc.) are irrelevant.
+RELEVANT_PR_ACTIONS = {"opened", "synchronize", "reopened", "ready_for_review"}
+async def _process_pr_review(
+    repo_full_name: str,
+    pr_number: int,
+    commit_sha: str,
+    installation_id: int,
+) -> None:
+    """
+    Background task: fetch PR data and post a review.
+    Pipeline:
+    1. Fetch PR diff and file contents from GitHub
+    2. Index files into ChromaDB for RAG context
+    3. Run 3 domain agents IN PARALLEL (asyncio.gather)
+    4. Merge all findings and compute health score
+    5. Post review to GitHub
+    6. Cache result in Redis
+    """
+    try:
+        logger.info(
+            "Starting PR review",
+            repo=repo_full_name,
+            pr=pr_number,
+            sha=commit_sha[:8],
+        )
+        # Step 1: Fetch PR data
+        client = GitHubClient(installation_id)
+        pr_data = await client.fetch_pr_data(repo_full_name, pr_number)
+        # Step 2: Index files for RAG context
+        # This embeds the file contents into ChromaDB so agents can
+        # semantically search for related code across the repo
+        rag_context = ""
+        try:
+            collection_name = await index_repo_files(
+                repo_full_name, pr_data.file_contents
+            )
+            rag_context = await retrieve_context(
+                collection_name, pr_data.diff[:5000]
+            )
+        except Exception as rag_err:
+            logger.warning("RAG context unavailable", error=str(rag_err))
+        # Step 3: Run all 3 domain agents IN PARALLEL
+        # asyncio.gather() runs all three concurrently — total latency is
+        # max(agent_latencies) instead of sum(agent_latencies).
+        # With Groq at 500+ tokens/sec, each agent takes 2-5 seconds.
+        # Parallel: ~5 seconds total. Sequential: ~15 seconds.
+        security_agent = SecurityAgent()
+        performance_agent = PerformanceAgent()
+        style_agent = StyleAgent()
+        security_findings, performance_findings, style_findings = await asyncio.gather(
+            security_agent.review(pr_data, rag_context),
+            performance_agent.review(pr_data, rag_context),
+            style_agent.review(pr_data, rag_context),
+        )
+        logger.info(
+            "All agents completed",
+            security=len(security_findings),
+            performance=len(performance_findings),
+            style=len(style_findings),
+            total=len(security_findings) + len(performance_findings) + len(style_findings),
+            repo=repo_full_name,
+            pr=pr_number,
+        )
+        # Step 4: Synthesize — deduplicate, rank, score, summarize
+        review = synthesize(security_findings, performance_findings, style_findings)
+        # Post the review to GitHub
+        if review.findings:
+            # Post inline comments anchored to specific lines
+            review_comments = findings_to_review_comments(review.findings)
+            try:
+                await client.post_review(
+                    repo_full_name,
+                    pr_number,
+                    commit_sha,
+                    body=format_summary_comment(review),
+                    comments=review_comments,
+                )
+            except Exception as review_err:
+                # If inline comments fail (e.g., line not in diff), fall back to summary only
+                logger.warning(
+                    "Inline review failed, posting summary comment instead",
+                    error=str(review_err),
+                )
+                await client.post_comment(
+                    repo_full_name, pr_number, format_summary_comment(review)
+                )
+        else:
+            # No findings — post a clean bill of health
+            await client.post_comment(
+                repo_full_name,
+                pr_number,
+                format_summary_comment(review),
+            )
+        # Save to Neon Postgres (for dashboard)
+        await save_review(repo_full_name, pr_number, commit_sha, review)
+        # Mark this commit as reviewed in Redis cache
+        await mark_as_reviewed(commit_sha)
+        logger.info(
+            "PR review completed",
+            repo=repo_full_name,
+            pr=pr_number,
+            sha=commit_sha[:8],
+        )
+    except Exception as e:
+        # Log the full traceback so we can debug failures
+        logger.error(
+            "PR review failed",
+            repo=repo_full_name,
+            pr=pr_number,
+            error=str(e),
+            traceback=traceback.format_exc(),
+        )
+@app.post("/webhook/github")
+async def webhook_github(
+    request: Request,
+    background_tasks: BackgroundTasks,
+    x_github_event: str = Header(..., alias="X-GitHub-Event"),
+    body: bytes = Depends(validate_webhook_signature),
+):
+    """
+    Receive and process GitHub webhook events.
+    This endpoint is called by GitHub whenever a PR event occurs on repos
+    where Ninja Code Guard is installed.
+    How the flow works:
+    1. FastAPI calls validate_webhook_signature() BEFORE this function runs
+       (it's a Depends() dependency). If HMAC validation fails, we never get here.
+    2. We parse the validated payload and check if it's a relevant event.
+    3. If it's a PR event we care about, we check Redis cache.
+    4. If not cached, we enqueue the review as a background task.
+    5. We return 200 immediately — GitHub expects a response within 10 seconds.
+    Why background tasks?
+    - GitHub has a 10-second webhook timeout. If we don't respond in time,
+      GitHub marks the delivery as failed and may retry (causing duplicates).
+    - Our actual review takes 15-20 seconds (agent calls + synthesis).
+    - So we acknowledge receipt immediately and process in the background.
+    Args:
+        request: The FastAPI request object
+        background_tasks: FastAPI's background task queue
+        x_github_event: The event type header (e.g., "pull_request")
+        body: The validated request body (returned by validate_webhook_signature)
+    """
+    # Parse the validated JSON payload
+    payload = json.loads(body)
+    # We only handle pull_request events for now
+    if x_github_event != "pull_request":
+        logger.debug("Ignoring non-PR event", github_event=x_github_event)
+        return {"status": "ignored", "reason": f"event type: {x_github_event}"}
+    action = payload.get("action", "")
+    if action not in RELEVANT_PR_ACTIONS:
+        logger.debug("Ignoring irrelevant PR action", action=action)
+        return {"status": "ignored", "reason": f"action: {action}"}
+    # Extract key data from the webhook payload
+    pr = payload["pull_request"]
+    repo_full_name = payload["repository"]["full_name"]
+    pr_number = payload["number"]
+    commit_sha = pr["head"]["sha"]
+    # Skip draft PRs — they're not ready for review
+    if pr.get("draft", False):
+        logger.info("Skipping draft PR", repo=repo_full_name, pr=pr_number)
+        return {"status": "ignored", "reason": "draft PR"}
+    # Check Redis cache — have we already reviewed this exact commit?
+    if await is_already_reviewed(commit_sha):
+        return {"status": "skipped", "reason": "already reviewed", "sha": commit_sha[:8]}
+    # Get the installation ID (needed for GitHub App authentication)
+    installation_id = payload.get("installation", {}).get("id")
+    if not installation_id:
+        logger.error("No installation ID in webhook payload")
+        return Response(status_code=400, content="Missing installation ID")
+    # Enqueue the review as a background task
+    # This returns 200 to GitHub immediately while processing continues
+    background_tasks.add_task(
+        _process_pr_review,
+        repo_full_name=repo_full_name,
+        pr_number=pr_number,
+        commit_sha=commit_sha,
+        installation_id=installation_id,
+    )
+    logger.info(
+        "Webhook received — review enqueued",
+        repo=repo_full_name,
+        pr=pr_number,
+        sha=commit_sha[:8],
+        action=action,
+    )
+    return {
+        "status": "accepted",
+        "pr": pr_number,
+        "sha": commit_sha[:8],
+    }

app/models/__init__.py ADDED Viewed

File without changes

app/models/findings.py ADDED Viewed

	@@ -0,0 +1,55 @@

+"""Core data models for agent findings and PR reviews."""
+from __future__ import annotations
+from typing import Literal, Optional
+from uuid import UUID, uuid4
+from pydantic import BaseModel, Field
+class Finding(BaseModel):
+    """A single finding produced by a domain agent."""
+    agent: Literal["security", "performance", "style"]
+    file_path: str
+    line_start: int
+    line_end: int
+    severity: Literal["critical", "high", "medium", "low"]
+    category: str
+    title: str
+    description: str
+    suggested_fix: str = ""
+    cwe_id: Optional[str] = None
+    confidence: float = Field(ge=0.0, le=1.0)
+class SynthesizedReview(BaseModel):
+    """Final synthesized review output from the Synthesizer Agent."""
+    health_score: int = Field(ge=0, le=100)
+    executive_summary: str
+    recommendation: Literal["approve", "request_changes", "block"]
+    findings: list[Finding]
+    critical_count: int = 0
+    high_count: int = 0
+    medium_count: int = 0
+    low_count: int = 0
+    duration_ms: int = 0
+class PRReviewRecord(BaseModel):
+    """Database record for a completed PR review."""
+    id: UUID = Field(default_factory=uuid4)
+    repo_full_name: str
+    pr_number: int
+    commit_sha: str
+    health_score: int = Field(ge=0, le=100)
+    critical_count: int = 0
+    high_count: int = 0
+    medium_count: int = 0
+    low_count: int = 0
+    summary: str = ""
+    findings: list[Finding] = []
+    duration_ms: int = 0

app/models/webhook_payloads.py ADDED Viewed

	@@ -0,0 +1,55 @@

+"""GitHub webhook event payload schemas."""
+from __future__ import annotations
+from typing import Optional
+from pydantic import BaseModel
+class GitHubUser(BaseModel):
+    login: str
+    id: int
+class GitHubRepo(BaseModel):
+    id: int
+    full_name: str
+    private: bool
+    default_branch: str = "main"
+class PullRequestHead(BaseModel):
+    sha: str
+    ref: str
+class PullRequest(BaseModel):
+    number: int
+    title: str
+    state: str
+    head: PullRequestHead
+    draft: bool = False
+    changed_files: Optional[int] = None
+    additions: Optional[int] = None
+    deletions: Optional[int] = None
+class PullRequestEvent(BaseModel):
+    """GitHub pull_request webhook event."""
+    action: str  # opened, synchronize, reopened, ready_for_review
+    number: int
+    pull_request: PullRequest
+    repository: GitHubRepo
+    sender: GitHubUser
+class Installation(BaseModel):
+    id: int
+class PullRequestEventWithInstallation(PullRequestEvent):
+    """Pull request event with GitHub App installation context."""
+    installation: Optional[Installation] = None

app/services/__init__.py ADDED Viewed

File without changes

app/services/health_score.py ADDED Viewed

	@@ -0,0 +1,85 @@

+"""
+PR Health Score Calculator
+===========================
+Computes a 0-100 health score for a PR based on finding density and severity.
+Formula:
+    base_score = 100
+    penalty = sum(SEVERITY_WEIGHTS[f.severity] * CONFIDENCE_FACTOR(f.confidence) for f in findings)
+    health_score = max(0, min(100, base_score - penalty))
+Severity weights are calibrated so that:
+- 1 critical finding drops the score by 25 points (one critical = action required)
+- 1 high finding drops by 15 points
+- 1 medium finding drops by 7 points
+- 1 low finding drops by 2 points
+Confidence factor scales the penalty — a finding with 0.5 confidence penalizes
+half as much as one with 1.0 confidence. This rewards agents for being honest
+about uncertainty.
+Score interpretation:
+    90-100: Excellent — safe to merge
+    70-89:  Good — minor issues, merge at discretion
+    50-69:  Needs attention — address before merging
+    30-49:  Poor — significant issues found
+    0-29:   Critical — do not merge
+"""
+from __future__ import annotations
+from app.models.findings import Finding
+SEVERITY_WEIGHTS = {
+    "critical": 25,
+    "high": 15,
+    "medium": 7,
+    "low": 2,
+}
+def calculate_health_score(findings: list[Finding]) -> int:
+    """
+    Calculate the PR Health Score from 0-100.
+    Higher confidence findings penalize more heavily. This incentivizes
+    agents to set confidence honestly — flagging everything as 1.0
+    confidence would over-penalize, while honest 0.6 confidence
+    for uncertain findings results in fairer scores.
+    """
+    if not findings:
+        return 100
+    total_penalty = 0.0
+    for finding in findings:
+        weight = SEVERITY_WEIGHTS.get(finding.severity, 5)
+        confidence_factor = max(0.3, finding.confidence)  # Minimum 0.3 floor
+        total_penalty += weight * confidence_factor
+    score = 100 - total_penalty
+    return max(0, min(100, round(score)))
+def determine_recommendation(
+    findings: list[Finding], health_score: int
+) -> str:
+    """
+    Determine the PR recommendation based on findings and score.
+    Logic:
+    - Any critical finding → block (regardless of score)
+    - Score < 50 → request_changes
+    - Score < 70 with high findings → request_changes
+    - Otherwise → approve
+    """
+    has_critical = any(f.severity == "critical" for f in findings)
+    has_high = any(f.severity == "high" for f in findings)
+    if has_critical:
+        return "block"
+    if health_score < 50:
+        return "request_changes"
+    if health_score < 70 and has_high:
+        return "request_changes"
+    return "approve"

app/tools/__init__.py ADDED Viewed

File without changes

app/tools/bandit_tool.py ADDED Viewed

	@@ -0,0 +1,173 @@

+"""
+Bandit Static Analysis Tool
+=============================
+Bandit is an open-source Python security linter. It parses Python code into an
+Abstract Syntax Tree (AST) and checks each node against a set of security rules.
+What Bandit catches:
+- SQL injection patterns (string formatting in SQL calls)
+- Use of eval(), exec(), os.system() (command injection risk)
+- Hardcoded passwords and bind addresses
+- Use of insecure hash functions (MD5, SHA1)
+- Insecure temp file creation
+- SSL/TLS verification disabled (requests.get(verify=False))
+- Use of pickle (deserialization attacks)
+What Bandit CANNOT catch:
+- Business logic flaws
+- Missing authentication/authorization
+- Cross-file data flow (it analyzes one file at a time)
+- Vulnerabilities in non-Python code
+That's why we combine Bandit (mechanical pattern matching) with the LLM (semantic
+understanding). Bandit provides high-confidence, low-noise signals that anchor the
+LLM's analysis.
+How it works:
+1. We write the changed Python files to a temp directory
+2. Run `bandit -r <dir> -f json` as a subprocess
+3. Parse the JSON output into a human-readable summary
+4. Feed this summary into the LLM's prompt as additional context
+"""
+from __future__ import annotations
+import json
+import subprocess
+import tempfile
+from pathlib import Path
+import structlog
+logger = structlog.get_logger()
+async def run_bandit(file_contents: dict[str, str]) -> str:
+    """
+    Run Bandit security analysis on Python files.
+    Args:
+        file_contents: dict of {filepath: source_code} for changed files
+    Returns:
+        A formatted string summarizing Bandit's findings, suitable for
+        including in an LLM prompt. Returns empty string if no Python
+        files or no findings.
+    """
+    # Filter to only Python files — Bandit only understands Python
+    python_files = {
+        path: content
+        for path, content in file_contents.items()
+        if path.endswith(".py")
+    }
+    if not python_files:
+        return ""
+    try:
+        # Create a temp directory and write the Python files there.
+        # We need files on disk because Bandit operates on the filesystem.
+        # tempfile.mkdtemp() creates a secure temp dir that only we can access.
+        with tempfile.TemporaryDirectory(prefix="ninjacg_bandit_") as tmpdir:
+            tmpdir_path = Path(tmpdir)
+            for filepath, content in python_files.items():
+                # Recreate the directory structure (e.g., src/auth/login.py)
+                file_path = tmpdir_path / filepath
+                file_path.parent.mkdir(parents=True, exist_ok=True)
+                file_path.write_text(content, encoding="utf-8")
+            # Run Bandit as a subprocess
+            # -r: recursive (scan all files in directory)
+            # -f json: output as JSON (machine-parseable)
+            # -ll: only report medium severity and above
+            # --quiet: suppress progress bar
+            result = subprocess.run(
+                [
+                    "bandit",
+                    "-r", str(tmpdir_path),
+                    "-f", "json",
+                    "-ll",
+                    "--quiet",
+                ],
+                capture_output=True,
+                text=True,
+                timeout=30,  # Kill if it takes too long
+            )
+            # Bandit exit codes:
+            # 0 = no issues found
+            # 1 = issues found (this is NOT an error)
+            # 2+ = actual error
+            if result.returncode > 1:
+                logger.warning("Bandit returned error", stderr=result.stderr[:500])
+                return ""
+            if not result.stdout.strip():
+                return ""
+            # Parse the JSON output
+            bandit_output = json.loads(result.stdout)
+            findings = bandit_output.get("results", [])
+            if not findings:
+                return "Bandit static analysis: No security issues detected."
+            # Format findings as a human-readable summary for the LLM
+            summary_lines = [
+                f"Bandit static analysis found {len(findings)} issue(s):\n"
+            ]
+            for i, finding in enumerate(findings, 1):
+                # Map the temp file path back to the original file path
+                temp_path = finding.get("filename", "")
+                original_path = _map_temp_to_original(temp_path, tmpdir, python_files)
+                severity = finding.get("issue_severity", "UNKNOWN")
+                confidence = finding.get("issue_confidence", "UNKNOWN")
+                text = finding.get("issue_text", "")
+                test_id = finding.get("test_id", "")
+                line_no = finding.get("line_number", 0)
+                code = finding.get("code", "").strip()
+                summary_lines.append(
+                    f"{i}. [{severity}/{confidence}] {text}\n"
+                    f"   File: {original_path}, Line: {line_no}\n"
+                    f"   Test: {test_id}\n"
+                    f"   Code: {code}\n"
+                )
+            summary = "\n".join(summary_lines)
+            logger.info("Bandit analysis complete", findings_count=len(findings))
+            return summary
+    except subprocess.TimeoutExpired:
+        logger.warning("Bandit timed out after 30 seconds")
+        return ""
+    except FileNotFoundError:
+        # Bandit not installed — this is OK, the LLM can still analyze
+        logger.warning("Bandit not found in PATH — skipping static analysis")
+        return ""
+    except Exception as e:
+        logger.warning("Bandit analysis failed", error=str(e))
+        return ""
+def _map_temp_to_original(
+    temp_path: str, tmpdir: str, original_files: dict[str, str]
+) -> str:
+    """Map a temp directory path back to the original file path."""
+    try:
+        # The temp path looks like: /tmp/ninjacg_bandit_xxx/src/auth/login.py
+        # We need to strip the tmpdir prefix to get: src/auth/login.py
+        relative = str(Path(temp_path).relative_to(tmpdir))
+        # Normalize path separators
+        relative = relative.replace("\\", "/")
+        # Verify it's one of our original files
+        if relative in original_files:
+            return relative
+    except (ValueError, Exception):
+        pass
+    # Fallback: return the filename only
+    return Path(temp_path).name

app/tools/detect_secrets_tool.py ADDED Viewed

	@@ -0,0 +1,118 @@

+"""
+detect-secrets Tool
+====================
+detect-secrets scans code for hardcoded credentials: API keys, passwords,
+database connection strings, AWS access keys, private keys, etc.
+Why a dedicated tool for secrets?
+- Hardcoded secrets are the #1 most common security finding in code reviews
+- They're easy to detect with regex/entropy analysis but easy to miss manually
+- detect-secrets uses both pattern matching AND Shannon entropy analysis:
+  - Pattern matching: finds things that LOOK like API keys (e.g., "sk_live_...")
+  - Entropy analysis: finds random-looking strings that might be secrets
+    (high entropy = lots of randomness = probably a key, not a variable name)
+What Shannon entropy means:
+- "hello" has low entropy (~2.8 bits/char) — predictable, probably not a secret
+- "a3f8g2kx9m" has high entropy (~3.9 bits/char) — random, might be a secret
+- detect-secrets flags strings above a configurable entropy threshold
+We run this on the PR diff specifically (not full files) because we only care
+about NEWLY introduced secrets, not pre-existing ones.
+"""
+from __future__ import annotations
+import json
+import subprocess
+import tempfile
+from pathlib import Path
+import structlog
+logger = structlog.get_logger()
+async def run_detect_secrets(file_contents: dict[str, str]) -> str:
+    """
+    Scan changed files for hardcoded secrets.
+    Args:
+        file_contents: dict of {filepath: source_code}
+    Returns:
+        A formatted string listing detected secrets, suitable for
+        including in an LLM prompt. Empty string if no secrets found.
+    """
+    if not file_contents:
+        return ""
+    try:
+        with tempfile.TemporaryDirectory(prefix="ninjacg_secrets_") as tmpdir:
+            tmpdir_path = Path(tmpdir)
+            for filepath, content in file_contents.items():
+                file_path = tmpdir_path / filepath
+                file_path.parent.mkdir(parents=True, exist_ok=True)
+                file_path.write_text(content, encoding="utf-8")
+            # Run detect-secrets scan
+            # --all-files: scan all file types
+            # --force-use-all-plugins: use every detection plugin
+            result = subprocess.run(
+                [
+                    "detect-secrets", "scan",
+                    str(tmpdir_path),
+                    "--all-files",
+                ],
+                capture_output=True,
+                text=True,
+                timeout=30,
+            )
+            if result.returncode != 0 and not result.stdout:
+                logger.warning("detect-secrets error", stderr=result.stderr[:500])
+                return ""
+            if not result.stdout.strip():
+                return ""
+            scan_results = json.loads(result.stdout)
+            results_map = scan_results.get("results", {})
+            # Count total secrets found
+            total_secrets = sum(len(secrets) for secrets in results_map.values())
+            if total_secrets == 0:
+                return "detect-secrets scan: No hardcoded secrets detected."
+            # Format findings
+            summary_lines = [
+                f"detect-secrets found {total_secrets} potential secret(s):\n"
+            ]
+            for file_path, secrets in results_map.items():
+                # Map temp path back to original
+                try:
+                    relative = str(Path(file_path).relative_to(tmpdir)).replace("\\", "/")
+                except ValueError:
+                    relative = Path(file_path).name
+                for secret in secrets:
+                    secret_type = secret.get("type", "Unknown")
+                    line_no = secret.get("line_number", 0)
+                    summary_lines.append(
+                        f"- {secret_type} in {relative} at line {line_no}"
+                    )
+            summary = "\n".join(summary_lines)
+            logger.info("detect-secrets scan complete", secrets_found=total_secrets)
+            return summary
+    except FileNotFoundError:
+        logger.warning("detect-secrets not found in PATH — skipping")
+        return ""
+    except Exception as e:
+        logger.warning("detect-secrets scan failed", error=str(e))
+        return ""

app/tools/linter_tool.py ADDED Viewed

	@@ -0,0 +1,113 @@

+"""
+Linter Tool (Ruff)
+===================
+Ruff is an extremely fast Python linter written in Rust. It replaces
+flake8, isort, pycodestyle, and dozens of other tools in a single binary.
+It runs 10-100x faster than traditional Python linters.
+What Ruff catches:
+- Unused imports (F401)
+- Undefined names (F821)
+- Unused variables (F841)
+- Import ordering issues (I001)
+- Unnecessary f-strings (F541)
+- Bare except clauses (E722)
+- And 800+ other rules
+We run Ruff on the changed files and feed the output to the Style Agent
+as additional context. The LLM then combines Ruff's mechanical findings
+with its own understanding of readability and maintainability.
+"""
+from __future__ import annotations
+import json
+import subprocess
+import tempfile
+from pathlib import Path
+import structlog
+logger = structlog.get_logger()
+async def run_ruff(file_contents: dict[str, str]) -> str:
+    """
+    Run Ruff linter on Python files.
+    Returns a formatted string of linting issues.
+    """
+    python_files = {
+        path: content
+        for path, content in file_contents.items()
+        if path.endswith(".py")
+    }
+    if not python_files:
+        return ""
+    try:
+        with tempfile.TemporaryDirectory(prefix="ninjacg_ruff_") as tmpdir:
+            tmpdir_path = Path(tmpdir)
+            for filepath, content in python_files.items():
+                file_path = tmpdir_path / filepath
+                file_path.parent.mkdir(parents=True, exist_ok=True)
+                file_path.write_text(content, encoding="utf-8")
+            # Run ruff check with JSON output
+            # --output-format json: machine-parseable output
+            # --select ALL: enable all rules (we want comprehensive feedback)
+            # --ignore E501: skip line-length (too noisy, not actionable)
+            result = subprocess.run(
+                [
+                    "ruff", "check",
+                    str(tmpdir_path),
+                    "--output-format", "json",
+                    "--select", "F,E,W,I,N,UP,B,A,SIM,RET,ARG",
+                    "--ignore", "E501,E402",
+                ],
+                capture_output=True,
+                text=True,
+                timeout=30,
+            )
+            # Ruff exit code 1 means issues found (not an error)
+            if not result.stdout.strip() or result.stdout.strip() == "[]":
+                return ""
+            issues = json.loads(result.stdout)
+            if not issues:
+                return ""
+            # Format findings
+            summary_lines = [f"Ruff linter found {len(issues)} issue(s):\n"]
+            for issue in issues[:20]:  # Cap at 20 to avoid prompt bloat
+                code = issue.get("code", "?")
+                message = issue.get("message", "")
+                filename = issue.get("filename", "")
+                line = issue.get("location", {}).get("row", 0)
+                try:
+                    relative = str(Path(filename).relative_to(tmpdir)).replace("\\", "/")
+                except ValueError:
+                    relative = Path(filename).name
+                summary_lines.append(f"- [{code}] {relative}:{line} — {message}")
+            if len(issues) > 20:
+                summary_lines.append(f"  ... and {len(issues) - 20} more issues")
+            summary = "\n".join(summary_lines)
+            logger.info("Ruff analysis complete", issues_count=len(issues))
+            return summary
+    except FileNotFoundError:
+        logger.warning("ruff not found in PATH — skipping lint analysis")
+        return ""
+    except Exception as e:
+        logger.warning("Ruff analysis failed", error=str(e))
+        return ""

app/tools/radon_tool.py ADDED Viewed

	@@ -0,0 +1,107 @@

+"""
+Radon Complexity Analysis Tool
+================================
+Radon measures cyclomatic complexity — the number of independent execution paths
+through a function. Higher complexity = more branches = harder to test and maintain,
+AND often correlates with performance issues (deeply nested conditionals often
+indicate O(n²) or worse algorithms).
+Complexity grades:
+  A (1-5):   Simple, low risk
+  B (6-10):  Moderate complexity
+  C (11-15): High complexity — consider refactoring
+  D (16-20): Very high — likely performance and maintenance issues
+  E (21-25): Extremely complex
+  F (26+):   Unmaintainable
+We report functions with complexity grade C or worse (>10) to the Performance Agent.
+The agent uses this as a signal to look deeper at those functions for algorithmic issues.
+"""
+from __future__ import annotations
+import json
+import subprocess
+import tempfile
+from pathlib import Path
+import structlog
+logger = structlog.get_logger()
+async def run_radon(file_contents: dict[str, str]) -> str:
+    """
+    Run radon cyclomatic complexity analysis on Python files.
+    Returns a formatted string summarizing high-complexity functions.
+    """
+    python_files = {
+        path: content
+        for path, content in file_contents.items()
+        if path.endswith(".py")
+    }
+    if not python_files:
+        return ""
+    try:
+        with tempfile.TemporaryDirectory(prefix="ninjacg_radon_") as tmpdir:
+            tmpdir_path = Path(tmpdir)
+            for filepath, content in python_files.items():
+                file_path = tmpdir_path / filepath
+                file_path.parent.mkdir(parents=True, exist_ok=True)
+                file_path.write_text(content, encoding="utf-8")
+            # Run radon cc (cyclomatic complexity) with JSON output
+            # -j: JSON output
+            # -n C: only show grade C or worse (complexity > 10)
+            result = subprocess.run(
+                ["radon", "cc", "-j", "-n", "C", str(tmpdir_path)],
+                capture_output=True,
+                text=True,
+                timeout=30,
+            )
+            if not result.stdout.strip() or result.stdout.strip() == "{}":
+                return ""
+            radon_output = json.loads(result.stdout)
+            # Collect high-complexity functions
+            findings = []
+            for file_path, functions in radon_output.items():
+                try:
+                    relative = str(Path(file_path).relative_to(tmpdir)).replace("\\", "/")
+                except ValueError:
+                    relative = Path(file_path).name
+                for func in functions:
+                    if not isinstance(func, dict):
+                        continue
+                    name = func.get("name", "unknown")
+                    complexity = func.get("complexity", 0)
+                    rank = func.get("rank", "?")
+                    lineno = func.get("lineno", 0)
+                    findings.append(
+                        f"- {relative}:{lineno} — `{name}()` complexity={complexity} (grade {rank})"
+                    )
+            if not findings:
+                return ""
+            summary = (
+                f"Radon complexity analysis found {len(findings)} high-complexity function(s):\n"
+                + "\n".join(findings)
+            )
+            logger.info("Radon analysis complete", high_complexity_count=len(findings))
+            return summary
+    except FileNotFoundError:
+        logger.warning("radon not found in PATH — skipping complexity analysis")
+        return ""
+    except Exception as e:
+        logger.warning("Radon analysis failed", error=str(e))
+        return ""

dashboard/.gitignore ADDED Viewed

	@@ -0,0 +1,41 @@

+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
+# dependencies
+/node_modules
+/.pnp
+.pnp.*
+.yarn/*
+!.yarn/patches
+!.yarn/plugins
+!.yarn/releases
+!.yarn/versions
+# testing
+/coverage
+# next.js
+/.next/
+/out/
+# production
+/build
+# misc
+.DS_Store
+*.pem
+# debug
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+.pnpm-debug.log*
+# env files (can opt-in for committing if needed)
+.env*
+# vercel
+.vercel
+# typescript
+*.tsbuildinfo
+next-env.d.ts

dashboard/AGENTS.md ADDED Viewed

	@@ -0,0 +1,5 @@

+<!-- BEGIN:nextjs-agent-rules -->
+# This is NOT the Next.js you know
+This version has breaking changes — APIs, conventions, and file structure may all differ from your training data. Read the relevant guide in `node_modules/next/dist/docs/` before writing any code. Heed deprecation notices.
+<!-- END:nextjs-agent-rules -->

dashboard/CLAUDE.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ @AGENTS.md

dashboard/README.md ADDED Viewed

	@@ -0,0 +1,36 @@

+This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/cli/create-next-app).
+## Getting Started
+First, run the development server:
+```bash
+npm run dev
+# or
+yarn dev
+# or
+pnpm dev
+# or
+bun dev
+```
+Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
+You can start editing the page by modifying `app/page.tsx`. The page auto-updates as you edit the file.
+This project uses [`next/font`](https://nextjs.org/docs/app/building-your-application/optimizing/fonts) to automatically optimize and load [Geist](https://vercel.com/font), a new font family for Vercel.
+## Learn More
+To learn more about Next.js, take a look at the following resources:
+- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API.
+- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial.
+You can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome!
+## Deploy on Vercel
+The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js.
+Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details.

dashboard/app/favicon.ico ADDED Viewed

dashboard/app/globals.css ADDED Viewed

	@@ -0,0 +1,152 @@

+@import "tailwindcss";
+:root {
+  --background: #050507;
+  --foreground: #f4f4f5;
+  --glass-bg: rgba(255, 255, 255, 0.03);
+  --glass-border: rgba(255, 255, 255, 0.06);
+  --glass-hover: rgba(255, 255, 255, 0.06);
+}
+@theme inline {
+  --color-background: var(--background);
+  --color-foreground: var(--foreground);
+  --font-sans: var(--font-geist-sans);
+  --font-mono: var(--font-geist-mono);
+}
+body {
+  background: var(--background);
+  color: var(--foreground);
+  font-family: var(--font-sans, system-ui, -apple-system, sans-serif);
+}
+/* ─── Dot grid background ─── */
+.dot-grid {
+  background-image: radial-gradient(circle, rgba(255, 255, 255, 0.04) 1px, transparent 1px);
+  background-size: 32px 32px;
+}
+/* ─── Animated gradient orbs ─── */
+.gradient-orb {
+  position: absolute;
+  border-radius: 50%;
+  filter: blur(120px);
+  opacity: 0.15;
+  pointer-events: none;
+  animation: orbFloat 20s ease-in-out infinite;
+}
+.gradient-orb-1 {
+  width: 600px;
+  height: 600px;
+  background: linear-gradient(135deg, #7c3aed, #6d28d9);
+  top: -200px;
+  right: -100px;
+  animation-delay: 0s;
+}
+.gradient-orb-2 {
+  width: 500px;
+  height: 500px;
+  background: linear-gradient(135deg, #06b6d4, #0891b2);
+  bottom: -150px;
+  left: -100px;
+  animation-delay: -7s;
+}
+.gradient-orb-3 {
+  width: 400px;
+  height: 400px;
+  background: linear-gradient(135deg, #ec4899, #be185d);
+  top: 40%;
+  left: 50%;
+  animation-delay: -14s;
+}
+@keyframes orbFloat {
+  0%, 100% { transform: translate(0, 0) scale(1); }
+  25% { transform: translate(30px, -40px) scale(1.05); }
+  50% { transform: translate(-20px, 20px) scale(0.95); }
+  75% { transform: translate(40px, 30px) scale(1.03); }
+}
+/* ─── Glass card ─── */
+.glass {
+  background: var(--glass-bg);
+  border: 1px solid var(--glass-border);
+  backdrop-filter: blur(20px);
+  -webkit-backdrop-filter: blur(20px);
+}
+.glass-hover:hover {
+  background: var(--glass-hover);
+  border-color: rgba(255, 255, 255, 0.1);
+}
+/* ─── Glow effects ─── */
+.glow-violet { box-shadow: 0 0 40px -10px rgba(139, 92, 246, 0.3); }
+.glow-green  { box-shadow: 0 0 40px -10px rgba(34, 197, 94, 0.3); }
+.glow-red    { box-shadow: 0 0 40px -10px rgba(239, 68, 68, 0.3); }
+.glow-amber  { box-shadow: 0 0 40px -10px rgba(245, 158, 11, 0.3); }
+/* ─── Gradient text ─── */
+.text-gradient {
+  background: linear-gradient(135deg, #c4b5fd 0%, #818cf8 50%, #6d28d9 100%);
+  -webkit-background-clip: text;
+  -webkit-text-fill-color: transparent;
+  background-clip: text;
+}
+.text-gradient-cyan {
+  background: linear-gradient(135deg, #67e8f9 0%, #22d3ee 50%, #06b6d4 100%);
+  -webkit-background-clip: text;
+  -webkit-text-fill-color: transparent;
+  background-clip: text;
+}
+/* ─── Shimmer border animation ─── */
+@keyframes shimmer {
+  0% { background-position: 200% 0; }
+  100% { background-position: -200% 0; }
+}
+.shimmer-border {
+  background: linear-gradient(
+    90deg,
+    transparent 0%,
+    rgba(139, 92, 246, 0.15) 25%,
+    rgba(6, 182, 212, 0.15) 50%,
+    rgba(139, 92, 246, 0.15) 75%,
+    transparent 100%
+  );
+  background-size: 200% 100%;
+  animation: shimmer 6s ease-in-out infinite;
+}
+/* ─── Scrollbar ─── */
+::-webkit-scrollbar {
+  width: 6px;
+  height: 6px;
+}
+::-webkit-scrollbar-track {
+  background: transparent;
+}
+::-webkit-scrollbar-thumb {
+  background: rgba(113, 113, 122, 0.3);
+  border-radius: 3px;
+}
+::-webkit-scrollbar-thumb:hover {
+  background: rgba(113, 113, 122, 0.5);
+}
+/* ─── Noise texture overlay ─── */
+.noise::before {
+  content: "";
+  position: fixed;
+  inset: 0;
+  z-index: 100;
+  pointer-events: none;
+  opacity: 0.015;
+  background-image: url("data:image/svg+xml,%3Csvg viewBox='0 0 256 256' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.9' numOctaves='4' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23n)'/%3E%3C/svg%3E");
+}

dashboard/app/layout.tsx ADDED Viewed

	@@ -0,0 +1,104 @@

+import type { Metadata } from "next";
+import { Geist, Geist_Mono } from "next/font/google";
+import Link from "next/link";
+import "./globals.css";
+const geistSans = Geist({
+  variable: "--font-geist-sans",
+  subsets: ["latin"],
+});
+const geistMono = Geist_Mono({
+  variable: "--font-geist-mono",
+  subsets: ["latin"],
+});
+export const metadata: Metadata = {
+  title: "Ninja Code Guard",
+  description:
+    "Multi-agent AI code review dashboard — security, performance & style analysis at a glance.",
+};
+export default function RootLayout({
+  children,
+}: Readonly<{
+  children: React.ReactNode;
+}>) {
+  return (
+    <html
+      lang="en"
+      className={`${geistSans.variable} ${geistMono.variable} h-full antialiased dark`}
+    >
+      <body className="noise min-h-full flex flex-col bg-[#050507] text-zinc-100">
+        {/* ── Gradient orbs (ambient background) ── */}
+        <div className="fixed inset-0 overflow-hidden pointer-events-none z-0">
+          <div className="gradient-orb gradient-orb-1" />
+          <div className="gradient-orb gradient-orb-2" />
+          <div className="gradient-orb gradient-orb-3" />
+        </div>
+        {/* ── Navigation ── */}
+        <header className="sticky top-0 z-50 border-b border-white/[0.06] bg-[#050507]/70 backdrop-blur-2xl">
+          <div className="mx-auto flex h-16 max-w-7xl items-center justify-between px-6 lg:px-8">
+            <Link href="/" className="flex items-center gap-3 group">
+              <span className="relative flex items-center justify-center w-9 h-9 rounded-xl bg-gradient-to-br from-violet-600 to-violet-800 shadow-lg shadow-violet-900/30 group-hover:shadow-violet-700/40 transition-shadow">
+                <svg
+                  xmlns="http://www.w3.org/2000/svg"
+                  viewBox="0 0 24 24"
+                  fill="currentColor"
+                  className="w-5 h-5 text-white"
+                >
+                  <path
+                    fillRule="evenodd"
+                    d="M12.516 2.17a.75.75 0 00-1.032 0 11.209 11.209 0 01-7.877 3.08.75.75 0 00-.722.515A12.74 12.74 0 002.25 9.75c0 5.942 4.064 10.933 9.563 12.348a.749.749 0 00.374 0c5.499-1.415 9.563-6.406 9.563-12.348 0-1.39-.223-2.73-.635-3.985a.75.75 0 00-.722-.516 11.209 11.209 0 01-7.877-3.08z"
+                    clipRule="evenodd"
+                  />
+                </svg>
+              </span>
+              <div className="flex flex-col">
+                <span className="text-[15px] font-semibold tracking-tight text-white leading-tight">
+                  Ninja Code Guard
+                </span>
+                <span className="text-[10px] font-medium text-zinc-500 tracking-widest uppercase">
+                  AI Review Platform
+                </span>
+              </div>
+            </Link>
+            <nav className="flex items-center gap-1">
+              <Link
+                href="/"
+                className="px-4 py-2 text-sm text-zinc-400 hover:text-white hover:bg-white/[0.04] rounded-lg transition-all duration-200"
+              >
+                Dashboard
+              </Link>
+              <a
+                href="https://github.com"
+                target="_blank"
+                rel="noopener noreferrer"
+                className="px-4 py-2 text-sm text-zinc-400 hover:text-white hover:bg-white/[0.04] rounded-lg transition-all duration-200"
+              >
+                GitHub
+              </a>
+            </nav>
+          </div>
+        </header>
+        {/* ── Content ── */}
+        <main className="relative z-10 flex-1">{children}</main>
+        {/* ── Footer ── */}
+        <footer className="relative z-10 border-t border-white/[0.04] py-8">
+          <div className="mx-auto max-w-7xl px-6 lg:px-8 flex items-center justify-between">
+            <p className="text-xs text-zinc-600">
+              &copy; {new Date().getFullYear()} Ninja Code Guard
+            </p>
+            <p className="text-xs text-zinc-700">
+              Multi-Agent AI Code Review Platform
+            </p>
+          </div>
+        </footer>
+      </body>
+    </html>
+  );
+}

dashboard/app/page.tsx ADDED Viewed

	@@ -0,0 +1,291 @@

+"use client";
+import Link from "next/link";
+import { motion } from "framer-motion";
+import { MOCK_REPOS } from "@/lib/api";
+import {
+  StaggerContainer,
+  StaggerItem,
+  FadeIn,
+  HoverCard,
+} from "@/components/motion";
+import { AnimatedCounter } from "@/components/AnimatedCounter";
+function scoreColor(score: number): string {
+  if (score >= 80) return "text-emerald-400";
+  if (score >= 60) return "text-amber-400";
+  return "text-red-400";
+}
+function scoreGlow(score: number): string {
+  if (score >= 80) return "group-hover:shadow-emerald-500/10";
+  if (score >= 60) return "group-hover:shadow-amber-500/10";
+  return "group-hover:shadow-red-500/10";
+}
+function scoreDot(score: number): string {
+  if (score >= 80) return "bg-emerald-400";
+  if (score >= 60) return "bg-amber-400";
+  return "bg-red-400";
+}
+const STATS = [
+  { label: "Repos Monitored", value: MOCK_REPOS.length, suffix: "" },
+  {
+    label: "Avg Health Score",
+    value: Math.round(
+      MOCK_REPOS.reduce((s, r) => s + r.health_score, 0) / MOCK_REPOS.length
+    ),
+    suffix: "%",
+  },
+  { label: "PRs Reviewed", value: 47, suffix: "" },
+  { label: "Issues Found", value: 132, suffix: "" },
+];
+const AGENTS = [
+  {
+    icon: (
+      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-6 h-6">
+        <path fillRule="evenodd" d="M12.516 2.17a.75.75 0 00-1.032 0 11.209 11.209 0 01-7.877 3.08.75.75 0 00-.722.515A12.74 12.74 0 002.25 9.75c0 5.942 4.064 10.933 9.563 12.348a.749.749 0 00.374 0c5.499-1.415 9.563-6.406 9.563-12.348 0-1.39-.223-2.73-.635-3.985a.75.75 0 00-.722-.516 11.209 11.209 0 01-7.877-3.08z" clipRule="evenodd" />
+      </svg>
+    ),
+    title: "Security Agent",
+    desc: "Scans for vulnerabilities, injection flaws, auth issues, and CWE-classified risks using Bandit and detect-secrets.",
+    color: "text-red-400",
+    bg: "from-red-500/10 via-red-500/5 to-transparent",
+    iconBg: "bg-red-500/10 text-red-400",
+    border: "border-red-500/10 hover:border-red-500/20",
+  },
+  {
+    icon: (
+      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-6 h-6">
+        <path fillRule="evenodd" d="M14.615 1.595a.75.75 0 01.359.852L12.982 9.75h7.268a.75.75 0 01.548 1.262l-10.5 11.25a.75.75 0 01-1.272-.71l1.992-7.302H3.75a.75.75 0 01-.548-1.262l10.5-11.25a.75.75 0 01.913-.143z" clipRule="evenodd" />
+      </svg>
+    ),
+    title: "Performance Agent",
+    desc: "Detects N+1 queries, memory leaks, blocking operations, and algorithmic inefficiencies with Radon analysis.",
+    color: "text-amber-400",
+    bg: "from-amber-500/10 via-amber-500/5 to-transparent",
+    iconBg: "bg-amber-500/10 text-amber-400",
+    border: "border-amber-500/10 hover:border-amber-500/20",
+  },
+  {
+    icon: (
+      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-6 h-6">
+        <path d="M11.7 2.805a.75.75 0 01.6 0A60.65 60.65 0 0122.83 8.72a.75.75 0 01-.231 1.337 49.949 49.949 0 00-9.902 3.912l-.003.002-.34.18a.75.75 0 01-.707 0A50.009 50.009 0 007.5 12.174v-.224c0-.131.067-.248.172-.311a54.614 54.614 0 014.653-2.52.75.75 0 00-.65-1.352 56.129 56.129 0 00-4.78 2.589 1.858 1.858 0 00-.859 1.228 49.803 49.803 0 00-4.634-1.527.75.75 0 01-.231-1.337A60.653 60.653 0 0111.7 2.805z" />
+        <path d="M13.06 15.473a48.45 48.45 0 017.666-3.282c.134 1.414.22 2.843.255 4.285a.75.75 0 01-.46.71 47.878 47.878 0 00-8.105 4.342.75.75 0 01-.832 0 47.877 47.877 0 00-8.104-4.342.75.75 0 01-.461-.71c.035-1.442.121-2.87.255-4.286A48.4 48.4 0 016 13.18v1.27a1.5 1.5 0 00-.14 2.508c-.09.38-.222.753-.397 1.11.452.213.901.434 1.346.661a6.729 6.729 0 00.551-1.608 1.5 1.5 0 00.14-2.67v-.645a48.549 48.549 0 013.44 1.668 2.25 2.25 0 002.12 0z" />
+        <path d="M4.462 19.462c.42-.419.753-.89 1-1.394.453.213.902.434 1.347.661a6.743 6.743 0 01-1.286 1.794.75.75 0 11-1.06-1.06z" />
+      </svg>
+    ),
+    title: "Style Agent",
+    desc: "Enforces naming conventions, reduces complexity, and ensures code consistency via Ruff linting.",
+    color: "text-cyan-400",
+    bg: "from-cyan-500/10 via-cyan-500/5 to-transparent",
+    iconBg: "bg-cyan-500/10 text-cyan-400",
+    border: "border-cyan-500/10 hover:border-cyan-500/20",
+  },
+];
+export default function HomePage() {
+  return (
+    <div className="dot-grid">
+      <div className="mx-auto max-w-7xl px-6 lg:px-8 py-16">
+        {/* ── Hero ── */}
+        <section className="text-center mb-20 pt-8">
+          <FadeIn delay={0}>
+            <div className="inline-flex items-center gap-2 rounded-full border border-violet-500/20 bg-violet-500/[0.06] px-4 py-1.5 text-sm text-violet-300 mb-8">
+              <span className="relative flex h-2 w-2">
+                <span className="animate-ping absolute inline-flex h-full w-full rounded-full bg-violet-400 opacity-75" />
+                <span className="relative inline-flex rounded-full h-2 w-2 bg-violet-500" />
+              </span>
+              Multi-Agent AI Review Platform
+            </div>
+          </FadeIn>
+          <FadeIn delay={0.1}>
+            <h1 className="text-5xl sm:text-7xl font-bold tracking-tight mb-6">
+              <span className="text-white">Code reviews,</span>
+              <br />
+              <span className="text-gradient">reimagined.</span>
+            </h1>
+          </FadeIn>
+          <FadeIn delay={0.2}>
+            <p className="text-lg sm:text-xl text-zinc-400 max-w-2xl mx-auto leading-relaxed">
+              Three specialised AI agents analyse every pull request for{" "}
+              <span className="text-red-400 font-medium">security</span>,{" "}
+              <span className="text-amber-400 font-medium">performance</span>,
+              and{" "}
+              <span className="text-cyan-400 font-medium">style</span>{" "}
+              — then synthesise a single, actionable review.
+            </p>
+          </FadeIn>
+        </section>
+        {/* ── Stats ── */}
+        <FadeIn delay={0.3}>
+          <section className="grid grid-cols-2 sm:grid-cols-4 gap-4 mb-20">
+            {STATS.map((s, i) => (
+              <div
+                key={s.label}
+                className="glass rounded-2xl p-5 text-center"
+              >
+                <p className="text-3xl sm:text-4xl font-bold text-white tabular-nums">
+                  <AnimatedCounter
+                    value={s.value}
+                    suffix={s.suffix}
+                    duration={1200 + i * 200}
+                  />
+                </p>
+                <p className="text-xs text-zinc-500 mt-2 font-medium tracking-wide uppercase">
+                  {s.label}
+                </p>
+              </div>
+            ))}
+          </section>
+        </FadeIn>
+        {/* ── Repositories ── */}
+        <section className="mb-24">
+          <FadeIn delay={0.15}>
+            <div className="flex items-center justify-between mb-6">
+              <h2 className="text-xl font-semibold text-white">
+                Repositories
+              </h2>
+              <span className="text-xs text-zinc-600 font-mono">
+                {MOCK_REPOS.length} monitored
+              </span>
+            </div>
+          </FadeIn>
+          <StaggerContainer className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-4 gap-4">
+            {MOCK_REPOS.map((repo) => (
+              <StaggerItem key={repo.full_name}>
+                <HoverCard>
+                  <Link
+                    href={`/repos/${repo.owner}/${repo.repo}`}
+                    className={`group block glass glass-hover rounded-2xl p-6 transition-all duration-300 hover:shadow-xl ${scoreGlow(
+                      repo.health_score
+                    )}`}
+                  >
+                    <div className="flex items-start justify-between mb-5">
+                      <div>
+                        <p className="text-xs text-zinc-600 font-mono mb-1">
+                          {repo.owner}/
+                        </p>
+                        <p className="text-base font-semibold text-zinc-200 group-hover:text-white transition-colors">
+                          {repo.repo}
+                        </p>
+                      </div>
+                      <div className="text-right">
+                        <span
+                          className={`text-3xl font-bold tabular-nums ${scoreColor(
+                            repo.health_score
+                          )}`}
+                        >
+                          {repo.health_score}
+                        </span>
+                      </div>
+                    </div>
+                    {/* Mini bar */}
+                    <div className="w-full h-1.5 rounded-full bg-white/[0.04] mb-4 overflow-hidden">
+                      <motion.div
+                        initial={{ width: 0 }}
+                        animate={{ width: `${repo.health_score}%` }}
+                        transition={{
+                          duration: 1,
+                          delay: 0.5,
+                          ease: [0.25, 0.46, 0.45, 0.94],
+                        }}
+                        className={`h-full rounded-full ${
+                          repo.health_score >= 80
+                            ? "bg-emerald-500"
+                            : repo.health_score >= 60
+                            ? "bg-amber-500"
+                            : "bg-red-500"
+                        }`}
+                      />
+                    </div>
+                    <div className="flex items-center justify-between text-xs text-zinc-500">
+                      <span className="flex items-center gap-1.5">
+                        <span className={`w-1.5 h-1.5 rounded-full ${scoreDot(repo.health_score)}`} />
+                        {repo.open_prs} open PRs
+                      </span>
+                      <span>{repo.last_review}</span>
+                    </div>
+                  </Link>
+                </HoverCard>
+              </StaggerItem>
+            ))}
+          </StaggerContainer>
+        </section>
+        {/* ── How It Works ── */}
+        <section className="mb-12">
+          <FadeIn>
+            <div className="text-center mb-12">
+              <h2 className="text-2xl font-bold text-white mb-3">
+                How It Works
+              </h2>
+              <p className="text-sm text-zinc-500 max-w-lg mx-auto">
+                Each PR triggers three specialised agents that run in parallel,
+                then a synthesizer merges their findings into one review.
+              </p>
+            </div>
+          </FadeIn>
+          {/* Pipeline visualization */}
+          <FadeIn delay={0.1}>
+            <div className="flex items-center justify-center mb-12">
+              <div className="flex items-center gap-2 text-xs font-mono text-zinc-500">
+                <span className="px-3 py-1.5 rounded-lg glass border border-white/[0.06]">
+                  PR Opened
+                </span>
+                <svg className="w-4 h-4 text-zinc-700" fill="none" viewBox="0 0 24 24" stroke="currentColor"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" /></svg>
+                <span className="px-3 py-1.5 rounded-lg glass border border-violet-500/20 text-violet-400">
+                  3 Agents
+                </span>
+                <svg className="w-4 h-4 text-zinc-700" fill="none" viewBox="0 0 24 24" stroke="currentColor"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" /></svg>
+                <span className="px-3 py-1.5 rounded-lg glass border border-cyan-500/20 text-cyan-400">
+                  Synthesize
+                </span>
+                <svg className="w-4 h-4 text-zinc-700" fill="none" viewBox="0 0 24 24" stroke="currentColor"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" /></svg>
+                <span className="px-3 py-1.5 rounded-lg glass border border-emerald-500/20 text-emerald-400">
+                  Review Posted
+                </span>
+              </div>
+            </div>
+          </FadeIn>
+          <StaggerContainer className="grid grid-cols-1 sm:grid-cols-3 gap-5">
+            {AGENTS.map((agent) => (
+              <StaggerItem key={agent.title}>
+                <HoverCard>
+                  <div
+                    className={`glass rounded-2xl p-6 border ${agent.border} transition-all duration-300 h-full`}
+                  >
+                    <div
+                      className={`w-11 h-11 rounded-xl ${agent.iconBg} flex items-center justify-center mb-4`}
+                    >
+                      {agent.icon}
+                    </div>
+                    <h3
+                      className={`text-base font-semibold mb-2 ${agent.color}`}
+                    >
+                      {agent.title}
+                    </h3>
+                    <p className="text-sm text-zinc-500 leading-relaxed">
+                      {agent.desc}
+                    </p>
+                  </div>
+                </HoverCard>
+              </StaggerItem>
+            ))}
+          </StaggerContainer>
+        </section>
+      </div>
+    </div>
+  );
+}

dashboard/app/repos/[owner]/[repo]/page.tsx ADDED Viewed

	@@ -0,0 +1,170 @@

+import Link from "next/link";
+import { getRepoReviews, getRepoStats } from "@/lib/api";
+import HealthScoreRing from "@/components/HealthScoreRing";
+import TrendChart from "@/components/TrendChart";
+import AgentBreakdown from "@/components/AgentBreakdown";
+import SeverityBadge from "@/components/SeverityBadge";
+import type { Severity } from "@/lib/types";
+export default async function RepoPage({
+  params,
+}: {
+  params: Promise<{ owner: string; repo: string }>;
+}) {
+  const { owner, repo } = await params;
+  const [reviews, stats] = await Promise.all([
+    getRepoReviews(owner, repo),
+    getRepoStats(owner, repo),
+  ]);
+  const latestScore = reviews[0]?.health_score ?? 0;
+  const previousScore = reviews[1]?.health_score;
+  const allFindings = reviews.flatMap((r) => r.findings);
+  return (
+    <div className="dot-grid">
+      <div className="mx-auto max-w-7xl px-6 lg:px-8 py-10">
+        {/* ── Breadcrumb ── */}
+        <nav className="flex items-center gap-2 text-sm text-zinc-600 mb-8">
+          <Link href="/" className="hover:text-zinc-400 transition-colors">
+            Dashboard
+          </Link>
+          <svg className="w-3.5 h-3.5 text-zinc-700" fill="none" viewBox="0 0 24 24" stroke="currentColor"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" /></svg>
+          <span className="text-zinc-400 font-medium">
+            {owner}/{repo}
+          </span>
+        </nav>
+        {/* ── Header ── */}
+        <div className="flex flex-col sm:flex-row sm:items-end sm:justify-between gap-6 mb-12">
+          <div>
+            <p className="text-xs text-zinc-600 font-mono mb-1">{owner}/</p>
+            <h1 className="text-3xl font-bold text-white">{repo}</h1>
+          </div>
+          <div className="flex items-center gap-8 text-sm">
+            {[
+              { label: "Reviews", value: stats.total_reviews },
+              { label: "Findings", value: stats.total_findings },
+              { label: "Avg Score", value: `${stats.average_health_score}%` },
+            ].map((s) => (
+              <div key={s.label} className="text-center">
+                <p className="text-2xl font-bold text-white tabular-nums">
+                  {s.value}
+                </p>
+                <p className="text-[10px] text-zinc-600 uppercase tracking-wider mt-0.5">
+                  {s.label}
+                </p>
+              </div>
+            ))}
+          </div>
+        </div>
+        {/* ── Score + Trend ── */}
+        <div className="grid grid-cols-1 lg:grid-cols-[200px_1fr] gap-8 mb-12">
+          <div className="flex items-center justify-center">
+            <HealthScoreRing
+              score={latestScore}
+              previousScore={previousScore}
+              label="Latest Score"
+            />
+          </div>
+          <TrendChart scores={stats.recent_scores} />
+        </div>
+        {/* ── Agent Breakdown ── */}
+        <section className="mb-12">
+          <h2 className="text-sm font-semibold text-zinc-400 mb-4 uppercase tracking-wider">
+            Agent Breakdown
+          </h2>
+          <AgentBreakdown findings={allFindings} />
+        </section>
+        {/* ── PR Reviews Table ── */}
+        <section>
+          <h2 className="text-sm font-semibold text-zinc-400 mb-4 uppercase tracking-wider">
+            Recent PR Reviews
+          </h2>
+          <div className="overflow-x-auto glass rounded-2xl">
+            <table className="w-full text-sm text-left">
+              <thead>
+                <tr className="border-b border-white/[0.04] text-zinc-500 text-[11px] uppercase tracking-wider">
+                  <th className="px-5 py-3.5 font-medium">PR</th>
+                  <th className="px-5 py-3.5 font-medium">Score</th>
+                  <th className="px-5 py-3.5 font-medium">Critical</th>
+                  <th className="px-5 py-3.5 font-medium">High</th>
+                  <th className="px-5 py-3.5 font-medium">Medium</th>
+                  <th className="px-5 py-3.5 font-medium">Low</th>
+                  <th className="px-5 py-3.5 font-medium">Summary</th>
+                  <th className="px-5 py-3.5 font-medium">Duration</th>
+                </tr>
+              </thead>
+              <tbody>
+                {reviews.map((r) => {
+                  const scoreClass =
+                    r.health_score >= 80
+                      ? "text-emerald-400"
+                      : r.health_score >= 60
+                      ? "text-amber-400"
+                      : "text-red-400";
+                  return (
+                    <tr
+                      key={r.id}
+                      className="border-b border-white/[0.03] hover:bg-white/[0.02] transition-colors"
+                    >
+                      <td className="px-5 py-3.5">
+                        <Link
+                          href={`/repos/${owner}/${repo}/prs/${r.pr_number}`}
+                          className="text-violet-400 hover:text-violet-300 font-medium transition-colors"
+                        >
+                          #{r.pr_number}
+                        </Link>
+                      </td>
+                      <td className={`px-5 py-3.5 font-bold tabular-nums ${scoreClass}`}>
+                        {r.health_score}
+                      </td>
+                      <td className="px-5 py-3.5">
+                        {r.critical_count > 0 ? (
+                          <SeverityBadge severity={"critical" as Severity} />
+                        ) : (
+                          <span className="text-zinc-700">0</span>
+                        )}
+                      </td>
+                      <td className="px-5 py-3.5">
+                        {r.high_count > 0 ? (
+                          <span className="text-orange-400 font-medium tabular-nums">
+                            {r.high_count}
+                          </span>
+                        ) : (
+                          <span className="text-zinc-700">0</span>
+                        )}
+                      </td>
+                      <td className="px-5 py-3.5">
+                        {r.medium_count > 0 ? (
+                          <span className="text-amber-400 tabular-nums">
+                            {r.medium_count}
+                          </span>
+                        ) : (
+                          <span className="text-zinc-700">0</span>
+                        )}
+                      </td>
+                      <td className="px-5 py-3.5 text-zinc-600 tabular-nums">
+                        {r.low_count}
+                      </td>
+                      <td className="px-5 py-3.5 text-zinc-500 truncate max-w-[240px] text-xs">
+                        {r.summary}
+                      </td>
+                      <td className="px-5 py-3.5 text-zinc-600 tabular-nums text-xs font-mono">
+                        {(r.duration_ms / 1000).toFixed(1)}s
+                      </td>
+                    </tr>
+                  );
+                })}
+              </tbody>
+            </table>
+          </div>
+        </section>
+      </div>
+    </div>
+  );
+}

dashboard/app/repos/[owner]/[repo]/prs/[number]/page.tsx ADDED Viewed

	@@ -0,0 +1,168 @@

+import Link from "next/link";
+import { getReviewDetail } from "@/lib/api";
+import HealthScoreRing from "@/components/HealthScoreRing";
+import FindingsTable from "@/components/FindingsTable";
+import AgentBreakdown from "@/components/AgentBreakdown";
+import type { Recommendation } from "@/lib/types";
+const RECOMMENDATION_STYLE: Record<
+  Recommendation,
+  { bg: string; text: string; label: string; dot: string }
+> = {
+  approve: {
+    bg: "bg-emerald-500/10",
+    text: "text-emerald-400",
+    label: "Approve",
+    dot: "bg-emerald-400",
+  },
+  request_changes: {
+    bg: "bg-amber-500/10",
+    text: "text-amber-400",
+    label: "Request Changes",
+    dot: "bg-amber-400",
+  },
+  block: {
+    bg: "bg-red-500/10",
+    text: "text-red-400",
+    label: "Block",
+    dot: "bg-red-400",
+  },
+};
+export default async function PRReviewPage({
+  params,
+}: {
+  params: Promise<{ owner: string; repo: string; number: string }>;
+}) {
+  const { owner, repo, number: prNum } = await params;
+  const prNumber = parseInt(prNum, 10);
+  const { review, record } = await getReviewDetail(owner, repo, prNumber);
+  const rec = RECOMMENDATION_STYLE[review.recommendation];
+  return (
+    <div className="dot-grid">
+      <div className="mx-auto max-w-7xl px-6 lg:px-8 py-10">
+        {/* ── Breadcrumb ── */}
+        <nav className="flex items-center gap-2 text-sm text-zinc-600 mb-8">
+          <Link href="/" className="hover:text-zinc-400 transition-colors">
+            Dashboard
+          </Link>
+          <svg className="w-3.5 h-3.5 text-zinc-700" fill="none" viewBox="0 0 24 24" stroke="currentColor"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" /></svg>
+          <Link
+            href={`/repos/${owner}/${repo}`}
+            className="hover:text-zinc-400 transition-colors"
+          >
+            {owner}/{repo}
+          </Link>
+          <svg className="w-3.5 h-3.5 text-zinc-700" fill="none" viewBox="0 0 24 24" stroke="currentColor"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" /></svg>
+          <span className="text-zinc-400 font-medium">PR #{prNumber}</span>
+        </nav>
+        {/* ── Header ── */}
+        <div className="flex flex-col sm:flex-row sm:items-start sm:justify-between gap-6 mb-12">
+          <div>
+            <p className="text-xs text-zinc-600 font-mono mb-1">
+              {owner}/{repo}
+            </p>
+            <h1 className="text-3xl font-bold text-white mb-4">
+              Pull Request #{prNumber}
+            </h1>
+            <div className="flex items-center gap-3">
+              <span
+                className={`inline-flex items-center gap-1.5 rounded-full px-3 py-1 text-xs font-semibold ${rec.bg} ${rec.text}`}
+              >
+                <span className={`w-1.5 h-1.5 rounded-full ${rec.dot}`} />
+                {rec.label}
+              </span>
+              <span className="text-[11px] text-zinc-600 font-mono">
+                {record.commit_sha}
+              </span>
+              <span className="text-[11px] text-zinc-700 font-mono">
+                {(record.duration_ms / 1000).toFixed(1)}s
+              </span>
+            </div>
+          </div>
+          <HealthScoreRing
+            score={review.health_score}
+            size={140}
+            label="Health Score"
+          />
+        </div>
+        {/* ── Executive Summary ── */}
+        <section className="glass rounded-2xl p-6 mb-8">
+          <h2 className="text-[10px] text-zinc-600 uppercase tracking-widest font-medium mb-3">
+            Executive Summary
+          </h2>
+          <p className="text-zinc-300 leading-relaxed text-[15px]">
+            {review.executive_summary}
+          </p>
+        </section>
+        {/* ── Severity Counts ── */}
+        <div className="grid grid-cols-2 sm:grid-cols-4 gap-4 mb-8">
+          {[
+            {
+              label: "Critical",
+              count: review.critical_count,
+              color: "text-red-400",
+              border: "border-red-500/[0.08]",
+              dot: "bg-red-400",
+            },
+            {
+              label: "High",
+              count: review.high_count,
+              color: "text-orange-400",
+              border: "border-orange-500/[0.08]",
+              dot: "bg-orange-400",
+            },
+            {
+              label: "Medium",
+              count: review.medium_count,
+              color: "text-amber-400",
+              border: "border-amber-500/[0.08]",
+              dot: "bg-amber-400",
+            },
+            {
+              label: "Low",
+              count: review.low_count,
+              color: "text-zinc-400",
+              border: "border-zinc-700/30",
+              dot: "bg-zinc-500",
+            },
+          ].map((s) => (
+            <div
+              key={s.label}
+              className={`glass rounded-2xl border ${s.border} p-5 text-center`}
+            >
+              <p className={`text-3xl font-bold tabular-nums ${s.color}`}>
+                {s.count}
+              </p>
+              <p className="text-[10px] text-zinc-600 mt-1 uppercase tracking-wider flex items-center justify-center gap-1.5">
+                <span className={`w-1.5 h-1.5 rounded-full ${s.dot}`} />
+                {s.label}
+              </p>
+            </div>
+          ))}
+        </div>
+        {/* ── Agent Breakdown ── */}
+        <section className="mb-8">
+          <h2 className="text-sm font-semibold text-zinc-400 mb-4 uppercase tracking-wider">
+            Agent Breakdown
+          </h2>
+          <AgentBreakdown findings={review.findings} />
+        </section>
+        {/* ── Findings ── */}
+        <section>
+          <h2 className="text-sm font-semibold text-zinc-400 mb-4 uppercase tracking-wider">
+            All Findings ({review.findings.length})
+          </h2>
+          <FindingsTable findings={review.findings} />
+        </section>
+      </div>
+    </div>
+  );
+}

dashboard/components/AgentBreakdown.tsx ADDED Viewed

	@@ -0,0 +1,113 @@

+"use client";
+import { motion } from "framer-motion";
+import type { Finding, AgentKind } from "@/lib/types";
+interface AgentBreakdownProps {
+  findings: Finding[];
+}
+const AGENT_META: Record<
+  AgentKind,
+  {
+    icon: React.ReactNode;
+    label: string;
+    color: string;
+    iconBg: string;
+    border: string;
+  }
+> = {
+  security: {
+    icon: (
+      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-5 h-5">
+        <path fillRule="evenodd" d="M12.516 2.17a.75.75 0 00-1.032 0 11.209 11.209 0 01-7.877 3.08.75.75 0 00-.722.515A12.74 12.74 0 002.25 9.75c0 5.942 4.064 10.933 9.563 12.348a.749.749 0 00.374 0c5.499-1.415 9.563-6.406 9.563-12.348 0-1.39-.223-2.73-.635-3.985a.75.75 0 00-.722-.516 11.209 11.209 0 01-7.877-3.08z" clipRule="evenodd" />
+      </svg>
+    ),
+    label: "Security",
+    color: "text-red-400",
+    iconBg: "bg-red-500/10 text-red-400",
+    border: "border-red-500/[0.08]",
+  },
+  performance: {
+    icon: (
+      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-5 h-5">
+        <path fillRule="evenodd" d="M14.615 1.595a.75.75 0 01.359.852L12.982 9.75h7.268a.75.75 0 01.548 1.262l-10.5 11.25a.75.75 0 01-1.272-.71l1.992-7.302H3.75a.75.75 0 01-.548-1.262l10.5-11.25a.75.75 0 01.913-.143z" clipRule="evenodd" />
+      </svg>
+    ),
+    label: "Performance",
+    color: "text-amber-400",
+    iconBg: "bg-amber-500/10 text-amber-400",
+    border: "border-amber-500/[0.08]",
+  },
+  style: {
+    icon: (
+      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-5 h-5">
+        <path d="M11.7 2.805a.75.75 0 01.6 0A60.65 60.65 0 0122.83 8.72a.75.75 0 01-.231 1.337 49.949 49.949 0 00-9.902 3.912l-.003.002-.34.18a.75.75 0 01-.707 0A50.009 50.009 0 007.5 12.174v-.224c0-.131.067-.248.172-.311a54.614 54.614 0 014.653-2.52.75.75 0 00-.65-1.352 56.129 56.129 0 00-4.78 2.589 1.858 1.858 0 00-.859 1.228 49.803 49.803 0 00-4.634-1.527.75.75 0 01-.231-1.337A60.653 60.653 0 0111.7 2.805z" />
+        <path d="M13.06 15.473a48.45 48.45 0 017.666-3.282c.134 1.414.22 2.843.255 4.285a.75.75 0 01-.46.71 47.878 47.878 0 00-8.105 4.342.75.75 0 01-.832 0 47.877 47.877 0 00-8.104-4.342.75.75 0 01-.461-.71c.035-1.442.121-2.87.255-4.286A48.4 48.4 0 016 13.18v1.27a1.5 1.5 0 00-.14 2.508c-.09.38-.222.753-.397 1.11.452.213.901.434 1.346.661a6.729 6.729 0 00.551-1.608 1.5 1.5 0 00.14-2.67v-.645a48.549 48.549 0 013.44 1.668 2.25 2.25 0 002.12 0z" />
+        <path d="M4.462 19.462c.42-.419.753-.89 1-1.394.453.213.902.434 1.347.661a6.743 6.743 0 01-1.286 1.794.75.75 0 11-1.06-1.06z" />
+      </svg>
+    ),
+    label: "Style",
+    color: "text-cyan-400",
+    iconBg: "bg-cyan-500/10 text-cyan-400",
+    border: "border-cyan-500/[0.08]",
+  },
+};
+export default function AgentBreakdown({ findings }: AgentBreakdownProps) {
+  const agents: AgentKind[] = ["security", "performance", "style"];
+  const stats = agents.map((agent) => {
+    const agentFindings = findings.filter((f) => f.agent === agent);
+    const catCounts: Record<string, number> = {};
+    agentFindings.forEach((f) => {
+      catCounts[f.category] = (catCounts[f.category] ?? 0) + 1;
+    });
+    const topCategory =
+      Object.entries(catCounts).sort((a, b) => b[1] - a[1])[0]?.[0] ?? "—";
+    return {
+      agent,
+      count: agentFindings.length,
+      topCategory,
+      meta: AGENT_META[agent],
+    };
+  });
+  return (
+    <div className="grid grid-cols-1 sm:grid-cols-3 gap-4">
+      {stats.map(({ agent, count, topCategory, meta }, i) => (
+        <motion.div
+          key={agent}
+          initial={{ opacity: 0, y: 16 }}
+          animate={{ opacity: 1, y: 0 }}
+          transition={{ duration: 0.4, delay: i * 0.08 }}
+          whileHover={{ y: -2, transition: { duration: 0.15 } }}
+          className={`glass rounded-2xl p-5 border ${meta.border} transition-colors duration-300`}
+        >
+          <div className="flex items-center gap-3 mb-4">
+            <div
+              className={`w-9 h-9 rounded-xl ${meta.iconBg} flex items-center justify-center`}
+            >
+              {meta.icon}
+            </div>
+            <h3 className={`text-sm font-semibold ${meta.color}`}>
+              {meta.label}
+            </h3>
+          </div>
+          <p className="text-3xl font-bold text-white tabular-nums">{count}</p>
+          <p className="text-[11px] text-zinc-600 mt-0.5 uppercase tracking-wider">
+            findings
+          </p>
+          <div className="mt-4 pt-3 border-t border-white/[0.04]">
+            <p className="text-[10px] text-zinc-600 uppercase tracking-wider">
+              Top category
+            </p>
+            <p className="text-xs text-zinc-400 font-medium truncate mt-0.5">
+              {topCategory}
+            </p>
+          </div>
+        </motion.div>
+      ))}
+    </div>
+  );
+}

dashboard/components/AnimatedCounter.tsx ADDED Viewed

	@@ -0,0 +1,44 @@

+"use client";
+import { useEffect, useRef, useState } from "react";
+interface AnimatedCounterProps {
+  value: number;
+  suffix?: string;
+  duration?: number;
+  className?: string;
+}
+export function AnimatedCounter({
+  value,
+  suffix = "",
+  duration = 1200,
+  className,
+}: AnimatedCounterProps) {
+  const [display, setDisplay] = useState(0);
+  const ref = useRef<HTMLSpanElement>(null);
+  const hasAnimated = useRef(false);
+  useEffect(() => {
+    if (hasAnimated.current) return;
+    hasAnimated.current = true;
+    const start = performance.now();
+    function tick(now: number) {
+      const elapsed = now - start;
+      const progress = Math.min(elapsed / duration, 1);
+      // ease-out expo
+      const ease = progress === 1 ? 1 : 1 - Math.pow(2, -10 * progress);
+      setDisplay(Math.round(ease * value));
+      if (progress < 1) requestAnimationFrame(tick);
+    }
+    requestAnimationFrame(tick);
+  }, [value, duration]);
+  return (
+    <span ref={ref} className={className}>
+      {display}
+      {suffix}
+    </span>
+  );
+}

dashboard/components/FindingsTable.tsx ADDED Viewed

	@@ -0,0 +1,185 @@

+"use client";
+import { useState, useMemo } from "react";
+import { motion, AnimatePresence } from "framer-motion";
+import type { Finding, Severity } from "@/lib/types";
+import SeverityBadge from "./SeverityBadge";
+const AGENT_ICON: Record<string, React.ReactNode> = {
+  security: (
+    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-4 h-4 text-red-400">
+      <path fillRule="evenodd" d="M12.516 2.17a.75.75 0 00-1.032 0 11.209 11.209 0 01-7.877 3.08.75.75 0 00-.722.515A12.74 12.74 0 002.25 9.75c0 5.942 4.064 10.933 9.563 12.348a.749.749 0 00.374 0c5.499-1.415 9.563-6.406 9.563-12.348 0-1.39-.223-2.73-.635-3.985a.75.75 0 00-.722-.516 11.209 11.209 0 01-7.877-3.08z" clipRule="evenodd" />
+    </svg>
+  ),
+  performance: (
+    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-4 h-4 text-amber-400">
+      <path fillRule="evenodd" d="M14.615 1.595a.75.75 0 01.359.852L12.982 9.75h7.268a.75.75 0 01.548 1.262l-10.5 11.25a.75.75 0 01-1.272-.71l1.992-7.302H3.75a.75.75 0 01-.548-1.262l10.5-11.25a.75.75 0 01.913-.143z" clipRule="evenodd" />
+    </svg>
+  ),
+  style: (
+    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-4 h-4 text-cyan-400">
+      <path d="M11.7 2.805a.75.75 0 01.6 0A60.65 60.65 0 0122.83 8.72a.75.75 0 01-.231 1.337 49.949 49.949 0 00-9.902 3.912l-.003.002-.34.18a.75.75 0 01-.707 0A50.009 50.009 0 007.5 12.174v-.224c0-.131.067-.248.172-.311a54.614 54.614 0 014.653-2.52.75.75 0 00-.65-1.352 56.129 56.129 0 00-4.78 2.589 1.858 1.858 0 00-.859 1.228 49.803 49.803 0 00-4.634-1.527.75.75 0 01-.231-1.337A60.653 60.653 0 0111.7 2.805z" />
+      <path d="M13.06 15.473a48.45 48.45 0 017.666-3.282c.134 1.414.22 2.843.255 4.285a.75.75 0 01-.46.71 47.878 47.878 0 00-8.105 4.342.75.75 0 01-.832 0 47.877 47.877 0 00-8.104-4.342.75.75 0 01-.461-.71c.035-1.442.121-2.87.255-4.286z" />
+    </svg>
+  ),
+};
+const SEVERITY_ORDER: Record<Severity, number> = {
+  critical: 0,
+  high: 1,
+  medium: 2,
+  low: 3,
+};
+type SortKey = "severity" | "agent" | "file_path" | "category" | "title";
+export default function FindingsTable({
+  findings,
+}: {
+  findings: Finding[];
+}) {
+  const [sortKey, setSortKey] = useState<SortKey>("severity");
+  const [sortAsc, setSortAsc] = useState(true);
+  const [expandedIdx, setExpandedIdx] = useState<number | null>(null);
+  const sorted = useMemo(() => {
+    const copy = [...findings];
+    copy.sort((a, b) => {
+      let cmp = 0;
+      if (sortKey === "severity") {
+        cmp = SEVERITY_ORDER[a.severity] - SEVERITY_ORDER[b.severity];
+      } else {
+        cmp = (a[sortKey] as string).localeCompare(b[sortKey] as string);
+      }
+      return sortAsc ? cmp : -cmp;
+    });
+    return copy;
+  }, [findings, sortKey, sortAsc]);
+  function handleSort(key: SortKey) {
+    if (key === sortKey) setSortAsc((v) => !v);
+    else {
+      setSortKey(key);
+      setSortAsc(true);
+    }
+  }
+  const arrow = (key: SortKey) =>
+    sortKey === key ? (sortAsc ? " \u25B2" : " \u25BC") : "";
+  return (
+    <motion.div
+      initial={{ opacity: 0, y: 12 }}
+      animate={{ opacity: 1, y: 0 }}
+      transition={{ duration: 0.4, delay: 0.1 }}
+      className="overflow-x-auto glass rounded-2xl"
+    >
+      <table className="w-full text-sm text-left">
+        <thead>
+          <tr className="border-b border-white/[0.04] text-zinc-500 text-[11px] uppercase tracking-wider">
+            {(
+              [
+                ["severity", "Severity"],
+                ["agent", "Agent"],
+                ["file_path", "File"],
+                ["category", "Category"],
+                ["title", "Title"],
+              ] as [SortKey, string][]
+            ).map(([key, label]) => (
+              <th
+                key={key}
+                onClick={() => handleSort(key)}
+                className="px-4 py-3.5 cursor-pointer select-none hover:text-zinc-300 transition-colors font-medium"
+              >
+                {label}
+                <span className="text-violet-400/70">{arrow(key)}</span>
+              </th>
+            ))}
+          </tr>
+        </thead>
+        <tbody>
+          {sorted.map((f, i) => {
+            const isExpanded = expandedIdx === i;
+            return (
+              <tr key={i} className="group">
+                <td colSpan={5} className="p-0">
+                  <button
+                    onClick={() => setExpandedIdx(isExpanded ? null : i)}
+                    className="w-full grid grid-cols-[100px_50px_1fr_130px_1fr] items-center text-left px-4 py-3 border-b border-white/[0.03] hover:bg-white/[0.02] transition-colors cursor-pointer"
+                  >
+                    <span>
+                      <SeverityBadge severity={f.severity} />
+                    </span>
+                    <span title={f.agent}>
+                      {AGENT_ICON[f.agent] ?? f.agent}
+                    </span>
+                    <span className="font-mono text-zinc-400 text-xs truncate pr-2">
+                      {f.file_path}
+                      <span className="text-zinc-700 ml-1">
+                        :{f.line_start}
+                      </span>
+                    </span>
+                    <span className="text-zinc-500 text-xs">{f.category}</span>
+                    <span className="text-zinc-300 text-xs truncate">
+                      {f.title}
+                    </span>
+                  </button>
+                  <AnimatePresence>
+                    {isExpanded && (
+                      <motion.div
+                        initial={{ height: 0, opacity: 0 }}
+                        animate={{ height: "auto", opacity: 1 }}
+                        exit={{ height: 0, opacity: 0 }}
+                        transition={{ duration: 0.25, ease: "easeInOut" }}
+                        className="overflow-hidden"
+                      >
+                        <div className="bg-white/[0.01] border-b border-white/[0.04] px-6 py-5 space-y-4">
+                          <div>
+                            <h4 className="text-[10px] text-zinc-600 uppercase tracking-widest mb-1.5 font-medium">
+                              Description
+                            </h4>
+                            <p className="text-zinc-300 text-sm leading-relaxed">
+                              {f.description}
+                            </p>
+                          </div>
+                          {f.suggested_fix && (
+                            <div>
+                              <h4 className="text-[10px] text-zinc-600 uppercase tracking-widest mb-1.5 font-medium">
+                                Suggested Fix
+                              </h4>
+                              <pre className="text-emerald-400/90 text-xs bg-emerald-500/[0.04] border border-emerald-500/10 rounded-xl px-4 py-3 overflow-x-auto whitespace-pre-wrap font-mono">
+                                {f.suggested_fix}
+                              </pre>
+                            </div>
+                          )}
+                          <div className="flex gap-5 text-[11px] text-zinc-600 pt-1">
+                            {f.cwe_id && (
+                              <span className="font-mono">{f.cwe_id}</span>
+                            )}
+                            <span>
+                              Confidence:{" "}
+                              <span className="text-zinc-400">
+                                {(f.confidence * 100).toFixed(0)}%
+                              </span>
+                            </span>
+                            <span>
+                              Lines{" "}
+                              <span className="text-zinc-400 font-mono">
+                                {f.line_start}–{f.line_end}
+                              </span>
+                            </span>
+                          </div>
+                        </div>
+                      </motion.div>
+                    )}
+                  </AnimatePresence>
+                </td>
+              </tr>
+            );
+          })}
+        </tbody>
+      </table>
+    </motion.div>
+  );
+}