Spaces:

NinjainPJs
/

ninja-code-guard

Sleeping

File size: 14,318 Bytes

4b445f6

# Week 1: Foundation & Setup — Detailed Documentation

> **Goal:** Project skeleton running locally, all external services provisioned.
> **Status:** Complete
> **Date:** 2026-03-19

---

## What We Accomplished

Week 1 established the entire project foundation: directory structure, configuration system,
data models, external service accounts, CI/CD pipeline, and the initial deployment config.

---

## Step-by-Step Log

### Step 1: Initialize the Project

**What we did:** Created the project directory structure following a modular Python backend
architecture with clear separation of concerns.

**Why this structure matters:**
```
app/                    ← All backend application code lives here
  agents/               ← One file per agent (security, performance, style, synthesizer)
  tools/                ← LangChain tool wrappers (semgrep, bandit, radon, etc.)
  context/              ← RAG pipeline (embedder → indexer → retriever)
  github/               ← All GitHub API interaction (webhook, auth, client, formatter)
  models/               ← Pydantic data models (Finding, PRReview, webhook payloads)
  db/                   ← Database & cache (Postgres, Redis)
  services/             ← Business logic (orchestrator, health score calculator)
dashboard/              ← Next.js frontend (deployed separately to Vercel)
tests/                  ← Mirrors the app/ structure (unit/, integration/, eval/)
prompts/                ← Agent system prompts as Markdown files
knowledge/              ← RAG knowledge bases (OWASP, DDIA, style guides)
docs/                   ← Project documentation (this file)
```

**Key principle:** Each directory has a single responsibility. The `agents/` folder doesn't
know about GitHub. The `github/` folder doesn't know about LangChain. The `services/`
folder orchestrates between them. This is called **separation of concerns** — it makes the
code testable, maintainable, and easy to explain in interviews.

**Commands run:**
```bash
# Create all directories
mkdir -p app/{agents,tools,context,github,models,db,services}
mkdir -p dashboard/{app/{repos,api},components,lib}
mkdir -p tests/{unit,integration,eval/dataset}
mkdir -p prompts knowledge/style_guides

# Create __init__.py files (makes directories Python packages)
touch app/__init__.py app/agents/__init__.py app/tools/__init__.py ...

# Initialize git
git init && git branch -m main
```

### Step 2: Create Configuration System (app/config.py)

**What we did:** Created a centralized configuration file using `pydantic-settings`.

**How it works:**
```python
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    groq_api_key: str = ""
    github_app_id: str = ""
    # ... all config vars

    model_config = {"env_file": ".env"}

settings = Settings()  # Singleton — imported everywhere
```

**Why pydantic-settings instead of plain os.environ?**
1. **Type safety** — `confidence_threshold: float = 0.6` ensures it's a float, not a string
2. **Validation** — pydantic raises clear errors if required vars are missing
3. **Defaults** — each setting has a sensible default for development
4. **Auto-loads .env** — reads from `.env` file automatically (via `model_config`)
5. **IDE autocomplete** — `settings.groq_api_key` instead of `os.environ.get("GROQ_API_KEY")`

**Interview talking point:** "We use pydantic-settings for type-safe configuration management
following the 12-factor app methodology — config lives in environment variables, not in code.
This makes the same codebase work in development, staging, and production with zero code changes."

### Step 3: Define Data Models (app/models/findings.py)

**What we did:** Created Pydantic models that define the exact shape of data flowing through
the system.

**Three core models:**

#### Finding — Output of each domain agent
```python
class Finding(BaseModel):
    agent: Literal["security", "performance", "style"]  # Which agent found this
    file_path: str              # e.g. "src/auth/login.py"
    line_start: int             # Where the issue starts
    line_end: int               # Where the issue ends
    severity: Literal["critical", "high", "medium", "low"]  # How bad is it
    category: str               # e.g. "sql_injection", "n+1_query"
    title: str                  # One-liner for the inline comment header
    description: str            # Full explanation
    suggested_fix: str          # Corrected code snippet
    cwe_id: Optional[str]       # CWE ID for security findings (e.g. "CWE-89")
    confidence: float           # 0.0–1.0, how sure the agent is
```

#### SynthesizedReview — Output of the Synthesizer Agent
```python
class SynthesizedReview(BaseModel):
    health_score: int           # 0-100 (the headline metric)
    executive_summary: str      # 3-5 sentences for PR description
    recommendation: Literal["approve", "request_changes", "block"]
    findings: list[Finding]     # Deduplicated, re-ranked findings
    critical_count: int         # Counts by severity
    # ...
```

#### PRReviewRecord — What gets stored in Postgres
```python
class PRReviewRecord(BaseModel):
    id: UUID                    # Primary key
    repo_full_name: str         # "ninjacode911/myapp"
    pr_number: int
    commit_sha: str
    health_score: int
    findings: list[Finding]     # Full findings as JSONB
    duration_ms: int            # How long the review took
```

**Why Pydantic models instead of plain dicts?**
1. **Validation** — `severity: Literal["critical", "high", "medium", "low"]` rejects invalid values
2. **Serialization** — `.model_dump()` converts to dict, `.model_dump_json()` to JSON
3. **Documentation** — the schema IS the documentation
4. **Type checking** — mypy catches bugs at development time, not production

**Interview talking point:** "Every data boundary in the system uses Pydantic models — agent
outputs, API responses, database records. This gives us runtime validation, IDE autocomplete,
and auto-generated OpenAPI docs. If an agent returns malformed JSON, Pydantic catches it
immediately instead of letting bad data propagate through the pipeline."

### Step 4: Define Webhook Payload Models (app/models/webhook_payloads.py)

**What we did:** Created typed models for GitHub's webhook JSON payloads.

**Why type the webhook payload?**
GitHub sends complex nested JSON. Without types, you'd write:
```python
sha = payload["pull_request"]["head"]["sha"]  # Easy to typo, no autocomplete
```
With Pydantic models:
```python
event = PullRequestEvent(**payload)
sha = event.pull_request.head.sha  # Autocomplete, type-checked
```

We didn't use these models in the final webhook handler (we used raw dict access for
simplicity), but they're available for stricter validation later.

### Step 5: Create FastAPI Skeleton (app/main.py)

**What we did:** Created the FastAPI application with a `/health` endpoint.

```python
app = FastAPI(title="Ninja Code Guard", version="0.1.0")

@app.get("/health")
async def health_check():
    return {"status": "ok", "service": "Ninja Code Guard", "version": "0.1.0"}
```

**Why a /health endpoint?**
- **Render.com** uses it to know if your service is alive (configured in render.yaml)
- **GitHub Actions cron** pings it every 10 minutes to prevent cold starts
- **The dashboard** calls it to show service status
- **Load balancers** (if you scale up) use it to route traffic only to healthy instances

### Step 6: Provision External Services

**What we did:** Created accounts and obtained credentials for all external services.

#### 6a. GitHub App — "Ninja's Code Guard"

**Where:** github.com/settings/apps/new

**What we configured:**
| Setting | Value | Reason |
|---------|-------|--------|
| Name | Ninja Code Guard | Bot identity: `ninjas-code-guard[bot]` |
| Homepage URL | github.com/ninjacode911/codeprobe | Points to our repo |
| Webhook Active | Yes | We need to receive PR events |
| Webhook Secret | (generated with `python -c "import secrets; print(secrets.token_hex(32))"`) | HMAC authentication |
| Contents | Read | Fetch full file source code for RAG context |
| Pull requests | Read & Write | Read diffs, post review comments |
| Commit statuses | Write | Show health score as commit status check |
| Metadata | Read | Required — basic repo info |
| Events | pull_request, pull_request_review_comment | Our trigger events |
| Install target | Only this account | Dev-mode only for now |

**What we got:**
- App ID: 3133457
- Private Key: `.pem` file saved to `keys/ninja-s-code-guard.2026-03-19.private-key.pem`
- Webhook Secret: saved to `.env`

**How GitHub App authentication works (important concept):**
```
Step 1: Sign a JWT with our private key (.pem)
        JWT payload = {iss: APP_ID, iat: now, exp: now+9min}
        Signed with RS256 (RSA + SHA-256)
        This proves: "I am the Ninja Code Guard app"

Step 2: Exchange JWT for an installation access token
        POST /app/installations/{id}/access_tokens
        Headers: Authorization: Bearer <JWT>
        Returns: token valid for 1 hour, scoped to installed repos
        This proves: "I can access ninjacode911's repos"

Step 3: Use installation token for all API calls
        GET /repos/ninjacode911/codeguard-test/pulls/1
        Headers: Authorization: token <installation_token>
```

#### 6b. Groq API

**Where:** console.groq.com
**What:** API key for Llama-3.1-70B inference (14,400 free requests/day)
**Saved as:** `GROQ_API_KEY` in `.env`

#### 6c. Neon.tech Postgres

**Where:** console.neon.tech
**What:** Serverless Postgres database (512MB free tier)
**Saved as:** `DATABASE_URL` in `.env`
**Used for:** Storing PR review history, health score trends, finding details

#### 6d. Upstash Redis

**Where:** console.upstash.com
**What:** Serverless Redis (10K requests/day free tier)
**Saved as:** `UPSTASH_REDIS_URL` in `.env`
**Used for:** Caching reviewed commit SHAs to prevent duplicate analysis

### Step 7: Create Configuration Files

#### .env.example
Template showing all required environment variables without actual values.
Committed to git so new developers know what to configure.

#### .gitignore
Prevents sensitive files from being committed:
- `.env` (contains API keys)
- `keys/` (contains private key .pem)
- `__pycache__/`, `.venv/` (generated files)
- `chroma_data/` (vector store data)
- `dashboard/node_modules/`, `dashboard/.next/` (Node.js generated)

#### pyproject.toml
Project metadata + tool configuration:
- `[tool.ruff]` — Python linter settings
- `[tool.pytest]` — Test configuration (asyncio mode, test paths)
- `[tool.mypy]` — Type checker settings

#### render.yaml
Render.com deployment configuration:
```yaml
services:
  - type: web
    name: ninja-code-guard
    buildCommand: pip install -r requirements.txt
    startCommand: uvicorn app.main:app --host 0.0.0.0 --port $PORT
    healthCheckPath: /health
    plan: free
```

#### sentinel.yml.example
Per-repo configuration template that users place in their repo root:
```yaml
agents:
  security: true
  performance: true
  style: true
min_severity: low
min_confidence: 0.6
exclude:
  - "vendor/"
  - "node_modules/"
```

### Step 8: Set Up CI/CD (GitHub Actions)

**Created two workflows:**

#### ci.yml — Runs on every push/PR
```yaml
steps:
  - Lint with ruff (catches style/import issues)
  - Type check with mypy (catches type errors)
  - Run tests with pytest
```

#### prewarm.yml — Cron job every 10 minutes on weekdays
```yaml
schedule: "*/10 6-20 * * 1-5"  # Every 10min, 6am-8pm UTC, Mon-Fri
steps:
  - curl the /health endpoint to prevent Render cold starts
```

**Why pre-warm?** Render's free tier spins down after 15 minutes of inactivity. The first
request after spindown takes ~30 seconds (cold start). By pinging /health every 10 minutes
during working hours, the service stays warm and responds instantly to webhooks.

### Step 9: Write Initial Tests

**Created:** `tests/unit/test_findings_schema.py` — 8 tests for data model validation

These tests verify:
- Valid Finding objects are accepted
- Invalid agent types are rejected
- Invalid severity levels are rejected
- Confidence must be between 0.0 and 1.0
- CWE ID is optional (None allowed)
- Health score must be 0-100
- Invalid recommendation values are rejected

---

## Files Created in Week 1

| File | Purpose |
|------|---------|
| `app/__init__.py` | Makes app a Python package |
| `app/config.py` | Centralized configuration via environment variables |
| `app/main.py` | FastAPI app with /health endpoint (expanded in Week 2) |
| `app/models/__init__.py` | Models package |
| `app/models/findings.py` | Finding, SynthesizedReview, PRReviewRecord schemas |
| `app/models/webhook_payloads.py` | GitHub webhook event payload types |
| `tests/conftest.py` | Shared test fixtures (sample finding data) |
| `tests/unit/test_findings_schema.py` | 8 schema validation tests |
| `.env` | Environment variables (gitignored — contains secrets) |
| `.env.example` | Template for .env (committed — no secrets) |
| `.gitignore` | Files to exclude from git |
| `pyproject.toml` | Project metadata + tool configs |
| `requirements.txt` | Python production dependencies |
| `requirements-dev.txt` | Dev/test dependencies |
| `render.yaml` | Render.com deployment config |
| `sentinel.yml.example` | Per-repo config template |
| `.github/workflows/ci.yml` | CI pipeline (lint + test) |
| `.github/workflows/prewarm.yml` | Render pre-warm cron |
| `keys/.gitignore` | Prevents .pem files from being committed |
| `PROJECT_PLAN.md` | Master project plan + progress tracker |

---

## Key Decisions Made

| Decision | Rationale |
|----------|-----------|
| Pydantic for all data models | Runtime validation + IDE autocomplete + auto-docs |
| pydantic-settings for config | Type-safe env vars, auto-loads .env, 12-factor pattern |
| FastAPI (not Flask/Django) | Async-native (needed for parallel agents), auto OpenAPI docs, modern Python |
| GitHub App (not Action) | One deployment serves all repos, webhook-driven, own bot identity |
| Upstash Redis (not in-memory cache) | Persists across Render restarts, shared across workers |
| Neon.tech (not SQLite) | Serverless, accessible from dashboard, persistent storage |

---

*Documentation written 2026-03-19 as part of Week 1 completion.*