Spaces:

NinjainPJs
/

ninja-code-guard

Sleeping

App Files Files Community

ninja-code-guard / docs /WEEK2_GITHUB_INTEGRATION.md

NinjainPJs

initial - commit

4b445f6 3 months ago

preview code

raw

history blame contribute delete

26.6 kB

Week 2: GitHub Integration — Detailed Documentation

Goal: Receive GitHub webhooks, validate signatures, fetch PR data, post comments. Status: Complete — End-to-end tested with live PR Date: 2026-03-19 Test PR: github.com/ninjacode911/codeguard-test/pull/1

What We Built

This week we built the communication layer between Ninja Code Guard and GitHub — the nervous system that listens for events, authenticates, fetches data, and responds.

End-to-end flow achieved:

PR opened on GitHub (21:54:52)
    → Webhook POST to our ngrok tunnel
    → HMAC-SHA256 signature validated
    → Redis cache checked (not previously reviewed)
    → Background task enqueued, 200 returned to GitHub
    → JWT signed with .pem, installation token obtained
    → PR diff + file contents fetched via GitHub API
    → Bot comment posted to PR #1
    → Commit SHA cached in Upstash Redis (7-day TTL)
    → Total time: ~5 seconds

Step-by-Step Implementation Log

Step 1: Webhook Signature Validation (app/github/webhook.py)

What: A FastAPI dependency that validates the HMAC-SHA256 signature on every incoming webhook request.

The problem it solves: Our /webhook/github endpoint is publicly accessible. Without validation, anyone could send fake webhook payloads to trigger bogus reviews, waste our Groq API quota, or spam PRs with fake comments.

How HMAC-SHA256 works:

                    Shared Secret
                    (GITHUB_WEBHOOK_SECRET)
                         │
          ┌──────────────┼──────────────┐
          │              │              │
      GitHub's side      │        Our server's side
          │              │              │
    request body ──→ HMAC-SHA256   HMAC-SHA256 ←── request body
          │              │              │
          ▼              │              ▼
    computed hash        │        computed hash
          │              │              │
    sent as header ──────┼──────→ compared with
    X-Hub-Signature-256  │        received header
                         │
                    Must match!

Key implementation details:

Raw bytes, not parsed JSON: We compute the HMAC on the raw request bytes, not parsed JSON. Even a single whitespace difference would produce a completely different hash. This is why we use await request.body() before any JSON parsing.
Constant-time comparison: We use hmac.compare_digest() instead of ==. A regular == short-circuits on the first different byte — an attacker could measure response time for different guesses and reconstruct the signature byte by byte. compare_digest() always takes the same time regardless of where the mismatch is. This is called a timing attack and is a real-world vulnerability (CVE-2013-0338, etc.).
FastAPI dependency injection: The validation is implemented as a Depends() function. FastAPI calls it automatically before the endpoint handler runs. If validation fails, the endpoint never executes. This ensures we can't accidentally forget to validate.

# How it's used in the endpoint — validation happens automatically via Depends()
@app.post("/webhook/github")
async def webhook_github(
    body: bytes = Depends(validate_webhook_signature),  # ← runs first
):
    payload = json.loads(body)  # Only reached if signature is valid

Signature format from GitHub:

X-Hub-Signature-256: sha256=5d7230d4d964e5c12a7e4e94c...
                     ^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     prefix   hex-encoded HMAC digest

Interview talking point: "We validate webhook authenticity using HMAC-SHA256 with constant-time comparison to prevent timing attacks. The validation is implemented as a FastAPI dependency so it's impossible to skip — the endpoint function only executes after successful validation."

Step 2: GitHub App JWT Authentication (app/github/auth.py)

What: Two-step authentication flow — sign a JWT, exchange it for a scoped token.

The problem it solves: We need to call GitHub's API (fetch PR data, post comments) on behalf of our installed app. GitHub needs to verify that API calls are coming from the registered "Ninja Code Guard" app, not from an impersonator.

Step 1 — JWT Generation:

A JWT (JSON Web Token) is a signed token with three parts:

eyJhbGciOiJSUzI1NiJ9.eyJpYXQiOjE3MTEuLi4sImV4cCI6MTcxMS4uLiwiX.SflKxwRJ...
└──────── Header ────────┘└────────── Payload ───────────┘└── Signature ──┘

Header:  {"alg": "RS256", "typ": "JWT"}
Payload: {"iat": <issued_at>, "exp": <expires_at>, "iss": "3133457"}
Signature: RSA-SHA256(header + "." + payload, private_key)

Why RS256 (RSA + SHA-256)?

This is asymmetric cryptography: we sign with our private key (.pem), GitHub verifies with the matching public key (stored when we registered the app)
Even if someone intercepts a JWT, they can't create new ones without the .pem file
This is the same algorithm used by Google Cloud, AWS Cognito, and Auth0

Code walkthrough:

def _generate_jwt() -> str:
    now = int(time.time())

    # Read the RSA private key from our .pem file
    project_root = Path(__file__).resolve().parent.parent.parent
    private_key_path = project_root / settings.github_app_private_key_path
    private_key = private_key_path.read_text()

    payload = {
        "iat": now - 60,        # Issued 60s ago (clock drift tolerance)
        "exp": now + (9 * 60),  # Expires in 9 minutes (GitHub max: 10min)
        "iss": settings.github_app_id,  # "I am app 3133457"
    }

    return jwt.encode(payload, private_key, algorithm="RS256")

Path resolution bug we hit and fixed:

Original: Path(settings.github_app_private_key_path) → resolved relative to CWD
Problem: When uvicorn runs, CWD might not be the project root
Fix: Path(__file__).resolve().parent.parent.parent / settings.github_app_private_key_path
This resolves relative to auth.py's location → up to app/github/ → app/ → project root

Step 2 — Installation Access Token:

async def get_installation_token(installation_id: int) -> str:
    # Check in-memory cache first
    cached = _token_cache.get(installation_id)
    if cached and cached["expires_at"] > time.time() + 60:
        return cached["token"]

    # Generate JWT and exchange for installation token
    app_jwt = _generate_jwt()
    response = await httpx.AsyncClient().post(
        f"https://api.github.com/app/installations/{installation_id}/access_tokens",
        headers={"Authorization": f"Bearer {app_jwt}"},
    )

    # Cache the token (valid for ~1 hour)
    _token_cache[installation_id] = {
        "token": response.json()["token"],
        "expires_at": time.time() + 3500,
    }

Why cache the token? Installation tokens last 1 hour. Without caching, we'd generate a new JWT and make a token exchange API call for every single GitHub API request. Caching reduces latency and API calls from ~10 per PR review to ~1 per hour.

Interview talking point: "GitHub Apps use a two-step auth flow — JWT for app identity, installation tokens for repo-scoped access. We cache installation tokens in memory with TTL-based expiry to avoid redundant token exchanges. This is the same client credentials pattern used in OAuth2."

Step 3: GitHub API Client (app/github/client.py)

What: An async HTTP client that fetches PR data and posts review comments.

The problem it solves: We need to:

Get the PR diff (what changed)
Get full file contents (for context — the diff alone isn't enough)
Post inline review comments (anchored to specific file+line)
Post a summary comment (health score, findings overview)

Key design decisions:

Why a class instead of standalone functions?

class GitHubClient:
    def __init__(self, installation_id: int):
        self.installation_id = installation_id
        self._token = None  # Lazily fetched on first API call

The installation_id and token are shared across all API calls for one webhook event. A class groups related operations with shared state. It's also easy to mock in tests.

Fetching the diff — two formats

# JSON format (structured data about each file)
GET /repos/{owner}/{repo}/pulls/{pr_number}/files
→ [{filename: "app.py", status: "modified", additions: 5, patch: "..."}, ...]

# Raw diff format (the unified diff, same as `git diff`)
GET /repos/{owner}/{repo}/pulls/{pr_number}
Accept: application/vnd.github.diff
→ "diff --git a/app.py b/app.py\n--- a/app.py\n+++ b/app.py\n@@ -1,3 +1,8 @@..."

We fetch BOTH. The raw diff is sent to agents for analysis. The structured file list tells us which files to fetch full contents for.

Why we fetch full file contents (not just the diff)

Consider this diff:

+ result = db.query(f"SELECT * FROM users WHERE id = {user_id}")

Questions an agent needs to answer:

Is user_id sanitized upstream? → Need to see the function signature
Is db.query() a safe ORM method or raw SQL? → Need to see the import
Is this in a public-facing endpoint? → Need to see the route decorator

Without full file: Agent sees one line, guesses wildly, produces false positives. With full file: Agent sees imports, class context, function scope — makes informed judgments.

# How we fetch file contents
response = await http.get(
    f"{GITHUB_API}/repos/{repo}/contents/{filepath}",
    params={"ref": commit_sha},  # At the exact commit, not HEAD
)
# GitHub returns content as base64 (because JSON can't hold binary)
content_b64 = response.json()["content"]
source_code = base64.b64decode(content_b64).decode("utf-8")

Posting reviews — two types of comments

PR #1 conversation:
┌─────────────────────────────────────────────┐
│ 📋 Summary Comment (post_comment)           │  ← Top-level, in the conversation
│ Health Score: 65/100, 1 critical finding     │
└─────────────────────────────────────────────┘

Files changed tab:
┌─────────────────────────────────────────────┐
│ app.py                                      │
│ ...                                         │
│ + result = db.query(f"SELECT * FROM...")     │
│   🚨 [CRITICAL] SQL Injection Risk          │  ← Inline, anchored to this line
│   User input directly embedded...            │     (post_review with comments)
│ ...                                         │
└─────────────────────────────────────────────┘

# Summary comment — uses Issues API (PRs are issues in GitHub's data model)
await http.post(
    f"{GITHUB_API}/repos/{repo}/issues/{pr_number}/comments",
    json={"body": "## Health Score: 65/100\n..."},
)

# Inline review — uses Pull Request Reviews API
await http.post(
    f"{GITHUB_API}/repos/{repo}/pulls/{pr_number}/reviews",
    json={
        "commit_id": commit_sha,
        "body": "Summary text",
        "event": "COMMENT",  # Don't approve/block — just comment
        "comments": [
            {"path": "app.py", "line": 5, "body": "🚨 SQL Injection..."},
        ],
    },
)

Interview talking point: "We fetch full file contents via GitHub's Contents API, not just diffs, because our agents need surrounding context — imports, class definitions, function signatures — to make accurate assessments. This is the same approach used by Sourcery and CodeRabbit, but we go further by embedding this context into a vector store for semantic retrieval."

Step 4: Comment Formatter (app/github/comment_formatter.py)

What: Converts our internal Finding objects into GitHub-flavored Markdown.

Two output formats:

Inline comment (per finding):

🚨 **[CRITICAL — Security] SQL Injection Risk**

The query on line 5 constructs SQL via string interpolation.
User input is directly embedded without sanitization.

**Suggested fix:**
```python
cursor.execute('SELECT * FROM users WHERE id = %s', (user_id,))

🔒 Security · CWE-89 · Confidence: 0.92


#### Summary comment (per PR):
```markdown
## ✅ Ninja Code Guard Review — Health Score: 85/100

`████████████████░░░░` **85**/100 — Healthy

### Findings Summary
| Severity | Count |
|----------|-------|
| 🚨 Critical | 0 |
| 🟠 High | 1 |
| 🟡 Medium | 2 |
| ℹ️ Low | 0 |

✅ **Recommendation: Approve** — No critical issues found.

Design decisions:

Emoji prefixes for quick scanning (devs skim reviews)
CWE IDs are hyperlinked (so devs can learn about vulnerabilities)
Suggested fixes use fenced code blocks (easy copy-paste)
Health bar uses Unicode block characters (works everywhere, no images needed)

Step 5: Redis Cache (app/db/redis_cache.py)

What: Prevents re-analyzing the same PR commit that we've already reviewed.

The problem it solves: When a developer pushes multiple commits quickly, or force-pushes, GitHub sends a webhook for each push. Without caching, we'd burn Groq API quota re-analyzing the same code and spam the PR with duplicate comments.

How it works:

Webhook received with commit SHA "0c8ec514"
    │
    ├─ Check Redis: EXISTS ninjacg:reviewed:0c8ec514
    │   │
    │   ├─ Key exists → return "already reviewed" (skip)
    │   │
    │   └─ Key missing → proceed with analysis
    │                      │
    │                      ▼
    │                   Run agents...
    │                   Post comments...
    │                      │
    │                      ▼
    └─ Set Redis: SET ninjacg:reviewed:0c8ec514 "1" EX 604800
                                                      ^^^^^^
                                                   7 days TTL

Key design decision — "Fail Open" pattern:

async def is_already_reviewed(commit_sha: str) -> bool:
    try:
        client = _get_redis_client()
        result = await client.exists(_cache_key(commit_sha))
        return bool(result)
    except Exception:
        # If Redis is DOWN, return False → proceed with analysis
        return False  # ← This is "fail open"

Fail open vs. fail closed:

Fail open: If the check fails, allow the operation (may duplicate)
Fail closed: If the check fails, block the operation (may miss reviews)

For a code review tool, missing a review is worse than reviewing twice, so we fail open. This is the same pattern used by rate limiters and circuit breakers in production systems.

Why Upstash Redis instead of in-memory cache?

Render's free tier restarts the server frequently (cold starts every 15 min)
In-memory dict would be wiped on every restart
Redis persists across restarts
If we ever scale to multiple workers, they share the same cache

Interview talking point: "Our cache uses a fail-open pattern — if Redis is unavailable, we proceed with analysis rather than blocking. This prioritizes availability over exact-once semantics, which is correct for a non-critical review tool. The TTL-based expiry ensures stale entries are automatically cleaned without manual maintenance."

Step 6: Webhook Endpoint (app/main.py)

What: The FastAPI endpoint that receives GitHub webhooks and orchestrates the response.

The full request lifecycle:

@app.post("/webhook/github")
async def webhook_github(
    request: Request,
    background_tasks: BackgroundTasks,
    x_github_event: str = Header(..., alias="X-GitHub-Event"),
    body: bytes = Depends(validate_webhook_signature),  # ← Runs FIRST
):

Step-by-step:

HMAC validation (via Depends): If signature is invalid → 401, endpoint never runs
Parse payload: json.loads(body) — we know the body is authentic now
Filter events: Only process pull_request events with actions: opened, synchronize, reopened, ready_for_review
Skip drafts: Draft PRs aren't ready for review
Check cache: await is_already_reviewed(commit_sha) — skip if already done
Get installation ID: Extracted from the webhook payload — needed for auth
Enqueue background task: background_tasks.add_task(_process_pr_review, ...)
Return 200 immediately: GitHub gets a fast response, processing continues in background

Why background tasks?

GitHub has a 10-second webhook timeout. If we don't respond in time:

GitHub marks the delivery as failed
GitHub retries (up to 3 times at increasing intervals)
We'd get duplicate reviews

Our actual review pipeline takes 15-20 seconds (agent calls + synthesis). So we:

Return 200 immediately (~50ms)
Process the review in FastAPI's background task queue
GitHub is happy, we have unlimited time to process

# This returns 200 to GitHub immediately
background_tasks.add_task(
    _process_pr_review,
    repo_full_name=repo_full_name,
    pr_number=pr_number,
    commit_sha=commit_sha,
    installation_id=installation_id,
)
return {"status": "accepted", "pr": pr_number}
# ↑ GitHub gets this response in ~50ms
# ↓ Meanwhile, _process_pr_review runs in the background

The background task (_process_pr_review):

async def _process_pr_review(...):
    client = GitHubClient(installation_id)
    pr_data = await client.fetch_pr_data(repo_full_name, pr_number)

    # TODO (Week 3-7): Run agents here
    # For now: post a dummy comment proving the pipeline works

    await client.post_comment(repo_full_name, pr_number, summary)
    await mark_as_reviewed(commit_sha)

Interview talking point: "We use FastAPI's background tasks to acknowledge webhooks within GitHub's 10-second timeout, then process asynchronously. The webhook handler follows a filter-then-dispatch pattern — irrelevant events are filtered early (wrong event type, draft PR, already cached), and only valid PR events trigger the expensive analysis pipeline."

Step 7: Unit Tests

What: 20 tests covering all critical paths.

Test Suite: Webhook Validation (5 tests)

test_valid_signature_accepted     — Correctly signed payload → 200 ✅
test_invalid_signature_rejected   — Wrong secret → 401 ✅
test_tampered_payload_rejected    — Valid sig for different payload → 401 ✅
test_missing_signature_rejected   — No header → 422 ✅
test_malformed_signature_rejected — No "sha256=" prefix → 401 ✅

How the tests work:

# We create a minimal FastAPI app just for testing
test_app = FastAPI()
TEST_SECRET = "test_webhook_secret_for_unit_tests"

@test_app.post("/webhook-endpoint")
async def webhook_endpoint(body: bytes = Depends(validate_webhook_signature)):
    return {"status": "ok"}

# monkeypatch overrides the real secret with our test secret
@pytest.fixture
def client(monkeypatch):
    monkeypatch.setattr(
        "app.github.webhook.settings.github_webhook_secret",
        TEST_SECRET,
    )
    return TestClient(test_app)

# Then we compute the expected signature ourselves
def _compute_signature(payload: bytes, secret: str) -> str:
    sig = hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()
    return f"sha256={sig}"

Key testing pattern: monkeypatch temporarily overrides the real webhook secret so tests are deterministic and don't depend on .env values. This is standard practice — tests should never use real credentials.

Test Suite: Redis Cache (7 tests)

test_returns_false_for_new_commit        — New SHA → not reviewed ✅
test_returns_true_for_cached_commit      — Cached SHA → already reviewed ✅
test_redis_failure_returns_false          — Redis down → fail open (False) ✅
test_sets_key_with_ttl                   — SET with 7-day expiry ✅
test_redis_failure_does_not_raise        — Redis SET fails → no crash ✅
test_deletes_key                         — Cache invalidation works ✅
test_redis_failure_does_not_raise (del)  — Redis DELETE fails → no crash ✅

How the tests work:

@pytest.fixture
def mock_redis():
    mock = AsyncMock()  # Python's built-in mock for async functions
    with patch("app.db.redis_cache._get_redis_client", return_value=mock):
        yield mock

# Example: testing fail-open behavior
async def test_redis_failure_returns_false(mock_redis):
    mock_redis.exists.side_effect = ConnectionError("Redis unavailable")
    result = await is_already_reviewed("abc123")
    assert result is False  # Fail open — proceed with analysis

Key testing pattern: AsyncMock simulates Redis responses without a real Redis connection. Tests run in milliseconds, offline, and are deterministic.

Test Suite: Schema Validation (8 tests)

test_valid_finding                           — Valid data → accepted ✅
test_finding_rejects_invalid_agent          — "invalid" agent → ValidationError ✅
test_finding_rejects_invalid_severity       — "urgent" severity → ValidationError ✅
test_finding_confidence_bounds              — 1.5 and -0.1 → ValidationError ✅
test_finding_optional_cwe_id               — None cwe_id → accepted ✅
test_valid_review                           — Valid review → accepted ✅
test_review_health_score_bounds            — 101 and -1 → ValidationError ✅
test_review_rejects_invalid_recommendation — "maybe" → ValidationError ✅

Step 8: End-to-End Test with ngrok

What: Tested the full pipeline live — from GitHub PR to bot comment.

Setup:

Started FastAPI server: uvicorn app.main:app --reload --port 8000
Started ngrok tunnel: ngrok http 8000 → got public URL
Updated GitHub App webhook URL to the ngrok URL
Created test repo: github.com/ninjacode911/codeguard-test
Installed "Ninja Code Guard" app on the test repo
Created a PR with a SQL injection vulnerability in app.py

Test code in the PR (intentionally vulnerable):

import sqlite3

def get_user(user_id):
    conn = sqlite3.connect("users.db")
    query = f"SELECT * FROM users WHERE id = {user_id}"  # SQL injection!
    return conn.execute(query).fetchone()

def delete_user(name):
    conn = sqlite3.connect("users.db")
    conn.execute(f"DELETE FROM users WHERE name = '{name}'")  # SQL injection!

What happened (from server logs):

22:01:19  Webhook received — review enqueued  (action=opened, pr=1, sha=0c8ec514)
22:01:19  Starting PR review                   (HMAC validated ✅)
22:01:23  Fetched PR data                      (1 changed file, 1 file with content)
22:01:24  Posted PR comment                    (Bot comment appeared on PR)
22:01:24  Cached review result                 (TTL 7 days in Upstash Redis)
22:01:24  PR review completed                  (Total: ~5 seconds)

Bugs encountered and fixed:

Bug	Cause	Fix
`TypeError: meth() got multiple values for argument 'event'`	structlog reserves `event` as a keyword	Changed `event=x_github_event` to `github_event=x_github_event`
`FileNotFoundError: 'keys\\app.pem'`	.pem filename didn't match .env path	Updated .env to use actual filename: `ninja-s-code-guard.2026-03-19.private-key.pem`
Same .pem error after .env fix	`Path("./keys/app.pem")` resolves relative to CWD, not project root	Changed to `Path(__file__).resolve().parent.parent.parent / path`

Result: Bot comment posted successfully to PR #1 at github.com/ninjacode911/codeguard-test/pull/1

Files Created/Modified in Week 2

File	Type	Purpose
`app/github/webhook.py`	New	HMAC-SHA256 webhook signature validation
`app/github/auth.py`	New	GitHub App JWT + installation token authentication
`app/github/client.py`	New	GitHub REST API client (fetch PR data, post comments)
`app/github/comment_formatter.py`	New	Finding → GitHub Markdown conversion
`app/db/redis_cache.py`	New	Commit SHA deduplication cache (Upstash Redis)
`app/main.py`	Modified	Added webhook endpoint + background task processing
`requirements.txt`	Modified	Added PyJWT[crypto] dependency
`tests/unit/test_webhook_validation.py`	New	5 tests for HMAC validation
`tests/unit/test_redis_cache.py`	New	7 tests for cache logic
`docs/WEEK2_GITHUB_INTEGRATION.md`	New	This documentation file

Dependencies Added

Package	Version	Purpose
`PyJWT[crypto]`	>=2.9.0	JWT generation with RS256 (includes cryptography backend)
`httpx`	>=0.28.0	Async HTTP client for GitHub API calls
`redis`	>=5.2.0	Async Redis client for Upstash
`structlog`	>=24.4.0	Structured logging (JSON-formatted, key-value pairs)

Architecture Patterns Used (Interview Reference)

Pattern	Where Used	What It Means
HMAC authentication	webhook.py	Symmetric key message authentication
Asymmetric JWT auth	auth.py	RSA private key signing, public key verification
Token caching	auth.py	In-memory cache with TTL for installation tokens
Dependency injection	main.py	FastAPI Depends() for webhook validation
Background tasks	main.py	Async processing after immediate HTTP response
Fail-open pattern	redis_cache.py	If cache check fails, proceed (don't block)
Separation of concerns	All files	Each module has a single responsibility

What's Next (Week 3)

The dummy comment will be replaced with the real Security Agent output. The agent will use Semgrep, Bandit, and Groq's Llama-3.1-70B to find the SQL injection vulnerabilities in our test PR's app.py.

Documentation written 2026-03-19 as part of Week 2 completion.