Spaces:
Sleeping
Sleeping
| # Week 2: GitHub Integration β Detailed Documentation | |
| > **Goal:** Receive GitHub webhooks, validate signatures, fetch PR data, post comments. | |
| > **Status:** Complete β End-to-end tested with live PR | |
| > **Date:** 2026-03-19 | |
| > **Test PR:** github.com/ninjacode911/codeguard-test/pull/1 | |
| --- | |
| ## What We Built | |
| This week we built the **communication layer** between Ninja Code Guard and GitHub β | |
| the nervous system that listens for events, authenticates, fetches data, and responds. | |
| **End-to-end flow achieved:** | |
| ``` | |
| PR opened on GitHub (21:54:52) | |
| β Webhook POST to our ngrok tunnel | |
| β HMAC-SHA256 signature validated | |
| β Redis cache checked (not previously reviewed) | |
| β Background task enqueued, 200 returned to GitHub | |
| β JWT signed with .pem, installation token obtained | |
| β PR diff + file contents fetched via GitHub API | |
| β Bot comment posted to PR #1 | |
| β Commit SHA cached in Upstash Redis (7-day TTL) | |
| β Total time: ~5 seconds | |
| ``` | |
| --- | |
| ## Step-by-Step Implementation Log | |
| ### Step 1: Webhook Signature Validation (app/github/webhook.py) | |
| **What:** A FastAPI dependency that validates the HMAC-SHA256 signature on every | |
| incoming webhook request. | |
| **The problem it solves:** Our `/webhook/github` endpoint is publicly accessible. Without | |
| validation, anyone could send fake webhook payloads to trigger bogus reviews, waste our | |
| Groq API quota, or spam PRs with fake comments. | |
| **How HMAC-SHA256 works:** | |
| ``` | |
| Shared Secret | |
| (GITHUB_WEBHOOK_SECRET) | |
| β | |
| ββββββββββββββββΌβββββββββββββββ | |
| β β β | |
| GitHub's side β Our server's side | |
| β β β | |
| request body βββ HMAC-SHA256 HMAC-SHA256 βββ request body | |
| β β β | |
| βΌ β βΌ | |
| computed hash β computed hash | |
| β β β | |
| sent as header βββββββΌβββββββ compared with | |
| X-Hub-Signature-256 β received header | |
| β | |
| Must match! | |
| ``` | |
| **Key implementation details:** | |
| 1. **Raw bytes, not parsed JSON:** We compute the HMAC on the raw request bytes, not | |
| parsed JSON. Even a single whitespace difference would produce a completely different | |
| hash. This is why we use `await request.body()` before any JSON parsing. | |
| 2. **Constant-time comparison:** We use `hmac.compare_digest()` instead of `==`. | |
| A regular `==` short-circuits on the first different byte β an attacker could measure | |
| response time for different guesses and reconstruct the signature byte by byte. | |
| `compare_digest()` always takes the same time regardless of where the mismatch is. | |
| This is called a **timing attack** and is a real-world vulnerability (CVE-2013-0338, etc.). | |
| 3. **FastAPI dependency injection:** The validation is implemented as a `Depends()` function. | |
| FastAPI calls it automatically before the endpoint handler runs. If validation fails, | |
| the endpoint never executes. This ensures we can't accidentally forget to validate. | |
| ```python | |
| # How it's used in the endpoint β validation happens automatically via Depends() | |
| @app.post("/webhook/github") | |
| async def webhook_github( | |
| body: bytes = Depends(validate_webhook_signature), # β runs first | |
| ): | |
| payload = json.loads(body) # Only reached if signature is valid | |
| ``` | |
| **Signature format from GitHub:** | |
| ``` | |
| X-Hub-Signature-256: sha256=5d7230d4d964e5c12a7e4e94c... | |
| ^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| prefix hex-encoded HMAC digest | |
| ``` | |
| **Interview talking point:** "We validate webhook authenticity using HMAC-SHA256 with | |
| constant-time comparison to prevent timing attacks. The validation is implemented as a | |
| FastAPI dependency so it's impossible to skip β the endpoint function only executes | |
| after successful validation." | |
| --- | |
| ### Step 2: GitHub App JWT Authentication (app/github/auth.py) | |
| **What:** Two-step authentication flow β sign a JWT, exchange it for a scoped token. | |
| **The problem it solves:** We need to call GitHub's API (fetch PR data, post comments) | |
| on behalf of our installed app. GitHub needs to verify that API calls are coming from | |
| the registered "Ninja Code Guard" app, not from an impersonator. | |
| **Step 1 β JWT Generation:** | |
| A JWT (JSON Web Token) is a signed token with three parts: | |
| ``` | |
| eyJhbGciOiJSUzI1NiJ9.eyJpYXQiOjE3MTEuLi4sImV4cCI6MTcxMS4uLiwiX.SflKxwRJ... | |
| βββββββββ Header ββββββββββββββββββββ Payload βββββββββββββββ Signature βββ | |
| Header: {"alg": "RS256", "typ": "JWT"} | |
| Payload: {"iat": <issued_at>, "exp": <expires_at>, "iss": "3133457"} | |
| Signature: RSA-SHA256(header + "." + payload, private_key) | |
| ``` | |
| **Why RS256 (RSA + SHA-256)?** | |
| - This is **asymmetric** cryptography: we sign with our private key (.pem), GitHub | |
| verifies with the matching public key (stored when we registered the app) | |
| - Even if someone intercepts a JWT, they can't create new ones without the .pem file | |
| - This is the same algorithm used by Google Cloud, AWS Cognito, and Auth0 | |
| **Code walkthrough:** | |
| ```python | |
| def _generate_jwt() -> str: | |
| now = int(time.time()) | |
| # Read the RSA private key from our .pem file | |
| project_root = Path(__file__).resolve().parent.parent.parent | |
| private_key_path = project_root / settings.github_app_private_key_path | |
| private_key = private_key_path.read_text() | |
| payload = { | |
| "iat": now - 60, # Issued 60s ago (clock drift tolerance) | |
| "exp": now + (9 * 60), # Expires in 9 minutes (GitHub max: 10min) | |
| "iss": settings.github_app_id, # "I am app 3133457" | |
| } | |
| return jwt.encode(payload, private_key, algorithm="RS256") | |
| ``` | |
| **Path resolution bug we hit and fixed:** | |
| - Original: `Path(settings.github_app_private_key_path)` β resolved relative to CWD | |
| - Problem: When uvicorn runs, CWD might not be the project root | |
| - Fix: `Path(__file__).resolve().parent.parent.parent / settings.github_app_private_key_path` | |
| - This resolves relative to `auth.py`'s location β up to `app/github/` β `app/` β project root | |
| **Step 2 β Installation Access Token:** | |
| ```python | |
| async def get_installation_token(installation_id: int) -> str: | |
| # Check in-memory cache first | |
| cached = _token_cache.get(installation_id) | |
| if cached and cached["expires_at"] > time.time() + 60: | |
| return cached["token"] | |
| # Generate JWT and exchange for installation token | |
| app_jwt = _generate_jwt() | |
| response = await httpx.AsyncClient().post( | |
| f"https://api.github.com/app/installations/{installation_id}/access_tokens", | |
| headers={"Authorization": f"Bearer {app_jwt}"}, | |
| ) | |
| # Cache the token (valid for ~1 hour) | |
| _token_cache[installation_id] = { | |
| "token": response.json()["token"], | |
| "expires_at": time.time() + 3500, | |
| } | |
| ``` | |
| **Why cache the token?** Installation tokens last 1 hour. Without caching, we'd generate | |
| a new JWT and make a token exchange API call for every single GitHub API request. Caching | |
| reduces latency and API calls from ~10 per PR review to ~1 per hour. | |
| **Interview talking point:** "GitHub Apps use a two-step auth flow β JWT for app identity, | |
| installation tokens for repo-scoped access. We cache installation tokens in memory with | |
| TTL-based expiry to avoid redundant token exchanges. This is the same client credentials | |
| pattern used in OAuth2." | |
| --- | |
| ### Step 3: GitHub API Client (app/github/client.py) | |
| **What:** An async HTTP client that fetches PR data and posts review comments. | |
| **The problem it solves:** We need to: | |
| 1. Get the PR diff (what changed) | |
| 2. Get full file contents (for context β the diff alone isn't enough) | |
| 3. Post inline review comments (anchored to specific file+line) | |
| 4. Post a summary comment (health score, findings overview) | |
| **Key design decisions:** | |
| #### Why a class instead of standalone functions? | |
| ```python | |
| class GitHubClient: | |
| def __init__(self, installation_id: int): | |
| self.installation_id = installation_id | |
| self._token = None # Lazily fetched on first API call | |
| ``` | |
| The installation_id and token are shared across all API calls for one webhook event. | |
| A class groups related operations with shared state. It's also easy to mock in tests. | |
| #### Fetching the diff β two formats | |
| ```python | |
| # JSON format (structured data about each file) | |
| GET /repos/{owner}/{repo}/pulls/{pr_number}/files | |
| β [{filename: "app.py", status: "modified", additions: 5, patch: "..."}, ...] | |
| # Raw diff format (the unified diff, same as `git diff`) | |
| GET /repos/{owner}/{repo}/pulls/{pr_number} | |
| Accept: application/vnd.github.diff | |
| β "diff --git a/app.py b/app.py\n--- a/app.py\n+++ b/app.py\n@@ -1,3 +1,8 @@..." | |
| ``` | |
| We fetch BOTH. The raw diff is sent to agents for analysis. The structured file list | |
| tells us which files to fetch full contents for. | |
| #### Why we fetch full file contents (not just the diff) | |
| Consider this diff: | |
| ```diff | |
| + result = db.query(f"SELECT * FROM users WHERE id = {user_id}") | |
| ``` | |
| Questions an agent needs to answer: | |
| - Is `user_id` sanitized upstream? β Need to see the function signature | |
| - Is `db.query()` a safe ORM method or raw SQL? β Need to see the import | |
| - Is this in a public-facing endpoint? β Need to see the route decorator | |
| **Without full file:** Agent sees one line, guesses wildly, produces false positives. | |
| **With full file:** Agent sees imports, class context, function scope β makes informed judgments. | |
| ```python | |
| # How we fetch file contents | |
| response = await http.get( | |
| f"{GITHUB_API}/repos/{repo}/contents/{filepath}", | |
| params={"ref": commit_sha}, # At the exact commit, not HEAD | |
| ) | |
| # GitHub returns content as base64 (because JSON can't hold binary) | |
| content_b64 = response.json()["content"] | |
| source_code = base64.b64decode(content_b64).decode("utf-8") | |
| ``` | |
| #### Posting reviews β two types of comments | |
| ``` | |
| PR #1 conversation: | |
| βββββββββββββββββββββββββββββββββββββββββββββββ | |
| β π Summary Comment (post_comment) β β Top-level, in the conversation | |
| β Health Score: 65/100, 1 critical finding β | |
| βββββββββββββββββββββββββββββββββββββββββββββββ | |
| Files changed tab: | |
| βββββββββββββββββββββββββββββββββββββββββββββββ | |
| β app.py β | |
| β ... β | |
| β + result = db.query(f"SELECT * FROM...") β | |
| β π¨ [CRITICAL] SQL Injection Risk β β Inline, anchored to this line | |
| β User input directly embedded... β (post_review with comments) | |
| β ... β | |
| βββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ```python | |
| # Summary comment β uses Issues API (PRs are issues in GitHub's data model) | |
| await http.post( | |
| f"{GITHUB_API}/repos/{repo}/issues/{pr_number}/comments", | |
| json={"body": "## Health Score: 65/100\n..."}, | |
| ) | |
| # Inline review β uses Pull Request Reviews API | |
| await http.post( | |
| f"{GITHUB_API}/repos/{repo}/pulls/{pr_number}/reviews", | |
| json={ | |
| "commit_id": commit_sha, | |
| "body": "Summary text", | |
| "event": "COMMENT", # Don't approve/block β just comment | |
| "comments": [ | |
| {"path": "app.py", "line": 5, "body": "π¨ SQL Injection..."}, | |
| ], | |
| }, | |
| ) | |
| ``` | |
| **Interview talking point:** "We fetch full file contents via GitHub's Contents API, not just | |
| diffs, because our agents need surrounding context β imports, class definitions, function | |
| signatures β to make accurate assessments. This is the same approach used by Sourcery and | |
| CodeRabbit, but we go further by embedding this context into a vector store for semantic retrieval." | |
| --- | |
| ### Step 4: Comment Formatter (app/github/comment_formatter.py) | |
| **What:** Converts our internal `Finding` objects into GitHub-flavored Markdown. | |
| **Two output formats:** | |
| #### Inline comment (per finding): | |
| ```markdown | |
| π¨ **[CRITICAL β Security] SQL Injection Risk** | |
| The query on line 5 constructs SQL via string interpolation. | |
| User input is directly embedded without sanitization. | |
| **Suggested fix:** | |
| ```python | |
| cursor.execute('SELECT * FROM users WHERE id = %s', (user_id,)) | |
| ``` | |
| > π Security Β· [CWE-89](https://cwe.mitre.org/data/definitions/89.html) Β· Confidence: 0.92 | |
| ``` | |
| #### Summary comment (per PR): | |
| ```markdown | |
| ## β Ninja Code Guard Review β Health Score: 85/100 | |
| `ββββββββββββββββββββ` **85**/100 β Healthy | |
| ### Findings Summary | |
| | Severity | Count | | |
| |----------|-------| | |
| | π¨ Critical | 0 | | |
| | π High | 1 | | |
| | π‘ Medium | 2 | | |
| | βΉοΈ Low | 0 | | |
| β **Recommendation: Approve** β No critical issues found. | |
| ``` | |
| **Design decisions:** | |
| - Emoji prefixes for quick scanning (devs skim reviews) | |
| - CWE IDs are hyperlinked (so devs can learn about vulnerabilities) | |
| - Suggested fixes use fenced code blocks (easy copy-paste) | |
| - Health bar uses Unicode block characters (works everywhere, no images needed) | |
| --- | |
| ### Step 5: Redis Cache (app/db/redis_cache.py) | |
| **What:** Prevents re-analyzing the same PR commit that we've already reviewed. | |
| **The problem it solves:** When a developer pushes multiple commits quickly, or force-pushes, | |
| GitHub sends a webhook for each push. Without caching, we'd burn Groq API quota | |
| re-analyzing the same code and spam the PR with duplicate comments. | |
| **How it works:** | |
| ``` | |
| Webhook received with commit SHA "0c8ec514" | |
| β | |
| ββ Check Redis: EXISTS ninjacg:reviewed:0c8ec514 | |
| β β | |
| β ββ Key exists β return "already reviewed" (skip) | |
| β β | |
| β ββ Key missing β proceed with analysis | |
| β β | |
| β βΌ | |
| β Run agents... | |
| β Post comments... | |
| β β | |
| β βΌ | |
| ββ Set Redis: SET ninjacg:reviewed:0c8ec514 "1" EX 604800 | |
| ^^^^^^ | |
| 7 days TTL | |
| ``` | |
| **Key design decision β "Fail Open" pattern:** | |
| ```python | |
| async def is_already_reviewed(commit_sha: str) -> bool: | |
| try: | |
| client = _get_redis_client() | |
| result = await client.exists(_cache_key(commit_sha)) | |
| return bool(result) | |
| except Exception: | |
| # If Redis is DOWN, return False β proceed with analysis | |
| return False # β This is "fail open" | |
| ``` | |
| **Fail open vs. fail closed:** | |
| - **Fail open:** If the check fails, allow the operation (may duplicate) | |
| - **Fail closed:** If the check fails, block the operation (may miss reviews) | |
| For a code review tool, **missing a review is worse than reviewing twice**, so we fail open. | |
| This is the same pattern used by rate limiters and circuit breakers in production systems. | |
| **Why Upstash Redis instead of in-memory cache?** | |
| - Render's free tier restarts the server frequently (cold starts every 15 min) | |
| - In-memory dict would be wiped on every restart | |
| - Redis persists across restarts | |
| - If we ever scale to multiple workers, they share the same cache | |
| **Interview talking point:** "Our cache uses a fail-open pattern β if Redis is unavailable, | |
| we proceed with analysis rather than blocking. This prioritizes availability over exact-once | |
| semantics, which is correct for a non-critical review tool. The TTL-based expiry ensures | |
| stale entries are automatically cleaned without manual maintenance." | |
| --- | |
| ### Step 6: Webhook Endpoint (app/main.py) | |
| **What:** The FastAPI endpoint that receives GitHub webhooks and orchestrates the response. | |
| **The full request lifecycle:** | |
| ```python | |
| @app.post("/webhook/github") | |
| async def webhook_github( | |
| request: Request, | |
| background_tasks: BackgroundTasks, | |
| x_github_event: str = Header(..., alias="X-GitHub-Event"), | |
| body: bytes = Depends(validate_webhook_signature), # β Runs FIRST | |
| ): | |
| ``` | |
| **Step-by-step:** | |
| 1. **HMAC validation** (via `Depends`): If signature is invalid β 401, endpoint never runs | |
| 2. **Parse payload**: `json.loads(body)` β we know the body is authentic now | |
| 3. **Filter events**: Only process `pull_request` events with actions: opened, synchronize, reopened, ready_for_review | |
| 4. **Skip drafts**: Draft PRs aren't ready for review | |
| 5. **Check cache**: `await is_already_reviewed(commit_sha)` β skip if already done | |
| 6. **Get installation ID**: Extracted from the webhook payload β needed for auth | |
| 7. **Enqueue background task**: `background_tasks.add_task(_process_pr_review, ...)` | |
| 8. **Return 200 immediately**: GitHub gets a fast response, processing continues in background | |
| **Why background tasks?** | |
| GitHub has a **10-second webhook timeout**. If we don't respond in time: | |
| - GitHub marks the delivery as failed | |
| - GitHub retries (up to 3 times at increasing intervals) | |
| - We'd get duplicate reviews | |
| Our actual review pipeline takes 15-20 seconds (agent calls + synthesis). So we: | |
| 1. Return 200 immediately (~50ms) | |
| 2. Process the review in FastAPI's background task queue | |
| 3. GitHub is happy, we have unlimited time to process | |
| ```python | |
| # This returns 200 to GitHub immediately | |
| background_tasks.add_task( | |
| _process_pr_review, | |
| repo_full_name=repo_full_name, | |
| pr_number=pr_number, | |
| commit_sha=commit_sha, | |
| installation_id=installation_id, | |
| ) | |
| return {"status": "accepted", "pr": pr_number} | |
| # β GitHub gets this response in ~50ms | |
| # β Meanwhile, _process_pr_review runs in the background | |
| ``` | |
| **The background task (_process_pr_review):** | |
| ```python | |
| async def _process_pr_review(...): | |
| client = GitHubClient(installation_id) | |
| pr_data = await client.fetch_pr_data(repo_full_name, pr_number) | |
| # TODO (Week 3-7): Run agents here | |
| # For now: post a dummy comment proving the pipeline works | |
| await client.post_comment(repo_full_name, pr_number, summary) | |
| await mark_as_reviewed(commit_sha) | |
| ``` | |
| **Interview talking point:** "We use FastAPI's background tasks to acknowledge webhooks within | |
| GitHub's 10-second timeout, then process asynchronously. The webhook handler follows a | |
| filter-then-dispatch pattern β irrelevant events are filtered early (wrong event type, draft PR, | |
| already cached), and only valid PR events trigger the expensive analysis pipeline." | |
| --- | |
| ### Step 7: Unit Tests | |
| **What:** 20 tests covering all critical paths. | |
| #### Test Suite: Webhook Validation (5 tests) | |
| ``` | |
| test_valid_signature_accepted β Correctly signed payload β 200 β | |
| test_invalid_signature_rejected β Wrong secret β 401 β | |
| test_tampered_payload_rejected β Valid sig for different payload β 401 β | |
| test_missing_signature_rejected β No header β 422 β | |
| test_malformed_signature_rejected β No "sha256=" prefix β 401 β | |
| ``` | |
| **How the tests work:** | |
| ```python | |
| # We create a minimal FastAPI app just for testing | |
| test_app = FastAPI() | |
| TEST_SECRET = "test_webhook_secret_for_unit_tests" | |
| @test_app.post("/webhook-endpoint") | |
| async def webhook_endpoint(body: bytes = Depends(validate_webhook_signature)): | |
| return {"status": "ok"} | |
| # monkeypatch overrides the real secret with our test secret | |
| @pytest.fixture | |
| def client(monkeypatch): | |
| monkeypatch.setattr( | |
| "app.github.webhook.settings.github_webhook_secret", | |
| TEST_SECRET, | |
| ) | |
| return TestClient(test_app) | |
| # Then we compute the expected signature ourselves | |
| def _compute_signature(payload: bytes, secret: str) -> str: | |
| sig = hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest() | |
| return f"sha256={sig}" | |
| ``` | |
| **Key testing pattern:** `monkeypatch` temporarily overrides the real webhook secret | |
| so tests are deterministic and don't depend on `.env` values. This is standard | |
| practice β tests should never use real credentials. | |
| #### Test Suite: Redis Cache (7 tests) | |
| ``` | |
| test_returns_false_for_new_commit β New SHA β not reviewed β | |
| test_returns_true_for_cached_commit β Cached SHA β already reviewed β | |
| test_redis_failure_returns_false β Redis down β fail open (False) β | |
| test_sets_key_with_ttl β SET with 7-day expiry β | |
| test_redis_failure_does_not_raise β Redis SET fails β no crash β | |
| test_deletes_key β Cache invalidation works β | |
| test_redis_failure_does_not_raise (del) β Redis DELETE fails β no crash β | |
| ``` | |
| **How the tests work:** | |
| ```python | |
| @pytest.fixture | |
| def mock_redis(): | |
| mock = AsyncMock() # Python's built-in mock for async functions | |
| with patch("app.db.redis_cache._get_redis_client", return_value=mock): | |
| yield mock | |
| # Example: testing fail-open behavior | |
| async def test_redis_failure_returns_false(mock_redis): | |
| mock_redis.exists.side_effect = ConnectionError("Redis unavailable") | |
| result = await is_already_reviewed("abc123") | |
| assert result is False # Fail open β proceed with analysis | |
| ``` | |
| **Key testing pattern:** `AsyncMock` simulates Redis responses without a real Redis | |
| connection. Tests run in milliseconds, offline, and are deterministic. | |
| #### Test Suite: Schema Validation (8 tests) | |
| ``` | |
| test_valid_finding β Valid data β accepted β | |
| test_finding_rejects_invalid_agent β "invalid" agent β ValidationError β | |
| test_finding_rejects_invalid_severity β "urgent" severity β ValidationError β | |
| test_finding_confidence_bounds β 1.5 and -0.1 β ValidationError β | |
| test_finding_optional_cwe_id β None cwe_id β accepted β | |
| test_valid_review β Valid review β accepted β | |
| test_review_health_score_bounds β 101 and -1 β ValidationError β | |
| test_review_rejects_invalid_recommendation β "maybe" β ValidationError β | |
| ``` | |
| --- | |
| ### Step 8: End-to-End Test with ngrok | |
| **What:** Tested the full pipeline live β from GitHub PR to bot comment. | |
| **Setup:** | |
| 1. Started FastAPI server: `uvicorn app.main:app --reload --port 8000` | |
| 2. Started ngrok tunnel: `ngrok http 8000` β got public URL | |
| 3. Updated GitHub App webhook URL to the ngrok URL | |
| 4. Created test repo: github.com/ninjacode911/codeguard-test | |
| 5. Installed "Ninja Code Guard" app on the test repo | |
| 6. Created a PR with a SQL injection vulnerability in app.py | |
| **Test code in the PR (intentionally vulnerable):** | |
| ```python | |
| import sqlite3 | |
| def get_user(user_id): | |
| conn = sqlite3.connect("users.db") | |
| query = f"SELECT * FROM users WHERE id = {user_id}" # SQL injection! | |
| return conn.execute(query).fetchone() | |
| def delete_user(name): | |
| conn = sqlite3.connect("users.db") | |
| conn.execute(f"DELETE FROM users WHERE name = '{name}'") # SQL injection! | |
| ``` | |
| **What happened (from server logs):** | |
| ``` | |
| 22:01:19 Webhook received β review enqueued (action=opened, pr=1, sha=0c8ec514) | |
| 22:01:19 Starting PR review (HMAC validated β ) | |
| 22:01:23 Fetched PR data (1 changed file, 1 file with content) | |
| 22:01:24 Posted PR comment (Bot comment appeared on PR) | |
| 22:01:24 Cached review result (TTL 7 days in Upstash Redis) | |
| 22:01:24 PR review completed (Total: ~5 seconds) | |
| ``` | |
| **Bugs encountered and fixed:** | |
| | Bug | Cause | Fix | | |
| |-----|-------|-----| | |
| | `TypeError: meth() got multiple values for argument 'event'` | structlog reserves `event` as a keyword | Changed `event=x_github_event` to `github_event=x_github_event` | | |
| | `FileNotFoundError: 'keys\\app.pem'` | .pem filename didn't match .env path | Updated .env to use actual filename: `ninja-s-code-guard.2026-03-19.private-key.pem` | | |
| | Same .pem error after .env fix | `Path("./keys/app.pem")` resolves relative to CWD, not project root | Changed to `Path(__file__).resolve().parent.parent.parent / path` | | |
| **Result:** Bot comment posted successfully to PR #1 at github.com/ninjacode911/codeguard-test/pull/1 | |
| --- | |
| ## Files Created/Modified in Week 2 | |
| | File | Type | Purpose | | |
| |------|------|---------| | |
| | `app/github/webhook.py` | **New** | HMAC-SHA256 webhook signature validation | | |
| | `app/github/auth.py` | **New** | GitHub App JWT + installation token authentication | | |
| | `app/github/client.py` | **New** | GitHub REST API client (fetch PR data, post comments) | | |
| | `app/github/comment_formatter.py` | **New** | Finding β GitHub Markdown conversion | | |
| | `app/db/redis_cache.py` | **New** | Commit SHA deduplication cache (Upstash Redis) | | |
| | `app/main.py` | **Modified** | Added webhook endpoint + background task processing | | |
| | `requirements.txt` | **Modified** | Added PyJWT[crypto] dependency | | |
| | `tests/unit/test_webhook_validation.py` | **New** | 5 tests for HMAC validation | | |
| | `tests/unit/test_redis_cache.py` | **New** | 7 tests for cache logic | | |
| | `docs/WEEK2_GITHUB_INTEGRATION.md` | **New** | This documentation file | | |
| --- | |
| ## Dependencies Added | |
| | Package | Version | Purpose | | |
| |---------|---------|---------| | |
| | `PyJWT[crypto]` | >=2.9.0 | JWT generation with RS256 (includes cryptography backend) | | |
| | `httpx` | >=0.28.0 | Async HTTP client for GitHub API calls | | |
| | `redis` | >=5.2.0 | Async Redis client for Upstash | | |
| | `structlog` | >=24.4.0 | Structured logging (JSON-formatted, key-value pairs) | | |
| --- | |
| ## Architecture Patterns Used (Interview Reference) | |
| | Pattern | Where Used | What It Means | | |
| |---------|------------|---------------| | |
| | **HMAC authentication** | webhook.py | Symmetric key message authentication | | |
| | **Asymmetric JWT auth** | auth.py | RSA private key signing, public key verification | | |
| | **Token caching** | auth.py | In-memory cache with TTL for installation tokens | | |
| | **Dependency injection** | main.py | FastAPI Depends() for webhook validation | | |
| | **Background tasks** | main.py | Async processing after immediate HTTP response | | |
| | **Fail-open pattern** | redis_cache.py | If cache check fails, proceed (don't block) | | |
| | **Separation of concerns** | All files | Each module has a single responsibility | | |
| --- | |
| ## What's Next (Week 3) | |
| The dummy comment will be replaced with the real **Security Agent** output. | |
| The agent will use Semgrep, Bandit, and Groq's Llama-3.1-70B to find the SQL injection | |
| vulnerabilities in our test PR's `app.py`. | |
| --- | |
| *Documentation written 2026-03-19 as part of Week 2 completion.* | |