File size: 35,808 Bytes
4b445f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
# CodeProbe β€” Complete Project Plan & Progress Tracker

> **Multi-Agent Code Review System**
> Author: Ninjacode911 | Started: March 2026 | Target: 10 Weeks

---

## Table of Contents

1. [Project Overview](#1-project-overview)
2. [Architecture Deep Dive](#2-architecture-deep-dive)
3. [Complete Tech Stack](#3-complete-tech-stack)
4. [Directory Structure](#4-directory-structure)
5. [Week-by-Week Implementation Plan](#5-week-by-week-implementation-plan)
6. [Non-Coding Tasks](#6-non-coding-tasks)
7. [GPU / WSL Tasks](#7-gpu--wsl-tasks)
8. [Data Models & Schemas](#8-data-models--schemas)
9. [API Endpoints](#9-api-endpoints)
10. [Agent Prompt Design](#10-agent-prompt-design)
11. [Evaluation Plan](#11-evaluation-plan)
12. [Deployment Checklist](#12-deployment-checklist)
13. [Progress Tracker](#13-progress-tracker)

---

## 1. Project Overview

**What:** A multi-agent PR review system that reviews GitHub pull requests using 4 specialized LangChain agents (Security, Performance, Style, Synthesizer), posts inline GitHub comments, and tracks code health via a Next.js dashboard.

**Why:** AI-generated code (41% of GitHub commits) introduces 1.7x more issues. Existing tools use single-pass LLM calls. Sentinel AI uses domain-specialized agents with debate/consensus, RAG context, and static analysis tools.

**Core Thesis:** Separate security, performance, and style review into specialized agents β€” each with distinct prompts, tools, and context β€” then merge via a Synthesizer into a coherent, ranked, deduplicated review.

**Key Differentiators:**
- Multi-agent specialization (3 domain + 1 synthesizer)
- Debate & consensus protocol (agents challenge each other before synthesis)
- Repo-aware RAG context (ChromaDB indexes full repo, not just diff)
- $0/month architecture (all free tiers)
- Structured severity scoring (Critical/High/Medium/Low with CWE IDs)
- Auto-fix suggestions (corrected code snippets inline)

---

## 2. Architecture Deep Dive

### 2.1 Four Layers

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GITHUB LAYER                                       β”‚
β”‚  Webhooks Β· PR Events Β· Inline Comments             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ pull_request webhook
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ORCHESTRATION LAYER (FastAPI on Render)             β”‚
β”‚  Webhook receiver Β· HMAC validation Β· Redis cache    β”‚
β”‚  Agent dispatcher Β· GitHub API client                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ asyncio.gather()
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  AGENT LAYER (LangChain ReAct Agents)               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚ Security β”‚ β”‚ Performance  β”‚ β”‚  Style  β”‚ PARALLEL β”‚
β”‚  β”‚  Agent   β”‚ β”‚    Agent     β”‚ β”‚  Agent  β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜         β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                      β–Ό                               β”‚
β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚            β”‚  Synthesizer     β”‚  SEQUENTIAL           β”‚
β”‚            β”‚  Agent           β”‚                      β”‚
β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  KNOWLEDGE LAYER                                     β”‚
β”‚  ChromaDB (vector store) Β· Upstash Redis (cache)     β”‚
β”‚  Neon Postgres (history) Β· sentence-transformers     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### 2.2 Data Flow (11 Steps)

1. GitHub fires `pull_request` webhook β†’ Render FastAPI endpoint
2. FastAPI validates HMAC-SHA256 signature (GitHub App secret)
3. Check Upstash Redis: commit SHA already reviewed? β†’ return cached
4. Fetch via GitHub API: PR diff, changed files, full contents, commit history
5. Build repo context: embed chunks with sentence-transformers β†’ upsert ChromaDB
6. Dispatch 3 parallel agents: `asyncio.gather(security, performance, style)`
7. Each agent: system prompt + RAG context β†’ Groq API β†’ static tools β†’ typed findings
8. Synthesizer: deduplicate + resolve conflicts + Health Score + executive summary
9. GitHub API: post inline comment per finding + PR summary comment
10. Write review to Neon Postgres + set Redis cache (TTL: 7 days)
11. Next.js dashboard fetches from Neon and updates Health Score chart

### 2.3 Context Loading (5 Layers per Agent)

1. Raw PR diff (changed lines, file paths, additions/deletions)
2. Relevant file sections from full repo (ChromaDB semantic search on diff)
3. Recent commit history for changed files (pattern detection)
4. Repo configuration (language, framework, linter rules, test coverage)
5. Domain-specific knowledge base (OWASP Top 10, DDIA patterns, style guides)

---

## 3. Complete Tech Stack

### 3.1 LLM & AI

| Tool | Free Tier | Purpose |
|------|-----------|---------|
| **Groq API** (Llama-3.1-70B) | 14,400 req/day, 500 tok/sec | Primary LLM for all agents |
| **Gemini 1.5 Flash** | 1M tokens/day | Fallback when Groq exhausted |
| **LangChain** | OSS | Agent orchestration, LCEL, ReAct framework |
| **sentence-transformers** | Local (GPU) | Embeddings for ChromaDB β€” runs on RTX 5070 via WSL |

### 3.2 Backend & APIs

| Tool | Free Tier | Purpose |
|------|-----------|---------|
| **FastAPI** | OSS | Webhook receiver, agent dispatcher, REST API |
| **Render.com** | Free web service | Hosts backend (30s cold start after 15min idle) |
| **GitHub Apps API** | Free | Webhooks, PR comments, file fetching |
| **Upstash Redis** | 10K req/day | Cache PR analysis by commit SHA |
| **Neon.tech** | Free Postgres 512MB | Review history, Health Score trends |

### 3.3 Knowledge & Static Analysis

| Tool | Free Tier | Purpose |
|------|-----------|---------|
| **ChromaDB** | OSS, in-memory/persisted | Vector store for RAG context retrieval |
| **Semgrep OSS** | Free, 3K+ rules | SAST rules for Security Agent |
| **Bandit** | Free | Python AST security analysis |
| **detect-secrets** | Free | Credential/API key scanning |
| **radon** | Free | Cyclomatic complexity & maintainability index |
| **pylint/ESLint/Ruff** | Free | Linting for Style Agent |

### 3.4 Frontend & Deployment

| Tool | Free Tier | Purpose |
|------|-----------|---------|
| **Vercel** | Free hobby tier | Hosts Next.js dashboard |
| **Next.js** | OSS | Dashboard UI |
| **Recharts** | OSS | Health Score trend charts, pie charts |
| **GitHub Actions** | 2K min/month | CI/CD for Sentinel AI itself |

---

## 4. Directory Structure

```
sentinel-ai/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                    # FastAPI app, webhook endpoint, lifespan
β”‚   β”œβ”€β”€ config.py                  # Settings via pydantic-settings (env vars)
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ base_agent.py          # Shared agent interface / base class
β”‚   β”‚   β”œβ”€β”€ security_agent.py      # Security ReAct agent
β”‚   β”‚   β”œβ”€β”€ performance_agent.py   # Performance ReAct agent
β”‚   β”‚   β”œβ”€β”€ style_agent.py         # Style & Maintainability agent
β”‚   β”‚   └── synthesizer.py         # Synthesizer + Health Score + dedup
β”‚   β”œβ”€β”€ tools/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ semgrep_tool.py        # LangChain tool wrapper for Semgrep
β”‚   β”‚   β”œβ”€β”€ bandit_tool.py         # LangChain tool wrapper for Bandit
β”‚   β”‚   β”œβ”€β”€ detect_secrets_tool.py # Credential scanner tool
β”‚   β”‚   β”œβ”€β”€ radon_tool.py          # Complexity metrics tool
β”‚   β”‚   β”œβ”€β”€ ast_analyzer.py        # Python AST analysis (N+1, patterns)
β”‚   β”‚   └── linter_tool.py         # Ruff/ESLint/pylint subprocess tool
β”‚   β”œβ”€β”€ context/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ embedder.py            # sentence-transformers embedding pipeline
β”‚   β”‚   β”œβ”€β”€ indexer.py             # ChromaDB repo indexer (upsert chunks)
β”‚   β”‚   └── retriever.py           # RAG retriever (query ChromaDB for context)
β”‚   β”œβ”€β”€ github/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ webhook.py             # Webhook validation (HMAC-SHA256)
β”‚   β”‚   β”œβ”€β”€ client.py              # GitHub API client (fetch diff, post comments)
β”‚   β”‚   └── comment_formatter.py   # Format findings as GitHub Markdown comments
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ findings.py            # Finding, PRReview Pydantic schemas
β”‚   β”‚   └── webhook_payloads.py    # GitHub webhook event schemas
β”‚   β”œβ”€β”€ db/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ postgres.py            # Neon Postgres connection + queries
β”‚   β”‚   └── redis_cache.py         # Upstash Redis cache logic
β”‚   └── services/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ orchestrator.py        # Main orchestration: dispatch agents, synthesize
β”‚       └── health_score.py        # Health Score calculation formula
β”œβ”€β”€ dashboard/                     # Next.js app (deployed to Vercel)
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ next.config.js
β”‚   β”œβ”€β”€ tsconfig.json
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ layout.tsx
β”‚   β”‚   β”œβ”€β”€ page.tsx               # / β€” Repository Overview
β”‚   β”‚   β”œβ”€β”€ repos/
β”‚   β”‚   β”‚   └── [owner]/
β”‚   β”‚   β”‚       └── [repo]/
β”‚   β”‚   β”‚           β”œβ”€β”€ page.tsx   # Repo Detail (trends, charts)
β”‚   β”‚   β”‚           └── prs/
β”‚   β”‚   β”‚               └── [number]/
β”‚   β”‚   β”‚                   └── page.tsx  # PR Review Detail
β”‚   β”‚   └── api/
β”‚   β”‚       β”œβ”€β”€ repos/
β”‚   β”‚       β”‚   └── route.ts       # API proxy to FastAPI backend
β”‚   β”‚       └── health/
β”‚   β”‚           └── route.ts
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ HealthScoreRing.tsx    # Circular gauge 0-100
β”‚   β”‚   β”œβ”€β”€ FindingsTable.tsx      # Sortable, filterable findings
β”‚   β”‚   β”œβ”€β”€ TrendChart.tsx         # Recharts LineChart
β”‚   β”‚   β”œβ”€β”€ AgentBreakdown.tsx     # 3-column agent summary cards
β”‚   β”‚   β”œβ”€β”€ SeverityBadge.tsx      # Color-coded severity pill
β”‚   β”‚   └── Navbar.tsx
β”‚   └── lib/
β”‚       β”œβ”€β”€ api.ts                 # Fetch wrapper for backend API
β”‚       └── types.ts               # TypeScript types matching backend schemas
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ conftest.py                # Shared fixtures
β”‚   β”œβ”€β”€ unit/
β”‚   β”‚   β”œβ”€β”€ test_findings_schema.py
β”‚   β”‚   β”œβ”€β”€ test_synthesizer_dedup.py
β”‚   β”‚   β”œβ”€β”€ test_webhook_validation.py
β”‚   β”‚   β”œβ”€β”€ test_redis_cache.py
β”‚   β”‚   └── test_health_score.py
β”‚   β”œβ”€β”€ integration/
β”‚   β”‚   β”œβ”€β”€ test_full_pipeline.py
β”‚   β”‚   └── test_github_posting.py
β”‚   └── eval/
β”‚       β”œβ”€β”€ dataset/               # 20-PR benchmark dataset (JSON fixtures)
β”‚       β”œβ”€β”€ run_eval.py            # Evaluation harness
β”‚       └── metrics.py             # Precision, recall, latency tracking
β”œβ”€β”€ prompts/
β”‚   β”œβ”€β”€ security_system.md         # Security Agent system prompt
β”‚   β”œβ”€β”€ performance_system.md      # Performance Agent system prompt
β”‚   β”œβ”€β”€ style_system.md            # Style Agent system prompt
β”‚   └── synthesizer_system.md      # Synthesizer system prompt
β”œβ”€β”€ knowledge/
β”‚   β”œβ”€β”€ owasp_top10_2025.md        # OWASP cheat sheet for Security RAG
β”‚   β”œβ”€β”€ ddia_patterns.md           # DDIA patterns for Performance RAG
β”‚   └── style_guides/              # Language style guides for Style RAG
β”œβ”€β”€ .env.example                   # Template for env vars (no secrets)
β”œβ”€β”€ .gitignore
β”œβ”€β”€ requirements.txt               # Python dependencies
β”œβ”€β”€ requirements-dev.txt           # Dev/test dependencies
β”œβ”€β”€ render.yaml                    # Render deployment config
β”œβ”€β”€ sentinel.yml.example           # Per-repo config template
β”œβ”€β”€ Dockerfile                     # For Render deployment
β”œβ”€β”€ pyproject.toml                 # Project metadata + tool configs
└── README.md                      # Installation, usage, architecture docs
```

---

## 5. Week-by-Week Implementation Plan

### WEEK 1: Foundation & Setup
**Goal:** Project skeleton running locally, all external services provisioned.

| # | Task | Type | Status |
|---|------|------|--------|
| 1.1 | Initialize git repo, create directory structure | Code | [ ] |
| 1.2 | Set up Python virtual environment + requirements.txt | Code | [ ] |
| 1.3 | Register GitHub App (dev.github.com/settings/apps) | Config | [ ] |
| 1.4 | Provision Neon.tech Postgres database + create `pr_reviews` table | Config | [ ] |
| 1.5 | Provision Upstash Redis instance | Config | [ ] |
| 1.6 | Get Groq API key (console.groq.com) | Config | [ ] |
| 1.7 | Get Gemini API key (aistudio.google.com) | Config | [ ] |
| 1.8 | Create FastAPI skeleton (`app/main.py`) with health endpoint | Code | [ ] |
| 1.9 | Create `app/config.py` with pydantic-settings (all env vars) | Code | [ ] |
| 1.10 | Create Pydantic models (`Finding`, `PRReview` schemas) | Code | [ ] |
| 1.11 | Set up .env.example, .gitignore, pyproject.toml | Code | [ ] |
| 1.12 | Deploy FastAPI skeleton to Render (verify /health works) | Deploy | [ ] |
| 1.13 | Write unit tests for Finding schema validation | Test | [ ] |
| 1.14 | Set up GitHub Actions CI (lint + test on push) | CI/CD | [ ] |

### WEEK 2: GitHub Integration
**Goal:** Receive webhooks, validate signatures, fetch PR data, post dummy comment.

| # | Task | Type | Status |
|---|------|------|--------|
| 2.1 | Implement HMAC-SHA256 webhook validation (`app/github/webhook.py`) | Code | [ ] |
| 2.2 | Implement GitHub API client β€” fetch PR diff (`app/github/client.py`) | Code | [ ] |
| 2.3 | Implement GitHub API client β€” fetch file contents | Code | [ ] |
| 2.4 | Implement GitHub API client β€” fetch commit history | Code | [ ] |
| 2.5 | Implement GitHub API client β€” post inline review comments | Code | [ ] |
| 2.6 | Implement GitHub API client β€” post PR summary comment | Code | [ ] |
| 2.7 | Create webhook endpoint (`POST /webhook/github`) in main.py | Code | [ ] |
| 2.8 | Implement comment formatter (`app/github/comment_formatter.py`) | Code | [ ] |
| 2.9 | Set up ngrok for local webhook testing | Config | [ ] |
| 2.10 | End-to-end test: open PR on test repo β†’ dummy comment posted | Test | [ ] |
| 2.11 | Implement Redis cache check (skip if commit SHA already reviewed) | Code | [ ] |
| 2.12 | Write unit tests for HMAC validation (valid + invalid signatures) | Test | [ ] |
| 2.13 | Write unit tests for Redis cache hit/miss logic | Test | [ ] |

### WEEK 3: Security Agent v1
**Goal:** Security Agent analyzes diffs, returns structured findings with CWE IDs.

| # | Task | Type | Status |
|---|------|------|--------|
| 3.1 | Install & configure Semgrep OSS with security rulesets | Config | [ ] |
| 3.2 | Create Semgrep LangChain tool (`app/tools/semgrep_tool.py`) | Code | [ ] |
| 3.3 | Install & configure Bandit for Python AST security analysis | Config | [ ] |
| 3.4 | Create Bandit LangChain tool (`app/tools/bandit_tool.py`) | Code | [ ] |
| 3.5 | Install & configure detect-secrets | Config | [ ] |
| 3.6 | Create detect-secrets LangChain tool (`app/tools/detect_secrets_tool.py`) | Code | [ ] |
| 3.7 | Write Security Agent system prompt (`prompts/security_system.md`) | Prompt | [ ] |
| 3.8 | Prepare OWASP Top 10 (2025) knowledge base (`knowledge/owasp_top10_2025.md`) | Data | [ ] |
| 3.9 | Implement Security Agent ReAct loop (`app/agents/security_agent.py`) | Code | [ ] |
| 3.10 | Implement base agent interface (`app/agents/base_agent.py`) | Code | [ ] |
| 3.11 | Set up Groq LLM client via LangChain (`ChatGroq`) | Code | [ ] |
| 3.12 | Implement structured output parsing (JSON β†’ Finding objects) | Code | [ ] |
| 3.13 | Create 10 synthetic security-vulnerable PRs for testing | Data | [ ] |
| 3.14 | Evaluate Security Agent on synthetic dataset β€” measure precision/recall | Eval | [ ] |
| 3.15 | Iterate on system prompt based on eval results | Prompt | [ ] |

### WEEK 4: Performance Agent v1
**Goal:** Performance Agent detects N+1 queries, complexity issues, returns findings.

| # | Task | Type | Status |
|---|------|------|--------|
| 4.1 | Create Python AST analyzer tool (`app/tools/ast_analyzer.py`) | Code | [ ] |
| 4.2 | Implement N+1 query pattern detector (Django/SQLAlchemy ORM patterns) | Code | [ ] |
| 4.3 | Create radon complexity tool (`app/tools/radon_tool.py`) | Code | [ ] |
| 4.4 | Write Performance Agent system prompt (`prompts/performance_system.md`) | Prompt | [ ] |
| 4.5 | Prepare DDIA patterns knowledge base (`knowledge/ddia_patterns.md`) | Data | [ ] |
| 4.6 | Implement Performance Agent ReAct loop (`app/agents/performance_agent.py`) | Code | [ ] |
| 4.7 | Fetch 10 Django PRs with known performance issues for testing | Data | [ ] |
| 4.8 | Evaluate Performance Agent on Django PR dataset | Eval | [ ] |
| 4.9 | Iterate on system prompt based on eval results | Prompt | [ ] |

### WEEK 5: Style Agent v1
**Goal:** Style Agent checks naming, complexity, dead code, test coverage gaps.

| # | Task | Type | Status |
|---|------|------|--------|
| 5.1 | Create linter tool wrapper β€” Ruff/ESLint/pylint (`app/tools/linter_tool.py`) | Code | [ ] |
| 5.2 | Implement dead code detector (unused imports, unreachable branches) | Code | [ ] |
| 5.3 | Write Style Agent system prompt (`prompts/style_system.md`) | Prompt | [ ] |
| 5.4 | Prepare language style guides knowledge base (`knowledge/style_guides/`) | Data | [ ] |
| 5.5 | Implement Style Agent ReAct loop (`app/agents/style_agent.py`) | Code | [ ] |
| 5.6 | Fetch 10 Exercism PRs with style/refactoring issues | Data | [ ] |
| 5.7 | Evaluate Style Agent on Exercism dataset | Eval | [ ] |
| 5.8 | Iterate on system prompt based on eval results | Prompt | [ ] |

### WEEK 6: ChromaDB + RAG Context
**Goal:** Full RAG pipeline β€” embed repo, retrieve context, inject into agents.

| # | Task | Type | Status |
|---|------|------|--------|
| 6.1 | Set up sentence-transformers embedding pipeline (`app/context/embedder.py`) | Code | [ ] |
| 6.2 | **Run embedding model on RTX 5070 via WSL** β€” benchmark speed | GPU | [ ] |
| 6.3 | Implement ChromaDB repo indexer (`app/context/indexer.py`) β€” chunk files, upsert | Code | [ ] |
| 6.4 | Implement RAG retriever (`app/context/retriever.py`) β€” query by diff content | Code | [ ] |
| 6.5 | Integrate RAG context into Security Agent | Code | [ ] |
| 6.6 | Integrate RAG context into Performance Agent | Code | [ ] |
| 6.7 | Integrate RAG context into Style Agent | Code | [ ] |
| 6.8 | Evaluate: does cross-file RAG context improve recall vs. diff-only? | Eval | [ ] |
| 6.9 | Optimize chunk size and retrieval top-k for quality vs. latency | Code | [ ] |
| 6.10 | Limit repo index to 500 most recently changed files (Render memory constraint) | Code | [ ] |

### WEEK 7: Synthesizer Agent
**Goal:** Deduplication, conflict resolution, Health Score, executive summary, full pipeline.

| # | Task | Type | Status |
|---|------|------|--------|
| 7.1 | Write Synthesizer system prompt (`prompts/synthesizer_system.md`) | Prompt | [ ] |
| 7.2 | Implement deduplication logic (cosine similarity on findings via ChromaDB) | Code | [ ] |
| 7.3 | Implement severity conflict resolution (Security > Performance > Style precedence) | Code | [ ] |
| 7.4 | Implement composite re-ranking: severity Γ— exploitability Γ— fix_complexity | Code | [ ] |
| 7.5 | Implement PR Health Score formula (0-100) (`app/services/health_score.py`) | Code | [ ] |
| 7.6 | Implement executive summary generation (3-5 sentences) | Code | [ ] |
| 7.7 | Implement auto-block logic (Critical findings β†’ block merge recommendation) | Code | [ ] |
| 7.8 | Implement Synthesizer Agent (`app/agents/synthesizer.py`) | Code | [ ] |
| 7.9 | Build main orchestrator (`app/services/orchestrator.py`) β€” ties everything together | Code | [ ] |
| 7.10 | Implement Gemini Flash fallback when Groq quota exhausted | Code | [ ] |
| 7.11 | Full end-to-end pipeline test: PR β†’ agents β†’ synthesizer β†’ GitHub comments | Test | [ ] |
| 7.12 | Write unit tests for Health Score formula | Test | [ ] |
| 7.13 | Write unit tests for deduplication with synthetic conflicting findings | Test | [ ] |
| 7.14 | Implement Neon Postgres write (store review record) | Code | [ ] |

### WEEK 8: Next.js Dashboard
**Goal:** Dashboard on Vercel showing review history, Health Scores, charts.

| # | Task | Type | Status |
|---|------|------|--------|
| 8.1 | Initialize Next.js app in `dashboard/` with TypeScript | Code | [ ] |
| 8.2 | Deploy to Vercel (connect GitHub repo) | Deploy | [ ] |
| 8.3 | Create TypeScript types matching backend schemas (`lib/types.ts`) | Code | [ ] |
| 8.4 | Create API fetch wrapper (`lib/api.ts`) β€” calls FastAPI backend | Code | [ ] |
| 8.5 | Build `HealthScoreRing` component (circular gauge, animated) | Code | [ ] |
| 8.6 | Build `SeverityBadge` component (color-coded pills) | Code | [ ] |
| 8.7 | Build `TrendChart` component (Recharts LineChart, 30-day trend) | Code | [ ] |
| 8.8 | Build `FindingsTable` component (sortable, filterable) | Code | [ ] |
| 8.9 | Build `AgentBreakdown` component (3-column cards) | Code | [ ] |
| 8.10 | Build `/` page β€” Repository Overview (connected repos, avg scores) | Code | [ ] |
| 8.11 | Build `/repos/[owner]/[repo]` page β€” Repo Detail (charts, PR list) | Code | [ ] |
| 8.12 | Build `/repos/[owner]/[repo]/prs/[number]` page β€” PR Review Detail | Code | [ ] |
| 8.13 | Add FastAPI CORS middleware for Vercel domain | Code | [ ] |
| 8.14 | Implement REST API endpoints on FastAPI side for dashboard | Code | [ ] |

### WEEK 9: Polish & Evaluation
**Goal:** Full benchmark, prompt tuning, latency optimization, documentation.

| # | Task | Type | Status |
|---|------|------|--------|
| 9.1 | Curate full 20-PR benchmark dataset (Django, Next.js, synthetic, Exercism) | Data | [ ] |
| 9.2 | Build evaluation harness (`tests/eval/run_eval.py`) | Code | [ ] |
| 9.3 | Run full benchmark β€” measure precision, recall, latency per agent | Eval | [ ] |
| 9.4 | Tune agent prompts to reduce false positives (target: <30% FP rate) | Prompt | [ ] |
| 9.5 | Implement confidence threshold: findings <0.6 shown as 'Suggestions' | Code | [ ] |
| 9.6 | Latency optimization: measure p50/p95/p99 per PR size bucket | Eval | [ ] |
| 9.7 | Optimize Groq API calls (reduce token usage, cache prompts) | Code | [ ] |
| 9.8 | Write comprehensive README.md | Docs | [ ] |
| 9.9 | Write installation guide in README | Docs | [ ] |
| 9.10 | Add GitHub Actions pre-warm cron (ping /health every 10min) | CI/CD | [ ] |

### WEEK 10: Launch & Promotion
**Goal:** Live on GitHub Marketplace, installed on public repos, launch posts published.

| # | Task | Type | Status |
|---|------|------|--------|
| 10.1 | Install Sentinel AI on 3 public open-source repos | Launch | [ ] |
| 10.2 | Record demo video (screen recording: PR opened β†’ comments posted) | Content | [ ] |
| 10.3 | Write Dev.to / HackerNews launch post | Content | [ ] |
| 10.4 | Write LinkedIn demo post | Content | [ ] |
| 10.5 | Submit to GitHub Marketplace (needs privacy policy, logo, description) | Launch | [ ] |
| 10.6 | Create sentinel.yml.example per-repo config template | Code | [ ] |
| 10.7 | Monitor first 48 hours β€” fix any production bugs | Ops | [ ] |

---

## 6. Non-Coding Tasks

These tasks don't involve writing project code but are essential for the project:

### 6.1 External Service Provisioning

| Service | Action | URL | Notes |
|---------|--------|-----|-------|
| **GitHub App** | Register new app | github.com/settings/apps/new | Need: App ID, Private Key (.pem), Webhook Secret |
| **Groq** | Get API key | console.groq.com | Free: 14,400 req/day |
| **Google AI Studio** | Get Gemini key | aistudio.google.com | Free: 1M tokens/day |
| **Neon.tech** | Create Postgres DB | console.neon.tech | Free: 512MB, create `pr_reviews` table |
| **Upstash** | Create Redis instance | console.upstash.com | Free: 10K req/day |
| **Render** | Create web service | dashboard.render.com | Free tier, connect GitHub repo |
| **Vercel** | Create project | vercel.com/new | Free hobby tier, connect dashboard/ |
| **ngrok** | Install for local testing | ngrok.com | Free: 1 tunnel |

### 6.2 GitHub App Configuration

**Permissions required:**
- Pull requests: Read & Write
- Contents: Read
- Metadata: Read
- Commit statuses: Write (optional)

**Webhook events to subscribe:**
- `pull_request` (opened, synchronize, reopened, ready_for_review)
- `pull_request_review_comment` (for @sentinel-ai re-review)

### 6.3 Data Curation Tasks

| Dataset | Source | Count | Purpose |
|---------|--------|-------|---------|
| Synthetic security PRs | Hand-crafted | 10 PRs | SQL injection, XSS, IDOR, hardcoded secrets |
| Django security PRs | github.com/django/django | 5 PRs | Real-world Python security fixes |
| Next.js performance PRs | github.com/vercel/next.js | 5 PRs | JS/TS performance changes |
| Exercism style PRs | github.com/exercism | 5 PRs | Naming, complexity, documentation issues |
| Mixed benchmark set | All above | 20 PRs | Full evaluation benchmark |

### 6.4 Knowledge Base Curation

| Document | Source | For Agent |
|----------|--------|-----------|
| OWASP Top 10 (2025) | owasp.org | Security Agent RAG |
| DDIA performance patterns | "Designing Data-Intensive Applications" | Performance Agent RAG |
| Python style guide (PEP 8) | python.org | Style Agent RAG |
| JavaScript style guide | Various (Airbnb, Google) | Style Agent RAG |
| TypeScript best practices | typescript-eslint.io | Style Agent RAG |

---

## 7. GPU / WSL Tasks

Your **RTX 5070** with WSL will be used for:

### 7.1 sentence-transformers Embedding (Required)

**No training needed** β€” these are pre-trained models used for embedding generation.

```
Model: all-MiniLM-L6-v2 (or all-mpnet-base-v2 for higher quality)
Task: Embed code chunks for ChromaDB indexing
Where: Runs locally during repo indexing (can also run on Render CPU, slower)
GPU benefit: ~10-50x faster embedding generation vs CPU
```

**Setup steps:**
1. Ensure CUDA toolkit installed in WSL (`nvidia-smi` should show RTX 5070)
2. `pip install sentence-transformers torch` (with CUDA support)
3. Benchmark: embed 1000 code chunks, measure time GPU vs CPU
4. Decision: if embedding is fast enough on CPU, skip GPU for deployment simplicity

### 7.2 Local LLM Testing (Optional, Recommended)

Running a local LLM for testing avoids burning Groq API quota during development:

```
Model: Llama-3.1-8B-Instruct (via Ollama or vLLM)
Task: Test agent prompts locally before hitting Groq
GPU benefit: Full inference locally, no API calls, no quota burn
```

**Setup steps:**
1. Install Ollama in WSL: `curl -fsSL https://ollama.com/install.sh | sh`
2. Pull model: `ollama pull llama3.1:8b`
3. Use for prompt iteration β€” switch to Groq (70B) for production quality

### 7.3 What You Do NOT Need to Train

| Item | Reason |
|------|--------|
| LLM (Llama-3.1-70B) | Used via Groq API β€” inference only, no fine-tuning |
| sentence-transformers | Pre-trained model, no fine-tuning needed for code embeddings |
| Semgrep/Bandit/radon | Rule-based tools, no ML training |
| Agent prompts | Iterative prompt engineering, not model training |

**Bottom line:** This project is an **inference and orchestration** project, not a training project. Your GPU is used for fast local embeddings and optional local LLM testing β€” no model training required.

---

## 8. Data Models & Schemas

### 8.1 Finding (per agent output)

```python
class Finding(BaseModel):
    agent: Literal['security', 'performance', 'style']
    file_path: str              # e.g. 'src/auth/login.py'
    line_start: int
    line_end: int
    severity: Literal['critical', 'high', 'medium', 'low']
    category: str               # e.g. 'sql_injection', 'n+1_query', 'naming'
    title: str                  # Short one-liner
    description: str            # Full explanation
    suggested_fix: str          # Corrected code snippet
    cwe_id: Optional[str]       # For security findings (e.g. 'CWE-89')
    confidence: float           # 0.0 – 1.0
```

### 8.2 SynthesizedReview (Synthesizer output)

```python
class SynthesizedReview(BaseModel):
    health_score: int                        # 0-100
    executive_summary: str                   # 3-5 sentences
    recommendation: Literal['approve', 'request_changes', 'block']
    findings: List[Finding]                  # Deduplicated, re-ranked
    critical_count: int
    high_count: int
    medium_count: int
    low_count: int
    duration_ms: int
```

### 8.3 PR Review Record (Neon Postgres)

```sql
CREATE TABLE pr_reviews (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    repo_full_name  TEXT NOT NULL,
    pr_number       INT NOT NULL,
    commit_sha      TEXT NOT NULL,
    health_score    INT NOT NULL,
    critical_count  INT DEFAULT 0,
    high_count      INT DEFAULT 0,
    medium_count    INT DEFAULT 0,
    low_count       INT DEFAULT 0,
    summary         TEXT,
    findings        JSONB NOT NULL,
    duration_ms     INT,
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_pr_reviews_repo ON pr_reviews(repo_full_name);
CREATE INDEX idx_pr_reviews_sha ON pr_reviews(commit_sha);
```

---

## 9. API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `POST /webhook/github` | POST | Receive GitHub webhook, validate HMAC, enqueue analysis |
| `GET /api/repos/{owner}/{repo}/reviews` | GET | Paginated PR review list + Health Score trend |
| `GET /api/repos/{owner}/{repo}/reviews/{pr_number}` | GET | Full findings for specific PR |
| `GET /api/repos/{owner}/{repo}/stats` | GET | Aggregate stats: avg score, top categories, 30-day trend |
| `POST /api/repos/{owner}/{repo}/reanalyze/{pr_number}` | POST | Re-trigger analysis (bypass cache) |
| `GET /health` | GET | Health check: agent status, Groq quota remaining |

---

## 10. Agent Prompt Design

Each agent prompt must include:

1. **Role definition** β€” who the agent is (e.g., "senior AppSec engineer")
2. **Scope boundaries** β€” what to look for and what to ignore
3. **Output schema** β€” exact JSON structure expected
4. **Severity guidelines** β€” when to use Critical vs. High vs. Medium vs. Low
5. **Confidence scoring** β€” how to self-assess confidence (0.0-1.0)
6. **Examples** β€” 2-3 few-shot examples of good findings
7. **Anti-patterns** β€” common false positives to avoid

Prompts are stored in `prompts/` as Markdown files and loaded at agent initialization.

---

## 11. Evaluation Plan

### 11.1 Metrics

| Metric | Target | Formula |
|--------|--------|---------|
| Security precision | >70% | true_positives / (true_positives + false_positives) |
| Performance recall | >60% | true_positives / (true_positives + false_negatives) |
| Deduplication rate | >15% | duplicates_removed / total_findings |
| e2e latency (p95) | <20s | Time from webhook to first comment posted |
| Groq quota usage | <10K/day | Total API calls per day |
| System uptime | >95% | (total_time - downtime) / total_time |

### 11.2 Evaluation Harness

Located in `tests/eval/`:
- `dataset/` β€” 20 PRs as JSON fixtures (diff, expected findings, ground truth labels)
- `run_eval.py` β€” Runs each PR through full pipeline, compares output vs ground truth
- `metrics.py` β€” Computes precision, recall, F1, latency percentiles
- Results logged to console + optionally to LangSmith (free self-hosted)

---

## 12. Deployment Checklist

### Render (FastAPI Backend)
- [ ] `render.yaml` configured with build + start commands
- [ ] Environment variables set in Render dashboard
- [ ] Health check endpoint (`/health`) configured
- [ ] Auto-deploy from `main` branch enabled

### Vercel (Next.js Dashboard)
- [ ] Connected to GitHub repo `dashboard/` directory
- [ ] Environment variable: `NEXT_PUBLIC_API_URL` pointing to Render backend
- [ ] Custom domain (optional)

### GitHub App
- [ ] App registered with correct permissions
- [ ] Webhook URL set to Render endpoint (`/webhook/github`)
- [ ] Private key (.pem) downloaded and stored securely
- [ ] App installed on test repo for development

### GitHub Actions
- [ ] CI workflow: lint (ruff) + test (pytest) on push/PR
- [ ] Pre-warm cron: ping /health every 10 minutes during working hours

---

## 13. Progress Tracker

### Overall Status

| Week | Milestone | Status | Notes |
|------|-----------|--------|-------|
| 1 | Foundation & Setup | COMPLETE | All services provisioned, project scaffolded |
| 2 | GitHub Integration | COMPLETE | E2E tested: webhook β†’ fetch β†’ comment on PR #1 |
| 3 | Security Agent v1 | COMPLETE | Bandit + Llama-3.3-70B, live-tested on PR #3, 4 findings |
| 4 | Performance Agent v1 | COMPLETE | Radon complexity + Llama-3.3-70B, 3 findings on PR #4 |
| 5 | Style Agent v1 | COMPLETE | Ruff linter + Llama-3.3-70B, 6 findings on PR #4 |
| 6 | ChromaDB + RAG Context | COMPLETE | sentence-transformers + ChromaDB, integrated into all agents |
| 7 | Synthesizer Agent | COMPLETE | Dedup, conflict resolution, Health Score formula, exec summary |
| 8 | Next.js Dashboard | COMPLETE | Next.js + Tailwind + Recharts, mock data, all pages |
| 9 | Polish & Evaluation | COMPLETE | Eval harness, metrics, README, DB persistence |
| 10 | Launch & Promotion | COMPLETE | Render config, Vercel ready, API endpoints for dashboard |

### Key Decisions Log

| Date | Decision | Rationale |
|------|----------|-----------|
| 2026-03-19 | Project plan created | Starting from scratch, PDF spec as source of truth |
| 2026-03-19 | Project renamed to "Ninja Code Guard" | User's personal branding choice |
| 2026-03-19 | GitHub App: "Ninja's Code Guard" (ID: 3133457) | Registered and tested with live PR |
| 2026-03-19 | Test repo: ninjacode911/codeguard-test | Used for e2e webhook testing |
| 2026-03-19 | Fail-open pattern for Redis cache | Missing a review is worse than duplicating |
| 2026-03-19 | Background tasks for webhook processing | GitHub's 10s timeout requires async processing |

---

*Last updated: 2026-03-19*