Spaces:
Runtime error
Runtime error
File size: 7,421 Bytes
3a679f6 2616e64 7eb80e6 2616e64 3a679f6 2616e64 3a679f6 7eb80e6 3a679f6 7eb80e6 2616e64 7eb80e6 2616e64 7eb80e6 2616e64 7eb80e6 2616e64 3a679f6 2616e64 3a679f6 2616e64 3a679f6 2616e64 3a679f6 2616e64 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | # DiffSense Technical Design
## Goal
Build a useful, demoable, privacy-first pull request reviewer for the Build Small hackathon. The app must work reliably inside a Gradio Space and stay eligible for the under-32B model constraint.
The implementation is intentionally offline-first: deterministic review rules provide the core value, and small-model inference is an optional enhancement rather than a single point of failure.
## Current Shipped Prototype
```text
Unified diff input or public GitHub PR URL
-> stdlib diff parser
-> deterministic review engine
-> structured findings
-> custom Gradio HTML diff viewer
-> optional Mellum 2 summary via HF OAuth
-> optional Nemotron 3 Nano routing via HF OAuth
-> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
-> optional MiniCPM-V 4.6 vision notes via HF OAuth
-> optional local checkpoints from /data/models on ZeroGPU
-> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT
```
## Components
### Gradio UI
File: `app.py`
- Uses `gr.Blocks` instead of the default chatbot scaffold.
- Provides a sample risky diff for a one-click demo.
- Accepts pasted unified diffs and public GitHub PR URLs.
- Renders an inline diff view with file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes.
- Shows structured JSON for automation and judge inspection.
- Exposes model/provider toggles for Mellum, Nemotron, Tiny Titan, MiniCPM-V, and Modal.
- Accepts PR screenshots or diagrams for the MiniCPM-V vision pass.
### Diff Parser
The input layer fetches public GitHub PR URLs through their `.diff` endpoint with a short timeout. Pasted diffs are handled entirely in-process.
The parser handles standard unified diffs:
- `diff --git` file boundaries.
- `+++ b/path` file names.
- `@@ -old,+new @@` hunk headers.
- Added, removed, and context lines with old/new line numbers.
No external parser is required, which keeps startup fast and dependency risk low.
### Review Engine
The deterministic engine checks added lines for high-signal review risks:
- Hardcoded credentials.
- Disabled verification such as TLS or JWT signature checks.
- Unsafe deserialization with `pickle`.
- Dynamic execution through `eval` or `exec`.
- `shell=True` command execution.
- SQL string interpolation.
- Bare `except:`.
- Temporary `TODO`, `FIXME`, or `HACK` markers.
- Return-contract changes such as newly introduced `return None`.
Each finding includes:
```json
{
"file": "src/auth.py",
"hunk": "@@ -1,9 +1,13 @@",
"line": 11,
"severity": "critical",
"category": "security",
"comment": "The change disables a verification check, which can turn a trusted boundary into a bypass.",
"suggestion": "Keep verification enabled and add a narrowly scoped test fixture for local development.",
"source": "deterministic"
}
```
### Optional Model Summary
When enabled, the app uses the signed-in Hugging Face OAuth token or `HF_TOKEN` through the Hugging Face Inference API to call:
```text
JetBrains/Mellum2-12B-A2.5B-Instruct
```
The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
If `/data/models/mellum2-instruct/config.json` exists, the app prefers that local checkpoint path before calling the hosted provider.
### Optional Nemotron Router
When enabled, the app calls:
```text
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
```
Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.
If `/data/models/nemotron-3-nano-30b-a3b/config.json` exists, the app treats the local checkpoint as the preferred runtime path.
### Optional Tiny Titan Checker
When enabled, the app calls a <=4B model:
```text
nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
```
This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.
If `/data/models/nemotron-3-nano-4b/config.json` exists, the app treats the local checkpoint as the preferred runtime path.
### Optional MiniCPM-V Vision Pass
When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:
```text
openbmb/MiniCPM-V-4.6
```
This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.
If `/data/models/minicpm-v-4.6/config.json` exists, the app reports the local MiniCPM-V checkpoint as ready and keeps the image ingestion path available for a custom local loader.
### ZeroGPU Bucket Mount
The Space has a read/write bucket mounted at `/data`. DiffSense checks the following model checkpoint locations at runtime and includes their status in the model-agent trace:
```text
/data/models/mellum2-instruct
/data/models/nemotron-3-nano-30b-a3b
/data/models/nemotron-3-nano-4b
/data/models/minicpm-v-4.6
```
This keeps the app repo small while making the model integration path explicit for the hackathon badges. Hosted provider failures are converted into concise status notes rather than raw request errors.
### Optional Modal Bridge
When `DIFFSENSE_MODAL_ENDPOINT` is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.
## Hackathon Fit
Required criteria:
- Under 32B: Mellum, Nemotron 3 Nano 30B-A3B, Nemotron 3 Nano 4B, and MiniCPM-V 4.6 are all within the hackathon model-size constraint.
- Gradio app: implemented in `app.py`.
- README tags: included in `README.md` front matter.
- Demo-friendly: built-in sample diff produces multiple clear findings without setup.
Prize positioning:
- Backyard AI: practical developer workflow.
- Best Use of Codex: Codex is actively building and shaping the repo.
- Best Agent: staged review pipeline with parsing, classification, review, and summary.
- Off Brand: custom HTML diff UI instead of stock chat.
- Best Demo: one-click sample with visible before/after review value.
- Best MiniCPM Build: MiniCPM-V 4.6 image/diagram context path is implemented.
- Nemotron Hardware Prize: Nemotron 3 Nano routing path is implemented.
- Best Use of Modal: Modal endpoint bridge is implemented and controlled through a Space secret.
- Tiny Titan: Nemotron 3 Nano 4B checker path is implemented.
## Planned Extensions
These should only be added after the current app is deployed and recorded:
1. Add a hosted Modal endpoint and set `DIFFSENSE_MODAL_ENDPOINT`.
2. Add downloadable `.patch` files for suggested fixes.
3. Add richer multimodal demo assets for the MiniCPM-V path.
## Risk Controls
- The app remains useful without model availability.
- Dependencies are limited to Gradio and `huggingface_hub`.
- No pasted diff is sent externally unless the user explicitly enables the model summary.
- Public PR URLs are fetched as public `.diff` documents; private code should be pasted only when the model summary is off.
- The sample diff demonstrates value even during GPU/API outages.
- Model/provider failures are rendered as agent trace notes rather than hard app failures.
|