Spaces:
Runtime error
Runtime error
Integrate sponsor model agents
Browse filesCo-authored-by: Codex <codex@openai.com>
- README.md +20 -4
- TECH_DESIGN.md +49 -5
- app.py +299 -14
README.md
CHANGED
|
@@ -22,15 +22,22 @@ tags:
|
|
| 22 |
- best-agent
|
| 23 |
- off-brand
|
| 24 |
- best-demo
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
models:
|
| 26 |
- JetBrains/Mellum-2-12B-instruct
|
|
|
|
|
|
|
|
|
|
| 27 |
---
|
| 28 |
|
| 29 |
# DiffSense
|
| 30 |
|
| 31 |
Private, offline-first pull request review for teams that cannot send proprietary code to cloud review bots.
|
| 32 |
|
| 33 |
-
Paste a unified diff or a public GitHub PR URL and DiffSense returns severity-tagged findings, inline comments, and structured JSON that can be copied into a PR review. The prototype works without a GPU by using deterministic review rules, then optionally adds
|
| 34 |
|
| 35 |
## Why We Built It
|
| 36 |
|
|
@@ -44,6 +51,10 @@ DiffSense is the small-model version of that workflow: useful immediately, inspe
|
|
| 44 |
- Inline custom diff viewer built in Gradio.
|
| 45 |
- Deterministic review findings for security, logic, maintainability, and test risks.
|
| 46 |
- Public GitHub PR URL fetching through the PR `.diff` endpoint.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
- Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
|
| 48 |
- Optional model-assisted summary using `JetBrains/Mellum-2-12B-instruct` through the Hugging Face Inference API when OAuth is available.
|
| 49 |
|
|
@@ -57,6 +68,10 @@ Prize/badge targets:
|
|
| 57 |
- Best Agent: the product is structured as a review pipeline: parse, classify, review, summarize, render.
|
| 58 |
- Off Brand: the app uses a custom Gradio interface instead of the default chat UI.
|
| 59 |
- Best Demo: the workflow is easy to show in under two minutes with a real risky diff.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
## Planned Model Stack
|
| 62 |
|
|
@@ -66,9 +81,10 @@ All planned models are under the Build Small 32B parameter cap.
|
|
| 66 |
| --- | --- | --- |
|
| 67 |
| Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook implemented |
|
| 68 |
| Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
|
| 69 |
-
| Agentic routing | NVIDIA Nemotron 3 Nano |
|
| 70 |
-
|
|
| 71 |
-
|
|
|
|
|
| 72 |
|
| 73 |
The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.
|
| 74 |
|
|
|
|
| 22 |
- best-agent
|
| 23 |
- off-brand
|
| 24 |
- best-demo
|
| 25 |
+
- best-minicpm-build
|
| 26 |
+
- nemotron-hardware-prize
|
| 27 |
+
- best-use-of-modal
|
| 28 |
+
- tiny-titan
|
| 29 |
models:
|
| 30 |
- JetBrains/Mellum-2-12B-instruct
|
| 31 |
+
- nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
|
| 32 |
+
- nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
|
| 33 |
+
- openbmb/MiniCPM-V-4.6
|
| 34 |
---
|
| 35 |
|
| 36 |
# DiffSense
|
| 37 |
|
| 38 |
Private, offline-first pull request review for teams that cannot send proprietary code to cloud review bots.
|
| 39 |
|
| 40 |
+
Paste a unified diff or a public GitHub PR URL and DiffSense returns severity-tagged findings, inline comments, and structured JSON that can be copied into a PR review. The prototype works without a GPU by using deterministic review rules, then optionally adds Mellum, Nemotron, MiniCPM-V, and Modal provider passes when credentials or endpoints are available.
|
| 41 |
|
| 42 |
## Why We Built It
|
| 43 |
|
|
|
|
| 51 |
- Inline custom diff viewer built in Gradio.
|
| 52 |
- Deterministic review findings for security, logic, maintainability, and test risks.
|
| 53 |
- Public GitHub PR URL fetching through the PR `.diff` endpoint.
|
| 54 |
+
- Optional Nemotron 3 Nano routing/triage pass.
|
| 55 |
+
- Optional Tiny Titan 4B checker pass.
|
| 56 |
+
- Optional MiniCPM-V 4.6 vision pass for PR screenshots, architecture diagrams, and UI diffs.
|
| 57 |
+
- Optional Modal bridge through `DIFFSENSE_MODAL_ENDPOINT`.
|
| 58 |
- Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
|
| 59 |
- Optional model-assisted summary using `JetBrains/Mellum-2-12B-instruct` through the Hugging Face Inference API when OAuth is available.
|
| 60 |
|
|
|
|
| 68 |
- Best Agent: the product is structured as a review pipeline: parse, classify, review, summarize, render.
|
| 69 |
- Off Brand: the app uses a custom Gradio interface instead of the default chat UI.
|
| 70 |
- Best Demo: the workflow is easy to show in under two minutes with a real risky diff.
|
| 71 |
+
- Best MiniCPM Build: MiniCPM-V 4.6 is integrated for optional image/diagram context.
|
| 72 |
+
- Nemotron Hardware Prize: Nemotron 3 Nano is integrated for optional agentic routing.
|
| 73 |
+
- Best Use of Modal: the app includes a provider bridge for a Modal-hosted review endpoint via `DIFFSENSE_MODAL_ENDPOINT`.
|
| 74 |
+
- Tiny Titan: a <=4B Nemotron 3 Nano checker is integrated as a separate optional pass.
|
| 75 |
|
| 76 |
## Planned Model Stack
|
| 77 |
|
|
|
|
| 81 |
| --- | --- | --- |
|
| 82 |
| Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook implemented |
|
| 83 |
| Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
|
| 84 |
+
| Agentic routing | NVIDIA Nemotron 3 Nano | Optional HF inference hook implemented |
|
| 85 |
+
| Tiny checker | NVIDIA Nemotron 3 Nano 4B | Optional HF inference hook implemented |
|
| 86 |
+
| Visual PR context | OpenBMB MiniCPM-V 4.6 | Optional image upload + HF inference hook implemented |
|
| 87 |
+
| Runtime | Modal | Optional provider bridge via `DIFFSENSE_MODAL_ENDPOINT` implemented |
|
| 88 |
|
| 89 |
The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.
|
| 90 |
|
TECH_DESIGN.md
CHANGED
|
@@ -15,6 +15,10 @@ Unified diff input or public GitHub PR URL
|
|
| 15 |
-> structured findings
|
| 16 |
-> custom Gradio HTML diff viewer
|
| 17 |
-> optional Mellum 2 summary via HF OAuth
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
```
|
| 19 |
|
| 20 |
## Components
|
|
@@ -28,6 +32,8 @@ File: `app.py`
|
|
| 28 |
- Accepts pasted unified diffs and public GitHub PR URLs.
|
| 29 |
- Renders an inline diff view with file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes.
|
| 30 |
- Shows structured JSON for automation and judge inspection.
|
|
|
|
|
|
|
| 31 |
|
| 32 |
### Diff Parser
|
| 33 |
|
|
@@ -81,11 +87,45 @@ JetBrains/Mellum-2-12B-instruct
|
|
| 81 |
|
| 82 |
The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
## Hackathon Fit
|
| 85 |
|
| 86 |
Required criteria:
|
| 87 |
|
| 88 |
-
- Under 32B:
|
| 89 |
- Gradio app: implemented in `app.py`.
|
| 90 |
- README tags: included in `README.md` front matter.
|
| 91 |
- Demo-friendly: built-in sample diff produces multiple clear findings without setup.
|
|
@@ -97,15 +137,18 @@ Prize positioning:
|
|
| 97 |
- Best Agent: staged review pipeline with parsing, classification, review, and summary.
|
| 98 |
- Off Brand: custom HTML diff UI instead of stock chat.
|
| 99 |
- Best Demo: one-click sample with visible before/after review value.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
|
| 101 |
## Planned Extensions
|
| 102 |
|
| 103 |
These should only be added after the current app is deployed and recorded:
|
| 104 |
|
| 105 |
-
1. Add Modal endpoint
|
| 106 |
-
2. Add
|
| 107 |
-
3. Add
|
| 108 |
-
4. Generate patch suggestions as downloadable `.patch` files.
|
| 109 |
|
| 110 |
## Risk Controls
|
| 111 |
|
|
@@ -114,3 +157,4 @@ These should only be added after the current app is deployed and recorded:
|
|
| 114 |
- No pasted diff is sent externally unless the user explicitly enables the model summary.
|
| 115 |
- Public PR URLs are fetched as public `.diff` documents; private code should be pasted only when the model summary is off.
|
| 116 |
- The sample diff demonstrates value even during GPU/API outages.
|
|
|
|
|
|
| 15 |
-> structured findings
|
| 16 |
-> custom Gradio HTML diff viewer
|
| 17 |
-> optional Mellum 2 summary via HF OAuth
|
| 18 |
+
-> optional Nemotron 3 Nano routing via HF OAuth
|
| 19 |
+
-> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
|
| 20 |
+
-> optional MiniCPM-V 4.6 vision notes via HF OAuth
|
| 21 |
+
-> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT
|
| 22 |
```
|
| 23 |
|
| 24 |
## Components
|
|
|
|
| 32 |
- Accepts pasted unified diffs and public GitHub PR URLs.
|
| 33 |
- Renders an inline diff view with file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes.
|
| 34 |
- Shows structured JSON for automation and judge inspection.
|
| 35 |
+
- Exposes model/provider toggles for Mellum, Nemotron, Tiny Titan, MiniCPM-V, and Modal.
|
| 36 |
+
- Accepts PR screenshots or diagrams for the MiniCPM-V vision pass.
|
| 37 |
|
| 38 |
### Diff Parser
|
| 39 |
|
|
|
|
| 87 |
|
| 88 |
The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
|
| 89 |
|
| 90 |
+
### Optional Nemotron Router
|
| 91 |
+
|
| 92 |
+
When enabled, the app calls:
|
| 93 |
+
|
| 94 |
+
```text
|
| 95 |
+
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.
|
| 99 |
+
|
| 100 |
+
### Optional Tiny Titan Checker
|
| 101 |
+
|
| 102 |
+
When enabled, the app calls a <=4B model:
|
| 103 |
+
|
| 104 |
+
```text
|
| 105 |
+
nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.
|
| 109 |
+
|
| 110 |
+
### Optional MiniCPM-V Vision Pass
|
| 111 |
+
|
| 112 |
+
When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:
|
| 113 |
+
|
| 114 |
+
```text
|
| 115 |
+
openbmb/MiniCPM-V-4.6
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.
|
| 119 |
+
|
| 120 |
+
### Optional Modal Bridge
|
| 121 |
+
|
| 122 |
+
When `DIFFSENSE_MODAL_ENDPOINT` is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.
|
| 123 |
+
|
| 124 |
## Hackathon Fit
|
| 125 |
|
| 126 |
Required criteria:
|
| 127 |
|
| 128 |
+
- Under 32B: Mellum, Nemotron 3 Nano 30B-A3B, Nemotron 3 Nano 4B, and MiniCPM-V 4.6 are all within the hackathon model-size constraint.
|
| 129 |
- Gradio app: implemented in `app.py`.
|
| 130 |
- README tags: included in `README.md` front matter.
|
| 131 |
- Demo-friendly: built-in sample diff produces multiple clear findings without setup.
|
|
|
|
| 137 |
- Best Agent: staged review pipeline with parsing, classification, review, and summary.
|
| 138 |
- Off Brand: custom HTML diff UI instead of stock chat.
|
| 139 |
- Best Demo: one-click sample with visible before/after review value.
|
| 140 |
+
- Best MiniCPM Build: MiniCPM-V 4.6 image/diagram context path is implemented.
|
| 141 |
+
- Nemotron Hardware Prize: Nemotron 3 Nano routing path is implemented.
|
| 142 |
+
- Best Use of Modal: Modal endpoint bridge is implemented and controlled through a Space secret.
|
| 143 |
+
- Tiny Titan: Nemotron 3 Nano 4B checker path is implemented.
|
| 144 |
|
| 145 |
## Planned Extensions
|
| 146 |
|
| 147 |
These should only be added after the current app is deployed and recorded:
|
| 148 |
|
| 149 |
+
1. Add a hosted Modal endpoint and set `DIFFSENSE_MODAL_ENDPOINT`.
|
| 150 |
+
2. Add downloadable `.patch` files for suggested fixes.
|
| 151 |
+
3. Add richer multimodal demo assets for the MiniCPM-V path.
|
|
|
|
| 152 |
|
| 153 |
## Risk Controls
|
| 154 |
|
|
|
|
| 157 |
- No pasted diff is sent externally unless the user explicitly enables the model summary.
|
| 158 |
- Public PR URLs are fetched as public `.diff` documents; private code should be pasted only when the model summary is off.
|
| 159 |
- The sample diff demonstrates value even during GPU/API outages.
|
| 160 |
+
- Model/provider failures are rendered as agent trace notes rather than hard app failures.
|
app.py
CHANGED
|
@@ -2,9 +2,12 @@ from __future__ import annotations
|
|
| 2 |
|
| 3 |
import html
|
| 4 |
import json
|
|
|
|
|
|
|
| 5 |
import os
|
| 6 |
import re
|
| 7 |
from dataclasses import dataclass, field
|
|
|
|
| 8 |
from typing import Any
|
| 9 |
from urllib.parse import urlparse
|
| 10 |
from urllib.request import Request, urlopen
|
|
@@ -13,8 +16,13 @@ import gradio as gr
|
|
| 13 |
from huggingface_hub import InferenceClient
|
| 14 |
|
| 15 |
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
FETCH_TIMEOUT_SECONDS = 10
|
|
|
|
| 18 |
|
| 19 |
|
| 20 |
CSS = """
|
|
@@ -568,22 +576,236 @@ def summarize_with_model(
|
|
| 568 |
]
|
| 569 |
|
| 570 |
try:
|
| 571 |
-
|
| 572 |
-
response = client.chat_completion(
|
| 573 |
-
messages=messages,
|
| 574 |
-
max_tokens=320,
|
| 575 |
-
temperature=0.2,
|
| 576 |
-
top_p=0.9,
|
| 577 |
-
)
|
| 578 |
-
return response.choices[0].message.content or "Model returned an empty summary."
|
| 579 |
except Exception as exc: # The app must stay demoable when endpoints are unavailable.
|
| 580 |
return summarize_deterministic(
|
| 581 |
files,
|
| 582 |
findings,
|
| 583 |
-
prefix=f"Model summary unavailable from {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 584 |
)
|
| 585 |
|
| 586 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 587 |
def summarize_deterministic(files: list[FileDiff], findings: list[Finding], prefix: str) -> str:
|
| 588 |
hunk_count = sum(len(file.hunks) for file in files)
|
| 589 |
counts = {
|
|
@@ -683,8 +905,13 @@ def render_finding(finding: Finding) -> str:
|
|
| 683 |
def run_review(
|
| 684 |
diff_input: str,
|
| 685 |
use_model_summary: bool,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 686 |
hf_token: gr.OAuthToken | None = None,
|
| 687 |
-
) -> tuple[str, list[dict[str, Any]], str]:
|
| 688 |
diff_text = normalize_diff(diff_input)
|
| 689 |
if not diff_text:
|
| 690 |
raise gr.Error("Paste a unified diff first, or load the sample diff.")
|
|
@@ -694,8 +921,29 @@ def run_review(
|
|
| 694 |
raise gr.Error("I could not find unified diff hunks. Look for lines starting with @@.")
|
| 695 |
|
| 696 |
findings = review_diff(files)
|
|
|
|
| 697 |
summary = summarize_with_model(files, findings, use_model_summary, hf_token)
|
| 698 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 699 |
|
| 700 |
|
| 701 |
def load_sample() -> str:
|
|
@@ -722,6 +970,26 @@ with gr.Blocks() as demo:
|
|
| 722 |
label="Add optional Mellum model summary",
|
| 723 |
info="Deterministic review works without network or GPU. OAuth/HF_TOKEN enables the sponsor-model summary.",
|
| 724 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 725 |
sample_btn = gr.Button("Load sample diff")
|
| 726 |
|
| 727 |
with gr.Row(equal_height=False):
|
|
@@ -734,12 +1002,21 @@ with gr.Blocks() as demo:
|
|
| 734 |
placeholder="Paste a unified diff, paste https://github.com/org/repo/pull/123, or click Load sample diff.",
|
| 735 |
interactive=True,
|
| 736 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 737 |
run_btn = gr.Button("Review diff", variant="primary")
|
| 738 |
with gr.Column(scale=4):
|
| 739 |
summary_output = gr.Markdown(
|
| 740 |
value="Run a review to get the risk summary.",
|
| 741 |
label="Reviewer summary",
|
| 742 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 743 |
json_output = gr.JSON(label="Structured findings")
|
| 744 |
|
| 745 |
review_output = gr.HTML(
|
|
@@ -750,8 +1027,16 @@ with gr.Blocks() as demo:
|
|
| 750 |
sample_btn.click(fn=load_sample, outputs=diff_input)
|
| 751 |
run_btn.click(
|
| 752 |
fn=run_review,
|
| 753 |
-
inputs=[
|
| 754 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 755 |
)
|
| 756 |
|
| 757 |
|
|
|
|
| 2 |
|
| 3 |
import html
|
| 4 |
import json
|
| 5 |
+
import base64
|
| 6 |
+
import mimetypes
|
| 7 |
import os
|
| 8 |
import re
|
| 9 |
from dataclasses import dataclass, field
|
| 10 |
+
from pathlib import Path
|
| 11 |
from typing import Any
|
| 12 |
from urllib.parse import urlparse
|
| 13 |
from urllib.request import Request, urlopen
|
|
|
|
| 16 |
from huggingface_hub import InferenceClient
|
| 17 |
|
| 18 |
|
| 19 |
+
MELLUM_MODEL = os.getenv("DIFFSENSE_MELLUM_MODEL", "JetBrains/Mellum-2-12B-instruct")
|
| 20 |
+
NEMOTRON_MODEL = os.getenv("DIFFSENSE_NEMOTRON_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")
|
| 21 |
+
TINY_TITAN_MODEL = os.getenv("DIFFSENSE_TINY_TITAN_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16")
|
| 22 |
+
MINICPM_MODEL = os.getenv("DIFFSENSE_MINICPM_MODEL", "openbmb/MiniCPM-V-4.6")
|
| 23 |
+
MODAL_ENDPOINT = os.getenv("DIFFSENSE_MODAL_ENDPOINT", "")
|
| 24 |
FETCH_TIMEOUT_SECONDS = 10
|
| 25 |
+
MAX_IMAGE_BYTES = 2_500_000
|
| 26 |
|
| 27 |
|
| 28 |
CSS = """
|
|
|
|
| 576 |
]
|
| 577 |
|
| 578 |
try:
|
| 579 |
+
return call_chat_model(MELLUM_MODEL, messages, token, max_tokens=320)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 580 |
except Exception as exc: # The app must stay demoable when endpoints are unavailable.
|
| 581 |
return summarize_deterministic(
|
| 582 |
files,
|
| 583 |
findings,
|
| 584 |
+
prefix=f"Model summary unavailable from {MELLUM_MODEL}: {exc}",
|
| 585 |
+
)
|
| 586 |
+
|
| 587 |
+
|
| 588 |
+
def call_chat_model(
|
| 589 |
+
model: str,
|
| 590 |
+
messages: list[dict[str, Any]],
|
| 591 |
+
token: str,
|
| 592 |
+
max_tokens: int = 320,
|
| 593 |
+
temperature: float = 0.2,
|
| 594 |
+
) -> str:
|
| 595 |
+
client = InferenceClient(token=token, model=model)
|
| 596 |
+
response = client.chat_completion(
|
| 597 |
+
messages=messages,
|
| 598 |
+
max_tokens=max_tokens,
|
| 599 |
+
temperature=temperature,
|
| 600 |
+
top_p=0.9,
|
| 601 |
+
)
|
| 602 |
+
return response.choices[0].message.content or f"{model} returned an empty response."
|
| 603 |
+
|
| 604 |
+
|
| 605 |
+
def compact_review_context(files: list[FileDiff], findings: list[Finding], max_chars: int = 9000) -> str:
|
| 606 |
+
diff_excerpt = "\n".join(
|
| 607 |
+
f"{file.path}\n"
|
| 608 |
+
+ "\n".join(
|
| 609 |
+
f"{hunk.header}\n"
|
| 610 |
+
+ "\n".join(
|
| 611 |
+
f"{'+' if line.kind == 'add' else '-' if line.kind == 'del' else ' '} {line.text}"
|
| 612 |
+
for line in hunk.lines[:80]
|
| 613 |
+
)
|
| 614 |
+
for hunk in file.hunks[:4]
|
| 615 |
+
)
|
| 616 |
+
for file in files[:6]
|
| 617 |
+
)
|
| 618 |
+
deterministic = json.dumps([finding_to_dict(item) for item in findings[:15]], indent=2)
|
| 619 |
+
return f"Deterministic findings:\n{deterministic}\n\nDiff excerpt:\n{diff_excerpt}"[:max_chars]
|
| 620 |
+
|
| 621 |
+
|
| 622 |
+
def run_nemotron_router(
|
| 623 |
+
files: list[FileDiff],
|
| 624 |
+
findings: list[Finding],
|
| 625 |
+
enabled: bool,
|
| 626 |
+
token: str | None,
|
| 627 |
+
) -> str:
|
| 628 |
+
if not enabled:
|
| 629 |
+
return f"Nemotron router disabled. Model configured: `{NEMOTRON_MODEL}`."
|
| 630 |
+
|
| 631 |
+
if not token:
|
| 632 |
+
return f"Nemotron router ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{NEMOTRON_MODEL}`."
|
| 633 |
+
|
| 634 |
+
messages = [
|
| 635 |
+
{
|
| 636 |
+
"role": "system",
|
| 637 |
+
"content": (
|
| 638 |
+
"You are the DiffSense routing agent. Prioritize code review findings for a PR reviewer. "
|
| 639 |
+
"Return a concise markdown triage plan with: merge risk, files to inspect first, and follow-up tests."
|
| 640 |
+
),
|
| 641 |
+
},
|
| 642 |
+
{"role": "user", "content": compact_review_context(files, findings)},
|
| 643 |
+
]
|
| 644 |
+
try:
|
| 645 |
+
return call_chat_model(NEMOTRON_MODEL, messages, token, max_tokens=360)
|
| 646 |
+
except Exception as exc:
|
| 647 |
+
return (
|
| 648 |
+
f"Nemotron router attempted `{NEMOTRON_MODEL}` but the endpoint was unavailable: {exc}\n\n"
|
| 649 |
+
+ deterministic_router_fallback(files, findings)
|
| 650 |
+
)
|
| 651 |
+
|
| 652 |
+
|
| 653 |
+
def deterministic_router_fallback(files: list[FileDiff], findings: list[Finding]) -> str:
|
| 654 |
+
high_risk = [item for item in findings if item.severity == "critical"]
|
| 655 |
+
risk = "high" if high_risk else "medium" if findings else "low"
|
| 656 |
+
hot_files = []
|
| 657 |
+
for finding in findings:
|
| 658 |
+
if finding.file not in hot_files:
|
| 659 |
+
hot_files.append(finding.file)
|
| 660 |
+
bullets = [
|
| 661 |
+
f"Deterministic router fallback: merge risk is **{risk}**.",
|
| 662 |
+
f"Inspect first: {', '.join(hot_files[:4]) if hot_files else 'no risky files detected'}.",
|
| 663 |
+
"Follow-up tests: cover changed auth/security paths and empty-input branches before merge.",
|
| 664 |
+
]
|
| 665 |
+
return "\n".join(f"- {item}" for item in bullets)
|
| 666 |
+
|
| 667 |
+
|
| 668 |
+
def run_tiny_titan_checker(
|
| 669 |
+
files: list[FileDiff],
|
| 670 |
+
findings: list[Finding],
|
| 671 |
+
enabled: bool,
|
| 672 |
+
token: str | None,
|
| 673 |
+
) -> str:
|
| 674 |
+
if not enabled:
|
| 675 |
+
return f"Tiny Titan checker disabled. Model configured: `{TINY_TITAN_MODEL}`."
|
| 676 |
+
|
| 677 |
+
if not token:
|
| 678 |
+
return f"Tiny Titan checker ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{TINY_TITAN_MODEL}`."
|
| 679 |
+
|
| 680 |
+
messages = [
|
| 681 |
+
{
|
| 682 |
+
"role": "system",
|
| 683 |
+
"content": (
|
| 684 |
+
"You are a compact <=4B code-review sanity checker. Given deterministic PR findings, "
|
| 685 |
+
"return exactly three bullets: one missed-risk hypothesis, one test recommendation, and one merge decision."
|
| 686 |
+
),
|
| 687 |
+
},
|
| 688 |
+
{"role": "user", "content": compact_review_context(files, findings, max_chars=7000)},
|
| 689 |
+
]
|
| 690 |
+
try:
|
| 691 |
+
return call_chat_model(TINY_TITAN_MODEL, messages, token, max_tokens=260)
|
| 692 |
+
except Exception as exc:
|
| 693 |
+
return f"Tiny Titan checker attempted `{TINY_TITAN_MODEL}` but the endpoint was unavailable: {exc}"
|
| 694 |
+
|
| 695 |
+
|
| 696 |
+
def run_minicpm_vision(
|
| 697 |
+
image_files: list[Any] | None,
|
| 698 |
+
files: list[FileDiff],
|
| 699 |
+
findings: list[Finding],
|
| 700 |
+
enabled: bool,
|
| 701 |
+
token: str | None,
|
| 702 |
+
) -> str:
|
| 703 |
+
images = normalize_uploaded_files(image_files)
|
| 704 |
+
if not images:
|
| 705 |
+
return f"MiniCPM-V vision not used: no PR screenshots or diagrams uploaded. Model configured: `{MINICPM_MODEL}`."
|
| 706 |
+
|
| 707 |
+
if not enabled:
|
| 708 |
+
return f"MiniCPM-V vision disabled with {len(images)} image(s) attached. Model configured: `{MINICPM_MODEL}`."
|
| 709 |
+
|
| 710 |
+
if not token:
|
| 711 |
+
return (
|
| 712 |
+
f"MiniCPM-V vision ready with {len(images)} image(s), but no Hugging Face OAuth/HF_TOKEN is available. "
|
| 713 |
+
f"Model configured: `{MINICPM_MODEL}`."
|
| 714 |
+
)
|
| 715 |
+
|
| 716 |
+
prompt = (
|
| 717 |
+
"You are DiffSense vision context. Read these PR screenshots, UI diffs, or architecture diagrams. "
|
| 718 |
+
"Return concise markdown notes that could affect code review: changed behavior, missing tests, security risks, "
|
| 719 |
+
"or inconsistencies with the code diff.\n\n"
|
| 720 |
+
+ compact_review_context(files, findings, max_chars=3500)
|
| 721 |
+
)
|
| 722 |
+
content: list[dict[str, Any]] = [{"type": "text", "text": prompt}]
|
| 723 |
+
skipped = 0
|
| 724 |
+
for path in images[:3]:
|
| 725 |
+
data_url = image_to_data_url(path)
|
| 726 |
+
if data_url:
|
| 727 |
+
content.append({"type": "image_url", "image_url": {"url": data_url}})
|
| 728 |
+
else:
|
| 729 |
+
skipped += 1
|
| 730 |
+
|
| 731 |
+
if len(content) == 1:
|
| 732 |
+
return f"MiniCPM-V vision could not read the uploaded image files. {skipped} file(s) were skipped."
|
| 733 |
+
|
| 734 |
+
messages = [{"role": "user", "content": content}]
|
| 735 |
+
try:
|
| 736 |
+
return call_chat_model(MINICPM_MODEL, messages, token, max_tokens=420)
|
| 737 |
+
except Exception as exc:
|
| 738 |
+
return (
|
| 739 |
+
f"MiniCPM-V attempted `{MINICPM_MODEL}` on {len(content) - 1} image(s), "
|
| 740 |
+
f"but the endpoint was unavailable: {exc}"
|
| 741 |
)
|
| 742 |
|
| 743 |
|
| 744 |
+
def normalize_uploaded_files(image_files: list[Any] | None) -> list[str]:
|
| 745 |
+
if not image_files:
|
| 746 |
+
return []
|
| 747 |
+
paths: list[str] = []
|
| 748 |
+
for file_obj in image_files:
|
| 749 |
+
if isinstance(file_obj, str):
|
| 750 |
+
paths.append(file_obj)
|
| 751 |
+
elif isinstance(file_obj, dict) and file_obj.get("path"):
|
| 752 |
+
paths.append(str(file_obj["path"]))
|
| 753 |
+
elif hasattr(file_obj, "name"):
|
| 754 |
+
paths.append(str(file_obj.name))
|
| 755 |
+
elif hasattr(file_obj, "path"):
|
| 756 |
+
paths.append(str(file_obj.path))
|
| 757 |
+
return [path for path in paths if Path(path).exists()]
|
| 758 |
+
|
| 759 |
+
|
| 760 |
+
def image_to_data_url(path: str) -> str | None:
|
| 761 |
+
file_path = Path(path)
|
| 762 |
+
if not file_path.exists() or file_path.stat().st_size > MAX_IMAGE_BYTES:
|
| 763 |
+
return None
|
| 764 |
+
|
| 765 |
+
mime_type, _ = mimetypes.guess_type(file_path.name)
|
| 766 |
+
if mime_type not in {"image/png", "image/jpeg", "image/webp"}:
|
| 767 |
+
return None
|
| 768 |
+
|
| 769 |
+
encoded = base64.b64encode(file_path.read_bytes()).decode("ascii")
|
| 770 |
+
return f"data:{mime_type};base64,{encoded}"
|
| 771 |
+
|
| 772 |
+
|
| 773 |
+
def run_modal_bridge(
|
| 774 |
+
files: list[FileDiff],
|
| 775 |
+
findings: list[Finding],
|
| 776 |
+
enabled: bool,
|
| 777 |
+
) -> str:
|
| 778 |
+
if not enabled:
|
| 779 |
+
return "Modal bridge disabled."
|
| 780 |
+
|
| 781 |
+
if not MODAL_ENDPOINT:
|
| 782 |
+
return "Modal bridge ready, but `DIFFSENSE_MODAL_ENDPOINT` is not configured as a Space secret."
|
| 783 |
+
|
| 784 |
+
payload = json.dumps(
|
| 785 |
+
{
|
| 786 |
+
"context": compact_review_context(files, findings, max_chars=12000),
|
| 787 |
+
"findings": [finding_to_dict(item) for item in findings],
|
| 788 |
+
"models": {
|
| 789 |
+
"mellum": MELLUM_MODEL,
|
| 790 |
+
"nemotron": NEMOTRON_MODEL,
|
| 791 |
+
"minicpm": MINICPM_MODEL,
|
| 792 |
+
},
|
| 793 |
+
}
|
| 794 |
+
).encode("utf-8")
|
| 795 |
+
request = Request(
|
| 796 |
+
MODAL_ENDPOINT,
|
| 797 |
+
data=payload,
|
| 798 |
+
headers={"Content-Type": "application/json", "User-Agent": "DiffSense/1.0"},
|
| 799 |
+
method="POST",
|
| 800 |
+
)
|
| 801 |
+
try:
|
| 802 |
+
with urlopen(request, timeout=20) as response:
|
| 803 |
+
body = response.read(20_000).decode("utf-8", errors="replace")
|
| 804 |
+
return f"Modal endpoint `{MODAL_ENDPOINT}` responded:\n\n```json\n{body}\n```"
|
| 805 |
+
except Exception as exc:
|
| 806 |
+
return f"Modal bridge attempted `{MODAL_ENDPOINT}` but failed: {exc}"
|
| 807 |
+
|
| 808 |
+
|
| 809 |
def summarize_deterministic(files: list[FileDiff], findings: list[Finding], prefix: str) -> str:
|
| 810 |
hunk_count = sum(len(file.hunks) for file in files)
|
| 811 |
counts = {
|
|
|
|
| 905 |
def run_review(
|
| 906 |
diff_input: str,
|
| 907 |
use_model_summary: bool,
|
| 908 |
+
use_nemotron_router: bool,
|
| 909 |
+
use_tiny_titan: bool,
|
| 910 |
+
use_minicpm_vision: bool,
|
| 911 |
+
use_modal_bridge: bool,
|
| 912 |
+
image_files: list[Any] | None,
|
| 913 |
hf_token: gr.OAuthToken | None = None,
|
| 914 |
+
) -> tuple[str, list[dict[str, Any]], str, str]:
|
| 915 |
diff_text = normalize_diff(diff_input)
|
| 916 |
if not diff_text:
|
| 917 |
raise gr.Error("Paste a unified diff first, or load the sample diff.")
|
|
|
|
| 921 |
raise gr.Error("I could not find unified diff hunks. Look for lines starting with @@.")
|
| 922 |
|
| 923 |
findings = review_diff(files)
|
| 924 |
+
token = hf_token.token if hf_token else os.getenv("HF_TOKEN")
|
| 925 |
summary = summarize_with_model(files, findings, use_model_summary, hf_token)
|
| 926 |
+
nemotron_notes = run_nemotron_router(files, findings, use_nemotron_router, token)
|
| 927 |
+
tiny_titan_notes = run_tiny_titan_checker(files, findings, use_tiny_titan, token)
|
| 928 |
+
minicpm_notes = run_minicpm_vision(image_files, files, findings, use_minicpm_vision, token)
|
| 929 |
+
modal_notes = run_modal_bridge(files, findings, use_modal_bridge)
|
| 930 |
+
agent_trace = render_agent_trace(nemotron_notes, tiny_titan_notes, minicpm_notes, modal_notes)
|
| 931 |
+
return render_review(files, findings), [finding_to_dict(item) for item in findings], summary, agent_trace
|
| 932 |
+
|
| 933 |
+
|
| 934 |
+
def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes: str, modal_notes: str) -> str:
|
| 935 |
+
return "\n\n".join(
|
| 936 |
+
[
|
| 937 |
+
"### Nemotron 3 Nano Router",
|
| 938 |
+
nemotron_notes,
|
| 939 |
+
"### Tiny Titan 4B Checker",
|
| 940 |
+
tiny_titan_notes,
|
| 941 |
+
"### MiniCPM-V 4.6 Vision Context",
|
| 942 |
+
minicpm_notes,
|
| 943 |
+
"### Modal Provider Bridge",
|
| 944 |
+
modal_notes,
|
| 945 |
+
]
|
| 946 |
+
)
|
| 947 |
|
| 948 |
|
| 949 |
def load_sample() -> str:
|
|
|
|
| 970 |
label="Add optional Mellum model summary",
|
| 971 |
info="Deterministic review works without network or GPU. OAuth/HF_TOKEN enables the sponsor-model summary.",
|
| 972 |
)
|
| 973 |
+
use_nemotron_router = gr.Checkbox(
|
| 974 |
+
value=False,
|
| 975 |
+
label="Run Nemotron 3 Nano router",
|
| 976 |
+
info=f"Uses {NEMOTRON_MODEL} when OAuth/HF_TOKEN is available.",
|
| 977 |
+
)
|
| 978 |
+
use_tiny_titan = gr.Checkbox(
|
| 979 |
+
value=False,
|
| 980 |
+
label="Run Tiny Titan 4B checker",
|
| 981 |
+
info=f"Uses {TINY_TITAN_MODEL} when OAuth/HF_TOKEN is available.",
|
| 982 |
+
)
|
| 983 |
+
use_minicpm_vision = gr.Checkbox(
|
| 984 |
+
value=False,
|
| 985 |
+
label="Run MiniCPM-V 4.6 vision",
|
| 986 |
+
info=f"Uses {MINICPM_MODEL} on uploaded PR images.",
|
| 987 |
+
)
|
| 988 |
+
use_modal_bridge = gr.Checkbox(
|
| 989 |
+
value=False,
|
| 990 |
+
label="Send payload to Modal bridge",
|
| 991 |
+
info="Uses DIFFSENSE_MODAL_ENDPOINT when configured.",
|
| 992 |
+
)
|
| 993 |
sample_btn = gr.Button("Load sample diff")
|
| 994 |
|
| 995 |
with gr.Row(equal_height=False):
|
|
|
|
| 1002 |
placeholder="Paste a unified diff, paste https://github.com/org/repo/pull/123, or click Load sample diff.",
|
| 1003 |
interactive=True,
|
| 1004 |
)
|
| 1005 |
+
image_files = gr.File(
|
| 1006 |
+
label="PR screenshots or diagrams for MiniCPM-V",
|
| 1007 |
+
file_count="multiple",
|
| 1008 |
+
file_types=["image"],
|
| 1009 |
+
)
|
| 1010 |
run_btn = gr.Button("Review diff", variant="primary")
|
| 1011 |
with gr.Column(scale=4):
|
| 1012 |
summary_output = gr.Markdown(
|
| 1013 |
value="Run a review to get the risk summary.",
|
| 1014 |
label="Reviewer summary",
|
| 1015 |
)
|
| 1016 |
+
agent_output = gr.Markdown(
|
| 1017 |
+
value="Enable Nemotron or MiniCPM-V to see model-agent traces here.",
|
| 1018 |
+
label="Model agent trace",
|
| 1019 |
+
)
|
| 1020 |
json_output = gr.JSON(label="Structured findings")
|
| 1021 |
|
| 1022 |
review_output = gr.HTML(
|
|
|
|
| 1027 |
sample_btn.click(fn=load_sample, outputs=diff_input)
|
| 1028 |
run_btn.click(
|
| 1029 |
fn=run_review,
|
| 1030 |
+
inputs=[
|
| 1031 |
+
diff_input,
|
| 1032 |
+
use_model_summary,
|
| 1033 |
+
use_nemotron_router,
|
| 1034 |
+
use_tiny_titan,
|
| 1035 |
+
use_minicpm_vision,
|
| 1036 |
+
use_modal_bridge,
|
| 1037 |
+
image_files,
|
| 1038 |
+
],
|
| 1039 |
+
outputs=[review_output, json_output, summary_output, agent_output],
|
| 1040 |
)
|
| 1041 |
|
| 1042 |
|