avaliev Codex commited on
Commit
2616e64
·
1 Parent(s): a5cb79c

Integrate sponsor model agents

Browse files

Co-authored-by: Codex <codex@openai.com>

Files changed (3) hide show
  1. README.md +20 -4
  2. TECH_DESIGN.md +49 -5
  3. app.py +299 -14
README.md CHANGED
@@ -22,15 +22,22 @@ tags:
22
  - best-agent
23
  - off-brand
24
  - best-demo
 
 
 
 
25
  models:
26
  - JetBrains/Mellum-2-12B-instruct
 
 
 
27
  ---
28
 
29
  # DiffSense
30
 
31
  Private, offline-first pull request review for teams that cannot send proprietary code to cloud review bots.
32
 
33
- Paste a unified diff or a public GitHub PR URL and DiffSense returns severity-tagged findings, inline comments, and structured JSON that can be copied into a PR review. The prototype works without a GPU by using deterministic review rules, then optionally adds a small-model summary through Hugging Face OAuth.
34
 
35
  ## Why We Built It
36
 
@@ -44,6 +51,10 @@ DiffSense is the small-model version of that workflow: useful immediately, inspe
44
  - Inline custom diff viewer built in Gradio.
45
  - Deterministic review findings for security, logic, maintainability, and test risks.
46
  - Public GitHub PR URL fetching through the PR `.diff` endpoint.
 
 
 
 
47
  - Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
48
  - Optional model-assisted summary using `JetBrains/Mellum-2-12B-instruct` through the Hugging Face Inference API when OAuth is available.
49
 
@@ -57,6 +68,10 @@ Prize/badge targets:
57
  - Best Agent: the product is structured as a review pipeline: parse, classify, review, summarize, render.
58
  - Off Brand: the app uses a custom Gradio interface instead of the default chat UI.
59
  - Best Demo: the workflow is easy to show in under two minutes with a real risky diff.
 
 
 
 
60
 
61
  ## Planned Model Stack
62
 
@@ -66,9 +81,10 @@ All planned models are under the Build Small 32B parameter cap.
66
  | --- | --- | --- |
67
  | Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook implemented |
68
  | Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
69
- | Agentic routing | NVIDIA Nemotron 3 Nano | Planned extension, not submitted as current eligibility |
70
- | Visual PR context | OpenBMB MiniCPM-V 4.6 | Planned extension, not submitted as current eligibility |
71
- | Runtime | Modal | Planned extension, not submitted as current eligibility |
 
72
 
73
  The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.
74
 
 
22
  - best-agent
23
  - off-brand
24
  - best-demo
25
+ - best-minicpm-build
26
+ - nemotron-hardware-prize
27
+ - best-use-of-modal
28
+ - tiny-titan
29
  models:
30
  - JetBrains/Mellum-2-12B-instruct
31
+ - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
32
+ - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
33
+ - openbmb/MiniCPM-V-4.6
34
  ---
35
 
36
  # DiffSense
37
 
38
  Private, offline-first pull request review for teams that cannot send proprietary code to cloud review bots.
39
 
40
+ Paste a unified diff or a public GitHub PR URL and DiffSense returns severity-tagged findings, inline comments, and structured JSON that can be copied into a PR review. The prototype works without a GPU by using deterministic review rules, then optionally adds Mellum, Nemotron, MiniCPM-V, and Modal provider passes when credentials or endpoints are available.
41
 
42
  ## Why We Built It
43
 
 
51
  - Inline custom diff viewer built in Gradio.
52
  - Deterministic review findings for security, logic, maintainability, and test risks.
53
  - Public GitHub PR URL fetching through the PR `.diff` endpoint.
54
+ - Optional Nemotron 3 Nano routing/triage pass.
55
+ - Optional Tiny Titan 4B checker pass.
56
+ - Optional MiniCPM-V 4.6 vision pass for PR screenshots, architecture diagrams, and UI diffs.
57
+ - Optional Modal bridge through `DIFFSENSE_MODAL_ENDPOINT`.
58
  - Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
59
  - Optional model-assisted summary using `JetBrains/Mellum-2-12B-instruct` through the Hugging Face Inference API when OAuth is available.
60
 
 
68
  - Best Agent: the product is structured as a review pipeline: parse, classify, review, summarize, render.
69
  - Off Brand: the app uses a custom Gradio interface instead of the default chat UI.
70
  - Best Demo: the workflow is easy to show in under two minutes with a real risky diff.
71
+ - Best MiniCPM Build: MiniCPM-V 4.6 is integrated for optional image/diagram context.
72
+ - Nemotron Hardware Prize: Nemotron 3 Nano is integrated for optional agentic routing.
73
+ - Best Use of Modal: the app includes a provider bridge for a Modal-hosted review endpoint via `DIFFSENSE_MODAL_ENDPOINT`.
74
+ - Tiny Titan: a <=4B Nemotron 3 Nano checker is integrated as a separate optional pass.
75
 
76
  ## Planned Model Stack
77
 
 
81
  | --- | --- | --- |
82
  | Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook implemented |
83
  | Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
84
+ | Agentic routing | NVIDIA Nemotron 3 Nano | Optional HF inference hook implemented |
85
+ | Tiny checker | NVIDIA Nemotron 3 Nano 4B | Optional HF inference hook implemented |
86
+ | Visual PR context | OpenBMB MiniCPM-V 4.6 | Optional image upload + HF inference hook implemented |
87
+ | Runtime | Modal | Optional provider bridge via `DIFFSENSE_MODAL_ENDPOINT` implemented |
88
 
89
  The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.
90
 
TECH_DESIGN.md CHANGED
@@ -15,6 +15,10 @@ Unified diff input or public GitHub PR URL
15
  -> structured findings
16
  -> custom Gradio HTML diff viewer
17
  -> optional Mellum 2 summary via HF OAuth
 
 
 
 
18
  ```
19
 
20
  ## Components
@@ -28,6 +32,8 @@ File: `app.py`
28
  - Accepts pasted unified diffs and public GitHub PR URLs.
29
  - Renders an inline diff view with file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes.
30
  - Shows structured JSON for automation and judge inspection.
 
 
31
 
32
  ### Diff Parser
33
 
@@ -81,11 +87,45 @@ JetBrains/Mellum-2-12B-instruct
81
 
82
  The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ## Hackathon Fit
85
 
86
  Required criteria:
87
 
88
- - Under 32B: current optional model target is 12B; planned sponsor models are also under 32B.
89
  - Gradio app: implemented in `app.py`.
90
  - README tags: included in `README.md` front matter.
91
  - Demo-friendly: built-in sample diff produces multiple clear findings without setup.
@@ -97,15 +137,18 @@ Prize positioning:
97
  - Best Agent: staged review pipeline with parsing, classification, review, and summary.
98
  - Off Brand: custom HTML diff UI instead of stock chat.
99
  - Best Demo: one-click sample with visible before/after review value.
 
 
 
 
100
 
101
  ## Planned Extensions
102
 
103
  These should only be added after the current app is deployed and recorded:
104
 
105
- 1. Add Modal endpoint for open-weight Mellum inference.
106
- 2. Add MiniCPM-V image upload for PR screenshots and architecture diagrams.
107
- 3. Add Nemotron router only if there is enough time to make it real and visible.
108
- 4. Generate patch suggestions as downloadable `.patch` files.
109
 
110
  ## Risk Controls
111
 
@@ -114,3 +157,4 @@ These should only be added after the current app is deployed and recorded:
114
  - No pasted diff is sent externally unless the user explicitly enables the model summary.
115
  - Public PR URLs are fetched as public `.diff` documents; private code should be pasted only when the model summary is off.
116
  - The sample diff demonstrates value even during GPU/API outages.
 
 
15
  -> structured findings
16
  -> custom Gradio HTML diff viewer
17
  -> optional Mellum 2 summary via HF OAuth
18
+ -> optional Nemotron 3 Nano routing via HF OAuth
19
+ -> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
20
+ -> optional MiniCPM-V 4.6 vision notes via HF OAuth
21
+ -> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT
22
  ```
23
 
24
  ## Components
 
32
  - Accepts pasted unified diffs and public GitHub PR URLs.
33
  - Renders an inline diff view with file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes.
34
  - Shows structured JSON for automation and judge inspection.
35
+ - Exposes model/provider toggles for Mellum, Nemotron, Tiny Titan, MiniCPM-V, and Modal.
36
+ - Accepts PR screenshots or diagrams for the MiniCPM-V vision pass.
37
 
38
  ### Diff Parser
39
 
 
87
 
88
  The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
89
 
90
+ ### Optional Nemotron Router
91
+
92
+ When enabled, the app calls:
93
+
94
+ ```text
95
+ nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
96
+ ```
97
+
98
+ Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.
99
+
100
+ ### Optional Tiny Titan Checker
101
+
102
+ When enabled, the app calls a <=4B model:
103
+
104
+ ```text
105
+ nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
106
+ ```
107
+
108
+ This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.
109
+
110
+ ### Optional MiniCPM-V Vision Pass
111
+
112
+ When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:
113
+
114
+ ```text
115
+ openbmb/MiniCPM-V-4.6
116
+ ```
117
+
118
+ This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.
119
+
120
+ ### Optional Modal Bridge
121
+
122
+ When `DIFFSENSE_MODAL_ENDPOINT` is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.
123
+
124
  ## Hackathon Fit
125
 
126
  Required criteria:
127
 
128
+ - Under 32B: Mellum, Nemotron 3 Nano 30B-A3B, Nemotron 3 Nano 4B, and MiniCPM-V 4.6 are all within the hackathon model-size constraint.
129
  - Gradio app: implemented in `app.py`.
130
  - README tags: included in `README.md` front matter.
131
  - Demo-friendly: built-in sample diff produces multiple clear findings without setup.
 
137
  - Best Agent: staged review pipeline with parsing, classification, review, and summary.
138
  - Off Brand: custom HTML diff UI instead of stock chat.
139
  - Best Demo: one-click sample with visible before/after review value.
140
+ - Best MiniCPM Build: MiniCPM-V 4.6 image/diagram context path is implemented.
141
+ - Nemotron Hardware Prize: Nemotron 3 Nano routing path is implemented.
142
+ - Best Use of Modal: Modal endpoint bridge is implemented and controlled through a Space secret.
143
+ - Tiny Titan: Nemotron 3 Nano 4B checker path is implemented.
144
 
145
  ## Planned Extensions
146
 
147
  These should only be added after the current app is deployed and recorded:
148
 
149
+ 1. Add a hosted Modal endpoint and set `DIFFSENSE_MODAL_ENDPOINT`.
150
+ 2. Add downloadable `.patch` files for suggested fixes.
151
+ 3. Add richer multimodal demo assets for the MiniCPM-V path.
 
152
 
153
  ## Risk Controls
154
 
 
157
  - No pasted diff is sent externally unless the user explicitly enables the model summary.
158
  - Public PR URLs are fetched as public `.diff` documents; private code should be pasted only when the model summary is off.
159
  - The sample diff demonstrates value even during GPU/API outages.
160
+ - Model/provider failures are rendered as agent trace notes rather than hard app failures.
app.py CHANGED
@@ -2,9 +2,12 @@ from __future__ import annotations
2
 
3
  import html
4
  import json
 
 
5
  import os
6
  import re
7
  from dataclasses import dataclass, field
 
8
  from typing import Any
9
  from urllib.parse import urlparse
10
  from urllib.request import Request, urlopen
@@ -13,8 +16,13 @@ import gradio as gr
13
  from huggingface_hub import InferenceClient
14
 
15
 
16
- DEFAULT_MODEL = os.getenv("DIFFSENSE_MODEL", "JetBrains/Mellum-2-12B-instruct")
 
 
 
 
17
  FETCH_TIMEOUT_SECONDS = 10
 
18
 
19
 
20
  CSS = """
@@ -568,22 +576,236 @@ def summarize_with_model(
568
  ]
569
 
570
  try:
571
- client = InferenceClient(token=token, model=DEFAULT_MODEL)
572
- response = client.chat_completion(
573
- messages=messages,
574
- max_tokens=320,
575
- temperature=0.2,
576
- top_p=0.9,
577
- )
578
- return response.choices[0].message.content or "Model returned an empty summary."
579
  except Exception as exc: # The app must stay demoable when endpoints are unavailable.
580
  return summarize_deterministic(
581
  files,
582
  findings,
583
- prefix=f"Model summary unavailable from {DEFAULT_MODEL}: {exc}",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
584
  )
585
 
586
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
587
  def summarize_deterministic(files: list[FileDiff], findings: list[Finding], prefix: str) -> str:
588
  hunk_count = sum(len(file.hunks) for file in files)
589
  counts = {
@@ -683,8 +905,13 @@ def render_finding(finding: Finding) -> str:
683
  def run_review(
684
  diff_input: str,
685
  use_model_summary: bool,
 
 
 
 
 
686
  hf_token: gr.OAuthToken | None = None,
687
- ) -> tuple[str, list[dict[str, Any]], str]:
688
  diff_text = normalize_diff(diff_input)
689
  if not diff_text:
690
  raise gr.Error("Paste a unified diff first, or load the sample diff.")
@@ -694,8 +921,29 @@ def run_review(
694
  raise gr.Error("I could not find unified diff hunks. Look for lines starting with @@.")
695
 
696
  findings = review_diff(files)
 
697
  summary = summarize_with_model(files, findings, use_model_summary, hf_token)
698
- return render_review(files, findings), [finding_to_dict(item) for item in findings], summary
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
699
 
700
 
701
  def load_sample() -> str:
@@ -722,6 +970,26 @@ with gr.Blocks() as demo:
722
  label="Add optional Mellum model summary",
723
  info="Deterministic review works without network or GPU. OAuth/HF_TOKEN enables the sponsor-model summary.",
724
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
725
  sample_btn = gr.Button("Load sample diff")
726
 
727
  with gr.Row(equal_height=False):
@@ -734,12 +1002,21 @@ with gr.Blocks() as demo:
734
  placeholder="Paste a unified diff, paste https://github.com/org/repo/pull/123, or click Load sample diff.",
735
  interactive=True,
736
  )
 
 
 
 
 
737
  run_btn = gr.Button("Review diff", variant="primary")
738
  with gr.Column(scale=4):
739
  summary_output = gr.Markdown(
740
  value="Run a review to get the risk summary.",
741
  label="Reviewer summary",
742
  )
 
 
 
 
743
  json_output = gr.JSON(label="Structured findings")
744
 
745
  review_output = gr.HTML(
@@ -750,8 +1027,16 @@ with gr.Blocks() as demo:
750
  sample_btn.click(fn=load_sample, outputs=diff_input)
751
  run_btn.click(
752
  fn=run_review,
753
- inputs=[diff_input, use_model_summary],
754
- outputs=[review_output, json_output, summary_output],
 
 
 
 
 
 
 
 
755
  )
756
 
757
 
 
2
 
3
  import html
4
  import json
5
+ import base64
6
+ import mimetypes
7
  import os
8
  import re
9
  from dataclasses import dataclass, field
10
+ from pathlib import Path
11
  from typing import Any
12
  from urllib.parse import urlparse
13
  from urllib.request import Request, urlopen
 
16
  from huggingface_hub import InferenceClient
17
 
18
 
19
+ MELLUM_MODEL = os.getenv("DIFFSENSE_MELLUM_MODEL", "JetBrains/Mellum-2-12B-instruct")
20
+ NEMOTRON_MODEL = os.getenv("DIFFSENSE_NEMOTRON_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")
21
+ TINY_TITAN_MODEL = os.getenv("DIFFSENSE_TINY_TITAN_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16")
22
+ MINICPM_MODEL = os.getenv("DIFFSENSE_MINICPM_MODEL", "openbmb/MiniCPM-V-4.6")
23
+ MODAL_ENDPOINT = os.getenv("DIFFSENSE_MODAL_ENDPOINT", "")
24
  FETCH_TIMEOUT_SECONDS = 10
25
+ MAX_IMAGE_BYTES = 2_500_000
26
 
27
 
28
  CSS = """
 
576
  ]
577
 
578
  try:
579
+ return call_chat_model(MELLUM_MODEL, messages, token, max_tokens=320)
 
 
 
 
 
 
 
580
  except Exception as exc: # The app must stay demoable when endpoints are unavailable.
581
  return summarize_deterministic(
582
  files,
583
  findings,
584
+ prefix=f"Model summary unavailable from {MELLUM_MODEL}: {exc}",
585
+ )
586
+
587
+
588
+ def call_chat_model(
589
+ model: str,
590
+ messages: list[dict[str, Any]],
591
+ token: str,
592
+ max_tokens: int = 320,
593
+ temperature: float = 0.2,
594
+ ) -> str:
595
+ client = InferenceClient(token=token, model=model)
596
+ response = client.chat_completion(
597
+ messages=messages,
598
+ max_tokens=max_tokens,
599
+ temperature=temperature,
600
+ top_p=0.9,
601
+ )
602
+ return response.choices[0].message.content or f"{model} returned an empty response."
603
+
604
+
605
+ def compact_review_context(files: list[FileDiff], findings: list[Finding], max_chars: int = 9000) -> str:
606
+ diff_excerpt = "\n".join(
607
+ f"{file.path}\n"
608
+ + "\n".join(
609
+ f"{hunk.header}\n"
610
+ + "\n".join(
611
+ f"{'+' if line.kind == 'add' else '-' if line.kind == 'del' else ' '} {line.text}"
612
+ for line in hunk.lines[:80]
613
+ )
614
+ for hunk in file.hunks[:4]
615
+ )
616
+ for file in files[:6]
617
+ )
618
+ deterministic = json.dumps([finding_to_dict(item) for item in findings[:15]], indent=2)
619
+ return f"Deterministic findings:\n{deterministic}\n\nDiff excerpt:\n{diff_excerpt}"[:max_chars]
620
+
621
+
622
+ def run_nemotron_router(
623
+ files: list[FileDiff],
624
+ findings: list[Finding],
625
+ enabled: bool,
626
+ token: str | None,
627
+ ) -> str:
628
+ if not enabled:
629
+ return f"Nemotron router disabled. Model configured: `{NEMOTRON_MODEL}`."
630
+
631
+ if not token:
632
+ return f"Nemotron router ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{NEMOTRON_MODEL}`."
633
+
634
+ messages = [
635
+ {
636
+ "role": "system",
637
+ "content": (
638
+ "You are the DiffSense routing agent. Prioritize code review findings for a PR reviewer. "
639
+ "Return a concise markdown triage plan with: merge risk, files to inspect first, and follow-up tests."
640
+ ),
641
+ },
642
+ {"role": "user", "content": compact_review_context(files, findings)},
643
+ ]
644
+ try:
645
+ return call_chat_model(NEMOTRON_MODEL, messages, token, max_tokens=360)
646
+ except Exception as exc:
647
+ return (
648
+ f"Nemotron router attempted `{NEMOTRON_MODEL}` but the endpoint was unavailable: {exc}\n\n"
649
+ + deterministic_router_fallback(files, findings)
650
+ )
651
+
652
+
653
+ def deterministic_router_fallback(files: list[FileDiff], findings: list[Finding]) -> str:
654
+ high_risk = [item for item in findings if item.severity == "critical"]
655
+ risk = "high" if high_risk else "medium" if findings else "low"
656
+ hot_files = []
657
+ for finding in findings:
658
+ if finding.file not in hot_files:
659
+ hot_files.append(finding.file)
660
+ bullets = [
661
+ f"Deterministic router fallback: merge risk is **{risk}**.",
662
+ f"Inspect first: {', '.join(hot_files[:4]) if hot_files else 'no risky files detected'}.",
663
+ "Follow-up tests: cover changed auth/security paths and empty-input branches before merge.",
664
+ ]
665
+ return "\n".join(f"- {item}" for item in bullets)
666
+
667
+
668
+ def run_tiny_titan_checker(
669
+ files: list[FileDiff],
670
+ findings: list[Finding],
671
+ enabled: bool,
672
+ token: str | None,
673
+ ) -> str:
674
+ if not enabled:
675
+ return f"Tiny Titan checker disabled. Model configured: `{TINY_TITAN_MODEL}`."
676
+
677
+ if not token:
678
+ return f"Tiny Titan checker ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{TINY_TITAN_MODEL}`."
679
+
680
+ messages = [
681
+ {
682
+ "role": "system",
683
+ "content": (
684
+ "You are a compact <=4B code-review sanity checker. Given deterministic PR findings, "
685
+ "return exactly three bullets: one missed-risk hypothesis, one test recommendation, and one merge decision."
686
+ ),
687
+ },
688
+ {"role": "user", "content": compact_review_context(files, findings, max_chars=7000)},
689
+ ]
690
+ try:
691
+ return call_chat_model(TINY_TITAN_MODEL, messages, token, max_tokens=260)
692
+ except Exception as exc:
693
+ return f"Tiny Titan checker attempted `{TINY_TITAN_MODEL}` but the endpoint was unavailable: {exc}"
694
+
695
+
696
+ def run_minicpm_vision(
697
+ image_files: list[Any] | None,
698
+ files: list[FileDiff],
699
+ findings: list[Finding],
700
+ enabled: bool,
701
+ token: str | None,
702
+ ) -> str:
703
+ images = normalize_uploaded_files(image_files)
704
+ if not images:
705
+ return f"MiniCPM-V vision not used: no PR screenshots or diagrams uploaded. Model configured: `{MINICPM_MODEL}`."
706
+
707
+ if not enabled:
708
+ return f"MiniCPM-V vision disabled with {len(images)} image(s) attached. Model configured: `{MINICPM_MODEL}`."
709
+
710
+ if not token:
711
+ return (
712
+ f"MiniCPM-V vision ready with {len(images)} image(s), but no Hugging Face OAuth/HF_TOKEN is available. "
713
+ f"Model configured: `{MINICPM_MODEL}`."
714
+ )
715
+
716
+ prompt = (
717
+ "You are DiffSense vision context. Read these PR screenshots, UI diffs, or architecture diagrams. "
718
+ "Return concise markdown notes that could affect code review: changed behavior, missing tests, security risks, "
719
+ "or inconsistencies with the code diff.\n\n"
720
+ + compact_review_context(files, findings, max_chars=3500)
721
+ )
722
+ content: list[dict[str, Any]] = [{"type": "text", "text": prompt}]
723
+ skipped = 0
724
+ for path in images[:3]:
725
+ data_url = image_to_data_url(path)
726
+ if data_url:
727
+ content.append({"type": "image_url", "image_url": {"url": data_url}})
728
+ else:
729
+ skipped += 1
730
+
731
+ if len(content) == 1:
732
+ return f"MiniCPM-V vision could not read the uploaded image files. {skipped} file(s) were skipped."
733
+
734
+ messages = [{"role": "user", "content": content}]
735
+ try:
736
+ return call_chat_model(MINICPM_MODEL, messages, token, max_tokens=420)
737
+ except Exception as exc:
738
+ return (
739
+ f"MiniCPM-V attempted `{MINICPM_MODEL}` on {len(content) - 1} image(s), "
740
+ f"but the endpoint was unavailable: {exc}"
741
  )
742
 
743
 
744
+ def normalize_uploaded_files(image_files: list[Any] | None) -> list[str]:
745
+ if not image_files:
746
+ return []
747
+ paths: list[str] = []
748
+ for file_obj in image_files:
749
+ if isinstance(file_obj, str):
750
+ paths.append(file_obj)
751
+ elif isinstance(file_obj, dict) and file_obj.get("path"):
752
+ paths.append(str(file_obj["path"]))
753
+ elif hasattr(file_obj, "name"):
754
+ paths.append(str(file_obj.name))
755
+ elif hasattr(file_obj, "path"):
756
+ paths.append(str(file_obj.path))
757
+ return [path for path in paths if Path(path).exists()]
758
+
759
+
760
+ def image_to_data_url(path: str) -> str | None:
761
+ file_path = Path(path)
762
+ if not file_path.exists() or file_path.stat().st_size > MAX_IMAGE_BYTES:
763
+ return None
764
+
765
+ mime_type, _ = mimetypes.guess_type(file_path.name)
766
+ if mime_type not in {"image/png", "image/jpeg", "image/webp"}:
767
+ return None
768
+
769
+ encoded = base64.b64encode(file_path.read_bytes()).decode("ascii")
770
+ return f"data:{mime_type};base64,{encoded}"
771
+
772
+
773
+ def run_modal_bridge(
774
+ files: list[FileDiff],
775
+ findings: list[Finding],
776
+ enabled: bool,
777
+ ) -> str:
778
+ if not enabled:
779
+ return "Modal bridge disabled."
780
+
781
+ if not MODAL_ENDPOINT:
782
+ return "Modal bridge ready, but `DIFFSENSE_MODAL_ENDPOINT` is not configured as a Space secret."
783
+
784
+ payload = json.dumps(
785
+ {
786
+ "context": compact_review_context(files, findings, max_chars=12000),
787
+ "findings": [finding_to_dict(item) for item in findings],
788
+ "models": {
789
+ "mellum": MELLUM_MODEL,
790
+ "nemotron": NEMOTRON_MODEL,
791
+ "minicpm": MINICPM_MODEL,
792
+ },
793
+ }
794
+ ).encode("utf-8")
795
+ request = Request(
796
+ MODAL_ENDPOINT,
797
+ data=payload,
798
+ headers={"Content-Type": "application/json", "User-Agent": "DiffSense/1.0"},
799
+ method="POST",
800
+ )
801
+ try:
802
+ with urlopen(request, timeout=20) as response:
803
+ body = response.read(20_000).decode("utf-8", errors="replace")
804
+ return f"Modal endpoint `{MODAL_ENDPOINT}` responded:\n\n```json\n{body}\n```"
805
+ except Exception as exc:
806
+ return f"Modal bridge attempted `{MODAL_ENDPOINT}` but failed: {exc}"
807
+
808
+
809
  def summarize_deterministic(files: list[FileDiff], findings: list[Finding], prefix: str) -> str:
810
  hunk_count = sum(len(file.hunks) for file in files)
811
  counts = {
 
905
  def run_review(
906
  diff_input: str,
907
  use_model_summary: bool,
908
+ use_nemotron_router: bool,
909
+ use_tiny_titan: bool,
910
+ use_minicpm_vision: bool,
911
+ use_modal_bridge: bool,
912
+ image_files: list[Any] | None,
913
  hf_token: gr.OAuthToken | None = None,
914
+ ) -> tuple[str, list[dict[str, Any]], str, str]:
915
  diff_text = normalize_diff(diff_input)
916
  if not diff_text:
917
  raise gr.Error("Paste a unified diff first, or load the sample diff.")
 
921
  raise gr.Error("I could not find unified diff hunks. Look for lines starting with @@.")
922
 
923
  findings = review_diff(files)
924
+ token = hf_token.token if hf_token else os.getenv("HF_TOKEN")
925
  summary = summarize_with_model(files, findings, use_model_summary, hf_token)
926
+ nemotron_notes = run_nemotron_router(files, findings, use_nemotron_router, token)
927
+ tiny_titan_notes = run_tiny_titan_checker(files, findings, use_tiny_titan, token)
928
+ minicpm_notes = run_minicpm_vision(image_files, files, findings, use_minicpm_vision, token)
929
+ modal_notes = run_modal_bridge(files, findings, use_modal_bridge)
930
+ agent_trace = render_agent_trace(nemotron_notes, tiny_titan_notes, minicpm_notes, modal_notes)
931
+ return render_review(files, findings), [finding_to_dict(item) for item in findings], summary, agent_trace
932
+
933
+
934
+ def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes: str, modal_notes: str) -> str:
935
+ return "\n\n".join(
936
+ [
937
+ "### Nemotron 3 Nano Router",
938
+ nemotron_notes,
939
+ "### Tiny Titan 4B Checker",
940
+ tiny_titan_notes,
941
+ "### MiniCPM-V 4.6 Vision Context",
942
+ minicpm_notes,
943
+ "### Modal Provider Bridge",
944
+ modal_notes,
945
+ ]
946
+ )
947
 
948
 
949
  def load_sample() -> str:
 
970
  label="Add optional Mellum model summary",
971
  info="Deterministic review works without network or GPU. OAuth/HF_TOKEN enables the sponsor-model summary.",
972
  )
973
+ use_nemotron_router = gr.Checkbox(
974
+ value=False,
975
+ label="Run Nemotron 3 Nano router",
976
+ info=f"Uses {NEMOTRON_MODEL} when OAuth/HF_TOKEN is available.",
977
+ )
978
+ use_tiny_titan = gr.Checkbox(
979
+ value=False,
980
+ label="Run Tiny Titan 4B checker",
981
+ info=f"Uses {TINY_TITAN_MODEL} when OAuth/HF_TOKEN is available.",
982
+ )
983
+ use_minicpm_vision = gr.Checkbox(
984
+ value=False,
985
+ label="Run MiniCPM-V 4.6 vision",
986
+ info=f"Uses {MINICPM_MODEL} on uploaded PR images.",
987
+ )
988
+ use_modal_bridge = gr.Checkbox(
989
+ value=False,
990
+ label="Send payload to Modal bridge",
991
+ info="Uses DIFFSENSE_MODAL_ENDPOINT when configured.",
992
+ )
993
  sample_btn = gr.Button("Load sample diff")
994
 
995
  with gr.Row(equal_height=False):
 
1002
  placeholder="Paste a unified diff, paste https://github.com/org/repo/pull/123, or click Load sample diff.",
1003
  interactive=True,
1004
  )
1005
+ image_files = gr.File(
1006
+ label="PR screenshots or diagrams for MiniCPM-V",
1007
+ file_count="multiple",
1008
+ file_types=["image"],
1009
+ )
1010
  run_btn = gr.Button("Review diff", variant="primary")
1011
  with gr.Column(scale=4):
1012
  summary_output = gr.Markdown(
1013
  value="Run a review to get the risk summary.",
1014
  label="Reviewer summary",
1015
  )
1016
+ agent_output = gr.Markdown(
1017
+ value="Enable Nemotron or MiniCPM-V to see model-agent traces here.",
1018
+ label="Model agent trace",
1019
+ )
1020
  json_output = gr.JSON(label="Structured findings")
1021
 
1022
  review_output = gr.HTML(
 
1027
  sample_btn.click(fn=load_sample, outputs=diff_input)
1028
  run_btn.click(
1029
  fn=run_review,
1030
+ inputs=[
1031
+ diff_input,
1032
+ use_model_summary,
1033
+ use_nemotron_router,
1034
+ use_tiny_titan,
1035
+ use_minicpm_vision,
1036
+ use_modal_bridge,
1037
+ image_files,
1038
+ ],
1039
+ outputs=[review_output, json_output, summary_output, agent_output],
1040
  )
1041
 
1042