Spaces:

build-small-hackathon
/

DiffSense

Runtime error

App Files Files Community

avaliev Codex commited on 14 days ago

Commit

2616e64

1 Parent(s): a5cb79c

Integrate sponsor model agents

Browse files

Co-authored-by: Codex <codex@openai.com>

Files changed (3) hide show

README.md +20 -4
TECH_DESIGN.md +49 -5
app.py +299 -14

README.md CHANGED Viewed

@@ -22,15 +22,22 @@ tags:
   - best-agent
   - off-brand
   - best-demo
 models:
   - JetBrains/Mellum-2-12B-instruct
 ---
 # DiffSense
 Private, offline-first pull request review for teams that cannot send proprietary code to cloud review bots.
-Paste a unified diff or a public GitHub PR URL and DiffSense returns severity-tagged findings, inline comments, and structured JSON that can be copied into a PR review. The prototype works without a GPU by using deterministic review rules, then optionally adds a small-model summary through Hugging Face OAuth.
 ## Why We Built It
@@ -44,6 +51,10 @@ DiffSense is the small-model version of that workflow: useful immediately, inspe
 - Inline custom diff viewer built in Gradio.
 - Deterministic review findings for security, logic, maintainability, and test risks.
 - Public GitHub PR URL fetching through the PR `.diff` endpoint.
 - Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
 - Optional model-assisted summary using `JetBrains/Mellum-2-12B-instruct` through the Hugging Face Inference API when OAuth is available.
@@ -57,6 +68,10 @@ Prize/badge targets:
 - Best Agent: the product is structured as a review pipeline: parse, classify, review, summarize, render.
 - Off Brand: the app uses a custom Gradio interface instead of the default chat UI.
 - Best Demo: the workflow is easy to show in under two minutes with a real risky diff.
 ## Planned Model Stack
@@ -66,9 +81,10 @@ All planned models are under the Build Small 32B parameter cap.
 | --- | --- | --- |
 | Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook implemented |
 | Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
-| Agentic routing | NVIDIA Nemotron 3 Nano | Planned extension, not submitted as current eligibility |
-| Visual PR context | OpenBMB MiniCPM-V 4.6 | Planned extension, not submitted as current eligibility |
-| Runtime | Modal | Planned extension, not submitted as current eligibility |
 The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.

   - best-agent
   - off-brand
   - best-demo
+  - best-minicpm-build
+  - nemotron-hardware-prize
+  - best-use-of-modal
+  - tiny-titan
 models:
   - JetBrains/Mellum-2-12B-instruct
+  - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
+  - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
+  - openbmb/MiniCPM-V-4.6
 ---
 # DiffSense
 Private, offline-first pull request review for teams that cannot send proprietary code to cloud review bots.
+Paste a unified diff or a public GitHub PR URL and DiffSense returns severity-tagged findings, inline comments, and structured JSON that can be copied into a PR review. The prototype works without a GPU by using deterministic review rules, then optionally adds Mellum, Nemotron, MiniCPM-V, and Modal provider passes when credentials or endpoints are available.
 ## Why We Built It
 - Inline custom diff viewer built in Gradio.
 - Deterministic review findings for security, logic, maintainability, and test risks.
 - Public GitHub PR URL fetching through the PR `.diff` endpoint.
+- Optional Nemotron 3 Nano routing/triage pass.
+- Optional Tiny Titan 4B checker pass.
+- Optional MiniCPM-V 4.6 vision pass for PR screenshots, architecture diagrams, and UI diffs.
+- Optional Modal bridge through `DIFFSENSE_MODAL_ENDPOINT`.
 - Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
 - Optional model-assisted summary using `JetBrains/Mellum-2-12B-instruct` through the Hugging Face Inference API when OAuth is available.
 - Best Agent: the product is structured as a review pipeline: parse, classify, review, summarize, render.
 - Off Brand: the app uses a custom Gradio interface instead of the default chat UI.
 - Best Demo: the workflow is easy to show in under two minutes with a real risky diff.
+- Best MiniCPM Build: MiniCPM-V 4.6 is integrated for optional image/diagram context.
+- Nemotron Hardware Prize: Nemotron 3 Nano is integrated for optional agentic routing.
+- Best Use of Modal: the app includes a provider bridge for a Modal-hosted review endpoint via `DIFFSENSE_MODAL_ENDPOINT`.
+- Tiny Titan: a <=4B Nemotron 3 Nano checker is integrated as a separate optional pass.
 ## Planned Model Stack
 | --- | --- | --- |
 | Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook implemented |
 | Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
+| Agentic routing | NVIDIA Nemotron 3 Nano | Optional HF inference hook implemented |
+| Tiny checker | NVIDIA Nemotron 3 Nano 4B | Optional HF inference hook implemented |
+| Visual PR context | OpenBMB MiniCPM-V 4.6 | Optional image upload + HF inference hook implemented |
+| Runtime | Modal | Optional provider bridge via `DIFFSENSE_MODAL_ENDPOINT` implemented |
 The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.

TECH_DESIGN.md CHANGED Viewed

@@ -15,6 +15,10 @@ Unified diff input or public GitHub PR URL
   -> structured findings
   -> custom Gradio HTML diff viewer
   -> optional Mellum 2 summary via HF OAuth
 ```
 ## Components
@@ -28,6 +32,8 @@ File: `app.py`
 - Accepts pasted unified diffs and public GitHub PR URLs.
 - Renders an inline diff view with file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes.
 - Shows structured JSON for automation and judge inspection.
 ### Diff Parser
@@ -81,11 +87,45 @@ JetBrains/Mellum-2-12B-instruct
 The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
 ## Hackathon Fit
 Required criteria:
-- Under 32B: current optional model target is 12B; planned sponsor models are also under 32B.
 - Gradio app: implemented in `app.py`.
 - README tags: included in `README.md` front matter.
 - Demo-friendly: built-in sample diff produces multiple clear findings without setup.
@@ -97,15 +137,18 @@ Prize positioning:
 - Best Agent: staged review pipeline with parsing, classification, review, and summary.
 - Off Brand: custom HTML diff UI instead of stock chat.
 - Best Demo: one-click sample with visible before/after review value.
 ## Planned Extensions
 These should only be added after the current app is deployed and recorded:
-1. Add Modal endpoint for open-weight Mellum inference.
-2. Add MiniCPM-V image upload for PR screenshots and architecture diagrams.
-3. Add Nemotron router only if there is enough time to make it real and visible.
-4. Generate patch suggestions as downloadable `.patch` files.
 ## Risk Controls
@@ -114,3 +157,4 @@ These should only be added after the current app is deployed and recorded:
 - No pasted diff is sent externally unless the user explicitly enables the model summary.
 - Public PR URLs are fetched as public `.diff` documents; private code should be pasted only when the model summary is off.
 - The sample diff demonstrates value even during GPU/API outages.

   -> structured findings
   -> custom Gradio HTML diff viewer
   -> optional Mellum 2 summary via HF OAuth
+  -> optional Nemotron 3 Nano routing via HF OAuth
+  -> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
+  -> optional MiniCPM-V 4.6 vision notes via HF OAuth
+  -> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT
 ```
 ## Components
 - Accepts pasted unified diffs and public GitHub PR URLs.
 - Renders an inline diff view with file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes.
 - Shows structured JSON for automation and judge inspection.
+- Exposes model/provider toggles for Mellum, Nemotron, Tiny Titan, MiniCPM-V, and Modal.
+- Accepts PR screenshots or diagrams for the MiniCPM-V vision pass.
 ### Diff Parser
 The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
+### Optional Nemotron Router
+When enabled, the app calls:
+```text
+nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
+```
+Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.
+### Optional Tiny Titan Checker
+When enabled, the app calls a <=4B model:
+```text
+nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
+```
+This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.
+### Optional MiniCPM-V Vision Pass
+When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:
+```text
+openbmb/MiniCPM-V-4.6
+```
+This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.
+### Optional Modal Bridge
+When `DIFFSENSE_MODAL_ENDPOINT` is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.
 ## Hackathon Fit
 Required criteria:
+- Under 32B: Mellum, Nemotron 3 Nano 30B-A3B, Nemotron 3 Nano 4B, and MiniCPM-V 4.6 are all within the hackathon model-size constraint.
 - Gradio app: implemented in `app.py`.
 - README tags: included in `README.md` front matter.
 - Demo-friendly: built-in sample diff produces multiple clear findings without setup.
 - Best Agent: staged review pipeline with parsing, classification, review, and summary.
 - Off Brand: custom HTML diff UI instead of stock chat.
 - Best Demo: one-click sample with visible before/after review value.
+- Best MiniCPM Build: MiniCPM-V 4.6 image/diagram context path is implemented.
+- Nemotron Hardware Prize: Nemotron 3 Nano routing path is implemented.
+- Best Use of Modal: Modal endpoint bridge is implemented and controlled through a Space secret.
+- Tiny Titan: Nemotron 3 Nano 4B checker path is implemented.
 ## Planned Extensions
 These should only be added after the current app is deployed and recorded:
+1. Add a hosted Modal endpoint and set `DIFFSENSE_MODAL_ENDPOINT`.
+2. Add downloadable `.patch` files for suggested fixes.
+3. Add richer multimodal demo assets for the MiniCPM-V path.
 ## Risk Controls
 - No pasted diff is sent externally unless the user explicitly enables the model summary.
 - Public PR URLs are fetched as public `.diff` documents; private code should be pasted only when the model summary is off.
 - The sample diff demonstrates value even during GPU/API outages.
+- Model/provider failures are rendered as agent trace notes rather than hard app failures.

app.py CHANGED Viewed

@@ -2,9 +2,12 @@ from __future__ import annotations
 import html
 import json
 import os
 import re
 from dataclasses import dataclass, field
 from typing import Any
 from urllib.parse import urlparse
 from urllib.request import Request, urlopen
@@ -13,8 +16,13 @@ import gradio as gr
 from huggingface_hub import InferenceClient
-DEFAULT_MODEL = os.getenv("DIFFSENSE_MODEL", "JetBrains/Mellum-2-12B-instruct")
 FETCH_TIMEOUT_SECONDS = 10
 CSS = """
@@ -568,22 +576,236 @@ def summarize_with_model(
     ]
     try:
-        client = InferenceClient(token=token, model=DEFAULT_MODEL)
-        response = client.chat_completion(
-            messages=messages,
-            max_tokens=320,
-            temperature=0.2,
-            top_p=0.9,
-        )
-        return response.choices[0].message.content or "Model returned an empty summary."
     except Exception as exc:  # The app must stay demoable when endpoints are unavailable.
         return summarize_deterministic(
             files,
             findings,
-            prefix=f"Model summary unavailable from {DEFAULT_MODEL}: {exc}",
         )
 def summarize_deterministic(files: list[FileDiff], findings: list[Finding], prefix: str) -> str:
     hunk_count = sum(len(file.hunks) for file in files)
     counts = {
@@ -683,8 +905,13 @@ def render_finding(finding: Finding) -> str:
 def run_review(
     diff_input: str,
     use_model_summary: bool,
     hf_token: gr.OAuthToken | None = None,
-) -> tuple[str, list[dict[str, Any]], str]:
     diff_text = normalize_diff(diff_input)
     if not diff_text:
         raise gr.Error("Paste a unified diff first, or load the sample diff.")
@@ -694,8 +921,29 @@ def run_review(
         raise gr.Error("I could not find unified diff hunks. Look for lines starting with @@.")
     findings = review_diff(files)
     summary = summarize_with_model(files, findings, use_model_summary, hf_token)
-    return render_review(files, findings), [finding_to_dict(item) for item in findings], summary
 def load_sample() -> str:
@@ -722,6 +970,26 @@ with gr.Blocks() as demo:
             label="Add optional Mellum model summary",
             info="Deterministic review works without network or GPU. OAuth/HF_TOKEN enables the sponsor-model summary.",
         )
         sample_btn = gr.Button("Load sample diff")
     with gr.Row(equal_height=False):
@@ -734,12 +1002,21 @@ with gr.Blocks() as demo:
                 placeholder="Paste a unified diff, paste https://github.com/org/repo/pull/123, or click Load sample diff.",
                 interactive=True,
             )
             run_btn = gr.Button("Review diff", variant="primary")
         with gr.Column(scale=4):
             summary_output = gr.Markdown(
                 value="Run a review to get the risk summary.",
                 label="Reviewer summary",
             )
             json_output = gr.JSON(label="Structured findings")
     review_output = gr.HTML(
@@ -750,8 +1027,16 @@ with gr.Blocks() as demo:
     sample_btn.click(fn=load_sample, outputs=diff_input)
     run_btn.click(
         fn=run_review,
-        inputs=[diff_input, use_model_summary],
-        outputs=[review_output, json_output, summary_output],
     )

 import html
 import json
+import base64
+import mimetypes
 import os
 import re
 from dataclasses import dataclass, field
+from pathlib import Path
 from typing import Any
 from urllib.parse import urlparse
 from urllib.request import Request, urlopen
 from huggingface_hub import InferenceClient
+MELLUM_MODEL = os.getenv("DIFFSENSE_MELLUM_MODEL", "JetBrains/Mellum-2-12B-instruct")
+NEMOTRON_MODEL = os.getenv("DIFFSENSE_NEMOTRON_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")
+TINY_TITAN_MODEL = os.getenv("DIFFSENSE_TINY_TITAN_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16")
+MINICPM_MODEL = os.getenv("DIFFSENSE_MINICPM_MODEL", "openbmb/MiniCPM-V-4.6")
+MODAL_ENDPOINT = os.getenv("DIFFSENSE_MODAL_ENDPOINT", "")
 FETCH_TIMEOUT_SECONDS = 10
+MAX_IMAGE_BYTES = 2_500_000
 CSS = """
     ]
     try:
+        return call_chat_model(MELLUM_MODEL, messages, token, max_tokens=320)
     except Exception as exc:  # The app must stay demoable when endpoints are unavailable.
         return summarize_deterministic(
             files,
             findings,
+            prefix=f"Model summary unavailable from {MELLUM_MODEL}: {exc}",
+        )
+def call_chat_model(
+    model: str,
+    messages: list[dict[str, Any]],
+    token: str,
+    max_tokens: int = 320,
+    temperature: float = 0.2,
+) -> str:
+    client = InferenceClient(token=token, model=model)
+    response = client.chat_completion(
+        messages=messages,
+        max_tokens=max_tokens,
+        temperature=temperature,
+        top_p=0.9,
+    )
+    return response.choices[0].message.content or f"{model} returned an empty response."
+def compact_review_context(files: list[FileDiff], findings: list[Finding], max_chars: int = 9000) -> str:
+    diff_excerpt = "\n".join(
+        f"{file.path}\n"
+        + "\n".join(
+            f"{hunk.header}\n"
+            + "\n".join(
+                f"{'+' if line.kind == 'add' else '-' if line.kind == 'del' else ' '} {line.text}"
+                for line in hunk.lines[:80]
+            )
+            for hunk in file.hunks[:4]
+        )
+        for file in files[:6]
+    )
+    deterministic = json.dumps([finding_to_dict(item) for item in findings[:15]], indent=2)
+    return f"Deterministic findings:\n{deterministic}\n\nDiff excerpt:\n{diff_excerpt}"[:max_chars]
+def run_nemotron_router(
+    files: list[FileDiff],
+    findings: list[Finding],
+    enabled: bool,
+    token: str | None,
+) -> str:
+    if not enabled:
+        return f"Nemotron router disabled. Model configured: `{NEMOTRON_MODEL}`."
+    if not token:
+        return f"Nemotron router ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{NEMOTRON_MODEL}`."
+    messages = [
+        {
+            "role": "system",
+            "content": (
+                "You are the DiffSense routing agent. Prioritize code review findings for a PR reviewer. "
+                "Return a concise markdown triage plan with: merge risk, files to inspect first, and follow-up tests."
+            ),
+        },
+        {"role": "user", "content": compact_review_context(files, findings)},
+    ]
+    try:
+        return call_chat_model(NEMOTRON_MODEL, messages, token, max_tokens=360)
+    except Exception as exc:
+        return (
+            f"Nemotron router attempted `{NEMOTRON_MODEL}` but the endpoint was unavailable: {exc}\n\n"
+            + deterministic_router_fallback(files, findings)
+        )
+def deterministic_router_fallback(files: list[FileDiff], findings: list[Finding]) -> str:
+    high_risk = [item for item in findings if item.severity == "critical"]
+    risk = "high" if high_risk else "medium" if findings else "low"
+    hot_files = []
+    for finding in findings:
+        if finding.file not in hot_files:
+            hot_files.append(finding.file)
+    bullets = [
+        f"Deterministic router fallback: merge risk is **{risk}**.",
+        f"Inspect first: {', '.join(hot_files[:4]) if hot_files else 'no risky files detected'}.",
+        "Follow-up tests: cover changed auth/security paths and empty-input branches before merge.",
+    ]
+    return "\n".join(f"- {item}" for item in bullets)
+def run_tiny_titan_checker(
+    files: list[FileDiff],
+    findings: list[Finding],
+    enabled: bool,
+    token: str | None,
+) -> str:
+    if not enabled:
+        return f"Tiny Titan checker disabled. Model configured: `{TINY_TITAN_MODEL}`."
+    if not token:
+        return f"Tiny Titan checker ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{TINY_TITAN_MODEL}`."
+    messages = [
+        {
+            "role": "system",
+            "content": (
+                "You are a compact <=4B code-review sanity checker. Given deterministic PR findings, "
+                "return exactly three bullets: one missed-risk hypothesis, one test recommendation, and one merge decision."
+            ),
+        },
+        {"role": "user", "content": compact_review_context(files, findings, max_chars=7000)},
+    ]
+    try:
+        return call_chat_model(TINY_TITAN_MODEL, messages, token, max_tokens=260)
+    except Exception as exc:
+        return f"Tiny Titan checker attempted `{TINY_TITAN_MODEL}` but the endpoint was unavailable: {exc}"
+def run_minicpm_vision(
+    image_files: list[Any] | None,
+    files: list[FileDiff],
+    findings: list[Finding],
+    enabled: bool,
+    token: str | None,
+) -> str:
+    images = normalize_uploaded_files(image_files)
+    if not images:
+        return f"MiniCPM-V vision not used: no PR screenshots or diagrams uploaded. Model configured: `{MINICPM_MODEL}`."
+    if not enabled:
+        return f"MiniCPM-V vision disabled with {len(images)} image(s) attached. Model configured: `{MINICPM_MODEL}`."
+    if not token:
+        return (
+            f"MiniCPM-V vision ready with {len(images)} image(s), but no Hugging Face OAuth/HF_TOKEN is available. "
+            f"Model configured: `{MINICPM_MODEL}`."
+        )
+    prompt = (
+        "You are DiffSense vision context. Read these PR screenshots, UI diffs, or architecture diagrams. "
+        "Return concise markdown notes that could affect code review: changed behavior, missing tests, security risks, "
+        "or inconsistencies with the code diff.\n\n"
+        + compact_review_context(files, findings, max_chars=3500)
+    )
+    content: list[dict[str, Any]] = [{"type": "text", "text": prompt}]
+    skipped = 0
+    for path in images[:3]:
+        data_url = image_to_data_url(path)
+        if data_url:
+            content.append({"type": "image_url", "image_url": {"url": data_url}})
+        else:
+            skipped += 1
+    if len(content) == 1:
+        return f"MiniCPM-V vision could not read the uploaded image files. {skipped} file(s) were skipped."
+    messages = [{"role": "user", "content": content}]
+    try:
+        return call_chat_model(MINICPM_MODEL, messages, token, max_tokens=420)
+    except Exception as exc:
+        return (
+            f"MiniCPM-V attempted `{MINICPM_MODEL}` on {len(content) - 1} image(s), "
+            f"but the endpoint was unavailable: {exc}"
         )
+def normalize_uploaded_files(image_files: list[Any] | None) -> list[str]:
+    if not image_files:
+        return []
+    paths: list[str] = []
+    for file_obj in image_files:
+        if isinstance(file_obj, str):
+            paths.append(file_obj)
+        elif isinstance(file_obj, dict) and file_obj.get("path"):
+            paths.append(str(file_obj["path"]))
+        elif hasattr(file_obj, "name"):
+            paths.append(str(file_obj.name))
+        elif hasattr(file_obj, "path"):
+            paths.append(str(file_obj.path))
+    return [path for path in paths if Path(path).exists()]
+def image_to_data_url(path: str) -> str | None:
+    file_path = Path(path)
+    if not file_path.exists() or file_path.stat().st_size > MAX_IMAGE_BYTES:
+        return None
+    mime_type, _ = mimetypes.guess_type(file_path.name)
+    if mime_type not in {"image/png", "image/jpeg", "image/webp"}:
+        return None
+    encoded = base64.b64encode(file_path.read_bytes()).decode("ascii")
+    return f"data:{mime_type};base64,{encoded}"
+def run_modal_bridge(
+    files: list[FileDiff],
+    findings: list[Finding],
+    enabled: bool,
+) -> str:
+    if not enabled:
+        return "Modal bridge disabled."
+    if not MODAL_ENDPOINT:
+        return "Modal bridge ready, but `DIFFSENSE_MODAL_ENDPOINT` is not configured as a Space secret."
+    payload = json.dumps(
+        {
+            "context": compact_review_context(files, findings, max_chars=12000),
+            "findings": [finding_to_dict(item) for item in findings],
+            "models": {
+                "mellum": MELLUM_MODEL,
+                "nemotron": NEMOTRON_MODEL,
+                "minicpm": MINICPM_MODEL,
+            },
+        }
+    ).encode("utf-8")
+    request = Request(
+        MODAL_ENDPOINT,
+        data=payload,
+        headers={"Content-Type": "application/json", "User-Agent": "DiffSense/1.0"},
+        method="POST",
+    )
+    try:
+        with urlopen(request, timeout=20) as response:
+            body = response.read(20_000).decode("utf-8", errors="replace")
+        return f"Modal endpoint `{MODAL_ENDPOINT}` responded:\n\n```json\n{body}\n```"
+    except Exception as exc:
+        return f"Modal bridge attempted `{MODAL_ENDPOINT}` but failed: {exc}"
 def summarize_deterministic(files: list[FileDiff], findings: list[Finding], prefix: str) -> str:
     hunk_count = sum(len(file.hunks) for file in files)
     counts = {
 def run_review(
     diff_input: str,
     use_model_summary: bool,
+    use_nemotron_router: bool,
+    use_tiny_titan: bool,
+    use_minicpm_vision: bool,
+    use_modal_bridge: bool,
+    image_files: list[Any] | None,
     hf_token: gr.OAuthToken | None = None,
+) -> tuple[str, list[dict[str, Any]], str, str]:
     diff_text = normalize_diff(diff_input)
     if not diff_text:
         raise gr.Error("Paste a unified diff first, or load the sample diff.")
         raise gr.Error("I could not find unified diff hunks. Look for lines starting with @@.")
     findings = review_diff(files)
+    token = hf_token.token if hf_token else os.getenv("HF_TOKEN")
     summary = summarize_with_model(files, findings, use_model_summary, hf_token)
+    nemotron_notes = run_nemotron_router(files, findings, use_nemotron_router, token)
+    tiny_titan_notes = run_tiny_titan_checker(files, findings, use_tiny_titan, token)
+    minicpm_notes = run_minicpm_vision(image_files, files, findings, use_minicpm_vision, token)
+    modal_notes = run_modal_bridge(files, findings, use_modal_bridge)
+    agent_trace = render_agent_trace(nemotron_notes, tiny_titan_notes, minicpm_notes, modal_notes)
+    return render_review(files, findings), [finding_to_dict(item) for item in findings], summary, agent_trace
+def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes: str, modal_notes: str) -> str:
+    return "\n\n".join(
+        [
+            "### Nemotron 3 Nano Router",
+            nemotron_notes,
+            "### Tiny Titan 4B Checker",
+            tiny_titan_notes,
+            "### MiniCPM-V 4.6 Vision Context",
+            minicpm_notes,
+            "### Modal Provider Bridge",
+            modal_notes,
+        ]
+    )
 def load_sample() -> str:
             label="Add optional Mellum model summary",
             info="Deterministic review works without network or GPU. OAuth/HF_TOKEN enables the sponsor-model summary.",
         )
+        use_nemotron_router = gr.Checkbox(
+            value=False,
+            label="Run Nemotron 3 Nano router",
+            info=f"Uses {NEMOTRON_MODEL} when OAuth/HF_TOKEN is available.",
+        )
+        use_tiny_titan = gr.Checkbox(
+            value=False,
+            label="Run Tiny Titan 4B checker",
+            info=f"Uses {TINY_TITAN_MODEL} when OAuth/HF_TOKEN is available.",
+        )
+        use_minicpm_vision = gr.Checkbox(
+            value=False,
+            label="Run MiniCPM-V 4.6 vision",
+            info=f"Uses {MINICPM_MODEL} on uploaded PR images.",
+        )
+        use_modal_bridge = gr.Checkbox(
+            value=False,
+            label="Send payload to Modal bridge",
+            info="Uses DIFFSENSE_MODAL_ENDPOINT when configured.",
+        )
         sample_btn = gr.Button("Load sample diff")
     with gr.Row(equal_height=False):
                 placeholder="Paste a unified diff, paste https://github.com/org/repo/pull/123, or click Load sample diff.",
                 interactive=True,
             )
+            image_files = gr.File(
+                label="PR screenshots or diagrams for MiniCPM-V",
+                file_count="multiple",
+                file_types=["image"],
+            )
             run_btn = gr.Button("Review diff", variant="primary")
         with gr.Column(scale=4):
             summary_output = gr.Markdown(
                 value="Run a review to get the risk summary.",
                 label="Reviewer summary",
             )
+            agent_output = gr.Markdown(
+                value="Enable Nemotron or MiniCPM-V to see model-agent traces here.",
+                label="Model agent trace",
+            )
             json_output = gr.JSON(label="Structured findings")
     review_output = gr.HTML(
     sample_btn.click(fn=load_sample, outputs=diff_input)
     run_btn.click(
         fn=run_review,
+        inputs=[
+            diff_input,
+            use_model_summary,
+            use_nemotron_router,
+            use_tiny_titan,
+            use_minicpm_vision,
+            use_modal_bridge,
+            image_files,
+        ],
+        outputs=[review_output, json_output, summary_output, agent_output],
     )