DiffSense / TECH_DESIGN.md
avaliev's picture
Add local model runtime status
7eb80e6
|
Raw
History Blame Contribute Delete
7.42 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

DiffSense Technical Design

Goal

Build a useful, demoable, privacy-first pull request reviewer for the Build Small hackathon. The app must work reliably inside a Gradio Space and stay eligible for the under-32B model constraint.

The implementation is intentionally offline-first: deterministic review rules provide the core value, and small-model inference is an optional enhancement rather than a single point of failure.

Current Shipped Prototype

Unified diff input or public GitHub PR URL
  -> stdlib diff parser
  -> deterministic review engine
  -> structured findings
  -> custom Gradio HTML diff viewer
  -> optional Mellum 2 summary via HF OAuth
  -> optional Nemotron 3 Nano routing via HF OAuth
  -> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
  -> optional MiniCPM-V 4.6 vision notes via HF OAuth
  -> optional local checkpoints from /data/models on ZeroGPU
  -> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT

Components

Gradio UI

File: app.py

  • Uses gr.Blocks instead of the default chatbot scaffold.
  • Provides a sample risky diff for a one-click demo.
  • Accepts pasted unified diffs and public GitHub PR URLs.
  • Renders an inline diff view with file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes.
  • Shows structured JSON for automation and judge inspection.
  • Exposes model/provider toggles for Mellum, Nemotron, Tiny Titan, MiniCPM-V, and Modal.
  • Accepts PR screenshots or diagrams for the MiniCPM-V vision pass.

Diff Parser

The input layer fetches public GitHub PR URLs through their .diff endpoint with a short timeout. Pasted diffs are handled entirely in-process.

The parser handles standard unified diffs:

  • diff --git file boundaries.
  • +++ b/path file names.
  • @@ -old,+new @@ hunk headers.
  • Added, removed, and context lines with old/new line numbers.

No external parser is required, which keeps startup fast and dependency risk low.

Review Engine

The deterministic engine checks added lines for high-signal review risks:

  • Hardcoded credentials.
  • Disabled verification such as TLS or JWT signature checks.
  • Unsafe deserialization with pickle.
  • Dynamic execution through eval or exec.
  • shell=True command execution.
  • SQL string interpolation.
  • Bare except:.
  • Temporary TODO, FIXME, or HACK markers.
  • Return-contract changes such as newly introduced return None.

Each finding includes:

{
  "file": "src/auth.py",
  "hunk": "@@ -1,9 +1,13 @@",
  "line": 11,
  "severity": "critical",
  "category": "security",
  "comment": "The change disables a verification check, which can turn a trusted boundary into a bypass.",
  "suggestion": "Keep verification enabled and add a narrowly scoped test fixture for local development.",
  "source": "deterministic"
}

Optional Model Summary

When enabled, the app uses the signed-in Hugging Face OAuth token or HF_TOKEN through the Hugging Face Inference API to call:

JetBrains/Mellum2-12B-A2.5B-Instruct

The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.

If /data/models/mellum2-instruct/config.json exists, the app prefers that local checkpoint path before calling the hosted provider.

Optional Nemotron Router

When enabled, the app calls:

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.

If /data/models/nemotron-3-nano-30b-a3b/config.json exists, the app treats the local checkpoint as the preferred runtime path.

Optional Tiny Titan Checker

When enabled, the app calls a <=4B model:

nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16

This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.

If /data/models/nemotron-3-nano-4b/config.json exists, the app treats the local checkpoint as the preferred runtime path.

Optional MiniCPM-V Vision Pass

When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:

openbmb/MiniCPM-V-4.6

This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.

If /data/models/minicpm-v-4.6/config.json exists, the app reports the local MiniCPM-V checkpoint as ready and keeps the image ingestion path available for a custom local loader.

ZeroGPU Bucket Mount

The Space has a read/write bucket mounted at /data. DiffSense checks the following model checkpoint locations at runtime and includes their status in the model-agent trace:

/data/models/mellum2-instruct
/data/models/nemotron-3-nano-30b-a3b
/data/models/nemotron-3-nano-4b
/data/models/minicpm-v-4.6

This keeps the app repo small while making the model integration path explicit for the hackathon badges. Hosted provider failures are converted into concise status notes rather than raw request errors.

Optional Modal Bridge

When DIFFSENSE_MODAL_ENDPOINT is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.

Hackathon Fit

Required criteria:

  • Under 32B: Mellum, Nemotron 3 Nano 30B-A3B, Nemotron 3 Nano 4B, and MiniCPM-V 4.6 are all within the hackathon model-size constraint.
  • Gradio app: implemented in app.py.
  • README tags: included in README.md front matter.
  • Demo-friendly: built-in sample diff produces multiple clear findings without setup.

Prize positioning:

  • Backyard AI: practical developer workflow.
  • Best Use of Codex: Codex is actively building and shaping the repo.
  • Best Agent: staged review pipeline with parsing, classification, review, and summary.
  • Off Brand: custom HTML diff UI instead of stock chat.
  • Best Demo: one-click sample with visible before/after review value.
  • Best MiniCPM Build: MiniCPM-V 4.6 image/diagram context path is implemented.
  • Nemotron Hardware Prize: Nemotron 3 Nano routing path is implemented.
  • Best Use of Modal: Modal endpoint bridge is implemented and controlled through a Space secret.
  • Tiny Titan: Nemotron 3 Nano 4B checker path is implemented.

Planned Extensions

These should only be added after the current app is deployed and recorded:

  1. Add a hosted Modal endpoint and set DIFFSENSE_MODAL_ENDPOINT.
  2. Add downloadable .patch files for suggested fixes.
  3. Add richer multimodal demo assets for the MiniCPM-V path.

Risk Controls

  • The app remains useful without model availability.
  • Dependencies are limited to Gradio and huggingface_hub.
  • No pasted diff is sent externally unless the user explicitly enables the model summary.
  • Public PR URLs are fetched as public .diff documents; private code should be pasted only when the model summary is off.
  • The sample diff demonstrates value even during GPU/API outages.
  • Model/provider failures are rendered as agent trace notes rather than hard app failures.