Spaces:

build-small-hackathon
/

DiffSense

Runtime error

App Files Files Community

avaliev commited on 15 days ago

Commit

4de614b

1 Parent(s): 8f3dee7

Add submission writing artifacts

Browse files

Files changed (4) hide show

DEMO_VIDEO_PITCH.md +39 -0
HF_TECH_PAPER.md +219 -0
LINKEDIN_POST.md +42 -0
README.md +6 -0

DEMO_VIDEO_PITCH.md ADDED Viewed

	@@ -0,0 +1,39 @@

+# DiffSense Demo Video Pitch
+## 20-Second Selling Pitch
+DiffSense is a private, local-first pull request reviewer for teams that cannot send proprietary code to cloud review bots. Paste a diff or public GitHub PR URL, click **Review diff**, and get inline severity-tagged findings plus structured JSON. The deterministic reviewer works immediately, while optional bridges for Mellum, Nemotron, Tiny Titan, MiniCPM-V, and Modal are ready for hosted providers or local checkpoints under `/data`.
+## 60-Second Demo Script
+Hi, this is DiffSense, our Build Small hackathon project.
+The problem is that AI code review is useful, but most review bots require sending private source code to a hosted SaaS. That does not work for security-sensitive teams, regulated teams, or unreleased products.
+DiffSense is a local-first alternative. On the left, I configure optional model passes: Mellum for summaries, Nemotron for routing, Tiny Titan for a lightweight checker, MiniCPM-V for screenshots and diagrams, and a Modal bridge for hosted inference.
+In the center, I paste a public GitHub PR URL or a unified diff. I can also attach PR screenshots or diagrams for the vision path. Now I click **Review diff**.
+The app fetches or parses the diff, runs a deterministic review engine, and returns a summary. The model runtime panel shows that `/data` is mounted and writable, with persistent checkpoint slots ready for local model weights.
+On the right, DiffSense renders the detailed review as an inline diff: file headers, hunk headers, line numbers, severity badges, comments, and suggested fixes. Under that, it exposes structured JSON so the output can be copied into PR automation.
+The key design choice is reliability. The deterministic review path always works, and model bridges enhance it when OAuth, provider routes, Modal, or local checkpoints are available.
+So the product is useful now, private by default, and ready for small-model local inference.
+## User Flow Checklist
+1. Show the title and tagline: private, offline-first PR review.
+2. Point to the sidebar model toggles.
+3. Paste or keep the public PR URL in the center input.
+4. Upload an image if you want to show the MiniCPM-V path.
+5. Click **Review diff**.
+6. Read the summary and model runtime status.
+7. Move to the right pane and show the inline review.
+8. Scroll to the structured JSON.
+9. Close with the privacy and reliability point.
+## One-Line Close
+DiffSense turns a diff into a review artifact, not a chat transcript: private by default, useful without a GPU, and ready for local small-model checkpoints.

HF_TECH_PAPER.md ADDED Viewed

	@@ -0,0 +1,219 @@

+# DiffSense: A Local-First Pull Request Reviewer Built During Build Small
+## Abstract
+DiffSense is a privacy-first pull request review assistant built for the Hugging Face Build Small hackathon. The app accepts either a unified diff or a public GitHub pull request URL, parses the changed files and hunks, runs a deterministic review engine for high-signal security and correctness risks, and renders the result as inline review comments with structured JSON output.
+The core design choice is simple: the app must remain useful even when hosted model providers are unavailable, cold, rate-limited, or missing a particular model route. DiffSense therefore treats deterministic review as the always-on path and model inference as an enhancement layer. It exposes bridge points for JetBrains Mellum 2, NVIDIA Nemotron 3 Nano, NVIDIA Nemotron 3 Nano 4B, OpenBMB MiniCPM-V 4.6, and Modal, while also preparing persistent local checkpoint slots under the Space bucket mounted at `/data`.
+## Motivation
+Code review is a daily workflow for engineering teams, but most AI review tools assume that source code can be sent to a third-party SaaS service. That assumption is often wrong. Teams working on customer data, unreleased products, internal APIs, regulated systems, or security-sensitive infrastructure may need review assistance without exporting private code.
+DiffSense is aimed at that gap. It is not trying to replace a human reviewer with a black-box chat interface. Instead, it turns a diff into a concrete review artifact:
+- severity-tagged findings,
+- per-file and per-hunk locations,
+- inline comments attached to changed lines,
+- actionable fix suggestions,
+- JSON output that can be copied into automation or a pull request workflow.
+The hackathon constraint shaped the product in a useful way. Rather than building a large hosted reviewer that only works when every model endpoint is healthy, we built a small, inspectable workflow that starts from deterministic analysis and adds model passes where they make the product better.
+## Product Experience
+The app is a Gradio Space with a three-part workspace:
+- The left sidebar configures model and provider passes.
+- The center pane accepts the diff or pull request URL, image uploads, and shows the summary/model trace after processing.
+- The right pane shows the detailed inline review and structured JSON.
+The user flow is intentionally short:
+1. Open the Space.
+2. Paste a unified diff or a public GitHub PR URL.
+3. Optionally upload PR screenshots, diagrams, or UI diffs.
+4. Click **Review diff**.
+5. Read inline comments and copy the structured JSON if needed.
+For public GitHub PRs, DiffSense appends `.diff` to the pull request URL and fetches the public unified diff with a short timeout. Pasted diffs stay inside the app process unless a model/provider pass is explicitly enabled.
+## Architecture
+```text
+Unified diff or public GitHub PR URL
+  -> normalize input
+  -> fetch public .diff when needed
+  -> parse unified diff into files, hunks, and changed lines
+  -> run deterministic review rules
+  -> optionally summarize with Mellum bridge
+  -> optionally route/triage with Nemotron bridge
+  -> optionally sanity-check with Tiny Titan bridge
+  -> optionally process uploaded images with MiniCPM-V bridge
+  -> optionally POST to Modal endpoint
+  -> render summary, agent trace, inline diff review, and JSON
+```
+The app is implemented in a single `app.py` file to keep the Space easy to inspect during judging. The key pieces are:
+- `normalize_diff`: accepts pasted diffs or public GitHub PR URLs.
+- `parse_unified_diff`: converts unified diff text into file/hunk/line dataclasses.
+- `review_diff`: applies deterministic code-review rules.
+- `summarize_with_model`: narrows the model role to summarizing known findings.
+- `run_nemotron_router`: produces routing/triage notes.
+- `run_tiny_titan_checker`: produces a compact <=4B sanity-check path.
+- `run_minicpm_vision`: accepts image uploads for PR screenshots and diagrams.
+- `render_review`: renders a custom HTML diff view with inline findings.
+- `render_agent_trace`: exposes model runtime and bridge status.
+## Deterministic Review Engine
+The deterministic path is the product's reliability layer. It parses added lines and checks for review risks that are common, high-signal, and easy to explain:
+- hardcoded credentials,
+- disabled TLS or JWT verification,
+- unsafe `pickle` deserialization,
+- dynamic execution via `eval` or `exec`,
+- `shell=True` subprocess calls,
+- SQL string interpolation,
+- bare `except:`,
+- temporary `TODO`, `FIXME`, or `HACK` markers,
+- return-contract changes such as newly introduced `return None`,
+- large behavior changes outside test files.
+Each finding is normalized into this shape:
+```json
+{
+  "file": "src/auth.py",
+  "hunk": "@@ -1,9 +1,13 @@",
+  "line": 11,
+  "severity": "critical",
+  "category": "security",
+  "comment": "The change disables a verification check, which can turn a trusted boundary into a bypass.",
+  "suggestion": "Keep verification enabled and add a narrowly scoped test fixture for local development.",
+  "source": "deterministic"
+}
+```
+This made the app demoable under time pressure. Even if all hosted inference routes fail, the reviewer still produces useful output.
+## Model and Provider Bridges
+DiffSense integrates the hackathon model stack as optional bridge points rather than hard dependencies.
+| Role | Model or Provider | Purpose |
+| --- | --- | --- |
+| Code summary | `JetBrains/Mellum2-12B-A2.5B-Instruct` | Summarize deterministic findings and diff risk |
+| Agentic routing | `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` | Triage changed files, merge risk, and follow-up tests |
+| Tiny checker | `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16` | <=4B lightweight review sanity check |
+| Visual context | `openbmb/MiniCPM-V-4.6` | PR screenshot, UI diff, and diagram context |
+| External runtime | Modal endpoint | Optional POST bridge via `DIFFSENSE_MODAL_ENDPOINT` |
+The model prompts are intentionally constrained. For example, Mellum is asked to summarize deterministic findings rather than invent findings from scratch. This keeps the output auditable and prevents the model layer from undermining the review engine.
+## Local Checkpoint Strategy
+The Space is configured with a read/write Hugging Face bucket mounted at `/data`. DiffSense creates and monitors these model slots:
+```text
+/data/models/mellum2-instruct
+/data/models/nemotron-3-nano-30b-a3b
+/data/models/nemotron-3-nano-4b
+/data/models/minicpm-v-4.6
+```
+Each slot is considered ready when it contains a `config.json`. Text-model bridge calls first check for local checkpoints before falling back to hosted Hugging Face Inference routes. This lets the app grow from a reliable deterministic demo into a local/ZeroGPU-backed model reviewer without committing checkpoints into the Space repo.
+The app also reports model runtime status directly in the UI so judges can see the configured local-first paths.
+## Privacy Model
+DiffSense has three privacy tiers:
+1. Pasted diff with model toggles off: diff analysis stays in the app process.
+2. Public GitHub PR URL: the app fetches the public `.diff` document.
+3. Optional model/provider pass: compact diff context and deterministic findings are sent to the selected provider or local checkpoint path.
+This is why the deterministic review path is not just a fallback. It is the privacy-preserving default that makes the tool useful for sensitive code.
+## Gradio UI Design
+The UI uses `gr.Blocks` with custom CSS and HTML rendering rather than a chatbot layout. That choice matters because code review is a reading and scanning task. A chat transcript is the wrong shape for a diff.
+The current layout is optimized for a demo and for actual use:
+- configuration in the sidebar,
+- input and summary in the center,
+- detailed inline review in the larger right pane,
+- JSON output beneath the detailed review.
+Findings are rendered inside the diff with severity badges, file headers, hunk headers, line numbers, and suggested fixes. This makes the output feel like a review artifact rather than a model response.
+## Development Process
+The project was built under a tight hackathon deadline with Codex as an active build partner.
+The build sequence was:
+1. Analyze the hackathon constraints and sponsor badge criteria.
+2. Choose a real developer workflow that benefits from local AI: pull request review.
+3. Build a deterministic reviewer first so the demo could never be blocked by model availability.
+4. Add a custom Gradio UI for a non-chat, code-review-specific experience.
+5. Add public GitHub PR URL fetching.
+6. Add model/provider bridge toggles for Mellum, Nemotron, Tiny Titan, MiniCPM-V, and Modal.
+7. Add persistent `/data` checkpoint slots for ZeroGPU/local checkpoint readiness.
+8. Stabilize Space runtime by disabling experimental Gradio SSR.
+9. Rebalance the UI into configuration, input/summary, and detailed review panes.
+10. Iterate the visible model status copy so the app reads as local-first and resilient rather than broken when hosted providers are unavailable.
+The most important engineering decision was to reduce risk early. A deterministic reviewer with a custom diff renderer is valuable on its own; model bridges then improve the experience rather than define it.
+## Failure Handling
+The app is designed to stay useful across common hackathon failure modes:
+- hosted model route unavailable,
+- OAuth token missing,
+- Space rebuild,
+- provider rate limit,
+- cold start,
+- missing local checkpoints,
+- public PR URL fetch failure.
+For model failures, the UI reports that the bridge is armed and that deterministic fallback is active. The review still completes.
+For rebuild persistence, model files belong under `/data`, not `/app`. The `/app` directory can be reset during rebuilds, but the mounted bucket persists as long as it remains attached to the Space.
+## Hackathon Fit
+DiffSense targets the Backyard AI track because it is a practical local AI tool for a daily developer workflow.
+It also maps cleanly to sponsor badges:
+- Gradio app: implemented as a Hugging Face Space using Gradio.
+- Best Use of Codex: Codex was used throughout design, implementation, debugging, deployment, and documentation.
+- Best Agent: the app is a staged review pipeline with parsing, classification, summarization, routing, and rendering.
+- Off Brand: custom diff UI instead of a stock chat interface.
+- Best Demo: one-click sample or public PR URL produces clear review output quickly.
+- Best MiniCPM Build: MiniCPM-V 4.6 image path is integrated for visual PR context.
+- Nemotron Hardware Prize: Nemotron 3 Nano router bridge is integrated.
+- Tiny Titan: Nemotron 3 Nano 4B checker path is integrated.
+- Best Use of Modal: Modal endpoint bridge is included through `DIFFSENSE_MODAL_ENDPOINT`.
+## What We Would Build Next
+The next product improvements are straightforward:
+1. Add a real Modal endpoint and set `DIFFSENSE_MODAL_ENDPOINT`.
+2. Stage quantized checkpoints under `/data/models`.
+3. Add downloadable patch suggestions.
+4. Add GitHub comment export.
+5. Add per-rule enable/disable controls.
+6. Add a richer MiniCPM-V demo with screenshots and architecture diagrams.
+## Conclusion
+DiffSense is small by design. It does not require a perfect model endpoint to be useful, and it does not ask teams to send private code to a SaaS reviewer. It turns a diff into a structured, inspectable review artifact and creates clear extension points for local checkpoints and sponsor models.
+That combination, reliable deterministic review plus optional small-model intelligence, is the core idea: useful now, private by default, and ready to grow into a fully local AI code review workflow.

LINKEDIN_POST.md ADDED Viewed

	@@ -0,0 +1,42 @@

+# LinkedIn Post Draft
+We built DiffSense for the Hugging Face Build Small hackathon: a private, local-first pull request reviewer for teams that cannot send proprietary code to cloud review bots.
+The idea is simple:
+Paste a unified diff or public GitHub PR URL.
+Get severity-tagged review findings.
+Read inline comments attached to changed lines.
+Copy structured JSON into your PR workflow.
+The core review path is deterministic and runs inside the app process, so the demo stays useful even when model providers are cold, rate-limited, or unavailable. Then we add optional small-model bridges for the hackathon stack:
+- JetBrains Mellum 2 for code-review summaries
+- NVIDIA Nemotron 3 Nano for agentic routing and triage
+- NVIDIA Nemotron 3 Nano 4B for a Tiny Titan checker pass
+- OpenBMB MiniCPM-V 4.6 for PR screenshots, diagrams, and UI context
+- Modal through a provider bridge for hosted inference
+The Space also has a persistent `/data` bucket mount with local checkpoint slots, so the app is ready for ZeroGPU/local model runs without putting weights in the repo.
+What I like most about this project is that it is not a chat UI pretending to be a code-review tool. DiffSense renders a custom inline diff view with file headers, hunk headers, line numbers, severity badges, comments, suggested fixes, and machine-readable JSON.
+Built with Gradio, Hugging Face Spaces, Codex, and open model targets under the Build Small constraints.
+Try the Space: https://huggingface.co/spaces/build-small-hackathon/DiffSense
+#BuildSmall #HuggingFace #Gradio #LocalAI #CodeReview #OpenSource #AIEngineering
+## Shorter Version
+We shipped DiffSense for the Hugging Face Build Small hackathon.
+It is a private, local-first PR reviewer: paste a diff or public GitHub PR URL, get inline severity-tagged findings and structured JSON without relying on a SaaS code review bot.
+The app has a deterministic review engine for reliability, plus optional bridges for Mellum 2, Nemotron 3 Nano, Tiny Titan, MiniCPM-V 4.6, Modal, and persistent `/data` checkpoint slots.
+The result is not a chat transcript. It is a real code-review artifact: inline comments, hunk-level findings, suggested fixes, and JSON output.
+Space: https://huggingface.co/spaces/build-small-hackathon/DiffSense
+#BuildSmall #HuggingFace #Gradio #LocalAI #CodeReview

README.md CHANGED Viewed

@@ -147,6 +147,12 @@ Then open `http://localhost:7860`.
 4. Show the JSON output as a practical artifact for PR automation.
 5. Toggle the optional model summary to show the small-model enhancement path.
 ## Social Post Draft
 DiffSense is our Build Small hackathon project: a private PR reviewer for teams that cannot send proprietary code to cloud bots.

 4. Show the JSON output as a practical artifact for PR automation.
 5. Toggle the optional model summary to show the small-model enhancement path.
+## Submission Artifacts
+- [HF technical paper](HF_TECH_PAPER.md)
+- [LinkedIn post draft](LINKEDIN_POST.md)
+- [Demo video pitch](DEMO_VIDEO_PITCH.md)
 ## Social Post Draft
 DiffSense is our Build Small hackathon project: a private PR reviewer for teams that cannot send proprietary code to cloud bots.