avaliev commited on
Commit
7eb80e6
·
1 Parent(s): a047a58

Add local model runtime status

Browse files
Files changed (3) hide show
  1. README.md +21 -7
  2. TECH_DESIGN.md +23 -1
  3. app.py +142 -20
README.md CHANGED
@@ -27,7 +27,7 @@ tags:
27
  - best-use-of-modal
28
  - tiny-titan
29
  models:
30
- - JetBrains/Mellum-2-12B-instruct
31
  - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
32
  - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
33
  - openbmb/MiniCPM-V-4.6
@@ -56,7 +56,8 @@ DiffSense is the small-model version of that workflow: useful immediately, inspe
56
  - Optional MiniCPM-V 4.6 vision pass for PR screenshots, architecture diagrams, and UI diffs.
57
  - Optional Modal bridge through `DIFFSENSE_MODAL_ENDPOINT`.
58
  - Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
59
- - Optional model-assisted summary using `JetBrains/Mellum-2-12B-instruct` through the Hugging Face Inference API when OAuth is available.
 
60
 
61
  ## Hackathon Track
62
 
@@ -79,15 +80,28 @@ All planned models are under the Build Small 32B parameter cap.
79
 
80
  | Role | Model | Status |
81
  | --- | --- | --- |
82
- | Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook implemented |
83
  | Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
84
- | Agentic routing | NVIDIA Nemotron 3 Nano | Optional HF inference hook implemented |
85
- | Tiny checker | NVIDIA Nemotron 3 Nano 4B | Optional HF inference hook implemented |
86
- | Visual PR context | OpenBMB MiniCPM-V 4.6 | Optional image upload + HF inference hook implemented |
87
  | Runtime | Modal | Optional provider bridge via `DIFFSENSE_MODAL_ENDPOINT` implemented |
88
 
89
  The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ## Usage
92
 
93
  1. Open the Space.
@@ -114,7 +128,7 @@ For public GitHub PRs, paste the PR URL directly. DiffSense fetches the `.diff`
114
 
115
  ## Privacy
116
 
117
- The deterministic review path runs inside the app process and does not send the pasted diff to any external model. If a public PR URL is pasted, the app fetches its public `.diff` over the network. If the optional model summary is enabled, the diff excerpt and deterministic findings are sent to the selected Hugging Face Inference model using the signed-in user's OAuth token.
118
 
119
  ## Local Run
120
 
 
27
  - best-use-of-modal
28
  - tiny-titan
29
  models:
30
+ - JetBrains/Mellum2-12B-A2.5B-Instruct
31
  - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
32
  - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
33
  - openbmb/MiniCPM-V-4.6
 
56
  - Optional MiniCPM-V 4.6 vision pass for PR screenshots, architecture diagrams, and UI diffs.
57
  - Optional Modal bridge through `DIFFSENSE_MODAL_ENDPOINT`.
58
  - Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
59
+ - Optional model-assisted summary using `JetBrains/Mellum2-12B-A2.5B-Instruct` through the Hugging Face Inference API when OAuth is available, or a local checkpoint when mounted under `/data`.
60
+ - ZeroGPU/bucket-aware model runtime status for local checkpoints mounted from the `build-small-hackathon/DiffSense` bucket.
61
 
62
  ## Hackathon Track
63
 
 
80
 
81
  | Role | Model | Status |
82
  | --- | --- | --- |
83
+ | Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook + `/data` local checkpoint path implemented |
84
  | Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
85
+ | Agentic routing | NVIDIA Nemotron 3 Nano | Optional HF inference hook + `/data` local checkpoint path implemented |
86
+ | Tiny checker | NVIDIA Nemotron 3 Nano 4B | Optional HF inference hook + `/data` local checkpoint path implemented |
87
+ | Visual PR context | OpenBMB MiniCPM-V 4.6 | Optional image upload + provider/local checkpoint readiness implemented |
88
  | Runtime | Modal | Optional provider bridge via `DIFFSENSE_MODAL_ENDPOINT` implemented |
89
 
90
  The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.
91
 
92
+ ## Local Checkpoint Layout
93
+
94
+ The Space is configured with a read/write bucket mounted at `/data`, so model files can be staged without committing checkpoints to the app repo. DiffSense checks these paths at runtime:
95
+
96
+ ```text
97
+ /data/models/mellum2-instruct
98
+ /data/models/nemotron-3-nano-30b-a3b
99
+ /data/models/nemotron-3-nano-4b
100
+ /data/models/minicpm-v-4.6
101
+ ```
102
+
103
+ Each directory is considered ready when it contains a `config.json`. If a Hugging Face provider does not serve a sponsor model, the app reports the provider limitation cleanly and keeps the deterministic review running.
104
+
105
  ## Usage
106
 
107
  1. Open the Space.
 
128
 
129
  ## Privacy
130
 
131
+ The deterministic review path runs inside the app process and does not send the pasted diff to any external model. If a public PR URL is pasted, the app fetches its public `.diff` over the network. If an optional hosted model pass is enabled, the diff excerpt and deterministic findings are sent to the selected Hugging Face Inference model using the signed-in user's OAuth token. If a local checkpoint is mounted under `/data/models`, that local path is preferred for text-model passes.
132
 
133
  ## Local Run
134
 
TECH_DESIGN.md CHANGED
@@ -18,6 +18,7 @@ Unified diff input or public GitHub PR URL
18
  -> optional Nemotron 3 Nano routing via HF OAuth
19
  -> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
20
  -> optional MiniCPM-V 4.6 vision notes via HF OAuth
 
21
  -> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT
22
  ```
23
 
@@ -82,11 +83,13 @@ Each finding includes:
82
  When enabled, the app uses the signed-in Hugging Face OAuth token or `HF_TOKEN` through the Hugging Face Inference API to call:
83
 
84
  ```text
85
- JetBrains/Mellum-2-12B-instruct
86
  ```
87
 
88
  The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
89
 
 
 
90
  ### Optional Nemotron Router
91
 
92
  When enabled, the app calls:
@@ -97,6 +100,8 @@ nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
97
 
98
  Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.
99
 
 
 
100
  ### Optional Tiny Titan Checker
101
 
102
  When enabled, the app calls a <=4B model:
@@ -107,6 +112,8 @@ nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
107
 
108
  This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.
109
 
 
 
110
  ### Optional MiniCPM-V Vision Pass
111
 
112
  When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:
@@ -117,6 +124,21 @@ openbmb/MiniCPM-V-4.6
117
 
118
  This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ### Optional Modal Bridge
121
 
122
  When `DIFFSENSE_MODAL_ENDPOINT` is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.
 
18
  -> optional Nemotron 3 Nano routing via HF OAuth
19
  -> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
20
  -> optional MiniCPM-V 4.6 vision notes via HF OAuth
21
+ -> optional local checkpoints from /data/models on ZeroGPU
22
  -> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT
23
  ```
24
 
 
83
  When enabled, the app uses the signed-in Hugging Face OAuth token or `HF_TOKEN` through the Hugging Face Inference API to call:
84
 
85
  ```text
86
+ JetBrains/Mellum2-12B-A2.5B-Instruct
87
  ```
88
 
89
  The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
90
 
91
+ If `/data/models/mellum2-instruct/config.json` exists, the app prefers that local checkpoint path before calling the hosted provider.
92
+
93
  ### Optional Nemotron Router
94
 
95
  When enabled, the app calls:
 
100
 
101
  Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.
102
 
103
+ If `/data/models/nemotron-3-nano-30b-a3b/config.json` exists, the app treats the local checkpoint as the preferred runtime path.
104
+
105
  ### Optional Tiny Titan Checker
106
 
107
  When enabled, the app calls a <=4B model:
 
112
 
113
  This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.
114
 
115
+ If `/data/models/nemotron-3-nano-4b/config.json` exists, the app treats the local checkpoint as the preferred runtime path.
116
+
117
  ### Optional MiniCPM-V Vision Pass
118
 
119
  When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:
 
124
 
125
  This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.
126
 
127
+ If `/data/models/minicpm-v-4.6/config.json` exists, the app reports the local MiniCPM-V checkpoint as ready and keeps the image ingestion path available for a custom local loader.
128
+
129
+ ### ZeroGPU Bucket Mount
130
+
131
+ The Space has a read/write bucket mounted at `/data`. DiffSense checks the following model checkpoint locations at runtime and includes their status in the model-agent trace:
132
+
133
+ ```text
134
+ /data/models/mellum2-instruct
135
+ /data/models/nemotron-3-nano-30b-a3b
136
+ /data/models/nemotron-3-nano-4b
137
+ /data/models/minicpm-v-4.6
138
+ ```
139
+
140
+ This keeps the app repo small while making the model integration path explicit for the hackathon badges. Hosted provider failures are converted into concise status notes rather than raw request errors.
141
+
142
  ### Optional Modal Bridge
143
 
144
  When `DIFFSENSE_MODAL_ENDPOINT` is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.
app.py CHANGED
@@ -16,11 +16,19 @@ import gradio as gr
16
  from huggingface_hub import InferenceClient
17
 
18
 
19
- MELLUM_MODEL = os.getenv("DIFFSENSE_MELLUM_MODEL", "JetBrains/Mellum-2-12B-instruct")
 
 
20
  NEMOTRON_MODEL = os.getenv("DIFFSENSE_NEMOTRON_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")
21
  TINY_TITAN_MODEL = os.getenv("DIFFSENSE_TINY_TITAN_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16")
22
  MINICPM_MODEL = os.getenv("DIFFSENSE_MINICPM_MODEL", "openbmb/MiniCPM-V-4.6")
23
  MODAL_ENDPOINT = os.getenv("DIFFSENSE_MODAL_ENDPOINT", "")
 
 
 
 
 
 
24
  FETCH_TIMEOUT_SECONDS = 10
25
  MAX_IMAGE_BYTES = 2_500_000
26
 
@@ -536,8 +544,8 @@ def summarize_with_model(
536
  if not enabled:
537
  return summarize_deterministic(files, findings, prefix="Deterministic review complete.")
538
 
539
- token = hf_token.token if hf_token else os.getenv("HF_TOKEN")
540
- if not token:
541
  return summarize_deterministic(
542
  files,
543
  findings,
@@ -576,12 +584,12 @@ def summarize_with_model(
576
  ]
577
 
578
  try:
579
- return call_chat_model(MELLUM_MODEL, messages, token, max_tokens=320)
580
  except Exception as exc: # The app must stay demoable when endpoints are unavailable.
581
  return summarize_deterministic(
582
  files,
583
  findings,
584
- prefix=f"Model summary unavailable from {MELLUM_MODEL}: {exc}",
585
  )
586
 
587
 
@@ -589,9 +597,15 @@ def call_chat_model(
589
  model: str,
590
  messages: list[dict[str, Any]],
591
  token: str,
 
592
  max_tokens: int = 320,
593
  temperature: float = 0.2,
594
  ) -> str:
 
 
 
 
 
595
  client = InferenceClient(token=token, model=model)
596
  response = client.chat_completion(
597
  messages=messages,
@@ -602,6 +616,70 @@ def call_chat_model(
602
  return response.choices[0].message.content or f"{model} returned an empty response."
603
 
604
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
605
  def compact_review_context(files: list[FileDiff], findings: list[Finding], max_chars: int = 9000) -> str:
606
  diff_excerpt = "\n".join(
607
  f"{file.path}\n"
@@ -628,7 +706,7 @@ def run_nemotron_router(
628
  if not enabled:
629
  return f"Nemotron router disabled. Model configured: `{NEMOTRON_MODEL}`."
630
 
631
- if not token:
632
  return f"Nemotron router ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{NEMOTRON_MODEL}`."
633
 
634
  messages = [
@@ -642,10 +720,11 @@ def run_nemotron_router(
642
  {"role": "user", "content": compact_review_context(files, findings)},
643
  ]
644
  try:
645
- return call_chat_model(NEMOTRON_MODEL, messages, token, max_tokens=360)
646
  except Exception as exc:
647
  return (
648
- f"Nemotron router attempted `{NEMOTRON_MODEL}` but the endpoint was unavailable: {exc}\n\n"
 
649
  + deterministic_router_fallback(files, findings)
650
  )
651
 
@@ -674,7 +753,7 @@ def run_tiny_titan_checker(
674
  if not enabled:
675
  return f"Tiny Titan checker disabled. Model configured: `{TINY_TITAN_MODEL}`."
676
 
677
- if not token:
678
  return f"Tiny Titan checker ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{TINY_TITAN_MODEL}`."
679
 
680
  messages = [
@@ -688,9 +767,15 @@ def run_tiny_titan_checker(
688
  {"role": "user", "content": compact_review_context(files, findings, max_chars=7000)},
689
  ]
690
  try:
691
- return call_chat_model(TINY_TITAN_MODEL, messages, token, max_tokens=260)
692
  except Exception as exc:
693
- return f"Tiny Titan checker attempted `{TINY_TITAN_MODEL}` but the endpoint was unavailable: {exc}"
 
 
 
 
 
 
694
 
695
 
696
  def run_minicpm_vision(
@@ -707,12 +792,6 @@ def run_minicpm_vision(
707
  if not enabled:
708
  return f"MiniCPM-V vision disabled with {len(images)} image(s) attached. Model configured: `{MINICPM_MODEL}`."
709
 
710
- if not token:
711
- return (
712
- f"MiniCPM-V vision ready with {len(images)} image(s), but no Hugging Face OAuth/HF_TOKEN is available. "
713
- f"Model configured: `{MINICPM_MODEL}`."
714
- )
715
-
716
  prompt = (
717
  "You are DiffSense vision context. Read these PR screenshots, UI diffs, or architecture diagrams. "
718
  "Return concise markdown notes that could affect code review: changed behavior, missing tests, security risks, "
@@ -731,13 +810,26 @@ def run_minicpm_vision(
731
  if len(content) == 1:
732
  return f"MiniCPM-V vision could not read the uploaded image files. {skipped} file(s) were skipped."
733
 
 
 
 
 
 
 
 
 
 
 
 
 
 
734
  messages = [{"role": "user", "content": content}]
735
  try:
736
- return call_chat_model(MINICPM_MODEL, messages, token, max_tokens=420)
737
  except Exception as exc:
738
  return (
739
- f"MiniCPM-V attempted `{MINICPM_MODEL}` on {len(content) - 1} image(s), "
740
- f"but the endpoint was unavailable: {exc}"
741
  )
742
 
743
 
@@ -934,6 +1026,8 @@ def run_review(
934
  def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes: str, modal_notes: str) -> str:
935
  return "\n\n".join(
936
  [
 
 
937
  "### Nemotron 3 Nano Router",
938
  nemotron_notes,
939
  "### Tiny Titan 4B Checker",
@@ -946,6 +1040,34 @@ def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes
946
  )
947
 
948
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
949
  def load_sample() -> str:
950
  return SAMPLE_DIFF
951
 
 
16
  from huggingface_hub import InferenceClient
17
 
18
 
19
+ DATA_ROOT = Path(os.getenv("DIFFSENSE_DATA_ROOT", "/data"))
20
+ LOCAL_MODEL_ROOT = Path(os.getenv("DIFFSENSE_LOCAL_MODEL_ROOT", DATA_ROOT / "models"))
21
+ MELLUM_MODEL = os.getenv("DIFFSENSE_MELLUM_MODEL", "JetBrains/Mellum2-12B-A2.5B-Instruct")
22
  NEMOTRON_MODEL = os.getenv("DIFFSENSE_NEMOTRON_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")
23
  TINY_TITAN_MODEL = os.getenv("DIFFSENSE_TINY_TITAN_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16")
24
  MINICPM_MODEL = os.getenv("DIFFSENSE_MINICPM_MODEL", "openbmb/MiniCPM-V-4.6")
25
  MODAL_ENDPOINT = os.getenv("DIFFSENSE_MODAL_ENDPOINT", "")
26
+ LOCAL_MODEL_DIRS = {
27
+ "mellum": Path(os.getenv("DIFFSENSE_MELLUM_LOCAL_DIR", LOCAL_MODEL_ROOT / "mellum2-instruct")),
28
+ "nemotron": Path(os.getenv("DIFFSENSE_NEMOTRON_LOCAL_DIR", LOCAL_MODEL_ROOT / "nemotron-3-nano-30b-a3b")),
29
+ "tiny_titan": Path(os.getenv("DIFFSENSE_TINY_TITAN_LOCAL_DIR", LOCAL_MODEL_ROOT / "nemotron-3-nano-4b")),
30
+ "minicpm": Path(os.getenv("DIFFSENSE_MINICPM_LOCAL_DIR", LOCAL_MODEL_ROOT / "minicpm-v-4.6")),
31
+ }
32
  FETCH_TIMEOUT_SECONDS = 10
33
  MAX_IMAGE_BYTES = 2_500_000
34
 
 
544
  if not enabled:
545
  return summarize_deterministic(files, findings, prefix="Deterministic review complete.")
546
 
547
+ token = hf_token.token if hf_token else os.getenv("HF_TOKEN", "")
548
+ if not token and not local_model_ready("mellum"):
549
  return summarize_deterministic(
550
  files,
551
  findings,
 
584
  ]
585
 
586
  try:
587
+ return call_chat_model(MELLUM_MODEL, messages, token, local_alias="mellum", max_tokens=320)
588
  except Exception as exc: # The app must stay demoable when endpoints are unavailable.
589
  return summarize_deterministic(
590
  files,
591
  findings,
592
+ prefix=f"Model summary unavailable from `{MELLUM_MODEL}`: {friendly_model_error(MELLUM_MODEL, exc, 'mellum')}",
593
  )
594
 
595
 
 
597
  model: str,
598
  messages: list[dict[str, Any]],
599
  token: str,
600
+ local_alias: str | None = None,
601
  max_tokens: int = 320,
602
  temperature: float = 0.2,
603
  ) -> str:
604
+ if local_alias:
605
+ local_response = try_local_text_model(local_alias, messages, max_tokens=max_tokens, temperature=temperature)
606
+ if local_response:
607
+ return local_response
608
+
609
  client = InferenceClient(token=token, model=model)
610
  response = client.chat_completion(
611
  messages=messages,
 
616
  return response.choices[0].message.content or f"{model} returned an empty response."
617
 
618
 
619
+ def try_local_text_model(
620
+ alias: str,
621
+ messages: list[dict[str, Any]],
622
+ max_tokens: int,
623
+ temperature: float,
624
+ ) -> str | None:
625
+ model_dir = LOCAL_MODEL_DIRS.get(alias)
626
+ if not model_dir or not (model_dir / "config.json").exists():
627
+ return None
628
+
629
+ try:
630
+ import torch
631
+ from transformers import AutoModelForCausalLM, AutoTokenizer
632
+ except Exception as exc:
633
+ return (
634
+ f"Local checkpoint detected at `{model_dir}`, but local inference dependencies are not installed: "
635
+ f"{type(exc).__name__}. Add torch/transformers or use the HF provider path."
636
+ )
637
+
638
+ try:
639
+ tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
640
+ model = AutoModelForCausalLM.from_pretrained(
641
+ model_dir,
642
+ device_map="auto",
643
+ torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
644
+ trust_remote_code=True,
645
+ )
646
+ if hasattr(tokenizer, "apply_chat_template"):
647
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
648
+ else:
649
+ prompt = "\n\n".join(f"{item.get('role', 'user')}: {item.get('content', '')}" for item in messages)
650
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
651
+ generated = model.generate(
652
+ **inputs,
653
+ max_new_tokens=max_tokens,
654
+ do_sample=temperature > 0,
655
+ temperature=max(temperature, 0.01),
656
+ )
657
+ new_tokens = generated[0][inputs["input_ids"].shape[-1] :]
658
+ text = tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
659
+ return text or f"Local checkpoint `{model_dir}` returned an empty response."
660
+ except Exception as exc:
661
+ return f"Local checkpoint `{model_dir}` could not run in this Space: {type(exc).__name__}: {exc}"
662
+
663
+
664
+ def friendly_model_error(model: str, exc: Exception, alias: str | None = None) -> str:
665
+ raw = str(exc)
666
+ if "model_not_found" in raw or "does not exist" in raw:
667
+ reason = "the model ID was rejected by the HF provider"
668
+ elif "model_not_supported" in raw or "not supported by any provider" in raw:
669
+ reason = "the model exists, but no enabled HF provider currently serves it"
670
+ elif "401" in raw or "unauthorized" in raw.lower():
671
+ reason = "the current token is not authorized for this provider call"
672
+ elif "429" in raw or "rate" in raw.lower():
673
+ reason = "the provider is rate-limited"
674
+ else:
675
+ reason = "the provider request failed"
676
+
677
+ local_hint = ""
678
+ if alias and alias in LOCAL_MODEL_DIRS:
679
+ local_hint = f" Local checkpoint path: `{LOCAL_MODEL_DIRS[alias]}`."
680
+ return f"{reason} for `{model}`.{local_hint}"
681
+
682
+
683
  def compact_review_context(files: list[FileDiff], findings: list[Finding], max_chars: int = 9000) -> str:
684
  diff_excerpt = "\n".join(
685
  f"{file.path}\n"
 
706
  if not enabled:
707
  return f"Nemotron router disabled. Model configured: `{NEMOTRON_MODEL}`."
708
 
709
+ if not token and not local_model_ready("nemotron"):
710
  return f"Nemotron router ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{NEMOTRON_MODEL}`."
711
 
712
  messages = [
 
720
  {"role": "user", "content": compact_review_context(files, findings)},
721
  ]
722
  try:
723
+ return call_chat_model(NEMOTRON_MODEL, messages, token, local_alias="nemotron", max_tokens=360)
724
  except Exception as exc:
725
  return (
726
+ f"Nemotron router attempted `{NEMOTRON_MODEL}`. "
727
+ f"{friendly_model_error(NEMOTRON_MODEL, exc, 'nemotron')}\n\n"
728
  + deterministic_router_fallback(files, findings)
729
  )
730
 
 
753
  if not enabled:
754
  return f"Tiny Titan checker disabled. Model configured: `{TINY_TITAN_MODEL}`."
755
 
756
+ if not token and not local_model_ready("tiny_titan"):
757
  return f"Tiny Titan checker ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{TINY_TITAN_MODEL}`."
758
 
759
  messages = [
 
767
  {"role": "user", "content": compact_review_context(files, findings, max_chars=7000)},
768
  ]
769
  try:
770
+ return call_chat_model(TINY_TITAN_MODEL, messages, token, local_alias="tiny_titan", max_tokens=260)
771
  except Exception as exc:
772
+ return (
773
+ f"Tiny Titan checker attempted `{TINY_TITAN_MODEL}`. "
774
+ f"{friendly_model_error(TINY_TITAN_MODEL, exc, 'tiny_titan')}\n\n"
775
+ "- Deterministic checker fallback: verify that critical security findings are fixed before merge.\n"
776
+ "- Test recommendation: cover every changed auth, network, and empty-input branch.\n"
777
+ "- Merge decision: hold if any critical finding remains."
778
+ )
779
 
780
 
781
  def run_minicpm_vision(
 
792
  if not enabled:
793
  return f"MiniCPM-V vision disabled with {len(images)} image(s) attached. Model configured: `{MINICPM_MODEL}`."
794
 
 
 
 
 
 
 
795
  prompt = (
796
  "You are DiffSense vision context. Read these PR screenshots, UI diffs, or architecture diagrams. "
797
  "Return concise markdown notes that could affect code review: changed behavior, missing tests, security risks, "
 
810
  if len(content) == 1:
811
  return f"MiniCPM-V vision could not read the uploaded image files. {skipped} file(s) were skipped."
812
 
813
+ local_dir = LOCAL_MODEL_DIRS["minicpm"]
814
+ if (local_dir / "config.json").exists():
815
+ return (
816
+ f"MiniCPM-V local checkpoint detected at `{local_dir}` with {len(content) - 1} image(s). "
817
+ "The app has the image ingestion path wired; run the custom MiniCPM-V loader from this mount for full local vision inference."
818
+ )
819
+
820
+ if not token:
821
+ return (
822
+ f"MiniCPM-V vision ready with {len(content) - 1} image(s), but no Hugging Face OAuth/HF_TOKEN is available "
823
+ f"and no local checkpoint is mounted at `{local_dir}`. Model configured: `{MINICPM_MODEL}`."
824
+ )
825
+
826
  messages = [{"role": "user", "content": content}]
827
  try:
828
+ return call_chat_model(MINICPM_MODEL, messages, token, local_alias="minicpm", max_tokens=420)
829
  except Exception as exc:
830
  return (
831
+ f"MiniCPM-V attempted `{MINICPM_MODEL}` on {len(content) - 1} image(s). "
832
+ f"{friendly_model_error(MINICPM_MODEL, exc, 'minicpm')}"
833
  )
834
 
835
 
 
1026
  def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes: str, modal_notes: str) -> str:
1027
  return "\n\n".join(
1028
  [
1029
+ "### Model Runtime Status",
1030
+ render_model_runtime_status(),
1031
  "### Nemotron 3 Nano Router",
1032
  nemotron_notes,
1033
  "### Tiny Titan 4B Checker",
 
1040
  )
1041
 
1042
 
1043
+ def render_model_runtime_status() -> str:
1044
+ data_state = "mounted" if DATA_ROOT.exists() else "not mounted"
1045
+ data_writable = "writable" if os.access(DATA_ROOT, os.W_OK) else "read-only or unavailable"
1046
+ lines = [
1047
+ f"- Data mount: `{DATA_ROOT}` is **{data_state}** and **{data_writable}**.",
1048
+ f"- Mellum summary: `{MELLUM_MODEL}`; local path {format_local_model_status('mellum')}.",
1049
+ f"- Nemotron router: `{NEMOTRON_MODEL}`; local path {format_local_model_status('nemotron')}.",
1050
+ f"- Tiny Titan checker: `{TINY_TITAN_MODEL}`; local path {format_local_model_status('tiny_titan')}.",
1051
+ f"- MiniCPM-V vision: `{MINICPM_MODEL}`; local path {format_local_model_status('minicpm')}.",
1052
+ "- Deterministic reviewer remains the always-on fallback for a reliable demo.",
1053
+ ]
1054
+ return "\n".join(lines)
1055
+
1056
+
1057
+ def format_local_model_status(alias: str) -> str:
1058
+ model_dir = LOCAL_MODEL_DIRS[alias]
1059
+ if (model_dir / "config.json").exists():
1060
+ return f"`{model_dir}` is **ready**"
1061
+ if model_dir.exists():
1062
+ return f"`{model_dir}` exists but has no `config.json`"
1063
+ return f"`{model_dir}` is not present"
1064
+
1065
+
1066
+ def local_model_ready(alias: str) -> bool:
1067
+ model_dir = LOCAL_MODEL_DIRS.get(alias)
1068
+ return bool(model_dir and (model_dir / "config.json").exists())
1069
+
1070
+
1071
  def load_sample() -> str:
1072
  return SAMPLE_DIFF
1073