Spaces:

build-small-hackathon
/

DiffSense

Runtime error

App Files Files Community

avaliev commited on 19 days ago

Commit

7eb80e6

1 Parent(s): a047a58

Add local model runtime status

Browse files

Files changed (3) hide show

README.md +21 -7
TECH_DESIGN.md +23 -1
app.py +142 -20

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ tags:
   - best-use-of-modal
   - tiny-titan
 models:
-  - JetBrains/Mellum-2-12B-instruct
   - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
   - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
   - openbmb/MiniCPM-V-4.6
@@ -56,7 +56,8 @@ DiffSense is the small-model version of that workflow: useful immediately, inspe
 - Optional MiniCPM-V 4.6 vision pass for PR screenshots, architecture diagrams, and UI diffs.
 - Optional Modal bridge through `DIFFSENSE_MODAL_ENDPOINT`.
 - Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
-- Optional model-assisted summary using `JetBrains/Mellum-2-12B-instruct` through the Hugging Face Inference API when OAuth is available.
 ## Hackathon Track
@@ -79,15 +80,28 @@ All planned models are under the Build Small 32B parameter cap.
 | Role | Model | Status |
 | --- | --- | --- |
-| Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook implemented |
 | Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
-| Agentic routing | NVIDIA Nemotron 3 Nano | Optional HF inference hook implemented |
-| Tiny checker | NVIDIA Nemotron 3 Nano 4B | Optional HF inference hook implemented |
-| Visual PR context | OpenBMB MiniCPM-V 4.6 | Optional image upload + HF inference hook implemented |
 | Runtime | Modal | Optional provider bridge via `DIFFSENSE_MODAL_ENDPOINT` implemented |
 The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.
 ## Usage
 1. Open the Space.
@@ -114,7 +128,7 @@ For public GitHub PRs, paste the PR URL directly. DiffSense fetches the `.diff`
 ## Privacy
-The deterministic review path runs inside the app process and does not send the pasted diff to any external model. If a public PR URL is pasted, the app fetches its public `.diff` over the network. If the optional model summary is enabled, the diff excerpt and deterministic findings are sent to the selected Hugging Face Inference model using the signed-in user's OAuth token.
 ## Local Run

   - best-use-of-modal
   - tiny-titan
 models:
+  - JetBrains/Mellum2-12B-A2.5B-Instruct
   - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
   - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
   - openbmb/MiniCPM-V-4.6
 - Optional MiniCPM-V 4.6 vision pass for PR screenshots, architecture diagrams, and UI diffs.
 - Optional Modal bridge through `DIFFSENSE_MODAL_ENDPOINT`.
 - Structured JSON output with file, hunk, line, severity, category, comment, and suggestion.
+- Optional model-assisted summary using `JetBrains/Mellum2-12B-A2.5B-Instruct` through the Hugging Face Inference API when OAuth is available, or a local checkpoint when mounted under `/data`.
+- ZeroGPU/bucket-aware model runtime status for local checkpoints mounted from the `build-small-hackathon/DiffSense` bucket.
 ## Hackathon Track
 | Role | Model | Status |
 | --- | --- | --- |
+| Code review summary | JetBrains Mellum 2 12B Instruct | Optional HF inference hook + `/data` local checkpoint path implemented |
 | Provider | Hugging Face Inference API | Optional OAuth-backed summary provider |
+| Agentic routing | NVIDIA Nemotron 3 Nano | Optional HF inference hook + `/data` local checkpoint path implemented |
+| Tiny checker | NVIDIA Nemotron 3 Nano 4B | Optional HF inference hook + `/data` local checkpoint path implemented |
+| Visual PR context | OpenBMB MiniCPM-V 4.6 | Optional image upload + provider/local checkpoint readiness implemented |
 | Runtime | Modal | Optional provider bridge via `DIFFSENSE_MODAL_ENDPOINT` implemented |
 The current app intentionally keeps a deterministic fallback so the demo remains reliable even if a hosted model endpoint is cold, rate-limited, or unavailable.
+## Local Checkpoint Layout
+The Space is configured with a read/write bucket mounted at `/data`, so model files can be staged without committing checkpoints to the app repo. DiffSense checks these paths at runtime:
+```text
+/data/models/mellum2-instruct
+/data/models/nemotron-3-nano-30b-a3b
+/data/models/nemotron-3-nano-4b
+/data/models/minicpm-v-4.6
+```
+Each directory is considered ready when it contains a `config.json`. If a Hugging Face provider does not serve a sponsor model, the app reports the provider limitation cleanly and keeps the deterministic review running.
 ## Usage
 1. Open the Space.
 ## Privacy
+The deterministic review path runs inside the app process and does not send the pasted diff to any external model. If a public PR URL is pasted, the app fetches its public `.diff` over the network. If an optional hosted model pass is enabled, the diff excerpt and deterministic findings are sent to the selected Hugging Face Inference model using the signed-in user's OAuth token. If a local checkpoint is mounted under `/data/models`, that local path is preferred for text-model passes.
 ## Local Run

TECH_DESIGN.md CHANGED Viewed

@@ -18,6 +18,7 @@ Unified diff input or public GitHub PR URL
   -> optional Nemotron 3 Nano routing via HF OAuth
   -> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
   -> optional MiniCPM-V 4.6 vision notes via HF OAuth
   -> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT
 ```
@@ -82,11 +83,13 @@ Each finding includes:
 When enabled, the app uses the signed-in Hugging Face OAuth token or `HF_TOKEN` through the Hugging Face Inference API to call:
 ```text
-JetBrains/Mellum-2-12B-instruct
 ```
 The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
 ### Optional Nemotron Router
 When enabled, the app calls:
@@ -97,6 +100,8 @@ nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
 Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.
 ### Optional Tiny Titan Checker
 When enabled, the app calls a <=4B model:
@@ -107,6 +112,8 @@ nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
 This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.
 ### Optional MiniCPM-V Vision Pass
 When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:
@@ -117,6 +124,21 @@ openbmb/MiniCPM-V-4.6
 This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.
 ### Optional Modal Bridge
 When `DIFFSENSE_MODAL_ENDPOINT` is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.

   -> optional Nemotron 3 Nano routing via HF OAuth
   -> optional Nemotron 3 Nano 4B Tiny Titan check via HF OAuth
   -> optional MiniCPM-V 4.6 vision notes via HF OAuth
+  -> optional local checkpoints from /data/models on ZeroGPU
   -> optional Modal bridge via DIFFSENSE_MODAL_ENDPOINT
 ```
 When enabled, the app uses the signed-in Hugging Face OAuth token or `HF_TOKEN` through the Hugging Face Inference API to call:
 ```text
+JetBrains/Mellum2-12B-A2.5B-Instruct
 ```
 The model is asked to summarize the deterministic findings rather than invent new findings. This keeps the model role narrow, fast, and auditable.
+If `/data/models/mellum2-instruct/config.json` exists, the app prefers that local checkpoint path before calling the hosted provider.
 ### Optional Nemotron Router
 When enabled, the app calls:
 Nemotron receives deterministic findings plus a compact diff excerpt and returns a triage plan: merge risk, files to inspect first, and follow-up tests. If the endpoint is unavailable, the app shows a deterministic routing fallback.
+If `/data/models/nemotron-3-nano-30b-a3b/config.json` exists, the app treats the local checkpoint as the preferred runtime path.
 ### Optional Tiny Titan Checker
 When enabled, the app calls a <=4B model:
 This pass returns a compact sanity check: missed-risk hypothesis, test recommendation, and merge decision. It exists as a separate small-model path for the Tiny Titan badge while keeping the main reviewer reliable.
+If `/data/models/nemotron-3-nano-4b/config.json` exists, the app treats the local checkpoint as the preferred runtime path.
 ### Optional MiniCPM-V Vision Pass
 When enabled, uploaded PNG, JPEG, or WebP images are converted to data URLs and sent with the diff context to:
 This is intended for PR screenshots, architecture diagrams, and UI diffs. The app limits image payload size and reports endpoint failures visibly instead of blocking the review.
+If `/data/models/minicpm-v-4.6/config.json` exists, the app reports the local MiniCPM-V checkpoint as ready and keeps the image ingestion path available for a custom local loader.
+### ZeroGPU Bucket Mount
+The Space has a read/write bucket mounted at `/data`. DiffSense checks the following model checkpoint locations at runtime and includes their status in the model-agent trace:
+```text
+/data/models/mellum2-instruct
+/data/models/nemotron-3-nano-30b-a3b
+/data/models/nemotron-3-nano-4b
+/data/models/minicpm-v-4.6
+```
+This keeps the app repo small while making the model integration path explicit for the hackathon badges. Hosted provider failures are converted into concise status notes rather than raw request errors.
 ### Optional Modal Bridge
 When `DIFFSENSE_MODAL_ENDPOINT` is configured, the app can POST the deterministic findings and compact diff context to a Modal-hosted review endpoint. Without that secret, the UI reports that the bridge is ready but not configured.

app.py CHANGED Viewed

@@ -16,11 +16,19 @@ import gradio as gr
 from huggingface_hub import InferenceClient
-MELLUM_MODEL = os.getenv("DIFFSENSE_MELLUM_MODEL", "JetBrains/Mellum-2-12B-instruct")
 NEMOTRON_MODEL = os.getenv("DIFFSENSE_NEMOTRON_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")
 TINY_TITAN_MODEL = os.getenv("DIFFSENSE_TINY_TITAN_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16")
 MINICPM_MODEL = os.getenv("DIFFSENSE_MINICPM_MODEL", "openbmb/MiniCPM-V-4.6")
 MODAL_ENDPOINT = os.getenv("DIFFSENSE_MODAL_ENDPOINT", "")
 FETCH_TIMEOUT_SECONDS = 10
 MAX_IMAGE_BYTES = 2_500_000
@@ -536,8 +544,8 @@ def summarize_with_model(
     if not enabled:
         return summarize_deterministic(files, findings, prefix="Deterministic review complete.")
-    token = hf_token.token if hf_token else os.getenv("HF_TOKEN")
-    if not token:
         return summarize_deterministic(
             files,
             findings,
@@ -576,12 +584,12 @@ def summarize_with_model(
     ]
     try:
-        return call_chat_model(MELLUM_MODEL, messages, token, max_tokens=320)
     except Exception as exc:  # The app must stay demoable when endpoints are unavailable.
         return summarize_deterministic(
             files,
             findings,
-            prefix=f"Model summary unavailable from {MELLUM_MODEL}: {exc}",
         )
@@ -589,9 +597,15 @@ def call_chat_model(
     model: str,
     messages: list[dict[str, Any]],
     token: str,
     max_tokens: int = 320,
     temperature: float = 0.2,
 ) -> str:
     client = InferenceClient(token=token, model=model)
     response = client.chat_completion(
         messages=messages,
@@ -602,6 +616,70 @@ def call_chat_model(
     return response.choices[0].message.content or f"{model} returned an empty response."
 def compact_review_context(files: list[FileDiff], findings: list[Finding], max_chars: int = 9000) -> str:
     diff_excerpt = "\n".join(
         f"{file.path}\n"
@@ -628,7 +706,7 @@ def run_nemotron_router(
     if not enabled:
         return f"Nemotron router disabled. Model configured: `{NEMOTRON_MODEL}`."
-    if not token:
         return f"Nemotron router ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{NEMOTRON_MODEL}`."
     messages = [
@@ -642,10 +720,11 @@ def run_nemotron_router(
         {"role": "user", "content": compact_review_context(files, findings)},
     ]
     try:
-        return call_chat_model(NEMOTRON_MODEL, messages, token, max_tokens=360)
     except Exception as exc:
         return (
-            f"Nemotron router attempted `{NEMOTRON_MODEL}` but the endpoint was unavailable: {exc}\n\n"
             + deterministic_router_fallback(files, findings)
         )
@@ -674,7 +753,7 @@ def run_tiny_titan_checker(
     if not enabled:
         return f"Tiny Titan checker disabled. Model configured: `{TINY_TITAN_MODEL}`."
-    if not token:
         return f"Tiny Titan checker ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{TINY_TITAN_MODEL}`."
     messages = [
@@ -688,9 +767,15 @@ def run_tiny_titan_checker(
         {"role": "user", "content": compact_review_context(files, findings, max_chars=7000)},
     ]
     try:
-        return call_chat_model(TINY_TITAN_MODEL, messages, token, max_tokens=260)
     except Exception as exc:
-        return f"Tiny Titan checker attempted `{TINY_TITAN_MODEL}` but the endpoint was unavailable: {exc}"
 def run_minicpm_vision(
@@ -707,12 +792,6 @@ def run_minicpm_vision(
     if not enabled:
         return f"MiniCPM-V vision disabled with {len(images)} image(s) attached. Model configured: `{MINICPM_MODEL}`."
-    if not token:
-        return (
-            f"MiniCPM-V vision ready with {len(images)} image(s), but no Hugging Face OAuth/HF_TOKEN is available. "
-            f"Model configured: `{MINICPM_MODEL}`."
-        )
     prompt = (
         "You are DiffSense vision context. Read these PR screenshots, UI diffs, or architecture diagrams. "
         "Return concise markdown notes that could affect code review: changed behavior, missing tests, security risks, "
@@ -731,13 +810,26 @@ def run_minicpm_vision(
     if len(content) == 1:
         return f"MiniCPM-V vision could not read the uploaded image files. {skipped} file(s) were skipped."
     messages = [{"role": "user", "content": content}]
     try:
-        return call_chat_model(MINICPM_MODEL, messages, token, max_tokens=420)
     except Exception as exc:
         return (
-            f"MiniCPM-V attempted `{MINICPM_MODEL}` on {len(content) - 1} image(s), "
-            f"but the endpoint was unavailable: {exc}"
         )
@@ -934,6 +1026,8 @@ def run_review(
 def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes: str, modal_notes: str) -> str:
     return "\n\n".join(
         [
             "### Nemotron 3 Nano Router",
             nemotron_notes,
             "### Tiny Titan 4B Checker",
@@ -946,6 +1040,34 @@ def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes
     )
 def load_sample() -> str:
     return SAMPLE_DIFF

 from huggingface_hub import InferenceClient
+DATA_ROOT = Path(os.getenv("DIFFSENSE_DATA_ROOT", "/data"))
+LOCAL_MODEL_ROOT = Path(os.getenv("DIFFSENSE_LOCAL_MODEL_ROOT", DATA_ROOT / "models"))
+MELLUM_MODEL = os.getenv("DIFFSENSE_MELLUM_MODEL", "JetBrains/Mellum2-12B-A2.5B-Instruct")
 NEMOTRON_MODEL = os.getenv("DIFFSENSE_NEMOTRON_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16")
 TINY_TITAN_MODEL = os.getenv("DIFFSENSE_TINY_TITAN_MODEL", "nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16")
 MINICPM_MODEL = os.getenv("DIFFSENSE_MINICPM_MODEL", "openbmb/MiniCPM-V-4.6")
 MODAL_ENDPOINT = os.getenv("DIFFSENSE_MODAL_ENDPOINT", "")
+LOCAL_MODEL_DIRS = {
+    "mellum": Path(os.getenv("DIFFSENSE_MELLUM_LOCAL_DIR", LOCAL_MODEL_ROOT / "mellum2-instruct")),
+    "nemotron": Path(os.getenv("DIFFSENSE_NEMOTRON_LOCAL_DIR", LOCAL_MODEL_ROOT / "nemotron-3-nano-30b-a3b")),
+    "tiny_titan": Path(os.getenv("DIFFSENSE_TINY_TITAN_LOCAL_DIR", LOCAL_MODEL_ROOT / "nemotron-3-nano-4b")),
+    "minicpm": Path(os.getenv("DIFFSENSE_MINICPM_LOCAL_DIR", LOCAL_MODEL_ROOT / "minicpm-v-4.6")),
+}
 FETCH_TIMEOUT_SECONDS = 10
 MAX_IMAGE_BYTES = 2_500_000
     if not enabled:
         return summarize_deterministic(files, findings, prefix="Deterministic review complete.")
+    token = hf_token.token if hf_token else os.getenv("HF_TOKEN", "")
+    if not token and not local_model_ready("mellum"):
         return summarize_deterministic(
             files,
             findings,
     ]
     try:
+        return call_chat_model(MELLUM_MODEL, messages, token, local_alias="mellum", max_tokens=320)
     except Exception as exc:  # The app must stay demoable when endpoints are unavailable.
         return summarize_deterministic(
             files,
             findings,
+            prefix=f"Model summary unavailable from `{MELLUM_MODEL}`: {friendly_model_error(MELLUM_MODEL, exc, 'mellum')}",
         )
     model: str,
     messages: list[dict[str, Any]],
     token: str,
+    local_alias: str | None = None,
     max_tokens: int = 320,
     temperature: float = 0.2,
 ) -> str:
+    if local_alias:
+        local_response = try_local_text_model(local_alias, messages, max_tokens=max_tokens, temperature=temperature)
+        if local_response:
+            return local_response
     client = InferenceClient(token=token, model=model)
     response = client.chat_completion(
         messages=messages,
     return response.choices[0].message.content or f"{model} returned an empty response."
+def try_local_text_model(
+    alias: str,
+    messages: list[dict[str, Any]],
+    max_tokens: int,
+    temperature: float,
+) -> str | None:
+    model_dir = LOCAL_MODEL_DIRS.get(alias)
+    if not model_dir or not (model_dir / "config.json").exists():
+        return None
+    try:
+        import torch
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+    except Exception as exc:
+        return (
+            f"Local checkpoint detected at `{model_dir}`, but local inference dependencies are not installed: "
+            f"{type(exc).__name__}. Add torch/transformers or use the HF provider path."
+        )
+    try:
+        tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
+        model = AutoModelForCausalLM.from_pretrained(
+            model_dir,
+            device_map="auto",
+            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+            trust_remote_code=True,
+        )
+        if hasattr(tokenizer, "apply_chat_template"):
+            prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+        else:
+            prompt = "\n\n".join(f"{item.get('role', 'user')}: {item.get('content', '')}" for item in messages)
+        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+        generated = model.generate(
+            **inputs,
+            max_new_tokens=max_tokens,
+            do_sample=temperature > 0,
+            temperature=max(temperature, 0.01),
+        )
+        new_tokens = generated[0][inputs["input_ids"].shape[-1] :]
+        text = tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
+        return text or f"Local checkpoint `{model_dir}` returned an empty response."
+    except Exception as exc:
+        return f"Local checkpoint `{model_dir}` could not run in this Space: {type(exc).__name__}: {exc}"
+def friendly_model_error(model: str, exc: Exception, alias: str | None = None) -> str:
+    raw = str(exc)
+    if "model_not_found" in raw or "does not exist" in raw:
+        reason = "the model ID was rejected by the HF provider"
+    elif "model_not_supported" in raw or "not supported by any provider" in raw:
+        reason = "the model exists, but no enabled HF provider currently serves it"
+    elif "401" in raw or "unauthorized" in raw.lower():
+        reason = "the current token is not authorized for this provider call"
+    elif "429" in raw or "rate" in raw.lower():
+        reason = "the provider is rate-limited"
+    else:
+        reason = "the provider request failed"
+    local_hint = ""
+    if alias and alias in LOCAL_MODEL_DIRS:
+        local_hint = f" Local checkpoint path: `{LOCAL_MODEL_DIRS[alias]}`."
+    return f"{reason} for `{model}`.{local_hint}"
 def compact_review_context(files: list[FileDiff], findings: list[Finding], max_chars: int = 9000) -> str:
     diff_excerpt = "\n".join(
         f"{file.path}\n"
     if not enabled:
         return f"Nemotron router disabled. Model configured: `{NEMOTRON_MODEL}`."
+    if not token and not local_model_ready("nemotron"):
         return f"Nemotron router ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{NEMOTRON_MODEL}`."
     messages = [
         {"role": "user", "content": compact_review_context(files, findings)},
     ]
     try:
+        return call_chat_model(NEMOTRON_MODEL, messages, token, local_alias="nemotron", max_tokens=360)
     except Exception as exc:
         return (
+            f"Nemotron router attempted `{NEMOTRON_MODEL}`. "
+            f"{friendly_model_error(NEMOTRON_MODEL, exc, 'nemotron')}\n\n"
             + deterministic_router_fallback(files, findings)
         )
     if not enabled:
         return f"Tiny Titan checker disabled. Model configured: `{TINY_TITAN_MODEL}`."
+    if not token and not local_model_ready("tiny_titan"):
         return f"Tiny Titan checker ready, but no Hugging Face OAuth/HF_TOKEN is available. Model configured: `{TINY_TITAN_MODEL}`."
     messages = [
         {"role": "user", "content": compact_review_context(files, findings, max_chars=7000)},
     ]
     try:
+        return call_chat_model(TINY_TITAN_MODEL, messages, token, local_alias="tiny_titan", max_tokens=260)
     except Exception as exc:
+        return (
+            f"Tiny Titan checker attempted `{TINY_TITAN_MODEL}`. "
+            f"{friendly_model_error(TINY_TITAN_MODEL, exc, 'tiny_titan')}\n\n"
+            "- Deterministic checker fallback: verify that critical security findings are fixed before merge.\n"
+            "- Test recommendation: cover every changed auth, network, and empty-input branch.\n"
+            "- Merge decision: hold if any critical finding remains."
+        )
 def run_minicpm_vision(
     if not enabled:
         return f"MiniCPM-V vision disabled with {len(images)} image(s) attached. Model configured: `{MINICPM_MODEL}`."
     prompt = (
         "You are DiffSense vision context. Read these PR screenshots, UI diffs, or architecture diagrams. "
         "Return concise markdown notes that could affect code review: changed behavior, missing tests, security risks, "
     if len(content) == 1:
         return f"MiniCPM-V vision could not read the uploaded image files. {skipped} file(s) were skipped."
+    local_dir = LOCAL_MODEL_DIRS["minicpm"]
+    if (local_dir / "config.json").exists():
+        return (
+            f"MiniCPM-V local checkpoint detected at `{local_dir}` with {len(content) - 1} image(s). "
+            "The app has the image ingestion path wired; run the custom MiniCPM-V loader from this mount for full local vision inference."
+        )
+    if not token:
+        return (
+            f"MiniCPM-V vision ready with {len(content) - 1} image(s), but no Hugging Face OAuth/HF_TOKEN is available "
+            f"and no local checkpoint is mounted at `{local_dir}`. Model configured: `{MINICPM_MODEL}`."
+        )
     messages = [{"role": "user", "content": content}]
     try:
+        return call_chat_model(MINICPM_MODEL, messages, token, local_alias="minicpm", max_tokens=420)
     except Exception as exc:
         return (
+            f"MiniCPM-V attempted `{MINICPM_MODEL}` on {len(content) - 1} image(s). "
+            f"{friendly_model_error(MINICPM_MODEL, exc, 'minicpm')}"
         )
 def render_agent_trace(nemotron_notes: str, tiny_titan_notes: str, minicpm_notes: str, modal_notes: str) -> str:
     return "\n\n".join(
         [
+            "### Model Runtime Status",
+            render_model_runtime_status(),
             "### Nemotron 3 Nano Router",
             nemotron_notes,
             "### Tiny Titan 4B Checker",
     )
+def render_model_runtime_status() -> str:
+    data_state = "mounted" if DATA_ROOT.exists() else "not mounted"
+    data_writable = "writable" if os.access(DATA_ROOT, os.W_OK) else "read-only or unavailable"
+    lines = [
+        f"- Data mount: `{DATA_ROOT}` is **{data_state}** and **{data_writable}**.",
+        f"- Mellum summary: `{MELLUM_MODEL}`; local path {format_local_model_status('mellum')}.",
+        f"- Nemotron router: `{NEMOTRON_MODEL}`; local path {format_local_model_status('nemotron')}.",
+        f"- Tiny Titan checker: `{TINY_TITAN_MODEL}`; local path {format_local_model_status('tiny_titan')}.",
+        f"- MiniCPM-V vision: `{MINICPM_MODEL}`; local path {format_local_model_status('minicpm')}.",
+        "- Deterministic reviewer remains the always-on fallback for a reliable demo.",
+    ]
+    return "\n".join(lines)
+def format_local_model_status(alias: str) -> str:
+    model_dir = LOCAL_MODEL_DIRS[alias]
+    if (model_dir / "config.json").exists():
+        return f"`{model_dir}` is **ready**"
+    if model_dir.exists():
+        return f"`{model_dir}` exists but has no `config.json`"
+    return f"`{model_dir}` is not present"
+def local_model_ready(alias: str) -> bool:
+    model_dir = LOCAL_MODEL_DIRS.get(alias)
+    return bool(model_dir and (model_dir / "config.json").exists())
 def load_sample() -> str:
     return SAMPLE_DIFF