Spaces:

build-small-hackathon
/

ObjectverseDiary

Paused

App Files Files Community

qqyule commited on Jun 8

Commit

d30bd8e

verified ·

1 Parent(s): 9e874de

Sync runtime diagnostics and smoke helpers

Browse files

Files changed (21) hide show

README.md +16 -2
docs/DEMO_VIDEO_SCRIPT.md +2 -1
docs/DEVELOPMENT_STATUS.md +9 -2
docs/EXTERNAL_SETUP.md +24 -0
docs/FAILURES.md +27 -0
docs/FIELD_NOTES.md +20 -5
docs/FINAL_VERIFICATION_REPORT.md +28 -23
docs/MODEL_CARD.md +11 -2
docs/RUNTIME.md +32 -0
docs/SOCIAL_POST.md +4 -1
docs/SUBMISSION_GUIDE.md +5 -1
scripts/README.md +14 -2
scripts/check_llama_cpp_smoke.py +145 -0
scripts/check_space_vlm.py +214 -14
src/config.py +5 -1
src/models/llama_cpp_runner.py +1 -0
src/models/vision_runner.py +90 -0
src/ui/layout.py +14 -0
tests/test_llama_cpp_smoke.py +65 -0
tests/test_mock_mvp.py +47 -0
tests/test_space_vlm_tooling.py +80 -0

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ Upload a photo of any everyday object. The app wakes it up, gives it a secret pe
 ## Current Status
-Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, optional llama.cpp text runtime wiring, public mock traces, Space validation evidence, and a published Qwen 1.5B LoRA test adapter are available.
 By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
@@ -31,6 +31,8 @@ By default, the app uses deterministic mock outputs for object understanding, pe
 `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. No GGUF file is committed in this stable submission baseline. A short Modal-trained LoRA adapter is published for Well-Tuned evidence, but it is not converted to GGUF or wired into the public Space runtime yet.
 Hugging Face Space:
 https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
@@ -59,7 +61,7 @@ The interface is English-first and Chinese-second.
 - [x] Sharing is Caring — public mock traces, JSONL export, prompt templates, and failure notes.
 - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
 - [ ] OpenBMB Special — MiniCPM-V wiring exists, but hosted validation currently falls back to mock vision.
-- [ ] Llama Champion — llama.cpp wiring exists, but real GGUF smoke test is not complete.
 - [x] Well-Tuned — synthetic curated SFT dataset and Qwen 1.5B LoRA test adapter are published.
 - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
@@ -106,6 +108,17 @@ python app.py
 If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
 ## Initial MVP Flow
 The stable submission baseline supports:
@@ -133,6 +146,7 @@ The stable submission baseline supports:
 - Public mock traces: `data/traces/samples/`
 - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
 - Hosted VLM failure evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`
 - Field Notes draft: `docs/FIELD_NOTES.md`
 - Demo video script: `docs/DEMO_VIDEO_SCRIPT.md`
 - Social post draft: `docs/SOCIAL_POST.md`

 ## Current Status
+Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, non-secret hosted vision diagnostics, optional llama.cpp text runtime wiring, a local GGUF smoke-test helper, public mock traces, Space validation evidence, and a published Qwen 1.5B LoRA test adapter are available.
 By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
 `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. No GGUF file is committed in this stable submission baseline. A short Modal-trained LoRA adapter is published for Well-Tuned evidence, but it is not converted to GGUF or wired into the public Space runtime yet.
+`scripts/check_llama_cpp_smoke.py` is available for an explicit-confirmation local GGUF smoke test. The recommended baseline smoke model is `Qwen/Qwen2.5-1.5B-Instruct-GGUF` with `qwen2.5-1.5b-instruct-q4_k_m.gguf`, stored under ignored `models/` when used locally.
 Hugging Face Space:
 https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 - [x] Sharing is Caring — public mock traces, JSONL export, prompt templates, and failure notes.
 - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
 - [ ] OpenBMB Special — MiniCPM-V wiring exists, but hosted validation currently falls back to mock vision.
+- [ ] Llama Champion — llama.cpp wiring and smoke helper exist, but real GGUF smoke test is not complete.
 - [x] Well-Tuned — synthetic curated SFT dataset and Qwen 1.5B LoRA test adapter are published.
 - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
 If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
+Recommended explicit-confirmation smoke path:
+```bash
+# Download externally, do not commit the GGUF:
+# https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF
+# file: qwen2.5-1.5b-instruct-q4_k_m.gguf
+.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
+  --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
+```
 ## Initial MVP Flow
 The stable submission baseline supports:
 - Public mock traces: `data/traces/samples/`
 - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
 - Hosted VLM failure evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`
+- Hosted VLM diagnostic support: hidden `/vision_runtime_probe` API and probe-aware `scripts/check_space_vlm.py`
 - Field Notes draft: `docs/FIELD_NOTES.md`
 - Demo video script: `docs/DEMO_VIDEO_SCRIPT.md`
 - Social post draft: `docs/SOCIAL_POST.md`

docs/DEMO_VIDEO_SCRIPT.md CHANGED Viewed

@@ -4,7 +4,7 @@
 Record a 90-second stable demo for Objectverse Diary using the mock-safe Hugging Face Space or local Gradio app.
-Do not claim that hosted MiniCPM-V, GGUF text generation, LoRA training, or model publishing are complete. The stable demo should emphasize the product loop, Gradio Off-Brand UI, public traces, and no commercial AI APIs.
 ## Recording Setup
@@ -104,5 +104,6 @@ Screen:
 ## Notes For Submission
 - Mention MiniCPM-V as wired but not hosted-validated yet.
 - Mention public traces and failure notes if the submission form asks for reproducibility.
 - Keep the final video under 2 minutes.

 Record a 90-second stable demo for Objectverse Diary using the mock-safe Hugging Face Space or local Gradio app.
+Do not claim that hosted MiniCPM-V validation, GGUF text generation, or live LoRA runtime wiring are complete. The stable demo should emphasize the product loop, Gradio Off-Brand UI, public traces, published dataset/LoRA evidence, and no commercial AI APIs.
 ## Recording Setup
 ## Notes For Submission
 - Mention MiniCPM-V as wired but not hosted-validated yet.
+- Mention the published synthetic curated dataset and LoRA adapter only as training evidence, not live Space runtime.
 - Mention public traces and failure notes if the submission form asks for reproducibility.
 - Keep the final video under 2 minutes.

docs/DEVELOPMENT_STATUS.md CHANGED Viewed

@@ -21,6 +21,9 @@ Last updated: 2026-06-08
 - Optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`, with mock fallback.
 - Space VLM validation tooling:
   - `scripts/check_space_vlm.py`
   - failed L4 validation report at `docs/SPACE_VLM_REPORT.md`
   - optional `--trace-output-dir` evidence export for validation traces
 - ZeroGPU compatibility:
@@ -36,18 +39,22 @@ Last updated: 2026-06-08
   - 50-row synthetic curated SFT dataset published at https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
   - Modal Qwen 1.5B LoRA test run completed with 20 steps
   - LoRA adapter published at https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
 - Local tests and initial acceptance currently pass.
 ## Not Completed
 - Hosted Space MiniCPM-V validation with real public mug/keyboard/shoe images. Paid L4 was blocked by Hugging Face `402 Payment Required`; ZeroGPU CUDA probe passed; the 2026-06-08 full ZeroGPU validation reached the app but all three objects fell back to mock vision.
 - Passing real VLM demo trace capture. Failed Space VLM traces are kept as fallback evidence and do not replace mock sample traces.
-- Real GGUF model selection, download/configuration outside Git, and `TEXT_MODEL_PATH` smoke test.
 - Final text model parameter count documentation.
 - Real model traces from non-mock runtime.
 - GGUF conversion and runtime wiring for the published LoRA adapter.
 - GitHub sync / final public repository confirmation.
-- Published Field Notes URL, recorded demo video URL, social post URL, and final public submission.
 ## Current Safe Defaults

 - Optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`, with mock fallback.
 - Space VLM validation tooling:
   - `scripts/check_space_vlm.py`
+  - hidden `/vision_runtime_probe` API for non-secret MiniCPM-V diagnostics
+  - probe output support in Space VLM markdown and JSON reports
+  - failure-note updater for the latest Space VLM failure summary
   - failed L4 validation report at `docs/SPACE_VLM_REPORT.md`
   - optional `--trace-output-dir` evidence export for validation traces
 - ZeroGPU compatibility:
   - 50-row synthetic curated SFT dataset published at https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
   - Modal Qwen 1.5B LoRA test run completed with 20 steps
   - LoRA adapter published at https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
+- GGUF smoke-test helper:
+  - `scripts/check_llama_cpp_smoke.py`
+  - recommended baseline model documented as `Qwen/Qwen2.5-1.5B-Instruct-GGUF` / `qwen2.5-1.5b-instruct-q4_k_m.gguf`
+  - trace runtime no longer records literal `TEXT_MODEL_PATH`
 - Local tests and initial acceptance currently pass.
 ## Not Completed
 - Hosted Space MiniCPM-V validation with real public mug/keyboard/shoe images. Paid L4 was blocked by Hugging Face `402 Payment Required`; ZeroGPU CUDA probe passed; the 2026-06-08 full ZeroGPU validation reached the app but all three objects fell back to mock vision.
 - Passing real VLM demo trace capture. Failed Space VLM traces are kept as fallback evidence and do not replace mock sample traces.
+- Real GGUF download/configuration outside Git and `TEXT_MODEL_PATH` smoke test. Model selection is now documented, but the file is not downloaded and optional `llama-cpp-python` is not installed by default.
 - Final text model parameter count documentation.
 - Real model traces from non-mock runtime.
 - GGUF conversion and runtime wiring for the published LoRA adapter.
 - GitHub sync / final public repository confirmation.
+- Published Field Notes URL, recorded demo video URL, social post URL, GitHub push confirmation, Space sync confirmation, and final public submission.
 ## Current Safe Defaults

docs/EXTERNAL_SETUP.md CHANGED Viewed

@@ -85,9 +85,12 @@ Automated validation command after confirmation:
   --output docs/SPACE_VLM_REPORT.md \
   --json-output docs/SPACE_VLM_REPORT.json \
   --trace-output-dir data/traces/space-vlm \
   --timeout-seconds 1200
 ```
 Optional rollback to mock-safe settings:
 ```bash
@@ -99,6 +102,27 @@ Optional rollback to mock-safe settings:
 The validation script must not print Hugging Face tokens. It uses three temporary public Wikimedia Commons images and does not commit downloaded assets.
 2026-06-06 validation attempt:
 - `--configure-space` was run for `l4x1`.

   --output docs/SPACE_VLM_REPORT.md \
   --json-output docs/SPACE_VLM_REPORT.json \
   --trace-output-dir data/traces/space-vlm \
+  --failure-notes-output docs/FAILURES.md \
   --timeout-seconds 1200
 ```
+The validation command now calls the hidden `/vision_runtime_probe` endpoint before mug/keyboard/shoe generation. The probe output is written into the markdown/JSON report and must remain free of token markers, `.env` paths, and private local paths.
 Optional rollback to mock-safe settings:
 ```bash
 The validation script must not print Hugging Face tokens. It uses three temporary public Wikimedia Commons images and does not commit downloaded assets.
+## Optional GGUF Smoke Test
+This is a local-only model evidence step. It should be run only after confirming optional dependency installation and GGUF download.
+Recommended model:
+```text
+repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
+file: qwen2.5-1.5b-instruct-q4_k_m.gguf
+local path: models/qwen2.5-1.5b-instruct-q4_k_m.gguf
+```
+Do not commit the downloaded GGUF. After the file is present and optional `llama-cpp-python` is installed:
+```bash
+.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
+  --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
+```
+Passing evidence requires `llama-cpp text generation` and no `text-fallback-to-mock` marker for generation or chat.
 2026-06-06 validation attempt:
 - `--configure-space` was run for `l4x1`.

docs/FAILURES.md CHANGED Viewed

@@ -10,6 +10,14 @@ Use it for model/runtime/deployment/data issues, not for UI polish notes.
 MiniCPM-V 2.6 is wired as an optional vision backend. Hosted Space ZeroGPU validation ran on 2026-06-08, but all three public object checks fell back to mock vision, so full hosted MiniCPM-V validation is still unresolved.
 Known non-blocking warning:
 - Gradio emits deprecation warnings for upcoming 6.0 API changes during local tests. This does not break the current Gradio Blocks build and can be handled with the later UI/API polish pass.
@@ -40,6 +48,25 @@ Known non-blocking warning:
 - Resolution: unresolved; inspect Space runtime logs or add non-secret fallback diagnostics for the MiniCPM-V load/chat exception.
 - Evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, and `data/traces/space-vlm/`.
 ## Anticipated Failure Areas
 ### Vision Runtime

 MiniCPM-V 2.6 is wired as an optional vision backend. Hosted Space ZeroGPU validation ran on 2026-06-08, but all three public object checks fell back to mock vision, so full hosted MiniCPM-V validation is still unresolved.
+The app now includes a hidden `/vision_runtime_probe` API and `scripts/check_space_vlm.py` writes probe output into the Space VLM report before image validation. The next hosted run should use this probe to identify whether the fallback is caused by dependency import, GPU visibility, MiniCPM-V loading, or generation output.
+The recommended baseline GGUF for local text smoke testing is selected, but not downloaded or run:
+- repo: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`
+- file: `qwen2.5-1.5b-instruct-q4_k_m.gguf`
+- helper: `scripts/check_llama_cpp_smoke.py`
 Known non-blocking warning:
 - Gradio emits deprecation warnings for upcoming 6.0 API changes during local tests. This does not break the current Gradio Blocks build and can be handled with the later UI/API polish pass.
 - Resolution: unresolved; inspect Space runtime logs or add non-secret fallback diagnostics for the MiniCPM-V load/chat exception.
 - Evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, and `data/traces/space-vlm/`.
+## Latest Space VLM Validation Failure
+- Updated: 2026-06-08
+- Area: Hugging Face Space vision runtime.
+- Probe backend: not available in the historical 2026-06-08 report; probe support was added afterward.
+- Failed checks: mug, keyboard, and shoe all included `vision-fallback-to-mock`.
+- Fallback used: mock object understanding plus mock text runtime.
+- Resolution: unresolved; keep the public Space mock-safe until a probe-aware validation run passes without `vision-fallback-to-mock`.
+## 2026-06-08 - GGUF Smoke Helper Prepared, Actual Smoke Pending
+- Area: llama.cpp text runtime evidence.
+- Reproduction: Run `scripts/check_llama_cpp_smoke.py` with an external GGUF model path after optional dependency installation.
+- Expected: trace records `llama-cpp text generation`, persona/diary/chat run without `text-fallback-to-mock`.
+- Actual: not run; `.venv` does not include `llama-cpp-python` by default and the GGUF file is intentionally not committed.
+- Impact: Llama Champion evidence remains incomplete.
+- Fallback used: default mock text runtime remains the safe public demo path.
+- Resolution: pending explicit confirmation to install optional local dependency and download `qwen2.5-1.5b-instruct-q4_k_m.gguf` into ignored `models/`.
 ## Anticipated Failure Areas
 ### Vision Runtime

docs/FIELD_NOTES.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ## Status
-Stable submission draft. This document is ready to adapt into the final Field Notes post after the public GitHub, demo video, and social post URLs are confirmed.
 ## 1. Why I Built It
@@ -63,7 +63,7 @@ The app keeps the Gradio UI separate from model execution:
 - `src/traces/logger.py` writes anonymized trace records
 - `src/renderer/share_card.py` renders the shareable card preview
-This boundary matters. It lets the mock MVP, hosted Space validation, and future local GGUF experiments share the same data shapes and fallback markers.
 ## 6. Runtime And Fallbacks
@@ -91,6 +91,8 @@ TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
 The fallback behavior is explicit. If MiniCPM-V fails or returns invalid JSON, the trace records `vision-fallback-to-mock`. If llama.cpp is unavailable, missing a model path, or returns invalid JSON, the trace records `text-fallback-to-mock`.
 ## 7. What Worked
 The stable loop works locally and in the mock-safe Space:
@@ -115,6 +117,8 @@ Paid L4 hardware on the hackathon organization returned `402 Payment Required`.
 This is not hidden in the submission. The stable baseline treats MiniCPM-V as wired but not yet validated in the hosted environment.
 ## 9. Traces And Reproducibility
 The project includes public mock traces for the six stable examples under `data/traces/samples/`. They are deterministic and intended for demo replay, schema validation, and public inspection.
@@ -127,6 +131,15 @@ The export command is:
 .venv/bin/python -B scripts/export_traces.py
 ```
 ## 10. Privacy And Safety
 The project does not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs. It does not commit GGUF files, private images, tokens, credit codes, or `.env` files.
@@ -140,9 +153,9 @@ The next model-focused step is to inspect Space runtime logs or add non-secret M
 After that:
 - rerun ZeroGPU MiniCPM-V validation
-- choose and smoke-test a real GGUF text model
-- generate and curate real training candidates
-- publish a dataset and fine-tuned adapter if time allows
 - record a final demo video from the stable Space
 The current version is intentionally honest: it is a stable, reproducible small-model toy baseline with clear boundaries, visible failures, and a path to stronger model evidence.
@@ -150,6 +163,8 @@ The current version is intentionally honest: it is a stable, reproducible small-
 ## Evidence Links To Fill Before Final Submission
 - Hugging Face Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 - GitHub repository: pending push confirmation
 - Demo video: pending recording
 - Social post: pending publishing

 ## Status
+Publication-ready draft. Fill the public GitHub, demo video, and social post URLs before posting; do not publish until those external actions are explicitly confirmed.
 ## 1. Why I Built It
 - `src/traces/logger.py` writes anonymized trace records
 - `src/renderer/share_card.py` renders the shareable card preview
+This boundary matters. It lets the mock MVP, hosted Space validation, diagnostics, and local GGUF experiments share the same data shapes and fallback markers.
 ## 6. Runtime And Fallbacks
 The fallback behavior is explicit. If MiniCPM-V fails or returns invalid JSON, the trace records `vision-fallback-to-mock`. If llama.cpp is unavailable, missing a model path, or returns invalid JSON, the trace records `text-fallback-to-mock`.
+The hosted Space also has a hidden `/vision_runtime_probe` endpoint for non-secret runtime diagnostics. It checks Torch and Transformers imports, GPU visibility, and whether MiniCPM-V can load, while redacting token markers and private paths.
 ## 7. What Worked
 The stable loop works locally and in the mock-safe Space:
 This is not hidden in the submission. The stable baseline treats MiniCPM-V as wired but not yet validated in the hosted environment.
+After this failure, I added a probe-aware validation path so the next hosted run can report whether the failure is happening at dependency import, GPU visibility, model loading, or generation time.
 ## 9. Traces And Reproducibility
 The project includes public mock traces for the six stable examples under `data/traces/samples/`. They are deterministic and intended for demo replay, schema validation, and public inspection.
 .venv/bin/python -B scripts/export_traces.py
 ```
+For text runtime evidence, the project now includes a local smoke helper for an external GGUF:
+```bash
+.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
+  --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
+```
+The recommended baseline file is `qwen2.5-1.5b-instruct-q4_k_m.gguf` from `Qwen/Qwen2.5-1.5B-Instruct-GGUF`. It is intentionally not committed.
 ## 10. Privacy And Safety
 The project does not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs. It does not commit GGUF files, private images, tokens, credit codes, or `.env` files.
 After that:
 - rerun ZeroGPU MiniCPM-V validation
+- run the documented GGUF smoke test after explicit confirmation
+- decide whether the published LoRA should remain badge evidence only or be converted later
+- generate real non-mock traces if hosted/local model validation passes
 - record a final demo video from the stable Space
 The current version is intentionally honest: it is a stable, reproducible small-model toy baseline with clear boundaries, visible failures, and a path to stronger model evidence.
 ## Evidence Links To Fill Before Final Submission
 - Hugging Face Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
+- Dataset: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
+- LoRA adapter: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
 - GitHub repository: pending push confirmation
 - Demo video: pending recording
 - Social post: pending publishing

docs/FINAL_VERIFICATION_REPORT.md CHANGED Viewed

@@ -1,30 +1,39 @@
 # Final Verification Report
-- Generated at: 2026-06-08 11:19:49 CST
-- Verified source commit: `b7cb470`
 - Branch: `main`
-- Verification target: stable mock-safe submission baseline
-- Local app URL: `http://127.0.0.1:7860/`
 ## Summary
-Objectverse Diary's stable mock-safe baseline is locally verifiable. The app starts with default mock backends, renders the archive-style Gradio interface, runs all six committed example objects, supports object chat, renders share cards, and exposes trace evidence.
-This report does not claim hosted MiniCPM-V validation, GGUF text generation, LoRA training, model publishing, dataset publishing, or final public submission URLs are complete.
 ## Command Verification
 | Check | Result | Notes |
 | --- | --- | --- |
-| `git status --short --untracked-files=all` | PASS | Clean before report generation. |
-| `.venv/bin/python -B -m unittest discover -s tests` | PASS | 30 tests passed. Gradio 6.0 deprecation warnings are non-blocking. |
 | `.venv/bin/python -B scripts/check_initial_stage.py` | PASS | Required files, runtime defaults, trace generation, sample traces, dataset preview, trace export, and Gradio build all passed. |
 | `.venv/bin/python -B scripts/export_traces.py` | PASS | Exported 6 traces to `data/traces/samples/objectverse_public_mock_traces.jsonl`. |
 | `git diff --check` | PASS | No whitespace errors. |
 ## Browser Verification
-The local app was started with:
 ```bash
 GRADIO_SERVER_NAME=127.0.0.1 GRADIO_SERVER_PORT=7860 .venv/bin/python app.py
@@ -51,31 +60,26 @@ Browser checks:
 - Six stable public mock sample traces remain under `data/traces/samples/`.
 - The trace export JSONL was regenerated successfully.
 - Hosted Space VLM traces under `data/traces/space-vlm/` remain failure evidence because they include `vision-fallback-to-mock`; they are intentionally not used as successful real VLM traces.
 ## Security Scan
-Scanned project docs, source, scripts, tests, and trace directories for:
 - `hf_`
 - `HF_TOKEN`
 - `HUGGINGFACE_TOKEN`
-- `BEGIN PRIVATE KEY`
-- `SUPABASE_SERVICE_ROLE_KEY`
-- test email pattern
-- private local path markers
 - `.env`
-Result: PASS with known safe hits only.
 Known safe hits:
-- test fixtures intentionally containing `user@example.com`
-- tests asserting that token markers are absent
 - `scripts/check_space_vlm.py` sensitive marker constants and auth helper names
-- documentation warning not to commit `.env`
-- `.env.example` path shown in architecture docs
-No real token, private key, credential, private image path, GGUF file, or `.env` file was found in the scanned project content.
 ## Remaining External Items
@@ -85,10 +89,11 @@ No real token, private key, credential, private image path, GGUF file, or `.env`
 - Field Notes URL is still pending publication.
 - Social post URL is still pending publication.
 - Hosted MiniCPM-V validation still falls back to mock vision.
-- Real GGUF smoke test, LoRA training, HF model publishing, and HF dataset publishing remain future work.
 ## Verdict
-PASS for the stable mock-safe local submission baseline.
-The project is ready for explicit-confirmation external steps: push `main`, record/publish the demo video, publish Field Notes/social post, and fill final submission URLs.

 # Final Verification Report
+- Generated at: 2026-06-08 16:24:23 CST
+- Verified source commit: uncommitted local implementation on `main`
 - Branch: `main`
+- Verification target: mock-safe submission baseline plus local diagnostics/smoke-helper implementation
+- Local app URL: not launched during this verification update
 ## Summary
+Objectverse Diary's stable mock-safe baseline remains locally verifiable. This update adds non-secret MiniCPM-V runtime diagnostics through a hidden Gradio API, probe-aware Space VLM reporting, a latest-failure-note updater, and a local llama.cpp GGUF smoke-test helper.
+This report does not claim hosted MiniCPM-V validation, real GGUF text generation, live LoRA runtime wiring, GitHub push, Field Notes publication, demo video publication, social post publication, or final public submission URLs are complete.
+## Implementation Additions
+- Hidden `/vision_runtime_probe` Gradio API returns sanitized backend, dependency, GPU, and MiniCPM-V load diagnostics.
+- `scripts/check_space_vlm.py` can include probe output in markdown/JSON reports and update the latest failure section in `docs/FAILURES.md`.
+- `scripts/check_llama_cpp_smoke.py` validates persona, diary, and chat through an externally configured GGUF without committing model files.
+- Runtime status no longer records literal `TEXT_MODEL_PATH`; traces only record whether an external GGUF path is configured.
+- Submission docs now distinguish final-draft materials from published URLs.
 ## Command Verification
 | Check | Result | Notes |
 | --- | --- | --- |
+| `.venv/bin/python -B -m unittest discover -s tests` | PASS | 46 tests passed. Gradio 6.0 deprecation warnings and an asyncio ResourceWarning remain non-blocking. |
 | `.venv/bin/python -B scripts/check_initial_stage.py` | PASS | Required files, runtime defaults, trace generation, sample traces, dataset preview, trace export, and Gradio build all passed. |
 | `.venv/bin/python -B scripts/export_traces.py` | PASS | Exported 6 traces to `data/traces/samples/objectverse_public_mock_traces.jsonl`. |
 | `git diff --check` | PASS | No whitespace errors. |
 ## Browser Verification
+Not re-run in this verification update. The previous stable baseline browser verification remains useful evidence for the mock-safe UI, but the new hidden `/vision_runtime_probe` API was verified through unit coverage rather than a browser session.
+Previous local app command:
 ```bash
 GRADIO_SERVER_NAME=127.0.0.1 GRADIO_SERVER_PORT=7860 .venv/bin/python app.py
 - Six stable public mock sample traces remain under `data/traces/samples/`.
 - The trace export JSONL was regenerated successfully.
 - Hosted Space VLM traces under `data/traces/space-vlm/` remain failure evidence because they include `vision-fallback-to-mock`; they are intentionally not used as successful real VLM traces.
+- New runtime traces do not include literal `TEXT_MODEL_PATH` values.
 ## Security Scan
+Targeted safety coverage now includes unit tests and an `rg` scan for probe/report/trace outputs that reject or redact:
 - `hf_`
 - `HF_TOKEN`
 - `HUGGINGFACE_TOKEN`
 - `.env`
+Result: PASS for the targeted diagnostic/report paths and repository scan.
 Known safe hits:
 - `scripts/check_space_vlm.py` sensitive marker constants and auth helper names
+- tests intentionally containing fake `hf_forbidden` and `.env` strings to verify redaction
+- `publish_hf_adapter` filenames/imports that match the broad `hf_` scan pattern but are not tokens
+No GGUF file, real token, private key, credential, or `.env` file was added by this implementation.
 ## Remaining External Items
 - Field Notes URL is still pending publication.
 - Social post URL is still pending publication.
 - Hosted MiniCPM-V validation still falls back to mock vision.
+- Real GGUF download, optional `llama-cpp-python` installation, and smoke test remain pending explicit confirmation.
+- GGUF conversion and live runtime wiring for the published LoRA adapter remain future work.
 ## Verdict
+PASS for the stable mock-safe local submission baseline plus local diagnostics/smoke-helper implementation.
+The project is ready for explicit-confirmation external steps: push `main`, sync the Space, rerun probe-aware Space VLM validation, run the local GGUF smoke test after optional dependency/model setup, record/publish the demo video, publish Field Notes/social post, and fill final submission URLs.

docs/MODEL_CARD.md CHANGED Viewed

@@ -4,7 +4,7 @@
 Stable submission baseline plus one published text LoRA test adapter. The public Gradio Space still defaults to deterministic mock text; the adapter is training evidence and has not been converted to GGUF or wired into the live runtime.
-The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`. A Modal LoRA test run completed for the planned text model path and the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`.
 Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validation reached the Space, but all three public object checks fell back to mock vision. See `docs/SPACE_VLM_REPORT.md` and `docs/FAILURES.md`.
@@ -20,7 +20,7 @@ Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validat
 | --- | --- | --- |
 | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Wired as optional backend; hosted validation currently falls back to mock. |
 | Text | deterministic mock text; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA test adapter | Adapter published; not converted to GGUF or wired into Space runtime. |
-| Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; real-model smoke test still pending. |
 | UI | Gradio Blocks | Required by the hackathon and project rules. |
 ## Parameter Budget
@@ -34,6 +34,7 @@ Record final numbers here before submission:
 | Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
 | Text base | Stable baseline mock text | 0 | no model parameters |
 | Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled |
 | Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled |
 | Stable baseline total | Mock text + optional wired vision not active by default | 0 active model parameters by default | <= 32B |
@@ -83,6 +84,13 @@ Training run summary:
 - Train loss: 1.6697
 - GGUF conversion: not completed
 ## Safety And Privacy
 - Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
@@ -95,6 +103,7 @@ Training run summary:
 - If VLM loading fails, use manual description and stable example flow.
 - If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
 - If model JSON is invalid, repair and validate before rendering.
 - Hosted VLM fallback evidence is preserved in `data/traces/space-vlm/` and should not be described as successful real VLM output.
 ## Required Notes

 Stable submission baseline plus one published text LoRA test adapter. The public Gradio Space still defaults to deterministic mock text; the adapter is training evidence and has not been converted to GGUF or wired into the live runtime.
+The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments, with a hidden non-secret probe for hosted diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`. A Modal LoRA test run completed for the planned text model path and the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`.
 Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validation reached the Space, but all three public object checks fell back to mock vision. See `docs/SPACE_VLM_REPORT.md` and `docs/FAILURES.md`.
 | --- | --- | --- |
 | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Wired as optional backend; hosted validation currently falls back to mock. |
 | Text | deterministic mock text; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA test adapter | Adapter published; not converted to GGUF or wired into Space runtime. |
+| Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; smoke helper exists, real-model smoke test still pending. |
 | UI | Gradio Blocks | Required by the hackathon and project rules. |
 ## Parameter Budget
 | Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
 | Text base | Stable baseline mock text | 0 | no model parameters |
 | Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled |
+| Recommended GGUF smoke file | `Qwen/Qwen2.5-1.5B-Instruct-GGUF` / `qwen2.5-1.5b-instruct-q4_k_m.gguf` | ~1.5B base, quantized file | yes, if used for text runtime smoke |
 | Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled |
 | Stable baseline total | Mock text + optional wired vision not active by default | 0 active model parameters by default | <= 32B |
 - Train loss: 1.6697
 - GGUF conversion: not completed
+GGUF smoke status:
+- Recommended repo: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`
+- Recommended file: `qwen2.5-1.5b-instruct-q4_k_m.gguf`
+- Local helper: `scripts/check_llama_cpp_smoke.py`
+- Current state: file not downloaded, optional `llama-cpp-python` not installed by default, smoke test not run.
 ## Safety And Privacy
 - Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
 - If VLM loading fails, use manual description and stable example flow.
 - If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
 - If model JSON is invalid, repair and validate before rendering.
+- Runtime traces do not record literal `TEXT_MODEL_PATH`; they only record that an external GGUF path is configured.
 - Hosted VLM fallback evidence is preserved in `data/traces/space-vlm/` and should not be described as successful real VLM output.
 ## Required Notes

docs/RUNTIME.md CHANGED Viewed

@@ -37,6 +37,38 @@ TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
 `llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
 ## Environment Variables
 ```bash

 `llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
+The runtime trace intentionally records only whether an external GGUF path was configured, not the literal `TEXT_MODEL_PATH`, so local private paths do not leak into public traces.
+## Runtime Diagnostics
+The Gradio app exposes two hidden diagnostic APIs:
+- `/zero_gpu_probe`: checks Torch import and CUDA visibility.
+- `/vision_runtime_probe`: checks configured vision backend, Torch/Transformers import, CUDA/MPS visibility, and MiniCPM-V load success or sanitized failure summaries.
+These APIs are for validation scripts and are not visible in the main UI. They must not return tokens, `.env` paths, Hugging Face token markers, or private local filesystem paths.
+`scripts/check_space_vlm.py` calls `/vision_runtime_probe` before the mug/keyboard/shoe validation run and writes the probe output into `docs/SPACE_VLM_REPORT.md` and `docs/SPACE_VLM_REPORT.json`.
+## Optional GGUF Smoke Test
+Recommended baseline smoke model:
+```text
+repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
+file: qwen2.5-1.5b-instruct-q4_k_m.gguf
+local path: models/qwen2.5-1.5b-instruct-q4_k_m.gguf
+```
+The `models/` directory and `*.gguf` are ignored by Git. After downloading the file externally and installing optional `llama-cpp-python` after confirmation, run:
+```bash
+.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
+  --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
+```
+A passing smoke test must show `llama-cpp text generation` and must not include `text-fallback-to-mock` in either generation or chat fallback markers.
 ## Environment Variables
 ```bash

docs/SOCIAL_POST.md CHANGED Viewed

@@ -6,6 +6,7 @@ I built Objectverse Diary for Build Small Hackathon: a Gradio app where everyday
 Stable demo: mock-safe, reproducible, no commercial AI APIs.
 MiniCPM-V and llama.cpp paths are wired behind fallbacks; hosted VLM validation is documented honestly.
 Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
@@ -24,6 +25,8 @@ Objectverse Diary is my Build Small Hackathon project: a strange little object a
 The stable submission baseline is mock-safe and reproducible, with no commercial AI APIs. MiniCPM-V vision and llama.cpp text paths are wired as optional backends, and the current hosted MiniCPM-V fallback is documented instead of hidden.
 Space:
 https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
@@ -35,4 +38,4 @@ https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 - Add GitHub URL after push is confirmed.
 - Add demo video URL after recording.
-- Do not claim LoRA, GGUF smoke test, or hosted MiniCPM-V validation are complete.

 Stable demo: mock-safe, reproducible, no commercial AI APIs.
 MiniCPM-V and llama.cpp paths are wired behind fallbacks; hosted VLM validation is documented honestly.
+Synthetic curated dataset + Qwen 1.5B LoRA adapter are published as training evidence.
 Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 The stable submission baseline is mock-safe and reproducible, with no commercial AI APIs. MiniCPM-V vision and llama.cpp text paths are wired as optional backends, and the current hosted MiniCPM-V fallback is documented instead of hidden.
+I also published a small synthetic curated SFT dataset and a Qwen 1.5B LoRA test adapter for Well-Tuned evidence. The adapter is not wired into the public Space runtime yet; the live demo stays intentionally reliable.
 Space:
 https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 - Add GitHub URL after push is confirmed.
 - Add demo video URL after recording.
+- Do not claim GGUF smoke test, hosted MiniCPM-V validation, or live LoRA runtime wiring are complete.

docs/SUBMISSION_GUIDE.md CHANGED Viewed

@@ -19,6 +19,7 @@
 - Dataset plan and preview workflow: `docs/DATASET.md`
 - External setup checklist: `docs/EXTERNAL_SETUP.md`
 - Space VLM validation report: `docs/SPACE_VLM_REPORT.md` currently failed. Paid L4 returned `402 Payment Required`; later ZeroGPU validation reached the app on 2026-06-08, but mug/keyboard/shoe all fell back to mock vision.
 - Space VLM trace evidence: `data/traces/space-vlm/`
 - Public mock traces: `data/traces/samples/`
 - Stable demo baseline: Gradio example buttons replay committed sample traces first, then fall back to the live generation pipeline if a cached trace is missing.
@@ -31,6 +32,8 @@
 - MiniCPM-V 2.6 backend wiring with fallback markers.
 - Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
 - Hosted Space VLM validation script, report, JSON summary, and trace evidence export.
 - Synthetic curated SFT dataset published to Hugging Face Datasets.
 - Modal Qwen 1.5B LoRA test run completed and adapter published to Hugging Face Models.
 - Field Notes draft, demo video script, and social post draft for the stable submission package.
@@ -38,7 +41,7 @@
 ## Not Completed Yet
 - Hosted Space MiniCPM-V validation for mug, keyboard, and shoe; ZeroGPU validation reached the app but currently falls back to mock vision.
-- Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count.
 - Real model traces, GGUF conversion, and app runtime wiring for the published adapter.
 - Field Notes publication URL, recorded demo video URL, social post URL, and final public push/submission.
@@ -46,6 +49,7 @@
 - [ ] Space is under the official organization.
 - [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe. Current status: wired but hosted validation falls back to mock.
 - [x] Demo video script targets under 2 minutes.
 - [x] README includes stable-baseline parameter budget and links to the model card.
 - [ ] No commercial cloud AI APIs are used.

 - Dataset plan and preview workflow: `docs/DATASET.md`
 - External setup checklist: `docs/EXTERNAL_SETUP.md`
 - Space VLM validation report: `docs/SPACE_VLM_REPORT.md` currently failed. Paid L4 returned `402 Payment Required`; later ZeroGPU validation reached the app on 2026-06-08, but mug/keyboard/shoe all fell back to mock vision.
+- Space VLM diagnostics: hidden `/vision_runtime_probe` API and probe-aware `scripts/check_space_vlm.py` are available for the next explicit-confirmation ZeroGPU validation.
 - Space VLM trace evidence: `data/traces/space-vlm/`
 - Public mock traces: `data/traces/samples/`
 - Stable demo baseline: Gradio example buttons replay committed sample traces first, then fall back to the live generation pipeline if a cached trace is missing.
 - MiniCPM-V 2.6 backend wiring with fallback markers.
 - Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
 - Hosted Space VLM validation script, report, JSON summary, and trace evidence export.
+- Hosted Space VLM probe support and latest failure-note update support.
+- Local GGUF smoke-test helper for `Qwen/Qwen2.5-1.5B-Instruct-GGUF` / `qwen2.5-1.5b-instruct-q4_k_m.gguf`; actual GGUF smoke remains pending.
 - Synthetic curated SFT dataset published to Hugging Face Datasets.
 - Modal Qwen 1.5B LoRA test run completed and adapter published to Hugging Face Models.
 - Field Notes draft, demo video script, and social post draft for the stable submission package.
 ## Not Completed Yet
 - Hosted Space MiniCPM-V validation for mug, keyboard, and shoe; ZeroGPU validation reached the app but currently falls back to mock vision.
+- Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count. The recommended baseline GGUF has been selected, but not downloaded or run.
 - Real model traces, GGUF conversion, and app runtime wiring for the published adapter.
 - Field Notes publication URL, recorded demo video URL, social post URL, and final public push/submission.
 - [ ] Space is under the official organization.
 - [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe. Current status: wired but hosted validation falls back to mock.
+- [x] Space MiniCPM-V non-secret diagnostic probe is implemented locally.
 - [x] Demo video script targets under 2 minutes.
 - [x] README includes stable-baseline parameter budget and links to the model card.
 - [ ] No commercial cloud AI APIs are used.

scripts/README.md CHANGED Viewed

@@ -10,6 +10,7 @@ Implemented initial scripts:
 - `prepare_curated_dataset.py`: creates 50 synthetic curated SFT rows for Modal LoRA pipeline testing.
 - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
 - `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
 - `finetune_lora.py`: validates SFT JSONL locally and defines the Modal LoRA training scaffold for the future Well-Tuned path.
 - `publish_hf_adapter.py`: uploads a downloaded LoRA adapter folder to Hugging Face Hub.
@@ -76,7 +77,9 @@ Space VLM validation:
 ```bash
 .venv/bin/python -B scripts/check_space_vlm.py \
   --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
-  --output docs/SPACE_VLM_REPORT.md
 ```
 External Space changes are explicit:
@@ -85,4 +88,13 @@ External Space changes are explicit:
 .venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
 ```
-Current status: mock trace generation, trace JSONL export, SFT preview generation, synthetic curated dataset publishing, optional MiniCPM-V wiring, optional llama.cpp wiring, hosted Space VLM validation tooling, Modal LoRA training scaffolding, one Modal LoRA test run, and HF adapter publishing are implemented. Real model validation on Space, GGUF conversion, and app runtime wiring for the adapter are not completed yet.

 - `prepare_curated_dataset.py`: creates 50 synthetic curated SFT rows for Modal LoRA pipeline testing.
 - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
 - `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
+- `check_llama_cpp_smoke.py`: smoke-tests the optional llama.cpp text runtime with an external GGUF model.
 - `finetune_lora.py`: validates SFT JSONL locally and defines the Modal LoRA training scaffold for the future Well-Tuned path.
 - `publish_hf_adapter.py`: uploads a downloaded LoRA adapter folder to Hugging Face Hub.
 ```bash
 .venv/bin/python -B scripts/check_space_vlm.py \
   --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
+  --output docs/SPACE_VLM_REPORT.md \
+  --json-output docs/SPACE_VLM_REPORT.json \
+  --failure-notes-output docs/FAILURES.md
 ```
 External Space changes are explicit:
 .venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
 ```
+Local GGUF smoke test after explicit confirmation:
+```bash
+.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
+  --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
+```
+Recommended GGUF source: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`, file `qwen2.5-1.5b-instruct-q4_k_m.gguf`. Do not commit the downloaded file.
+Current status: mock trace generation, trace JSONL export, SFT preview generation, synthetic curated dataset publishing, optional MiniCPM-V wiring, optional llama.cpp wiring, hosted Space VLM validation tooling with non-secret probe support, local GGUF smoke helper, Modal LoRA training scaffolding, one Modal LoRA test run, and HF adapter publishing are implemented. Real model validation on Space, actual GGUF smoke, GGUF conversion, and app runtime wiring for the adapter are not completed yet.

scripts/check_llama_cpp_smoke.py ADDED Viewed

	@@ -0,0 +1,145 @@

+"""Smoke-test the optional llama.cpp text runtime with an external GGUF model."""
+from __future__ import annotations
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+from src.models.llama_cpp_runner import get_text_runtime_fallbacks, reply_as_object
+from src.pipeline import generate_object_diary
+DEFAULT_GGUF_REPO = "Qwen/Qwen2.5-1.5B-Instruct-GGUF"
+DEFAULT_GGUF_FILE = "qwen2.5-1.5b-instruct-q4_k_m.gguf"
+TEXT_FALLBACK_MARKER = "text-fallback-to-mock"
+def run_llama_cpp_smoke(
+    *,
+    model_path: Path,
+    description: str,
+    mode: str,
+    save_trace: bool,
+) -> dict[str, Any]:
+    if not model_path.exists():
+        raise FileNotFoundError(f"GGUF model path does not exist: {model_path}")
+    previous_text_backend = os.environ.get("OBJECTVERSE_TEXT_BACKEND")
+    previous_text_model_path = os.environ.get("TEXT_MODEL_PATH")
+    try:
+        os.environ["OBJECTVERSE_TEXT_BACKEND"] = "llama-cpp"
+        os.environ["TEXT_MODEL_PATH"] = str(model_path)
+        result = generate_object_diary(
+            None,
+            description,
+            mode,
+            save=save_trace,
+        )
+        chat_reply = reply_as_object(
+            result.persona.model_dump(mode="json"),
+            "What did you see today?",
+        )
+        chat_fallbacks = get_text_runtime_fallbacks()
+    finally:
+        _restore_env("OBJECTVERSE_TEXT_BACKEND", previous_text_backend)
+        _restore_env("TEXT_MODEL_PATH", previous_text_model_path)
+    payload = {
+        "status": "pass",
+        "model_path": _display_model_path(model_path),
+        "description": description,
+        "mode": mode,
+        "trace_id": result.trace.trace_id,
+        "trace_path": result.trace_path,
+        "model_runtime": result.trace.model_runtime,
+        "fallbacks": result.trace.fallbacks,
+        "object_name": result.object_understanding.object.name,
+        "character_name": result.persona.persona.character_name,
+        "diary_title": result.diary.title,
+        "chat_reply_preview": chat_reply[:160],
+        "chat_fallbacks": chat_fallbacks,
+    }
+    if result.trace.model_runtime.get("text") != "llama-cpp text generation":
+        payload["status"] = "fail"
+        payload["error"] = "trace did not record llama-cpp text generation"
+    if TEXT_FALLBACK_MARKER in result.trace.fallbacks:
+        payload["status"] = "fail"
+        payload["error"] = "trace included text-fallback-to-mock"
+    if TEXT_FALLBACK_MARKER in chat_fallbacks:
+        payload["status"] = "fail"
+        payload["error"] = "chat included text-fallback-to-mock"
+    return payload
+def _restore_env(key: str, previous_value: str | None) -> None:
+    if previous_value is None:
+        os.environ.pop(key, None)
+    else:
+        os.environ[key] = previous_value
+def _print_json(payload: dict[str, Any]) -> None:
+    print(json.dumps(payload, ensure_ascii=False, indent=2, sort_keys=True), flush=True)
+def _display_model_path(model_path: Path) -> str:
+    try:
+        return str(model_path.resolve().relative_to(PROJECT_ROOT))
+    except ValueError:
+        return model_path.name
+def _parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--model-path",
+        type=Path,
+        default=Path("models") / DEFAULT_GGUF_FILE,
+        help=f"Path to {DEFAULT_GGUF_FILE} or another external GGUF file.",
+    )
+    parser.add_argument(
+        "--description",
+        default="old white coffee mug on a developer desk",
+    )
+    parser.add_argument("--mode", default="Cynical")
+    parser.add_argument("--save-trace", action="store_true")
+    return parser.parse_args()
+def main() -> None:
+    args = _parse_args()
+    try:
+        payload = run_llama_cpp_smoke(
+            model_path=args.model_path,
+            description=args.description,
+            mode=args.mode,
+            save_trace=args.save_trace,
+        )
+    except Exception as exc:
+        _print_json(
+            {
+                "status": "fail",
+                "model_path": _display_model_path(args.model_path),
+                "recommended_repo": DEFAULT_GGUF_REPO,
+                "recommended_file": DEFAULT_GGUF_FILE,
+                "error_type": type(exc).__name__,
+                "error": str(exc),
+            }
+        )
+        raise SystemExit(1) from exc
+    _print_json(payload)
+    if payload["status"] != "pass":
+        raise SystemExit(1)
+if __name__ == "__main__":
+    main()

scripts/check_space_vlm.py CHANGED Viewed

@@ -25,11 +25,14 @@ DEFAULT_SPACE_URL = "https://huggingface.co/spaces/build-small-hackathon/Objectv
 DEFAULT_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.md")
 DEFAULT_JSON_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.json")
 DEFAULT_ASSET_DIR = Path(".tmp/space-vlm-assets")
 DEFAULT_HARDWARE = "l4x1"
 MOCK_SAFE_HARDWARE = "cpu-basic"
 GENERATE_API_NAME = "/generate_object_file"
 REQUEST_TIMEOUT_SECONDS = 45
 PREDICTION_TIMEOUT_SECONDS = 360
 SPACE_VARIABLES = {
     "OBJECTVERSE_VISION_BACKEND": "minicpm-v",
@@ -231,6 +234,23 @@ def run_space_validation(
     return results
 def _predict_with_timeout(
     client: Any,
     image: Any,
@@ -238,6 +258,22 @@ def _predict_with_timeout(
     mode: str,
     *,
     timeout_seconds: int,
 ) -> Any:
     def _raise_timeout(_signum: int, _frame: Any) -> None:
         raise TimeoutError(f"Gradio prediction did not finish within {timeout_seconds}s")
@@ -245,12 +281,7 @@ def _predict_with_timeout(
     previous_handler = signal.signal(signal.SIGALRM, _raise_timeout)
     signal.alarm(max(1, timeout_seconds))
     try:
-        return client.predict(
-            image,
-            description,
-            mode,
-            api_name=GENERATE_API_NAME,
-        )
     finally:
         signal.alarm(0)
         signal.signal(signal.SIGALRM, previous_handler)
@@ -323,6 +354,7 @@ def render_report(
     space_url: str,
     repo_id: str,
     results: list[ValidationResult],
     configured: dict[str, str] | None = None,
     rollback: dict[str, str] | None = None,
     configuration_error: str = "",
@@ -357,6 +389,12 @@ def render_report(
     if configuration_error:
         lines.extend(["", "## Configuration Error", "", f"- Error: `{configuration_error}`"])
     lines.extend(["", "## Results", ""])
     for result in results:
         lines.extend(
@@ -396,21 +434,55 @@ def write_report(markdown: str, output_path: Path = DEFAULT_OUTPUT_PATH) -> Path
     return output_path
-def write_json_results(results: list[ValidationResult], output_path: Path) -> Path:
     output_path.parent.mkdir(parents=True, exist_ok=True)
-    payload = [result.__dict__ for result in results]
-    output_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
     return output_path
 def write_trace_record(trace: TraceRecord, output_path: Path) -> Path:
     output_path.parent.mkdir(parents=True, exist_ok=True)
     serialized = json.dumps(trace.model_dump(mode="json"), ensure_ascii=False, indent=2, sort_keys=True)
-    _assert_trace_is_public_safe(serialized)
     output_path.write_text(serialized + "\n", encoding="utf-8")
     return output_path
 def _download_url(url: str, output_path: Path) -> None:
     request = urllib.request.Request(
         url,
@@ -434,14 +506,22 @@ def _extract_trace_payload(response: Any) -> dict[str, Any]:
     return trace_payload
 def extract_trace_record(response: Any) -> TraceRecord:
     return TraceRecord.model_validate(_extract_trace_payload(response))
-def _assert_trace_is_public_safe(serialized_trace: str) -> None:
     for marker in SENSITIVE_TRACE_MARKERS:
-        if marker in serialized_trace:
-            raise ValueError("Trace output may contain a sensitive token marker.")
 def _failure_reason(
@@ -471,6 +551,110 @@ def _runtime_stage_name(runtime: Any) -> str:
     return str(stage or "unknown")
 def _assert_hf_auth(api: Any) -> None:
     try:
         user = api.whoami()
@@ -499,6 +683,7 @@ def _parse_args() -> argparse.Namespace:
     parser.add_argument("--hardware", default=DEFAULT_HARDWARE)
     parser.add_argument("--skip-validation", action="store_true")
     parser.add_argument("--trace-output-dir", type=Path)
     return parser.parse_args()
@@ -507,6 +692,7 @@ def main() -> None:
     repo_id = parse_space_repo_id(args.space_url)
     configured = None
     rollback = None
     configuration_error = ""
     if args.configure_space:
         try:
@@ -529,6 +715,13 @@ def main() -> None:
     results: list[ValidationResult] = []
     if not args.skip_validation and not configuration_error:
         try:
             results = run_space_validation(
                 space_url=args.space_url,
@@ -554,13 +747,20 @@ def main() -> None:
         space_url=args.space_url,
         repo_id=repo_id,
         results=results,
         configured=configured,
         rollback=rollback,
         configuration_error=configuration_error,
     )
     write_report(report, args.output)
     if args.json_output:
-        write_json_results(results, args.json_output)
     if configuration_error or (results and not all(result.passed for result in results)):
         raise SystemExit(1)

 DEFAULT_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.md")
 DEFAULT_JSON_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.json")
 DEFAULT_ASSET_DIR = Path(".tmp/space-vlm-assets")
+DEFAULT_FAILURE_NOTES_PATH = Path("docs/FAILURES.md")
 DEFAULT_HARDWARE = "l4x1"
 MOCK_SAFE_HARDWARE = "cpu-basic"
 GENERATE_API_NAME = "/generate_object_file"
+PROBE_API_NAME = "/vision_runtime_probe"
 REQUEST_TIMEOUT_SECONDS = 45
 PREDICTION_TIMEOUT_SECONDS = 360
+LATEST_FAILURE_HEADING = "## Latest Space VLM Validation Failure"
 SPACE_VARIABLES = {
     "OBJECTVERSE_VISION_BACKEND": "minicpm-v",
     return results
+def run_vision_runtime_probe(
+    *,
+    space_url: str = DEFAULT_SPACE_URL,
+    timeout_seconds: int = 900,
+) -> dict[str, Any]:
+    client_url = space_client_url(space_url)
+    client = _build_gradio_client(client_url, timeout_seconds=timeout_seconds)
+    response = _predict_api_with_timeout(
+        client,
+        api_name=PROBE_API_NAME,
+        timeout_seconds=min(PREDICTION_TIMEOUT_SECONDS, timeout_seconds),
+    )
+    payload = _extract_probe_payload(response)
+    _assert_public_safe_serialized(json.dumps(payload, ensure_ascii=False, sort_keys=True), "Probe output")
+    return payload
 def _predict_with_timeout(
     client: Any,
     image: Any,
     mode: str,
     *,
     timeout_seconds: int,
+) -> Any:
+    return _predict_api_with_timeout(
+        client,
+        image,
+        description,
+        mode,
+        api_name=GENERATE_API_NAME,
+        timeout_seconds=timeout_seconds,
+    )
+def _predict_api_with_timeout(
+    client: Any,
+    *inputs: Any,
+    api_name: str,
+    timeout_seconds: int,
 ) -> Any:
     def _raise_timeout(_signum: int, _frame: Any) -> None:
         raise TimeoutError(f"Gradio prediction did not finish within {timeout_seconds}s")
     previous_handler = signal.signal(signal.SIGALRM, _raise_timeout)
     signal.alarm(max(1, timeout_seconds))
     try:
+        return client.predict(*inputs, api_name=api_name)
     finally:
         signal.alarm(0)
         signal.signal(signal.SIGALRM, previous_handler)
     space_url: str,
     repo_id: str,
     results: list[ValidationResult],
+    probe_result: dict[str, Any] | None = None,
     configured: dict[str, str] | None = None,
     rollback: dict[str, str] | None = None,
     configuration_error: str = "",
     if configuration_error:
         lines.extend(["", "## Configuration Error", "", f"- Error: `{configuration_error}`"])
+    lines.extend(["", "## Vision Runtime Probe", ""])
+    if probe_result:
+        lines.extend(_probe_lines(probe_result))
+    else:
+        lines.append("- Probe was not run.")
     lines.extend(["", "## Results", ""])
     for result in results:
         lines.extend(
     return output_path
+def write_json_results(
+    results: list[ValidationResult],
+    output_path: Path,
+    *,
+    probe_result: dict[str, Any] | None = None,
+) -> Path:
     output_path.parent.mkdir(parents=True, exist_ok=True)
+    result_payload = [result.__dict__ for result in results]
+    payload: Any = result_payload
+    if probe_result is not None:
+        payload = {"probe": probe_result, "results": result_payload}
+    serialized = json.dumps(payload, ensure_ascii=False, indent=2)
+    _assert_public_safe_serialized(serialized, "JSON report")
+    output_path.write_text(serialized, encoding="utf-8")
     return output_path
 def write_trace_record(trace: TraceRecord, output_path: Path) -> Path:
     output_path.parent.mkdir(parents=True, exist_ok=True)
     serialized = json.dumps(trace.model_dump(mode="json"), ensure_ascii=False, indent=2, sort_keys=True)
+    _assert_public_safe_serialized(serialized, "Trace output")
     output_path.write_text(serialized + "\n", encoding="utf-8")
     return output_path
+def update_failure_notes(
+    *,
+    results: list[ValidationResult],
+    probe_result: dict[str, Any] | None,
+    output_path: Path = DEFAULT_FAILURE_NOTES_PATH,
+    configuration_error: str = "",
+) -> Path | None:
+    failed_results = [result for result in results if not result.passed]
+    if not configuration_error and not failed_results:
+        return None
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    existing = output_path.read_text(encoding="utf-8") if output_path.exists() else "# Failure Notes\n"
+    section = _latest_failure_section(
+        results=failed_results,
+        probe_result=probe_result,
+        configuration_error=configuration_error,
+    )
+    updated = _replace_or_append_section(existing, LATEST_FAILURE_HEADING, section)
+    _assert_public_safe_serialized(updated, "Failure notes")
+    output_path.write_text(updated, encoding="utf-8")
+    return output_path
 def _download_url(url: str, output_path: Path) -> None:
     request = urllib.request.Request(
         url,
     return trace_payload
+def _extract_probe_payload(response: Any) -> dict[str, Any]:
+    if isinstance(response, dict):
+        return response
+    if isinstance(response, tuple | list) and len(response) == 1 and isinstance(response[0], dict):
+        return response[0]
+    raise ValueError("Probe output was not a JSON object.")
 def extract_trace_record(response: Any) -> TraceRecord:
     return TraceRecord.model_validate(_extract_trace_payload(response))
+def _assert_public_safe_serialized(serialized_payload: str, label: str) -> None:
     for marker in SENSITIVE_TRACE_MARKERS:
+        if marker in serialized_payload:
+            raise ValueError(f"{label} may contain a sensitive token marker.")
 def _failure_reason(
     return str(stage or "unknown")
+def _safe_error_payload(exc: Exception, *, stage: str) -> dict[str, str]:
+    return {
+        "backend": "unknown",
+        "probe_ok": "false",
+        "stage": stage,
+        "error_type": type(exc).__name__,
+        "error_summary": _sanitize_error_summary(str(exc) or type(exc).__name__),
+    }
+def _sanitize_error_summary(value: str, *, max_length: int = 240) -> str:
+    clean = value.replace(str(Path.home()), "[home]")
+    clean = clean.replace("HUGGINGFACE_TOKEN", "[redacted]")
+    clean = clean.replace("HF_TOKEN", "[redacted]")
+    clean = clean.replace("hf_", "[redacted]")
+    if len(clean) > max_length:
+        return clean[: max_length - 3] + "..."
+    return clean
+def _probe_lines(probe_result: dict[str, Any]) -> list[str]:
+    summary_keys = (
+        "backend",
+        "vision_model_id",
+        "torch_import",
+        "transformers_import",
+        "cuda_available",
+        "device_count",
+        "device_name",
+        "mps_available",
+        "minicpm_load_attempted",
+        "minicpm_load_ok",
+    )
+    lines: list[str] = []
+    for key in summary_keys:
+        if key in probe_result:
+            lines.append(f"- `{key}`: `{probe_result[key]}`")
+    errors = probe_result.get("errors")
+    if isinstance(errors, list) and errors:
+        lines.append("- Errors:")
+        for error in errors:
+            if isinstance(error, dict):
+                stage = error.get("stage", "unknown")
+                error_type = error.get("type", "unknown")
+                summary = error.get("summary", "")
+                lines.append(f"  - `{stage}`: `{error_type}` - {summary}")
+    elif "error_type" in probe_result:
+        lines.append(f"- Error: `{probe_result['error_type']}` - {probe_result.get('error_summary', '')}")
+    else:
+        lines.append("- Errors: none")
+    return lines
+def _latest_failure_section(
+    *,
+    results: list[ValidationResult],
+    probe_result: dict[str, Any] | None,
+    configuration_error: str,
+) -> str:
+    now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
+    lines = [
+        LATEST_FAILURE_HEADING,
+        "",
+        f"- Updated: {now}",
+        "- Area: Hugging Face Space vision runtime.",
+    ]
+    if configuration_error:
+        lines.append(f"- Configuration error: `{_sanitize_error_summary(configuration_error)}`")
+    if probe_result:
+        lines.append(f"- Probe backend: `{probe_result.get('backend', 'unknown')}`")
+        lines.append(f"- MiniCPM load attempted: `{probe_result.get('minicpm_load_attempted', 'unknown')}`")
+        lines.append(f"- MiniCPM load ok: `{probe_result.get('minicpm_load_ok', 'unknown')}`")
+        errors = probe_result.get("errors")
+        if isinstance(errors, list) and errors:
+            probe_errors = []
+            for error in errors:
+                if isinstance(error, dict):
+                    probe_errors.append(f"{error.get('stage', 'unknown')}={error.get('type', 'unknown')}")
+            if probe_errors:
+                lines.append(f"- Probe errors: {', '.join(probe_errors)}")
+    if results:
+        failures = [f"{result.key}: {result.error or 'failed'}" for result in results]
+        lines.append(f"- Failed checks: {'; '.join(failures)}")
+    lines.extend(
+        [
+            "- Fallback used: mock object understanding plus mock text runtime if validation reaches generation.",
+            "- Resolution: unresolved; keep the public Space mock-safe until this section reports a passing VLM validation.",
+            "",
+        ]
+    )
+    return "\n".join(lines)
+def _replace_or_append_section(markdown: str, heading: str, section: str) -> str:
+    start = markdown.find(heading)
+    if start == -1:
+        return markdown.rstrip() + "\n\n" + section
+    next_start = markdown.find("\n## ", start + len(heading))
+    if next_start == -1:
+        return markdown[:start].rstrip() + "\n\n" + section
+    return markdown[:start].rstrip() + "\n\n" + section.rstrip() + "\n" + markdown[next_start:]
 def _assert_hf_auth(api: Any) -> None:
     try:
         user = api.whoami()
     parser.add_argument("--hardware", default=DEFAULT_HARDWARE)
     parser.add_argument("--skip-validation", action="store_true")
     parser.add_argument("--trace-output-dir", type=Path)
+    parser.add_argument("--failure-notes-output", type=Path, default=DEFAULT_FAILURE_NOTES_PATH)
     return parser.parse_args()
     repo_id = parse_space_repo_id(args.space_url)
     configured = None
     rollback = None
+    probe_result = None
     configuration_error = ""
     if args.configure_space:
         try:
     results: list[ValidationResult] = []
     if not args.skip_validation and not configuration_error:
+        try:
+            probe_result = run_vision_runtime_probe(
+                space_url=args.space_url,
+                timeout_seconds=args.timeout_seconds,
+            )
+        except Exception as exc:
+            probe_result = _safe_error_payload(exc, stage="vision_runtime_probe")
         try:
             results = run_space_validation(
                 space_url=args.space_url,
         space_url=args.space_url,
         repo_id=repo_id,
         results=results,
+        probe_result=probe_result,
         configured=configured,
         rollback=rollback,
         configuration_error=configuration_error,
     )
     write_report(report, args.output)
     if args.json_output:
+        write_json_results(results, args.json_output, probe_result=probe_result)
+    update_failure_notes(
+        results=results,
+        probe_result=probe_result,
+        output_path=args.failure_notes_output,
+        configuration_error=configuration_error,
+    )
     if configuration_error or (results and not all(result.passed for result in results)):
         raise SystemExit(1)

src/config.py CHANGED Viewed

@@ -61,11 +61,15 @@ def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
     if text_backend == "mock":
         runtime_parts.append("no llama.cpp model connected yet")
     else:
-        runtime_parts.append(f"text model path: {current.text_model_path or '[not configured]'}")
     runtime = "; ".join(runtime_parts)
     return {"vision": vision, "text": text, "runtime": runtime}
 SETTINGS = get_runtime_settings()
 TRACE_DIR = SETTINGS.trace_output_dir
 MODEL_RUNTIME_STATUS = runtime_status(SETTINGS)

     if text_backend == "mock":
         runtime_parts.append("no llama.cpp model connected yet")
     else:
+        runtime_parts.append(f"text model path: {_text_model_path_status(current.text_model_path)}")
     runtime = "; ".join(runtime_parts)
     return {"vision": vision, "text": text, "runtime": runtime}
+def _text_model_path_status(text_model_path: str) -> str:
+    return "[configured external GGUF]" if text_model_path.strip() else "[not configured]"
 SETTINGS = get_runtime_settings()
 TRACE_DIR = SETTINGS.trace_output_dir
 MODEL_RUNTIME_STATUS = runtime_status(SETTINGS)

src/models/llama_cpp_runner.py CHANGED Viewed

@@ -80,6 +80,7 @@ def reply_as_object(persona_data: dict, message: str) -> str:
             return _reply_as_object_llama_cpp(persona_data, message, settings)
         except Exception as exc:
             _log_text_fallback("chat", exc)
     return _reply_as_object_mock(persona_data, message)

             return _reply_as_object_llama_cpp(persona_data, message, settings)
         except Exception as exc:
             _log_text_fallback("chat", exc)
+            _add_text_fallback(TEXT_FALLBACK_TO_MOCK)
     return _reply_as_object_mock(persona_data, message)

src/models/vision_runner.py CHANGED Viewed

@@ -2,6 +2,7 @@
 from __future__ import annotations
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any
@@ -25,6 +26,7 @@ KNOWN_OBJECTS = {
 MINICPM_DEFAULT_MODEL_ID = "openbmb/MiniCPM-V-2_6"
 MINICPM_BACKENDS = {"minicpm-v", "minicpm_v", "minicpmv"}
 _MINICPM_MODEL: Any | None = None
 _MINICPM_TOKENIZER: Any | None = None
@@ -42,6 +44,64 @@ def understand_object(image_path: str | None, description: str) -> ObjectUnderst
     return understand_object_with_metadata(image_path, description).object_understanding
 def understand_object_with_metadata(
     image_path: str | None,
     description: str,
@@ -166,6 +226,36 @@ def _log_vision_fallback(backend: str, exc: Exception) -> None:
     )
 def _infer_object_name(description: str, image_path: str | None) -> str:
     lowered = description.lower()
     for keyword, name in KNOWN_OBJECTS.items():

 from __future__ import annotations
+import re
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any
 MINICPM_DEFAULT_MODEL_ID = "openbmb/MiniCPM-V-2_6"
 MINICPM_BACKENDS = {"minicpm-v", "minicpm_v", "minicpmv"}
+SENSITIVE_PROBE_MARKERS = ("HF_TOKEN", "HUGGINGFACE_TOKEN", "hf_", ".env")
 _MINICPM_MODEL: Any | None = None
 _MINICPM_TOKENIZER: Any | None = None
     return understand_object_with_metadata(image_path, description).object_understanding
+def probe_vision_runtime(
+    *,
+    settings: RuntimeSettings | None = None,
+    load_model: bool = True,
+) -> dict[str, Any]:
+    """Return non-secret runtime diagnostics for hosted MiniCPM-V debugging."""
+    current = settings or get_runtime_settings()
+    backend = current.vision_backend.strip().lower()
+    model_id = current.vision_model_id or MINICPM_DEFAULT_MODEL_ID
+    probe: dict[str, Any] = {
+        "backend": backend,
+        "vision_model_id": model_id if backend in MINICPM_BACKENDS else current.vision_model_id,
+        "torch_import": False,
+        "transformers_import": False,
+        "cuda_available": False,
+        "device_count": 0,
+        "device_name": "",
+        "mps_available": False,
+        "minicpm_load_attempted": False,
+        "minicpm_load_ok": False,
+        "errors": [],
+    }
+    torch_module: Any | None = None
+    try:
+        import torch
+        torch_module = torch
+        probe["torch_import"] = True
+        probe["cuda_available"] = torch.cuda.is_available()
+        probe["device_count"] = torch.cuda.device_count()
+        if probe["cuda_available"] and probe["device_count"]:
+            probe["device_name"] = torch.cuda.get_device_name(0)
+        probe["mps_available"] = bool(
+            getattr(torch.backends, "mps", None) and torch.backends.mps.is_available()
+        )
+    except Exception as exc:
+        _add_probe_error(probe, "torch", exc)
+    try:
+        from transformers import AutoModel as _AutoModel  # noqa: F401
+        from transformers import AutoTokenizer as _AutoTokenizer  # noqa: F401
+        probe["transformers_import"] = True
+    except Exception as exc:
+        _add_probe_error(probe, "transformers", exc)
+    if backend in MINICPM_BACKENDS and load_model:
+        probe["minicpm_load_attempted"] = True
+        try:
+            _load_minicpm_components(model_id)
+            probe["minicpm_load_ok"] = True
+        except Exception as exc:
+            _add_probe_error(probe, "minicpm_load", exc)
+    return _sanitize_probe_payload(probe)
 def understand_object_with_metadata(
     image_path: str | None,
     description: str,
     )
+def _add_probe_error(probe: dict[str, Any], stage: str, exc: Exception) -> None:
+    probe["errors"].append(
+        {
+            "stage": stage,
+            "type": type(exc).__name__,
+            "summary": _sanitize_probe_text(str(exc) or type(exc).__name__),
+        }
+    )
+def _sanitize_probe_payload(value: Any) -> Any:
+    if isinstance(value, dict):
+        return {str(key): _sanitize_probe_payload(item) for key, item in value.items()}
+    if isinstance(value, list):
+        return [_sanitize_probe_payload(item) for item in value]
+    if isinstance(value, str):
+        return _sanitize_probe_text(value)
+    return value
+def _sanitize_probe_text(value: str, *, max_length: int = 240) -> str:
+    clean = value.replace(str(Path.home()), "[home]")
+    clean = re.sub(r"hf_[A-Za-z0-9_-]+", "[redacted-token]", clean)
+    for marker in SENSITIVE_PROBE_MARKERS:
+        clean = clean.replace(marker, "[redacted]")
+    if len(clean) > max_length:
+        return clean[: max_length - 3] + "..."
+    return clean
 def _infer_object_name(description: str, image_path: str | None) -> str:
     lowered = description.lower()
     for keyword, name in KNOWN_OBJECTS.items():

src/ui/layout.py CHANGED Viewed

@@ -13,6 +13,7 @@ from src.example_cache import load_sample_generation
 from src.examples import EXAMPLE_OBJECTS, example_button_label
 from src.models.llama_cpp_runner import reply_as_object
 from src.models.schema import GenerationResult
 from src.pipeline import format_diary_markdown, generate_object_diary
 from src.renderer.share_card import render_share_card
 from src.ui import copy
@@ -145,6 +146,8 @@ def build_app() -> gr.Blocks:
                 result_state = gr.State()
                 zero_gpu_probe_button = gr.Button(visible=False)
                 zero_gpu_probe_output = gr.JSON(visible=False)
                 # Intake & Examples Row
                 with gr.Row(elem_id="intake", elem_classes=["content-section"]):
@@ -324,6 +327,12 @@ def build_app() -> gr.Blocks:
             outputs=[zero_gpu_probe_output],
             api_name="zero_gpu_probe",
         )
     return demo
@@ -514,3 +523,8 @@ def zero_gpu_probe() -> dict[str, Any]:
         "device_count": torch.cuda.device_count(),
         "device_name": torch.cuda.get_device_name(0) if cuda_available else "",
     }

 from src.examples import EXAMPLE_OBJECTS, example_button_label
 from src.models.llama_cpp_runner import reply_as_object
 from src.models.schema import GenerationResult
+from src.models.vision_runner import probe_vision_runtime
 from src.pipeline import format_diary_markdown, generate_object_diary
 from src.renderer.share_card import render_share_card
 from src.ui import copy
                 result_state = gr.State()
                 zero_gpu_probe_button = gr.Button(visible=False)
                 zero_gpu_probe_output = gr.JSON(visible=False)
+                vision_runtime_probe_button = gr.Button(visible=False)
+                vision_runtime_probe_output = gr.JSON(visible=False)
                 # Intake & Examples Row
                 with gr.Row(elem_id="intake", elem_classes=["content-section"]):
             outputs=[zero_gpu_probe_output],
             api_name="zero_gpu_probe",
         )
+        vision_runtime_probe_button.click(
+            fn=vision_runtime_probe,
+            inputs=[],
+            outputs=[vision_runtime_probe_output],
+            api_name="vision_runtime_probe",
+        )
     return demo
         "device_count": torch.cuda.device_count(),
         "device_name": torch.cuda.get_device_name(0) if cuda_available else "",
     }
+@zero_gpu(duration=180)
+def vision_runtime_probe() -> dict[str, Any]:
+    return probe_vision_runtime(load_model=True)

tests/test_llama_cpp_smoke.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""Tests for the optional llama.cpp smoke-test helper."""
+from __future__ import annotations
+import tempfile
+import unittest
+from pathlib import Path
+from unittest.mock import patch
+from scripts.check_llama_cpp_smoke import run_llama_cpp_smoke
+class FakeLlamaModel:
+    def __init__(self, responses: list[str]) -> None:
+        self.responses = responses
+    def create_chat_completion(self, **_: object) -> dict:
+        response = self.responses.pop(0)
+        return {"choices": [{"message": {"content": response}}]}
+class LlamaCppSmokeTest(unittest.TestCase):
+    def test_smoke_passes_when_pipeline_uses_llama_cpp_without_fallback(self) -> None:
+        fake_llama = FakeLlamaModel(
+            [
+                """
+                {"persona":{"object_name":"coffee mug","character_name":"Mugworth","mood":"dry and suspicious","secret_fear":"being left empty forever","core_memory":"It remembers every late-night refill.","complaint":"I am treated like a ceramic fuel tank.","tags":["desk witness","warm archive","quiet judgment"]}}
+                """,
+                """
+                {"title":"Secret Diary - Day 418","english":"Today I held another bitter storm and called it service.","chinese":"今天我又装下一场苦涩风暴，并被称为有用。"}
+                """,
+                """
+                {"reply":"Mugworth: I saw another deadline dissolve into a coffee ring."}
+                """,
+            ]
+        )
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            model_path = Path(tmp_dir) / "model.gguf"
+            model_path.write_text("fake", encoding="utf-8")
+            with patch("src.models.llama_cpp_runner._load_llama_model", return_value=fake_llama):
+                result = run_llama_cpp_smoke(
+                    model_path=model_path,
+                    description="old white coffee mug",
+                    mode="Cynical",
+                    save_trace=False,
+                )
+        self.assertEqual(result["status"], "pass")
+        self.assertEqual(result["model_runtime"]["text"], "llama-cpp text generation")
+        self.assertNotIn("text-fallback-to-mock", result["fallbacks"])
+        self.assertNotIn("text-fallback-to-mock", result["chat_fallbacks"])
+    def test_smoke_fails_when_model_path_is_missing(self) -> None:
+        with self.assertRaises(FileNotFoundError):
+            run_llama_cpp_smoke(
+                model_path=Path("/tmp/objectverse-missing-model.gguf"),
+                description="old white coffee mug",
+                mode="Cynical",
+                save_trace=False,
+            )
+if __name__ == "__main__":
+    unittest.main()

tests/test_mock_mvp.py CHANGED Viewed

@@ -17,8 +17,10 @@ from src.models.llama_cpp_runner import (
     reset_text_runtime_fallbacks,
 )
 from src.models.vision_runner import understand_object, understand_object_with_metadata
 from src.pipeline import generate_object_diary
 from src.renderer.share_card import render_share_card
 from src.traces.anonymizer import anonymize_text
 from src.traces.logger import build_trace, save_trace
 from scripts.generate_sample_traces import generate_sample_traces
@@ -56,6 +58,20 @@ class MockMvpTest(unittest.TestCase):
         self.assertEqual(status["vision"], "mock object understanding")
         self.assertEqual(status["runtime"], "no llama.cpp model connected yet")
     def test_examples_cover_six_objects(self) -> None:
         self.assertEqual(len(EXAMPLE_OBJECTS), 6)
         self.assertEqual(len(gradio_examples()), 6)
@@ -201,6 +217,37 @@ class MockMvpTest(unittest.TestCase):
         self.assertEqual(result.object_understanding.object.name, "keyboard")
         self.assertEqual(result.fallbacks, ["vision-fallback-to-mock"])
     def test_pipeline_saves_generation_result(self) -> None:
         with tempfile.TemporaryDirectory() as tmp_dir:
             result = generate_object_diary(

     reset_text_runtime_fallbacks,
 )
 from src.models.vision_runner import understand_object, understand_object_with_metadata
+from src.models.vision_runner import probe_vision_runtime
 from src.pipeline import generate_object_diary
 from src.renderer.share_card import render_share_card
+from src.ui.layout import vision_runtime_probe
 from src.traces.anonymizer import anonymize_text
 from src.traces.logger import build_trace, save_trace
 from scripts.generate_sample_traces import generate_sample_traces
         self.assertEqual(status["vision"], "mock object understanding")
         self.assertEqual(status["runtime"], "no llama.cpp model connected yet")
+    def test_llama_cpp_runtime_status_does_not_expose_model_path(self) -> None:
+        status = runtime_status(
+            get_runtime_settings(
+                {
+                    "OBJECTVERSE_TEXT_BACKEND": "llama-cpp",
+                    "TEXT_MODEL_PATH": "/Users/leo/private/model.gguf",
+                }
+            )
+        )
+        self.assertEqual(status["text"], "llama-cpp text generation")
+        self.assertIn("[configured external GGUF]", status["runtime"])
+        self.assertNotIn("/Users/leo", status["runtime"])
     def test_examples_cover_six_objects(self) -> None:
         self.assertEqual(len(EXAMPLE_OBJECTS), 6)
         self.assertEqual(len(gradio_examples()), 6)
         self.assertEqual(result.object_understanding.object.name, "keyboard")
         self.assertEqual(result.fallbacks, ["vision-fallback-to-mock"])
+    def test_vision_runtime_probe_redacts_sensitive_error_markers(self) -> None:
+        settings = get_runtime_settings(
+            {
+                "OBJECTVERSE_VISION_BACKEND": "minicpm-v",
+                "VISION_MODEL_ID": "openbmb/MiniCPM-V-2_6",
+            }
+        )
+        with patch(
+            "src.models.vision_runner._load_minicpm_components",
+            side_effect=RuntimeError("failed with token hf_forbidden in /Users/leo/.env"),
+        ):
+            probe = probe_vision_runtime(settings=settings, load_model=True)
+        serialized = json.dumps(probe, ensure_ascii=False)
+        self.assertTrue(probe["minicpm_load_attempted"])
+        self.assertFalse(probe["minicpm_load_ok"])
+        self.assertNotIn("hf_", serialized)
+        self.assertNotIn("HF_TOKEN", serialized)
+        self.assertNotIn("/Users/leo", serialized)
+        self.assertNotIn(".env", serialized)
+    def test_hidden_vision_runtime_probe_returns_safe_json(self) -> None:
+        probe = vision_runtime_probe()
+        serialized = json.dumps(probe, ensure_ascii=False)
+        self.assertIn("backend", probe)
+        self.assertIn("torch_import", probe)
+        self.assertNotIn("hf_", serialized)
+        self.assertNotIn("HF_TOKEN", serialized)
     def test_pipeline_saves_generation_result(self) -> None:
         with tempfile.TemporaryDirectory() as tmp_dir:
             result = generate_object_diary(

tests/test_space_vlm_tooling.py CHANGED Viewed

@@ -14,7 +14,9 @@ from scripts.check_space_vlm import (
     parse_space_repo_id,
     render_report,
     space_client_url,
     validate_prediction,
     write_trace_record,
 )
 from src.models.schema import DiaryEntry, ObjectInfo, ObjectUnderstanding, Persona, PersonaEnvelope, TraceRecord
@@ -146,11 +148,14 @@ class SpaceVlmToolingTest(unittest.TestCase):
             space_url="https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary",
             repo_id="build-small-hackathon/ObjectverseDiary",
             results=[result],
             configured={"hardware": "l4x1", "OBJECTVERSE_VISION_BACKEND": "minicpm-v"},
             rollback={"hardware": "cpu-basic", "OBJECTVERSE_VISION_BACKEND": "mock"},
         )
         self.assertIn("Overall status: PASS", report)
         self.assertIn("Running shoe", report)
         self.assertIn("OBJECTVERSE_VISION_BACKEND", report)
         self.assertNotIn("hf_", report.lower())
@@ -169,6 +174,65 @@ class SpaceVlmToolingTest(unittest.TestCase):
         self.assertIn("Configuration Error", report)
         self.assertIn("402 Payment Required", report)
 def _trace_record(
     *,
@@ -216,5 +280,21 @@ def _trace_record(
     )
 if __name__ == "__main__":
     unittest.main()

     parse_space_repo_id,
     render_report,
     space_client_url,
+    update_failure_notes,
     validate_prediction,
+    write_json_results,
     write_trace_record,
 )
 from src.models.schema import DiaryEntry, ObjectInfo, ObjectUnderstanding, Persona, PersonaEnvelope, TraceRecord
             space_url="https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary",
             repo_id="build-small-hackathon/ObjectverseDiary",
             results=[result],
+            probe_result=_probe_result(minicpm_load_ok=True),
             configured={"hardware": "l4x1", "OBJECTVERSE_VISION_BACKEND": "minicpm-v"},
             rollback={"hardware": "cpu-basic", "OBJECTVERSE_VISION_BACKEND": "mock"},
         )
         self.assertIn("Overall status: PASS", report)
+        self.assertIn("Vision Runtime Probe", report)
+        self.assertIn("minicpm_load_ok", report)
         self.assertIn("Running shoe", report)
         self.assertIn("OBJECTVERSE_VISION_BACKEND", report)
         self.assertNotIn("hf_", report.lower())
         self.assertIn("Configuration Error", report)
         self.assertIn("402 Payment Required", report)
+    def test_write_json_results_includes_probe_when_present(self) -> None:
+        result = ValidationResult(
+            key="mug",
+            label="Coffee mug",
+            source_page="https://commons.wikimedia.org/wiki/File:Striped_coffee_mug.jpg",
+            image_path="/tmp/mug.jpg",
+            passed=False,
+            object_name="coffee mug",
+            visible_features=["uploaded photo provided"],
+            likely_context="everyday human environment",
+            confidence=0.42,
+            runtime_vision="minicpm-v object understanding",
+            runtime_text="mock persona and diary generation",
+            fallbacks=["vision-fallback-to-mock", "mock-text-runtime"],
+            error="vision fallback marker was present",
+        )
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            output_path = write_json_results(
+                [result],
+                Path(tmp_dir) / "report.json",
+                probe_result=_probe_result(minicpm_load_ok=False),
+            )
+            payload = output_path.read_text(encoding="utf-8")
+            parsed = output_path.read_text(encoding="utf-8")
+        self.assertIn('"probe"', payload)
+        self.assertIn('"results"', payload)
+        self.assertNotIn("hf_", parsed)
+        self.assertNotIn("HF_TOKEN", parsed)
+    def test_update_failure_notes_replaces_latest_failure_section(self) -> None:
+        failed = ValidationResult(
+            key="keyboard",
+            label="Computer keyboard",
+            source_page="https://commons.wikimedia.org/wiki/File:Computer_keyboard.jpg",
+            image_path="/tmp/keyboard.jpg",
+            passed=False,
+            object_name="keyboard",
+            visible_features=["uploaded photo provided"],
+            likely_context="everyday human environment",
+            confidence=0.42,
+            runtime_vision="minicpm-v object understanding",
+            runtime_text="mock persona and diary generation",
+            fallbacks=["vision-fallback-to-mock", "mock-text-runtime"],
+            error="vision fallback marker was present",
+        )
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            notes_path = Path(tmp_dir) / "FAILURES.md"
+            notes_path.write_text("# Failure Notes\n\n## Current Status\n\nStable.\n", encoding="utf-8")
+            update_failure_notes(results=[failed], probe_result=_probe_result(False), output_path=notes_path)
+            update_failure_notes(results=[failed], probe_result=_probe_result(False), output_path=notes_path)
+            content = notes_path.read_text(encoding="utf-8")
+        self.assertEqual(content.count("## Latest Space VLM Validation Failure"), 1)
+        self.assertIn("keyboard: vision fallback marker was present", content)
+        self.assertNotIn("hf_", content)
 def _trace_record(
     *,
     )
+def _probe_result(minicpm_load_ok: bool) -> dict[str, object]:
+    return {
+        "backend": "minicpm-v",
+        "vision_model_id": "openbmb/MiniCPM-V-2_6",
+        "torch_import": True,
+        "transformers_import": True,
+        "cuda_available": True,
+        "device_count": 1,
+        "device_name": "NVIDIA test device",
+        "mps_available": False,
+        "minicpm_load_attempted": True,
+        "minicpm_load_ok": minicpm_load_ok,
+        "errors": [] if minicpm_load_ok else [{"stage": "minicpm_load", "type": "RuntimeError", "summary": "test failure"}],
+    }
 if __name__ == "__main__":
     unittest.main()