qqyule commited on
Commit
d30bd8e
·
verified ·
1 Parent(s): 9e874de

Sync runtime diagnostics and smoke helpers

Browse files
README.md CHANGED
@@ -23,7 +23,7 @@ Upload a photo of any everyday object. The app wakes it up, gives it a secret pe
23
 
24
  ## Current Status
25
 
26
- Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, optional llama.cpp text runtime wiring, public mock traces, Space validation evidence, and a published Qwen 1.5B LoRA test adapter are available.
27
 
28
  By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
29
 
@@ -31,6 +31,8 @@ By default, the app uses deterministic mock outputs for object understanding, pe
31
 
32
  `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. No GGUF file is committed in this stable submission baseline. A short Modal-trained LoRA adapter is published for Well-Tuned evidence, but it is not converted to GGUF or wired into the public Space runtime yet.
33
 
 
 
34
  Hugging Face Space:
35
 
36
  https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
@@ -59,7 +61,7 @@ The interface is English-first and Chinese-second.
59
  - [x] Sharing is Caring — public mock traces, JSONL export, prompt templates, and failure notes.
60
  - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
61
  - [ ] OpenBMB Special — MiniCPM-V wiring exists, but hosted validation currently falls back to mock vision.
62
- - [ ] Llama Champion — llama.cpp wiring exists, but real GGUF smoke test is not complete.
63
  - [x] Well-Tuned — synthetic curated SFT dataset and Qwen 1.5B LoRA test adapter are published.
64
  - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
65
 
@@ -106,6 +108,17 @@ python app.py
106
 
107
  If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
108
 
 
 
 
 
 
 
 
 
 
 
 
109
  ## Initial MVP Flow
110
 
111
  The stable submission baseline supports:
@@ -133,6 +146,7 @@ The stable submission baseline supports:
133
  - Public mock traces: `data/traces/samples/`
134
  - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
135
  - Hosted VLM failure evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`
 
136
  - Field Notes draft: `docs/FIELD_NOTES.md`
137
  - Demo video script: `docs/DEMO_VIDEO_SCRIPT.md`
138
  - Social post draft: `docs/SOCIAL_POST.md`
 
23
 
24
  ## Current Status
25
 
26
+ Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, non-secret hosted vision diagnostics, optional llama.cpp text runtime wiring, a local GGUF smoke-test helper, public mock traces, Space validation evidence, and a published Qwen 1.5B LoRA test adapter are available.
27
 
28
  By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
29
 
 
31
 
32
  `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. No GGUF file is committed in this stable submission baseline. A short Modal-trained LoRA adapter is published for Well-Tuned evidence, but it is not converted to GGUF or wired into the public Space runtime yet.
33
 
34
+ `scripts/check_llama_cpp_smoke.py` is available for an explicit-confirmation local GGUF smoke test. The recommended baseline smoke model is `Qwen/Qwen2.5-1.5B-Instruct-GGUF` with `qwen2.5-1.5b-instruct-q4_k_m.gguf`, stored under ignored `models/` when used locally.
35
+
36
  Hugging Face Space:
37
 
38
  https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 
61
  - [x] Sharing is Caring — public mock traces, JSONL export, prompt templates, and failure notes.
62
  - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
63
  - [ ] OpenBMB Special — MiniCPM-V wiring exists, but hosted validation currently falls back to mock vision.
64
+ - [ ] Llama Champion — llama.cpp wiring and smoke helper exist, but real GGUF smoke test is not complete.
65
  - [x] Well-Tuned — synthetic curated SFT dataset and Qwen 1.5B LoRA test adapter are published.
66
  - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
67
 
 
108
 
109
  If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
110
 
111
+ Recommended explicit-confirmation smoke path:
112
+
113
+ ```bash
114
+ # Download externally, do not commit the GGUF:
115
+ # https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF
116
+ # file: qwen2.5-1.5b-instruct-q4_k_m.gguf
117
+
118
+ .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
119
+ --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
120
+ ```
121
+
122
  ## Initial MVP Flow
123
 
124
  The stable submission baseline supports:
 
146
  - Public mock traces: `data/traces/samples/`
147
  - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
148
  - Hosted VLM failure evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`
149
+ - Hosted VLM diagnostic support: hidden `/vision_runtime_probe` API and probe-aware `scripts/check_space_vlm.py`
150
  - Field Notes draft: `docs/FIELD_NOTES.md`
151
  - Demo video script: `docs/DEMO_VIDEO_SCRIPT.md`
152
  - Social post draft: `docs/SOCIAL_POST.md`
docs/DEMO_VIDEO_SCRIPT.md CHANGED
@@ -4,7 +4,7 @@
4
 
5
  Record a 90-second stable demo for Objectverse Diary using the mock-safe Hugging Face Space or local Gradio app.
6
 
7
- Do not claim that hosted MiniCPM-V, GGUF text generation, LoRA training, or model publishing are complete. The stable demo should emphasize the product loop, Gradio Off-Brand UI, public traces, and no commercial AI APIs.
8
 
9
  ## Recording Setup
10
 
@@ -104,5 +104,6 @@ Screen:
104
  ## Notes For Submission
105
 
106
  - Mention MiniCPM-V as wired but not hosted-validated yet.
 
107
  - Mention public traces and failure notes if the submission form asks for reproducibility.
108
  - Keep the final video under 2 minutes.
 
4
 
5
  Record a 90-second stable demo for Objectverse Diary using the mock-safe Hugging Face Space or local Gradio app.
6
 
7
+ Do not claim that hosted MiniCPM-V validation, GGUF text generation, or live LoRA runtime wiring are complete. The stable demo should emphasize the product loop, Gradio Off-Brand UI, public traces, published dataset/LoRA evidence, and no commercial AI APIs.
8
 
9
  ## Recording Setup
10
 
 
104
  ## Notes For Submission
105
 
106
  - Mention MiniCPM-V as wired but not hosted-validated yet.
107
+ - Mention the published synthetic curated dataset and LoRA adapter only as training evidence, not live Space runtime.
108
  - Mention public traces and failure notes if the submission form asks for reproducibility.
109
  - Keep the final video under 2 minutes.
docs/DEVELOPMENT_STATUS.md CHANGED
@@ -21,6 +21,9 @@ Last updated: 2026-06-08
21
  - Optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`, with mock fallback.
22
  - Space VLM validation tooling:
23
  - `scripts/check_space_vlm.py`
 
 
 
24
  - failed L4 validation report at `docs/SPACE_VLM_REPORT.md`
25
  - optional `--trace-output-dir` evidence export for validation traces
26
  - ZeroGPU compatibility:
@@ -36,18 +39,22 @@ Last updated: 2026-06-08
36
  - 50-row synthetic curated SFT dataset published at https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
37
  - Modal Qwen 1.5B LoRA test run completed with 20 steps
38
  - LoRA adapter published at https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
 
 
 
 
39
  - Local tests and initial acceptance currently pass.
40
 
41
  ## Not Completed
42
 
43
  - Hosted Space MiniCPM-V validation with real public mug/keyboard/shoe images. Paid L4 was blocked by Hugging Face `402 Payment Required`; ZeroGPU CUDA probe passed; the 2026-06-08 full ZeroGPU validation reached the app but all three objects fell back to mock vision.
44
  - Passing real VLM demo trace capture. Failed Space VLM traces are kept as fallback evidence and do not replace mock sample traces.
45
- - Real GGUF model selection, download/configuration outside Git, and `TEXT_MODEL_PATH` smoke test.
46
  - Final text model parameter count documentation.
47
  - Real model traces from non-mock runtime.
48
  - GGUF conversion and runtime wiring for the published LoRA adapter.
49
  - GitHub sync / final public repository confirmation.
50
- - Published Field Notes URL, recorded demo video URL, social post URL, and final public submission.
51
 
52
  ## Current Safe Defaults
53
 
 
21
  - Optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`, with mock fallback.
22
  - Space VLM validation tooling:
23
  - `scripts/check_space_vlm.py`
24
+ - hidden `/vision_runtime_probe` API for non-secret MiniCPM-V diagnostics
25
+ - probe output support in Space VLM markdown and JSON reports
26
+ - failure-note updater for the latest Space VLM failure summary
27
  - failed L4 validation report at `docs/SPACE_VLM_REPORT.md`
28
  - optional `--trace-output-dir` evidence export for validation traces
29
  - ZeroGPU compatibility:
 
39
  - 50-row synthetic curated SFT dataset published at https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
40
  - Modal Qwen 1.5B LoRA test run completed with 20 steps
41
  - LoRA adapter published at https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
42
+ - GGUF smoke-test helper:
43
+ - `scripts/check_llama_cpp_smoke.py`
44
+ - recommended baseline model documented as `Qwen/Qwen2.5-1.5B-Instruct-GGUF` / `qwen2.5-1.5b-instruct-q4_k_m.gguf`
45
+ - trace runtime no longer records literal `TEXT_MODEL_PATH`
46
  - Local tests and initial acceptance currently pass.
47
 
48
  ## Not Completed
49
 
50
  - Hosted Space MiniCPM-V validation with real public mug/keyboard/shoe images. Paid L4 was blocked by Hugging Face `402 Payment Required`; ZeroGPU CUDA probe passed; the 2026-06-08 full ZeroGPU validation reached the app but all three objects fell back to mock vision.
51
  - Passing real VLM demo trace capture. Failed Space VLM traces are kept as fallback evidence and do not replace mock sample traces.
52
+ - Real GGUF download/configuration outside Git and `TEXT_MODEL_PATH` smoke test. Model selection is now documented, but the file is not downloaded and optional `llama-cpp-python` is not installed by default.
53
  - Final text model parameter count documentation.
54
  - Real model traces from non-mock runtime.
55
  - GGUF conversion and runtime wiring for the published LoRA adapter.
56
  - GitHub sync / final public repository confirmation.
57
+ - Published Field Notes URL, recorded demo video URL, social post URL, GitHub push confirmation, Space sync confirmation, and final public submission.
58
 
59
  ## Current Safe Defaults
60
 
docs/EXTERNAL_SETUP.md CHANGED
@@ -85,9 +85,12 @@ Automated validation command after confirmation:
85
  --output docs/SPACE_VLM_REPORT.md \
86
  --json-output docs/SPACE_VLM_REPORT.json \
87
  --trace-output-dir data/traces/space-vlm \
 
88
  --timeout-seconds 1200
89
  ```
90
 
 
 
91
  Optional rollback to mock-safe settings:
92
 
93
  ```bash
@@ -99,6 +102,27 @@ Optional rollback to mock-safe settings:
99
 
100
  The validation script must not print Hugging Face tokens. It uses three temporary public Wikimedia Commons images and does not commit downloaded assets.
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  2026-06-06 validation attempt:
103
 
104
  - `--configure-space` was run for `l4x1`.
 
85
  --output docs/SPACE_VLM_REPORT.md \
86
  --json-output docs/SPACE_VLM_REPORT.json \
87
  --trace-output-dir data/traces/space-vlm \
88
+ --failure-notes-output docs/FAILURES.md \
89
  --timeout-seconds 1200
90
  ```
91
 
92
+ The validation command now calls the hidden `/vision_runtime_probe` endpoint before mug/keyboard/shoe generation. The probe output is written into the markdown/JSON report and must remain free of token markers, `.env` paths, and private local paths.
93
+
94
  Optional rollback to mock-safe settings:
95
 
96
  ```bash
 
102
 
103
  The validation script must not print Hugging Face tokens. It uses three temporary public Wikimedia Commons images and does not commit downloaded assets.
104
 
105
+ ## Optional GGUF Smoke Test
106
+
107
+ This is a local-only model evidence step. It should be run only after confirming optional dependency installation and GGUF download.
108
+
109
+ Recommended model:
110
+
111
+ ```text
112
+ repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
113
+ file: qwen2.5-1.5b-instruct-q4_k_m.gguf
114
+ local path: models/qwen2.5-1.5b-instruct-q4_k_m.gguf
115
+ ```
116
+
117
+ Do not commit the downloaded GGUF. After the file is present and optional `llama-cpp-python` is installed:
118
+
119
+ ```bash
120
+ .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
121
+ --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
122
+ ```
123
+
124
+ Passing evidence requires `llama-cpp text generation` and no `text-fallback-to-mock` marker for generation or chat.
125
+
126
  2026-06-06 validation attempt:
127
 
128
  - `--configure-space` was run for `l4x1`.
docs/FAILURES.md CHANGED
@@ -10,6 +10,14 @@ Use it for model/runtime/deployment/data issues, not for UI polish notes.
10
 
11
  MiniCPM-V 2.6 is wired as an optional vision backend. Hosted Space ZeroGPU validation ran on 2026-06-08, but all three public object checks fell back to mock vision, so full hosted MiniCPM-V validation is still unresolved.
12
 
 
 
 
 
 
 
 
 
13
  Known non-blocking warning:
14
 
15
  - Gradio emits deprecation warnings for upcoming 6.0 API changes during local tests. This does not break the current Gradio Blocks build and can be handled with the later UI/API polish pass.
@@ -40,6 +48,25 @@ Known non-blocking warning:
40
  - Resolution: unresolved; inspect Space runtime logs or add non-secret fallback diagnostics for the MiniCPM-V load/chat exception.
41
  - Evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, and `data/traces/space-vlm/`.
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ## Anticipated Failure Areas
44
 
45
  ### Vision Runtime
 
10
 
11
  MiniCPM-V 2.6 is wired as an optional vision backend. Hosted Space ZeroGPU validation ran on 2026-06-08, but all three public object checks fell back to mock vision, so full hosted MiniCPM-V validation is still unresolved.
12
 
13
+ The app now includes a hidden `/vision_runtime_probe` API and `scripts/check_space_vlm.py` writes probe output into the Space VLM report before image validation. The next hosted run should use this probe to identify whether the fallback is caused by dependency import, GPU visibility, MiniCPM-V loading, or generation output.
14
+
15
+ The recommended baseline GGUF for local text smoke testing is selected, but not downloaded or run:
16
+
17
+ - repo: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`
18
+ - file: `qwen2.5-1.5b-instruct-q4_k_m.gguf`
19
+ - helper: `scripts/check_llama_cpp_smoke.py`
20
+
21
  Known non-blocking warning:
22
 
23
  - Gradio emits deprecation warnings for upcoming 6.0 API changes during local tests. This does not break the current Gradio Blocks build and can be handled with the later UI/API polish pass.
 
48
  - Resolution: unresolved; inspect Space runtime logs or add non-secret fallback diagnostics for the MiniCPM-V load/chat exception.
49
  - Evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, and `data/traces/space-vlm/`.
50
 
51
+ ## Latest Space VLM Validation Failure
52
+
53
+ - Updated: 2026-06-08
54
+ - Area: Hugging Face Space vision runtime.
55
+ - Probe backend: not available in the historical 2026-06-08 report; probe support was added afterward.
56
+ - Failed checks: mug, keyboard, and shoe all included `vision-fallback-to-mock`.
57
+ - Fallback used: mock object understanding plus mock text runtime.
58
+ - Resolution: unresolved; keep the public Space mock-safe until a probe-aware validation run passes without `vision-fallback-to-mock`.
59
+
60
+ ## 2026-06-08 - GGUF Smoke Helper Prepared, Actual Smoke Pending
61
+
62
+ - Area: llama.cpp text runtime evidence.
63
+ - Reproduction: Run `scripts/check_llama_cpp_smoke.py` with an external GGUF model path after optional dependency installation.
64
+ - Expected: trace records `llama-cpp text generation`, persona/diary/chat run without `text-fallback-to-mock`.
65
+ - Actual: not run; `.venv` does not include `llama-cpp-python` by default and the GGUF file is intentionally not committed.
66
+ - Impact: Llama Champion evidence remains incomplete.
67
+ - Fallback used: default mock text runtime remains the safe public demo path.
68
+ - Resolution: pending explicit confirmation to install optional local dependency and download `qwen2.5-1.5b-instruct-q4_k_m.gguf` into ignored `models/`.
69
+
70
  ## Anticipated Failure Areas
71
 
72
  ### Vision Runtime
docs/FIELD_NOTES.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  ## Status
4
 
5
- Stable submission draft. This document is ready to adapt into the final Field Notes post after the public GitHub, demo video, and social post URLs are confirmed.
6
 
7
  ## 1. Why I Built It
8
 
@@ -63,7 +63,7 @@ The app keeps the Gradio UI separate from model execution:
63
  - `src/traces/logger.py` writes anonymized trace records
64
  - `src/renderer/share_card.py` renders the shareable card preview
65
 
66
- This boundary matters. It lets the mock MVP, hosted Space validation, and future local GGUF experiments share the same data shapes and fallback markers.
67
 
68
  ## 6. Runtime And Fallbacks
69
 
@@ -91,6 +91,8 @@ TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
91
 
92
  The fallback behavior is explicit. If MiniCPM-V fails or returns invalid JSON, the trace records `vision-fallback-to-mock`. If llama.cpp is unavailable, missing a model path, or returns invalid JSON, the trace records `text-fallback-to-mock`.
93
 
 
 
94
  ## 7. What Worked
95
 
96
  The stable loop works locally and in the mock-safe Space:
@@ -115,6 +117,8 @@ Paid L4 hardware on the hackathon organization returned `402 Payment Required`.
115
 
116
  This is not hidden in the submission. The stable baseline treats MiniCPM-V as wired but not yet validated in the hosted environment.
117
 
 
 
118
  ## 9. Traces And Reproducibility
119
 
120
  The project includes public mock traces for the six stable examples under `data/traces/samples/`. They are deterministic and intended for demo replay, schema validation, and public inspection.
@@ -127,6 +131,15 @@ The export command is:
127
  .venv/bin/python -B scripts/export_traces.py
128
  ```
129
 
 
 
 
 
 
 
 
 
 
130
  ## 10. Privacy And Safety
131
 
132
  The project does not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs. It does not commit GGUF files, private images, tokens, credit codes, or `.env` files.
@@ -140,9 +153,9 @@ The next model-focused step is to inspect Space runtime logs or add non-secret M
140
  After that:
141
 
142
  - rerun ZeroGPU MiniCPM-V validation
143
- - choose and smoke-test a real GGUF text model
144
- - generate and curate real training candidates
145
- - publish a dataset and fine-tuned adapter if time allows
146
  - record a final demo video from the stable Space
147
 
148
  The current version is intentionally honest: it is a stable, reproducible small-model toy baseline with clear boundaries, visible failures, and a path to stronger model evidence.
@@ -150,6 +163,8 @@ The current version is intentionally honest: it is a stable, reproducible small-
150
  ## Evidence Links To Fill Before Final Submission
151
 
152
  - Hugging Face Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 
 
153
  - GitHub repository: pending push confirmation
154
  - Demo video: pending recording
155
  - Social post: pending publishing
 
2
 
3
  ## Status
4
 
5
+ Publication-ready draft. Fill the public GitHub, demo video, and social post URLs before posting; do not publish until those external actions are explicitly confirmed.
6
 
7
  ## 1. Why I Built It
8
 
 
63
  - `src/traces/logger.py` writes anonymized trace records
64
  - `src/renderer/share_card.py` renders the shareable card preview
65
 
66
+ This boundary matters. It lets the mock MVP, hosted Space validation, diagnostics, and local GGUF experiments share the same data shapes and fallback markers.
67
 
68
  ## 6. Runtime And Fallbacks
69
 
 
91
 
92
  The fallback behavior is explicit. If MiniCPM-V fails or returns invalid JSON, the trace records `vision-fallback-to-mock`. If llama.cpp is unavailable, missing a model path, or returns invalid JSON, the trace records `text-fallback-to-mock`.
93
 
94
+ The hosted Space also has a hidden `/vision_runtime_probe` endpoint for non-secret runtime diagnostics. It checks Torch and Transformers imports, GPU visibility, and whether MiniCPM-V can load, while redacting token markers and private paths.
95
+
96
  ## 7. What Worked
97
 
98
  The stable loop works locally and in the mock-safe Space:
 
117
 
118
  This is not hidden in the submission. The stable baseline treats MiniCPM-V as wired but not yet validated in the hosted environment.
119
 
120
+ After this failure, I added a probe-aware validation path so the next hosted run can report whether the failure is happening at dependency import, GPU visibility, model loading, or generation time.
121
+
122
  ## 9. Traces And Reproducibility
123
 
124
  The project includes public mock traces for the six stable examples under `data/traces/samples/`. They are deterministic and intended for demo replay, schema validation, and public inspection.
 
131
  .venv/bin/python -B scripts/export_traces.py
132
  ```
133
 
134
+ For text runtime evidence, the project now includes a local smoke helper for an external GGUF:
135
+
136
+ ```bash
137
+ .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
138
+ --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
139
+ ```
140
+
141
+ The recommended baseline file is `qwen2.5-1.5b-instruct-q4_k_m.gguf` from `Qwen/Qwen2.5-1.5B-Instruct-GGUF`. It is intentionally not committed.
142
+
143
  ## 10. Privacy And Safety
144
 
145
  The project does not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs. It does not commit GGUF files, private images, tokens, credit codes, or `.env` files.
 
153
  After that:
154
 
155
  - rerun ZeroGPU MiniCPM-V validation
156
+ - run the documented GGUF smoke test after explicit confirmation
157
+ - decide whether the published LoRA should remain badge evidence only or be converted later
158
+ - generate real non-mock traces if hosted/local model validation passes
159
  - record a final demo video from the stable Space
160
 
161
  The current version is intentionally honest: it is a stable, reproducible small-model toy baseline with clear boundaries, visible failures, and a path to stronger model evidence.
 
163
  ## Evidence Links To Fill Before Final Submission
164
 
165
  - Hugging Face Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
166
+ - Dataset: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
167
+ - LoRA adapter: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
168
  - GitHub repository: pending push confirmation
169
  - Demo video: pending recording
170
  - Social post: pending publishing
docs/FINAL_VERIFICATION_REPORT.md CHANGED
@@ -1,30 +1,39 @@
1
  # Final Verification Report
2
 
3
- - Generated at: 2026-06-08 11:19:49 CST
4
- - Verified source commit: `b7cb470`
5
  - Branch: `main`
6
- - Verification target: stable mock-safe submission baseline
7
- - Local app URL: `http://127.0.0.1:7860/`
8
 
9
  ## Summary
10
 
11
- Objectverse Diary's stable mock-safe baseline is locally verifiable. The app starts with default mock backends, renders the archive-style Gradio interface, runs all six committed example objects, supports object chat, renders share cards, and exposes trace evidence.
12
 
13
- This report does not claim hosted MiniCPM-V validation, GGUF text generation, LoRA training, model publishing, dataset publishing, or final public submission URLs are complete.
 
 
 
 
 
 
 
 
14
 
15
  ## Command Verification
16
 
17
  | Check | Result | Notes |
18
  | --- | --- | --- |
19
- | `git status --short --untracked-files=all` | PASS | Clean before report generation. |
20
- | `.venv/bin/python -B -m unittest discover -s tests` | PASS | 30 tests passed. Gradio 6.0 deprecation warnings are non-blocking. |
21
  | `.venv/bin/python -B scripts/check_initial_stage.py` | PASS | Required files, runtime defaults, trace generation, sample traces, dataset preview, trace export, and Gradio build all passed. |
22
  | `.venv/bin/python -B scripts/export_traces.py` | PASS | Exported 6 traces to `data/traces/samples/objectverse_public_mock_traces.jsonl`. |
23
  | `git diff --check` | PASS | No whitespace errors. |
24
 
25
  ## Browser Verification
26
 
27
- The local app was started with:
 
 
28
 
29
  ```bash
30
  GRADIO_SERVER_NAME=127.0.0.1 GRADIO_SERVER_PORT=7860 .venv/bin/python app.py
@@ -51,31 +60,26 @@ Browser checks:
51
  - Six stable public mock sample traces remain under `data/traces/samples/`.
52
  - The trace export JSONL was regenerated successfully.
53
  - Hosted Space VLM traces under `data/traces/space-vlm/` remain failure evidence because they include `vision-fallback-to-mock`; they are intentionally not used as successful real VLM traces.
 
54
 
55
  ## Security Scan
56
 
57
- Scanned project docs, source, scripts, tests, and trace directories for:
58
 
59
  - `hf_`
60
  - `HF_TOKEN`
61
  - `HUGGINGFACE_TOKEN`
62
- - `BEGIN PRIVATE KEY`
63
- - `SUPABASE_SERVICE_ROLE_KEY`
64
- - test email pattern
65
- - private local path markers
66
  - `.env`
67
 
68
- Result: PASS with known safe hits only.
69
 
70
  Known safe hits:
71
 
72
- - test fixtures intentionally containing `user@example.com`
73
- - tests asserting that token markers are absent
74
  - `scripts/check_space_vlm.py` sensitive marker constants and auth helper names
75
- - documentation warning not to commit `.env`
76
- - `.env.example` path shown in architecture docs
77
 
78
- No real token, private key, credential, private image path, GGUF file, or `.env` file was found in the scanned project content.
79
 
80
  ## Remaining External Items
81
 
@@ -85,10 +89,11 @@ No real token, private key, credential, private image path, GGUF file, or `.env`
85
  - Field Notes URL is still pending publication.
86
  - Social post URL is still pending publication.
87
  - Hosted MiniCPM-V validation still falls back to mock vision.
88
- - Real GGUF smoke test, LoRA training, HF model publishing, and HF dataset publishing remain future work.
 
89
 
90
  ## Verdict
91
 
92
- PASS for the stable mock-safe local submission baseline.
93
 
94
- The project is ready for explicit-confirmation external steps: push `main`, record/publish the demo video, publish Field Notes/social post, and fill final submission URLs.
 
1
  # Final Verification Report
2
 
3
+ - Generated at: 2026-06-08 16:24:23 CST
4
+ - Verified source commit: uncommitted local implementation on `main`
5
  - Branch: `main`
6
+ - Verification target: mock-safe submission baseline plus local diagnostics/smoke-helper implementation
7
+ - Local app URL: not launched during this verification update
8
 
9
  ## Summary
10
 
11
+ Objectverse Diary's stable mock-safe baseline remains locally verifiable. This update adds non-secret MiniCPM-V runtime diagnostics through a hidden Gradio API, probe-aware Space VLM reporting, a latest-failure-note updater, and a local llama.cpp GGUF smoke-test helper.
12
 
13
+ This report does not claim hosted MiniCPM-V validation, real GGUF text generation, live LoRA runtime wiring, GitHub push, Field Notes publication, demo video publication, social post publication, or final public submission URLs are complete.
14
+
15
+ ## Implementation Additions
16
+
17
+ - Hidden `/vision_runtime_probe` Gradio API returns sanitized backend, dependency, GPU, and MiniCPM-V load diagnostics.
18
+ - `scripts/check_space_vlm.py` can include probe output in markdown/JSON reports and update the latest failure section in `docs/FAILURES.md`.
19
+ - `scripts/check_llama_cpp_smoke.py` validates persona, diary, and chat through an externally configured GGUF without committing model files.
20
+ - Runtime status no longer records literal `TEXT_MODEL_PATH`; traces only record whether an external GGUF path is configured.
21
+ - Submission docs now distinguish final-draft materials from published URLs.
22
 
23
  ## Command Verification
24
 
25
  | Check | Result | Notes |
26
  | --- | --- | --- |
27
+ | `.venv/bin/python -B -m unittest discover -s tests` | PASS | 46 tests passed. Gradio 6.0 deprecation warnings and an asyncio ResourceWarning remain non-blocking. |
 
28
  | `.venv/bin/python -B scripts/check_initial_stage.py` | PASS | Required files, runtime defaults, trace generation, sample traces, dataset preview, trace export, and Gradio build all passed. |
29
  | `.venv/bin/python -B scripts/export_traces.py` | PASS | Exported 6 traces to `data/traces/samples/objectverse_public_mock_traces.jsonl`. |
30
  | `git diff --check` | PASS | No whitespace errors. |
31
 
32
  ## Browser Verification
33
 
34
+ Not re-run in this verification update. The previous stable baseline browser verification remains useful evidence for the mock-safe UI, but the new hidden `/vision_runtime_probe` API was verified through unit coverage rather than a browser session.
35
+
36
+ Previous local app command:
37
 
38
  ```bash
39
  GRADIO_SERVER_NAME=127.0.0.1 GRADIO_SERVER_PORT=7860 .venv/bin/python app.py
 
60
  - Six stable public mock sample traces remain under `data/traces/samples/`.
61
  - The trace export JSONL was regenerated successfully.
62
  - Hosted Space VLM traces under `data/traces/space-vlm/` remain failure evidence because they include `vision-fallback-to-mock`; they are intentionally not used as successful real VLM traces.
63
+ - New runtime traces do not include literal `TEXT_MODEL_PATH` values.
64
 
65
  ## Security Scan
66
 
67
+ Targeted safety coverage now includes unit tests and an `rg` scan for probe/report/trace outputs that reject or redact:
68
 
69
  - `hf_`
70
  - `HF_TOKEN`
71
  - `HUGGINGFACE_TOKEN`
 
 
 
 
72
  - `.env`
73
 
74
+ Result: PASS for the targeted diagnostic/report paths and repository scan.
75
 
76
  Known safe hits:
77
 
 
 
78
  - `scripts/check_space_vlm.py` sensitive marker constants and auth helper names
79
+ - tests intentionally containing fake `hf_forbidden` and `.env` strings to verify redaction
80
+ - `publish_hf_adapter` filenames/imports that match the broad `hf_` scan pattern but are not tokens
81
 
82
+ No GGUF file, real token, private key, credential, or `.env` file was added by this implementation.
83
 
84
  ## Remaining External Items
85
 
 
89
  - Field Notes URL is still pending publication.
90
  - Social post URL is still pending publication.
91
  - Hosted MiniCPM-V validation still falls back to mock vision.
92
+ - Real GGUF download, optional `llama-cpp-python` installation, and smoke test remain pending explicit confirmation.
93
+ - GGUF conversion and live runtime wiring for the published LoRA adapter remain future work.
94
 
95
  ## Verdict
96
 
97
+ PASS for the stable mock-safe local submission baseline plus local diagnostics/smoke-helper implementation.
98
 
99
+ The project is ready for explicit-confirmation external steps: push `main`, sync the Space, rerun probe-aware Space VLM validation, run the local GGUF smoke test after optional dependency/model setup, record/publish the demo video, publish Field Notes/social post, and fill final submission URLs.
docs/MODEL_CARD.md CHANGED
@@ -4,7 +4,7 @@
4
 
5
  Stable submission baseline plus one published text LoRA test adapter. The public Gradio Space still defaults to deterministic mock text; the adapter is training evidence and has not been converted to GGUF or wired into the live runtime.
6
 
7
- The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`. A Modal LoRA test run completed for the planned text model path and the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`.
8
 
9
  Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validation reached the Space, but all three public object checks fell back to mock vision. See `docs/SPACE_VLM_REPORT.md` and `docs/FAILURES.md`.
10
 
@@ -20,7 +20,7 @@ Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validat
20
  | --- | --- | --- |
21
  | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Wired as optional backend; hosted validation currently falls back to mock. |
22
  | Text | deterministic mock text; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA test adapter | Adapter published; not converted to GGUF or wired into Space runtime. |
23
- | Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; real-model smoke test still pending. |
24
  | UI | Gradio Blocks | Required by the hackathon and project rules. |
25
 
26
  ## Parameter Budget
@@ -34,6 +34,7 @@ Record final numbers here before submission:
34
  | Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
35
  | Text base | Stable baseline mock text | 0 | no model parameters |
36
  | Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled |
 
37
  | Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled |
38
  | Stable baseline total | Mock text + optional wired vision not active by default | 0 active model parameters by default | <= 32B |
39
 
@@ -83,6 +84,13 @@ Training run summary:
83
  - Train loss: 1.6697
84
  - GGUF conversion: not completed
85
 
 
 
 
 
 
 
 
86
  ## Safety And Privacy
87
 
88
  - Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
@@ -95,6 +103,7 @@ Training run summary:
95
  - If VLM loading fails, use manual description and stable example flow.
96
  - If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
97
  - If model JSON is invalid, repair and validate before rendering.
 
98
  - Hosted VLM fallback evidence is preserved in `data/traces/space-vlm/` and should not be described as successful real VLM output.
99
 
100
  ## Required Notes
 
4
 
5
  Stable submission baseline plus one published text LoRA test adapter. The public Gradio Space still defaults to deterministic mock text; the adapter is training evidence and has not been converted to GGUF or wired into the live runtime.
6
 
7
+ The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments, with a hidden non-secret probe for hosted diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`. A Modal LoRA test run completed for the planned text model path and the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`.
8
 
9
  Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validation reached the Space, but all three public object checks fell back to mock vision. See `docs/SPACE_VLM_REPORT.md` and `docs/FAILURES.md`.
10
 
 
20
  | --- | --- | --- |
21
  | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Wired as optional backend; hosted validation currently falls back to mock. |
22
  | Text | deterministic mock text; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA test adapter | Adapter published; not converted to GGUF or wired into Space runtime. |
23
+ | Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; smoke helper exists, real-model smoke test still pending. |
24
  | UI | Gradio Blocks | Required by the hackathon and project rules. |
25
 
26
  ## Parameter Budget
 
34
  | Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
35
  | Text base | Stable baseline mock text | 0 | no model parameters |
36
  | Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled |
37
+ | Recommended GGUF smoke file | `Qwen/Qwen2.5-1.5B-Instruct-GGUF` / `qwen2.5-1.5b-instruct-q4_k_m.gguf` | ~1.5B base, quantized file | yes, if used for text runtime smoke |
38
  | Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled |
39
  | Stable baseline total | Mock text + optional wired vision not active by default | 0 active model parameters by default | <= 32B |
40
 
 
84
  - Train loss: 1.6697
85
  - GGUF conversion: not completed
86
 
87
+ GGUF smoke status:
88
+
89
+ - Recommended repo: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`
90
+ - Recommended file: `qwen2.5-1.5b-instruct-q4_k_m.gguf`
91
+ - Local helper: `scripts/check_llama_cpp_smoke.py`
92
+ - Current state: file not downloaded, optional `llama-cpp-python` not installed by default, smoke test not run.
93
+
94
  ## Safety And Privacy
95
 
96
  - Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
 
103
  - If VLM loading fails, use manual description and stable example flow.
104
  - If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
105
  - If model JSON is invalid, repair and validate before rendering.
106
+ - Runtime traces do not record literal `TEXT_MODEL_PATH`; they only record that an external GGUF path is configured.
107
  - Hosted VLM fallback evidence is preserved in `data/traces/space-vlm/` and should not be described as successful real VLM output.
108
 
109
  ## Required Notes
docs/RUNTIME.md CHANGED
@@ -37,6 +37,38 @@ TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
37
 
38
  `llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ## Environment Variables
41
 
42
  ```bash
 
37
 
38
  `llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
39
 
40
+ The runtime trace intentionally records only whether an external GGUF path was configured, not the literal `TEXT_MODEL_PATH`, so local private paths do not leak into public traces.
41
+
42
+ ## Runtime Diagnostics
43
+
44
+ The Gradio app exposes two hidden diagnostic APIs:
45
+
46
+ - `/zero_gpu_probe`: checks Torch import and CUDA visibility.
47
+ - `/vision_runtime_probe`: checks configured vision backend, Torch/Transformers import, CUDA/MPS visibility, and MiniCPM-V load success or sanitized failure summaries.
48
+
49
+ These APIs are for validation scripts and are not visible in the main UI. They must not return tokens, `.env` paths, Hugging Face token markers, or private local filesystem paths.
50
+
51
+ `scripts/check_space_vlm.py` calls `/vision_runtime_probe` before the mug/keyboard/shoe validation run and writes the probe output into `docs/SPACE_VLM_REPORT.md` and `docs/SPACE_VLM_REPORT.json`.
52
+
53
+ ## Optional GGUF Smoke Test
54
+
55
+ Recommended baseline smoke model:
56
+
57
+ ```text
58
+ repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
59
+ file: qwen2.5-1.5b-instruct-q4_k_m.gguf
60
+ local path: models/qwen2.5-1.5b-instruct-q4_k_m.gguf
61
+ ```
62
+
63
+ The `models/` directory and `*.gguf` are ignored by Git. After downloading the file externally and installing optional `llama-cpp-python` after confirmation, run:
64
+
65
+ ```bash
66
+ .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
67
+ --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
68
+ ```
69
+
70
+ A passing smoke test must show `llama-cpp text generation` and must not include `text-fallback-to-mock` in either generation or chat fallback markers.
71
+
72
  ## Environment Variables
73
 
74
  ```bash
docs/SOCIAL_POST.md CHANGED
@@ -6,6 +6,7 @@ I built Objectverse Diary for Build Small Hackathon: a Gradio app where everyday
6
 
7
  Stable demo: mock-safe, reproducible, no commercial AI APIs.
8
  MiniCPM-V and llama.cpp paths are wired behind fallbacks; hosted VLM validation is documented honestly.
 
9
 
10
  Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
11
 
@@ -24,6 +25,8 @@ Objectverse Diary is my Build Small Hackathon project: a strange little object a
24
 
25
  The stable submission baseline is mock-safe and reproducible, with no commercial AI APIs. MiniCPM-V vision and llama.cpp text paths are wired as optional backends, and the current hosted MiniCPM-V fallback is documented instead of hidden.
26
 
 
 
27
  Space:
28
  https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
29
 
@@ -35,4 +38,4 @@ https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
35
 
36
  - Add GitHub URL after push is confirmed.
37
  - Add demo video URL after recording.
38
- - Do not claim LoRA, GGUF smoke test, or hosted MiniCPM-V validation are complete.
 
6
 
7
  Stable demo: mock-safe, reproducible, no commercial AI APIs.
8
  MiniCPM-V and llama.cpp paths are wired behind fallbacks; hosted VLM validation is documented honestly.
9
+ Synthetic curated dataset + Qwen 1.5B LoRA adapter are published as training evidence.
10
 
11
  Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
12
 
 
25
 
26
  The stable submission baseline is mock-safe and reproducible, with no commercial AI APIs. MiniCPM-V vision and llama.cpp text paths are wired as optional backends, and the current hosted MiniCPM-V fallback is documented instead of hidden.
27
 
28
+ I also published a small synthetic curated SFT dataset and a Qwen 1.5B LoRA test adapter for Well-Tuned evidence. The adapter is not wired into the public Space runtime yet; the live demo stays intentionally reliable.
29
+
30
  Space:
31
  https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
32
 
 
38
 
39
  - Add GitHub URL after push is confirmed.
40
  - Add demo video URL after recording.
41
+ - Do not claim GGUF smoke test, hosted MiniCPM-V validation, or live LoRA runtime wiring are complete.
docs/SUBMISSION_GUIDE.md CHANGED
@@ -19,6 +19,7 @@
19
  - Dataset plan and preview workflow: `docs/DATASET.md`
20
  - External setup checklist: `docs/EXTERNAL_SETUP.md`
21
  - Space VLM validation report: `docs/SPACE_VLM_REPORT.md` currently failed. Paid L4 returned `402 Payment Required`; later ZeroGPU validation reached the app on 2026-06-08, but mug/keyboard/shoe all fell back to mock vision.
 
22
  - Space VLM trace evidence: `data/traces/space-vlm/`
23
  - Public mock traces: `data/traces/samples/`
24
  - Stable demo baseline: Gradio example buttons replay committed sample traces first, then fall back to the live generation pipeline if a cached trace is missing.
@@ -31,6 +32,8 @@
31
  - MiniCPM-V 2.6 backend wiring with fallback markers.
32
  - Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
33
  - Hosted Space VLM validation script, report, JSON summary, and trace evidence export.
 
 
34
  - Synthetic curated SFT dataset published to Hugging Face Datasets.
35
  - Modal Qwen 1.5B LoRA test run completed and adapter published to Hugging Face Models.
36
  - Field Notes draft, demo video script, and social post draft for the stable submission package.
@@ -38,7 +41,7 @@
38
  ## Not Completed Yet
39
 
40
  - Hosted Space MiniCPM-V validation for mug, keyboard, and shoe; ZeroGPU validation reached the app but currently falls back to mock vision.
41
- - Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count.
42
  - Real model traces, GGUF conversion, and app runtime wiring for the published adapter.
43
  - Field Notes publication URL, recorded demo video URL, social post URL, and final public push/submission.
44
 
@@ -46,6 +49,7 @@
46
 
47
  - [ ] Space is under the official organization.
48
  - [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe. Current status: wired but hosted validation falls back to mock.
 
49
  - [x] Demo video script targets under 2 minutes.
50
  - [x] README includes stable-baseline parameter budget and links to the model card.
51
  - [ ] No commercial cloud AI APIs are used.
 
19
  - Dataset plan and preview workflow: `docs/DATASET.md`
20
  - External setup checklist: `docs/EXTERNAL_SETUP.md`
21
  - Space VLM validation report: `docs/SPACE_VLM_REPORT.md` currently failed. Paid L4 returned `402 Payment Required`; later ZeroGPU validation reached the app on 2026-06-08, but mug/keyboard/shoe all fell back to mock vision.
22
+ - Space VLM diagnostics: hidden `/vision_runtime_probe` API and probe-aware `scripts/check_space_vlm.py` are available for the next explicit-confirmation ZeroGPU validation.
23
  - Space VLM trace evidence: `data/traces/space-vlm/`
24
  - Public mock traces: `data/traces/samples/`
25
  - Stable demo baseline: Gradio example buttons replay committed sample traces first, then fall back to the live generation pipeline if a cached trace is missing.
 
32
  - MiniCPM-V 2.6 backend wiring with fallback markers.
33
  - Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
34
  - Hosted Space VLM validation script, report, JSON summary, and trace evidence export.
35
+ - Hosted Space VLM probe support and latest failure-note update support.
36
+ - Local GGUF smoke-test helper for `Qwen/Qwen2.5-1.5B-Instruct-GGUF` / `qwen2.5-1.5b-instruct-q4_k_m.gguf`; actual GGUF smoke remains pending.
37
  - Synthetic curated SFT dataset published to Hugging Face Datasets.
38
  - Modal Qwen 1.5B LoRA test run completed and adapter published to Hugging Face Models.
39
  - Field Notes draft, demo video script, and social post draft for the stable submission package.
 
41
  ## Not Completed Yet
42
 
43
  - Hosted Space MiniCPM-V validation for mug, keyboard, and shoe; ZeroGPU validation reached the app but currently falls back to mock vision.
44
+ - Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count. The recommended baseline GGUF has been selected, but not downloaded or run.
45
  - Real model traces, GGUF conversion, and app runtime wiring for the published adapter.
46
  - Field Notes publication URL, recorded demo video URL, social post URL, and final public push/submission.
47
 
 
49
 
50
  - [ ] Space is under the official organization.
51
  - [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe. Current status: wired but hosted validation falls back to mock.
52
+ - [x] Space MiniCPM-V non-secret diagnostic probe is implemented locally.
53
  - [x] Demo video script targets under 2 minutes.
54
  - [x] README includes stable-baseline parameter budget and links to the model card.
55
  - [ ] No commercial cloud AI APIs are used.
scripts/README.md CHANGED
@@ -10,6 +10,7 @@ Implemented initial scripts:
10
  - `prepare_curated_dataset.py`: creates 50 synthetic curated SFT rows for Modal LoRA pipeline testing.
11
  - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
12
  - `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
 
13
  - `finetune_lora.py`: validates SFT JSONL locally and defines the Modal LoRA training scaffold for the future Well-Tuned path.
14
  - `publish_hf_adapter.py`: uploads a downloaded LoRA adapter folder to Hugging Face Hub.
15
 
@@ -76,7 +77,9 @@ Space VLM validation:
76
  ```bash
77
  .venv/bin/python -B scripts/check_space_vlm.py \
78
  --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
79
- --output docs/SPACE_VLM_REPORT.md
 
 
80
  ```
81
 
82
  External Space changes are explicit:
@@ -85,4 +88,13 @@ External Space changes are explicit:
85
  .venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
86
  ```
87
 
88
- Current status: mock trace generation, trace JSONL export, SFT preview generation, synthetic curated dataset publishing, optional MiniCPM-V wiring, optional llama.cpp wiring, hosted Space VLM validation tooling, Modal LoRA training scaffolding, one Modal LoRA test run, and HF adapter publishing are implemented. Real model validation on Space, GGUF conversion, and app runtime wiring for the adapter are not completed yet.
 
 
 
 
 
 
 
 
 
 
10
  - `prepare_curated_dataset.py`: creates 50 synthetic curated SFT rows for Modal LoRA pipeline testing.
11
  - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
12
  - `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
13
+ - `check_llama_cpp_smoke.py`: smoke-tests the optional llama.cpp text runtime with an external GGUF model.
14
  - `finetune_lora.py`: validates SFT JSONL locally and defines the Modal LoRA training scaffold for the future Well-Tuned path.
15
  - `publish_hf_adapter.py`: uploads a downloaded LoRA adapter folder to Hugging Face Hub.
16
 
 
77
  ```bash
78
  .venv/bin/python -B scripts/check_space_vlm.py \
79
  --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
80
+ --output docs/SPACE_VLM_REPORT.md \
81
+ --json-output docs/SPACE_VLM_REPORT.json \
82
+ --failure-notes-output docs/FAILURES.md
83
  ```
84
 
85
  External Space changes are explicit:
 
88
  .venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
89
  ```
90
 
91
+ Local GGUF smoke test after explicit confirmation:
92
+
93
+ ```bash
94
+ .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
95
+ --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
96
+ ```
97
+
98
+ Recommended GGUF source: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`, file `qwen2.5-1.5b-instruct-q4_k_m.gguf`. Do not commit the downloaded file.
99
+
100
+ Current status: mock trace generation, trace JSONL export, SFT preview generation, synthetic curated dataset publishing, optional MiniCPM-V wiring, optional llama.cpp wiring, hosted Space VLM validation tooling with non-secret probe support, local GGUF smoke helper, Modal LoRA training scaffolding, one Modal LoRA test run, and HF adapter publishing are implemented. Real model validation on Space, actual GGUF smoke, GGUF conversion, and app runtime wiring for the adapter are not completed yet.
scripts/check_llama_cpp_smoke.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Smoke-test the optional llama.cpp text runtime with an external GGUF model."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import argparse
6
+ import json
7
+ import os
8
+ import sys
9
+ from pathlib import Path
10
+ from typing import Any
11
+
12
+ PROJECT_ROOT = Path(__file__).resolve().parents[1]
13
+ if str(PROJECT_ROOT) not in sys.path:
14
+ sys.path.insert(0, str(PROJECT_ROOT))
15
+
16
+ from src.models.llama_cpp_runner import get_text_runtime_fallbacks, reply_as_object
17
+ from src.pipeline import generate_object_diary
18
+
19
+
20
+ DEFAULT_GGUF_REPO = "Qwen/Qwen2.5-1.5B-Instruct-GGUF"
21
+ DEFAULT_GGUF_FILE = "qwen2.5-1.5b-instruct-q4_k_m.gguf"
22
+ TEXT_FALLBACK_MARKER = "text-fallback-to-mock"
23
+
24
+
25
+ def run_llama_cpp_smoke(
26
+ *,
27
+ model_path: Path,
28
+ description: str,
29
+ mode: str,
30
+ save_trace: bool,
31
+ ) -> dict[str, Any]:
32
+ if not model_path.exists():
33
+ raise FileNotFoundError(f"GGUF model path does not exist: {model_path}")
34
+
35
+ previous_text_backend = os.environ.get("OBJECTVERSE_TEXT_BACKEND")
36
+ previous_text_model_path = os.environ.get("TEXT_MODEL_PATH")
37
+ try:
38
+ os.environ["OBJECTVERSE_TEXT_BACKEND"] = "llama-cpp"
39
+ os.environ["TEXT_MODEL_PATH"] = str(model_path)
40
+ result = generate_object_diary(
41
+ None,
42
+ description,
43
+ mode,
44
+ save=save_trace,
45
+ )
46
+ chat_reply = reply_as_object(
47
+ result.persona.model_dump(mode="json"),
48
+ "What did you see today?",
49
+ )
50
+ chat_fallbacks = get_text_runtime_fallbacks()
51
+ finally:
52
+ _restore_env("OBJECTVERSE_TEXT_BACKEND", previous_text_backend)
53
+ _restore_env("TEXT_MODEL_PATH", previous_text_model_path)
54
+
55
+ payload = {
56
+ "status": "pass",
57
+ "model_path": _display_model_path(model_path),
58
+ "description": description,
59
+ "mode": mode,
60
+ "trace_id": result.trace.trace_id,
61
+ "trace_path": result.trace_path,
62
+ "model_runtime": result.trace.model_runtime,
63
+ "fallbacks": result.trace.fallbacks,
64
+ "object_name": result.object_understanding.object.name,
65
+ "character_name": result.persona.persona.character_name,
66
+ "diary_title": result.diary.title,
67
+ "chat_reply_preview": chat_reply[:160],
68
+ "chat_fallbacks": chat_fallbacks,
69
+ }
70
+ if result.trace.model_runtime.get("text") != "llama-cpp text generation":
71
+ payload["status"] = "fail"
72
+ payload["error"] = "trace did not record llama-cpp text generation"
73
+ if TEXT_FALLBACK_MARKER in result.trace.fallbacks:
74
+ payload["status"] = "fail"
75
+ payload["error"] = "trace included text-fallback-to-mock"
76
+ if TEXT_FALLBACK_MARKER in chat_fallbacks:
77
+ payload["status"] = "fail"
78
+ payload["error"] = "chat included text-fallback-to-mock"
79
+ return payload
80
+
81
+
82
+ def _restore_env(key: str, previous_value: str | None) -> None:
83
+ if previous_value is None:
84
+ os.environ.pop(key, None)
85
+ else:
86
+ os.environ[key] = previous_value
87
+
88
+
89
+ def _print_json(payload: dict[str, Any]) -> None:
90
+ print(json.dumps(payload, ensure_ascii=False, indent=2, sort_keys=True), flush=True)
91
+
92
+
93
+ def _display_model_path(model_path: Path) -> str:
94
+ try:
95
+ return str(model_path.resolve().relative_to(PROJECT_ROOT))
96
+ except ValueError:
97
+ return model_path.name
98
+
99
+
100
+ def _parse_args() -> argparse.Namespace:
101
+ parser = argparse.ArgumentParser(description=__doc__)
102
+ parser.add_argument(
103
+ "--model-path",
104
+ type=Path,
105
+ default=Path("models") / DEFAULT_GGUF_FILE,
106
+ help=f"Path to {DEFAULT_GGUF_FILE} or another external GGUF file.",
107
+ )
108
+ parser.add_argument(
109
+ "--description",
110
+ default="old white coffee mug on a developer desk",
111
+ )
112
+ parser.add_argument("--mode", default="Cynical")
113
+ parser.add_argument("--save-trace", action="store_true")
114
+ return parser.parse_args()
115
+
116
+
117
+ def main() -> None:
118
+ args = _parse_args()
119
+ try:
120
+ payload = run_llama_cpp_smoke(
121
+ model_path=args.model_path,
122
+ description=args.description,
123
+ mode=args.mode,
124
+ save_trace=args.save_trace,
125
+ )
126
+ except Exception as exc:
127
+ _print_json(
128
+ {
129
+ "status": "fail",
130
+ "model_path": _display_model_path(args.model_path),
131
+ "recommended_repo": DEFAULT_GGUF_REPO,
132
+ "recommended_file": DEFAULT_GGUF_FILE,
133
+ "error_type": type(exc).__name__,
134
+ "error": str(exc),
135
+ }
136
+ )
137
+ raise SystemExit(1) from exc
138
+
139
+ _print_json(payload)
140
+ if payload["status"] != "pass":
141
+ raise SystemExit(1)
142
+
143
+
144
+ if __name__ == "__main__":
145
+ main()
scripts/check_space_vlm.py CHANGED
@@ -25,11 +25,14 @@ DEFAULT_SPACE_URL = "https://huggingface.co/spaces/build-small-hackathon/Objectv
25
  DEFAULT_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.md")
26
  DEFAULT_JSON_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.json")
27
  DEFAULT_ASSET_DIR = Path(".tmp/space-vlm-assets")
 
28
  DEFAULT_HARDWARE = "l4x1"
29
  MOCK_SAFE_HARDWARE = "cpu-basic"
30
  GENERATE_API_NAME = "/generate_object_file"
 
31
  REQUEST_TIMEOUT_SECONDS = 45
32
  PREDICTION_TIMEOUT_SECONDS = 360
 
33
 
34
  SPACE_VARIABLES = {
35
  "OBJECTVERSE_VISION_BACKEND": "minicpm-v",
@@ -231,6 +234,23 @@ def run_space_validation(
231
  return results
232
 
233
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
  def _predict_with_timeout(
235
  client: Any,
236
  image: Any,
@@ -238,6 +258,22 @@ def _predict_with_timeout(
238
  mode: str,
239
  *,
240
  timeout_seconds: int,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
241
  ) -> Any:
242
  def _raise_timeout(_signum: int, _frame: Any) -> None:
243
  raise TimeoutError(f"Gradio prediction did not finish within {timeout_seconds}s")
@@ -245,12 +281,7 @@ def _predict_with_timeout(
245
  previous_handler = signal.signal(signal.SIGALRM, _raise_timeout)
246
  signal.alarm(max(1, timeout_seconds))
247
  try:
248
- return client.predict(
249
- image,
250
- description,
251
- mode,
252
- api_name=GENERATE_API_NAME,
253
- )
254
  finally:
255
  signal.alarm(0)
256
  signal.signal(signal.SIGALRM, previous_handler)
@@ -323,6 +354,7 @@ def render_report(
323
  space_url: str,
324
  repo_id: str,
325
  results: list[ValidationResult],
 
326
  configured: dict[str, str] | None = None,
327
  rollback: dict[str, str] | None = None,
328
  configuration_error: str = "",
@@ -357,6 +389,12 @@ def render_report(
357
  if configuration_error:
358
  lines.extend(["", "## Configuration Error", "", f"- Error: `{configuration_error}`"])
359
 
 
 
 
 
 
 
360
  lines.extend(["", "## Results", ""])
361
  for result in results:
362
  lines.extend(
@@ -396,21 +434,55 @@ def write_report(markdown: str, output_path: Path = DEFAULT_OUTPUT_PATH) -> Path
396
  return output_path
397
 
398
 
399
- def write_json_results(results: list[ValidationResult], output_path: Path) -> Path:
 
 
 
 
 
400
  output_path.parent.mkdir(parents=True, exist_ok=True)
401
- payload = [result.__dict__ for result in results]
402
- output_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
 
 
 
 
 
403
  return output_path
404
 
405
 
406
  def write_trace_record(trace: TraceRecord, output_path: Path) -> Path:
407
  output_path.parent.mkdir(parents=True, exist_ok=True)
408
  serialized = json.dumps(trace.model_dump(mode="json"), ensure_ascii=False, indent=2, sort_keys=True)
409
- _assert_trace_is_public_safe(serialized)
410
  output_path.write_text(serialized + "\n", encoding="utf-8")
411
  return output_path
412
 
413
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
414
  def _download_url(url: str, output_path: Path) -> None:
415
  request = urllib.request.Request(
416
  url,
@@ -434,14 +506,22 @@ def _extract_trace_payload(response: Any) -> dict[str, Any]:
434
  return trace_payload
435
 
436
 
 
 
 
 
 
 
 
 
437
  def extract_trace_record(response: Any) -> TraceRecord:
438
  return TraceRecord.model_validate(_extract_trace_payload(response))
439
 
440
 
441
- def _assert_trace_is_public_safe(serialized_trace: str) -> None:
442
  for marker in SENSITIVE_TRACE_MARKERS:
443
- if marker in serialized_trace:
444
- raise ValueError("Trace output may contain a sensitive token marker.")
445
 
446
 
447
  def _failure_reason(
@@ -471,6 +551,110 @@ def _runtime_stage_name(runtime: Any) -> str:
471
  return str(stage or "unknown")
472
 
473
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
474
  def _assert_hf_auth(api: Any) -> None:
475
  try:
476
  user = api.whoami()
@@ -499,6 +683,7 @@ def _parse_args() -> argparse.Namespace:
499
  parser.add_argument("--hardware", default=DEFAULT_HARDWARE)
500
  parser.add_argument("--skip-validation", action="store_true")
501
  parser.add_argument("--trace-output-dir", type=Path)
 
502
  return parser.parse_args()
503
 
504
 
@@ -507,6 +692,7 @@ def main() -> None:
507
  repo_id = parse_space_repo_id(args.space_url)
508
  configured = None
509
  rollback = None
 
510
  configuration_error = ""
511
  if args.configure_space:
512
  try:
@@ -529,6 +715,13 @@ def main() -> None:
529
 
530
  results: list[ValidationResult] = []
531
  if not args.skip_validation and not configuration_error:
 
 
 
 
 
 
 
532
  try:
533
  results = run_space_validation(
534
  space_url=args.space_url,
@@ -554,13 +747,20 @@ def main() -> None:
554
  space_url=args.space_url,
555
  repo_id=repo_id,
556
  results=results,
 
557
  configured=configured,
558
  rollback=rollback,
559
  configuration_error=configuration_error,
560
  )
561
  write_report(report, args.output)
562
  if args.json_output:
563
- write_json_results(results, args.json_output)
 
 
 
 
 
 
564
 
565
  if configuration_error or (results and not all(result.passed for result in results)):
566
  raise SystemExit(1)
 
25
  DEFAULT_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.md")
26
  DEFAULT_JSON_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.json")
27
  DEFAULT_ASSET_DIR = Path(".tmp/space-vlm-assets")
28
+ DEFAULT_FAILURE_NOTES_PATH = Path("docs/FAILURES.md")
29
  DEFAULT_HARDWARE = "l4x1"
30
  MOCK_SAFE_HARDWARE = "cpu-basic"
31
  GENERATE_API_NAME = "/generate_object_file"
32
+ PROBE_API_NAME = "/vision_runtime_probe"
33
  REQUEST_TIMEOUT_SECONDS = 45
34
  PREDICTION_TIMEOUT_SECONDS = 360
35
+ LATEST_FAILURE_HEADING = "## Latest Space VLM Validation Failure"
36
 
37
  SPACE_VARIABLES = {
38
  "OBJECTVERSE_VISION_BACKEND": "minicpm-v",
 
234
  return results
235
 
236
 
237
+ def run_vision_runtime_probe(
238
+ *,
239
+ space_url: str = DEFAULT_SPACE_URL,
240
+ timeout_seconds: int = 900,
241
+ ) -> dict[str, Any]:
242
+ client_url = space_client_url(space_url)
243
+ client = _build_gradio_client(client_url, timeout_seconds=timeout_seconds)
244
+ response = _predict_api_with_timeout(
245
+ client,
246
+ api_name=PROBE_API_NAME,
247
+ timeout_seconds=min(PREDICTION_TIMEOUT_SECONDS, timeout_seconds),
248
+ )
249
+ payload = _extract_probe_payload(response)
250
+ _assert_public_safe_serialized(json.dumps(payload, ensure_ascii=False, sort_keys=True), "Probe output")
251
+ return payload
252
+
253
+
254
  def _predict_with_timeout(
255
  client: Any,
256
  image: Any,
 
258
  mode: str,
259
  *,
260
  timeout_seconds: int,
261
+ ) -> Any:
262
+ return _predict_api_with_timeout(
263
+ client,
264
+ image,
265
+ description,
266
+ mode,
267
+ api_name=GENERATE_API_NAME,
268
+ timeout_seconds=timeout_seconds,
269
+ )
270
+
271
+
272
+ def _predict_api_with_timeout(
273
+ client: Any,
274
+ *inputs: Any,
275
+ api_name: str,
276
+ timeout_seconds: int,
277
  ) -> Any:
278
  def _raise_timeout(_signum: int, _frame: Any) -> None:
279
  raise TimeoutError(f"Gradio prediction did not finish within {timeout_seconds}s")
 
281
  previous_handler = signal.signal(signal.SIGALRM, _raise_timeout)
282
  signal.alarm(max(1, timeout_seconds))
283
  try:
284
+ return client.predict(*inputs, api_name=api_name)
 
 
 
 
 
285
  finally:
286
  signal.alarm(0)
287
  signal.signal(signal.SIGALRM, previous_handler)
 
354
  space_url: str,
355
  repo_id: str,
356
  results: list[ValidationResult],
357
+ probe_result: dict[str, Any] | None = None,
358
  configured: dict[str, str] | None = None,
359
  rollback: dict[str, str] | None = None,
360
  configuration_error: str = "",
 
389
  if configuration_error:
390
  lines.extend(["", "## Configuration Error", "", f"- Error: `{configuration_error}`"])
391
 
392
+ lines.extend(["", "## Vision Runtime Probe", ""])
393
+ if probe_result:
394
+ lines.extend(_probe_lines(probe_result))
395
+ else:
396
+ lines.append("- Probe was not run.")
397
+
398
  lines.extend(["", "## Results", ""])
399
  for result in results:
400
  lines.extend(
 
434
  return output_path
435
 
436
 
437
+ def write_json_results(
438
+ results: list[ValidationResult],
439
+ output_path: Path,
440
+ *,
441
+ probe_result: dict[str, Any] | None = None,
442
+ ) -> Path:
443
  output_path.parent.mkdir(parents=True, exist_ok=True)
444
+ result_payload = [result.__dict__ for result in results]
445
+ payload: Any = result_payload
446
+ if probe_result is not None:
447
+ payload = {"probe": probe_result, "results": result_payload}
448
+ serialized = json.dumps(payload, ensure_ascii=False, indent=2)
449
+ _assert_public_safe_serialized(serialized, "JSON report")
450
+ output_path.write_text(serialized, encoding="utf-8")
451
  return output_path
452
 
453
 
454
  def write_trace_record(trace: TraceRecord, output_path: Path) -> Path:
455
  output_path.parent.mkdir(parents=True, exist_ok=True)
456
  serialized = json.dumps(trace.model_dump(mode="json"), ensure_ascii=False, indent=2, sort_keys=True)
457
+ _assert_public_safe_serialized(serialized, "Trace output")
458
  output_path.write_text(serialized + "\n", encoding="utf-8")
459
  return output_path
460
 
461
 
462
+ def update_failure_notes(
463
+ *,
464
+ results: list[ValidationResult],
465
+ probe_result: dict[str, Any] | None,
466
+ output_path: Path = DEFAULT_FAILURE_NOTES_PATH,
467
+ configuration_error: str = "",
468
+ ) -> Path | None:
469
+ failed_results = [result for result in results if not result.passed]
470
+ if not configuration_error and not failed_results:
471
+ return None
472
+
473
+ output_path.parent.mkdir(parents=True, exist_ok=True)
474
+ existing = output_path.read_text(encoding="utf-8") if output_path.exists() else "# Failure Notes\n"
475
+ section = _latest_failure_section(
476
+ results=failed_results,
477
+ probe_result=probe_result,
478
+ configuration_error=configuration_error,
479
+ )
480
+ updated = _replace_or_append_section(existing, LATEST_FAILURE_HEADING, section)
481
+ _assert_public_safe_serialized(updated, "Failure notes")
482
+ output_path.write_text(updated, encoding="utf-8")
483
+ return output_path
484
+
485
+
486
  def _download_url(url: str, output_path: Path) -> None:
487
  request = urllib.request.Request(
488
  url,
 
506
  return trace_payload
507
 
508
 
509
+ def _extract_probe_payload(response: Any) -> dict[str, Any]:
510
+ if isinstance(response, dict):
511
+ return response
512
+ if isinstance(response, tuple | list) and len(response) == 1 and isinstance(response[0], dict):
513
+ return response[0]
514
+ raise ValueError("Probe output was not a JSON object.")
515
+
516
+
517
  def extract_trace_record(response: Any) -> TraceRecord:
518
  return TraceRecord.model_validate(_extract_trace_payload(response))
519
 
520
 
521
+ def _assert_public_safe_serialized(serialized_payload: str, label: str) -> None:
522
  for marker in SENSITIVE_TRACE_MARKERS:
523
+ if marker in serialized_payload:
524
+ raise ValueError(f"{label} may contain a sensitive token marker.")
525
 
526
 
527
  def _failure_reason(
 
551
  return str(stage or "unknown")
552
 
553
 
554
+ def _safe_error_payload(exc: Exception, *, stage: str) -> dict[str, str]:
555
+ return {
556
+ "backend": "unknown",
557
+ "probe_ok": "false",
558
+ "stage": stage,
559
+ "error_type": type(exc).__name__,
560
+ "error_summary": _sanitize_error_summary(str(exc) or type(exc).__name__),
561
+ }
562
+
563
+
564
+ def _sanitize_error_summary(value: str, *, max_length: int = 240) -> str:
565
+ clean = value.replace(str(Path.home()), "[home]")
566
+ clean = clean.replace("HUGGINGFACE_TOKEN", "[redacted]")
567
+ clean = clean.replace("HF_TOKEN", "[redacted]")
568
+ clean = clean.replace("hf_", "[redacted]")
569
+ if len(clean) > max_length:
570
+ return clean[: max_length - 3] + "..."
571
+ return clean
572
+
573
+
574
+ def _probe_lines(probe_result: dict[str, Any]) -> list[str]:
575
+ summary_keys = (
576
+ "backend",
577
+ "vision_model_id",
578
+ "torch_import",
579
+ "transformers_import",
580
+ "cuda_available",
581
+ "device_count",
582
+ "device_name",
583
+ "mps_available",
584
+ "minicpm_load_attempted",
585
+ "minicpm_load_ok",
586
+ )
587
+ lines: list[str] = []
588
+ for key in summary_keys:
589
+ if key in probe_result:
590
+ lines.append(f"- `{key}`: `{probe_result[key]}`")
591
+ errors = probe_result.get("errors")
592
+ if isinstance(errors, list) and errors:
593
+ lines.append("- Errors:")
594
+ for error in errors:
595
+ if isinstance(error, dict):
596
+ stage = error.get("stage", "unknown")
597
+ error_type = error.get("type", "unknown")
598
+ summary = error.get("summary", "")
599
+ lines.append(f" - `{stage}`: `{error_type}` - {summary}")
600
+ elif "error_type" in probe_result:
601
+ lines.append(f"- Error: `{probe_result['error_type']}` - {probe_result.get('error_summary', '')}")
602
+ else:
603
+ lines.append("- Errors: none")
604
+ return lines
605
+
606
+
607
+ def _latest_failure_section(
608
+ *,
609
+ results: list[ValidationResult],
610
+ probe_result: dict[str, Any] | None,
611
+ configuration_error: str,
612
+ ) -> str:
613
+ now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
614
+ lines = [
615
+ LATEST_FAILURE_HEADING,
616
+ "",
617
+ f"- Updated: {now}",
618
+ "- Area: Hugging Face Space vision runtime.",
619
+ ]
620
+ if configuration_error:
621
+ lines.append(f"- Configuration error: `{_sanitize_error_summary(configuration_error)}`")
622
+ if probe_result:
623
+ lines.append(f"- Probe backend: `{probe_result.get('backend', 'unknown')}`")
624
+ lines.append(f"- MiniCPM load attempted: `{probe_result.get('minicpm_load_attempted', 'unknown')}`")
625
+ lines.append(f"- MiniCPM load ok: `{probe_result.get('minicpm_load_ok', 'unknown')}`")
626
+ errors = probe_result.get("errors")
627
+ if isinstance(errors, list) and errors:
628
+ probe_errors = []
629
+ for error in errors:
630
+ if isinstance(error, dict):
631
+ probe_errors.append(f"{error.get('stage', 'unknown')}={error.get('type', 'unknown')}")
632
+ if probe_errors:
633
+ lines.append(f"- Probe errors: {', '.join(probe_errors)}")
634
+ if results:
635
+ failures = [f"{result.key}: {result.error or 'failed'}" for result in results]
636
+ lines.append(f"- Failed checks: {'; '.join(failures)}")
637
+ lines.extend(
638
+ [
639
+ "- Fallback used: mock object understanding plus mock text runtime if validation reaches generation.",
640
+ "- Resolution: unresolved; keep the public Space mock-safe until this section reports a passing VLM validation.",
641
+ "",
642
+ ]
643
+ )
644
+ return "\n".join(lines)
645
+
646
+
647
+ def _replace_or_append_section(markdown: str, heading: str, section: str) -> str:
648
+ start = markdown.find(heading)
649
+ if start == -1:
650
+ return markdown.rstrip() + "\n\n" + section
651
+
652
+ next_start = markdown.find("\n## ", start + len(heading))
653
+ if next_start == -1:
654
+ return markdown[:start].rstrip() + "\n\n" + section
655
+ return markdown[:start].rstrip() + "\n\n" + section.rstrip() + "\n" + markdown[next_start:]
656
+
657
+
658
  def _assert_hf_auth(api: Any) -> None:
659
  try:
660
  user = api.whoami()
 
683
  parser.add_argument("--hardware", default=DEFAULT_HARDWARE)
684
  parser.add_argument("--skip-validation", action="store_true")
685
  parser.add_argument("--trace-output-dir", type=Path)
686
+ parser.add_argument("--failure-notes-output", type=Path, default=DEFAULT_FAILURE_NOTES_PATH)
687
  return parser.parse_args()
688
 
689
 
 
692
  repo_id = parse_space_repo_id(args.space_url)
693
  configured = None
694
  rollback = None
695
+ probe_result = None
696
  configuration_error = ""
697
  if args.configure_space:
698
  try:
 
715
 
716
  results: list[ValidationResult] = []
717
  if not args.skip_validation and not configuration_error:
718
+ try:
719
+ probe_result = run_vision_runtime_probe(
720
+ space_url=args.space_url,
721
+ timeout_seconds=args.timeout_seconds,
722
+ )
723
+ except Exception as exc:
724
+ probe_result = _safe_error_payload(exc, stage="vision_runtime_probe")
725
  try:
726
  results = run_space_validation(
727
  space_url=args.space_url,
 
747
  space_url=args.space_url,
748
  repo_id=repo_id,
749
  results=results,
750
+ probe_result=probe_result,
751
  configured=configured,
752
  rollback=rollback,
753
  configuration_error=configuration_error,
754
  )
755
  write_report(report, args.output)
756
  if args.json_output:
757
+ write_json_results(results, args.json_output, probe_result=probe_result)
758
+ update_failure_notes(
759
+ results=results,
760
+ probe_result=probe_result,
761
+ output_path=args.failure_notes_output,
762
+ configuration_error=configuration_error,
763
+ )
764
 
765
  if configuration_error or (results and not all(result.passed for result in results)):
766
  raise SystemExit(1)
src/config.py CHANGED
@@ -61,11 +61,15 @@ def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
61
  if text_backend == "mock":
62
  runtime_parts.append("no llama.cpp model connected yet")
63
  else:
64
- runtime_parts.append(f"text model path: {current.text_model_path or '[not configured]'}")
65
  runtime = "; ".join(runtime_parts)
66
  return {"vision": vision, "text": text, "runtime": runtime}
67
 
68
 
 
 
 
 
69
  SETTINGS = get_runtime_settings()
70
  TRACE_DIR = SETTINGS.trace_output_dir
71
  MODEL_RUNTIME_STATUS = runtime_status(SETTINGS)
 
61
  if text_backend == "mock":
62
  runtime_parts.append("no llama.cpp model connected yet")
63
  else:
64
+ runtime_parts.append(f"text model path: {_text_model_path_status(current.text_model_path)}")
65
  runtime = "; ".join(runtime_parts)
66
  return {"vision": vision, "text": text, "runtime": runtime}
67
 
68
 
69
+ def _text_model_path_status(text_model_path: str) -> str:
70
+ return "[configured external GGUF]" if text_model_path.strip() else "[not configured]"
71
+
72
+
73
  SETTINGS = get_runtime_settings()
74
  TRACE_DIR = SETTINGS.trace_output_dir
75
  MODEL_RUNTIME_STATUS = runtime_status(SETTINGS)
src/models/llama_cpp_runner.py CHANGED
@@ -80,6 +80,7 @@ def reply_as_object(persona_data: dict, message: str) -> str:
80
  return _reply_as_object_llama_cpp(persona_data, message, settings)
81
  except Exception as exc:
82
  _log_text_fallback("chat", exc)
 
83
 
84
  return _reply_as_object_mock(persona_data, message)
85
 
 
80
  return _reply_as_object_llama_cpp(persona_data, message, settings)
81
  except Exception as exc:
82
  _log_text_fallback("chat", exc)
83
+ _add_text_fallback(TEXT_FALLBACK_TO_MOCK)
84
 
85
  return _reply_as_object_mock(persona_data, message)
86
 
src/models/vision_runner.py CHANGED
@@ -2,6 +2,7 @@
2
 
3
  from __future__ import annotations
4
 
 
5
  from dataclasses import dataclass
6
  from pathlib import Path
7
  from typing import Any
@@ -25,6 +26,7 @@ KNOWN_OBJECTS = {
25
 
26
  MINICPM_DEFAULT_MODEL_ID = "openbmb/MiniCPM-V-2_6"
27
  MINICPM_BACKENDS = {"minicpm-v", "minicpm_v", "minicpmv"}
 
28
 
29
  _MINICPM_MODEL: Any | None = None
30
  _MINICPM_TOKENIZER: Any | None = None
@@ -42,6 +44,64 @@ def understand_object(image_path: str | None, description: str) -> ObjectUnderst
42
  return understand_object_with_metadata(image_path, description).object_understanding
43
 
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  def understand_object_with_metadata(
46
  image_path: str | None,
47
  description: str,
@@ -166,6 +226,36 @@ def _log_vision_fallback(backend: str, exc: Exception) -> None:
166
  )
167
 
168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
  def _infer_object_name(description: str, image_path: str | None) -> str:
170
  lowered = description.lower()
171
  for keyword, name in KNOWN_OBJECTS.items():
 
2
 
3
  from __future__ import annotations
4
 
5
+ import re
6
  from dataclasses import dataclass
7
  from pathlib import Path
8
  from typing import Any
 
26
 
27
  MINICPM_DEFAULT_MODEL_ID = "openbmb/MiniCPM-V-2_6"
28
  MINICPM_BACKENDS = {"minicpm-v", "minicpm_v", "minicpmv"}
29
+ SENSITIVE_PROBE_MARKERS = ("HF_TOKEN", "HUGGINGFACE_TOKEN", "hf_", ".env")
30
 
31
  _MINICPM_MODEL: Any | None = None
32
  _MINICPM_TOKENIZER: Any | None = None
 
44
  return understand_object_with_metadata(image_path, description).object_understanding
45
 
46
 
47
+ def probe_vision_runtime(
48
+ *,
49
+ settings: RuntimeSettings | None = None,
50
+ load_model: bool = True,
51
+ ) -> dict[str, Any]:
52
+ """Return non-secret runtime diagnostics for hosted MiniCPM-V debugging."""
53
+ current = settings or get_runtime_settings()
54
+ backend = current.vision_backend.strip().lower()
55
+ model_id = current.vision_model_id or MINICPM_DEFAULT_MODEL_ID
56
+ probe: dict[str, Any] = {
57
+ "backend": backend,
58
+ "vision_model_id": model_id if backend in MINICPM_BACKENDS else current.vision_model_id,
59
+ "torch_import": False,
60
+ "transformers_import": False,
61
+ "cuda_available": False,
62
+ "device_count": 0,
63
+ "device_name": "",
64
+ "mps_available": False,
65
+ "minicpm_load_attempted": False,
66
+ "minicpm_load_ok": False,
67
+ "errors": [],
68
+ }
69
+
70
+ torch_module: Any | None = None
71
+ try:
72
+ import torch
73
+
74
+ torch_module = torch
75
+ probe["torch_import"] = True
76
+ probe["cuda_available"] = torch.cuda.is_available()
77
+ probe["device_count"] = torch.cuda.device_count()
78
+ if probe["cuda_available"] and probe["device_count"]:
79
+ probe["device_name"] = torch.cuda.get_device_name(0)
80
+ probe["mps_available"] = bool(
81
+ getattr(torch.backends, "mps", None) and torch.backends.mps.is_available()
82
+ )
83
+ except Exception as exc:
84
+ _add_probe_error(probe, "torch", exc)
85
+
86
+ try:
87
+ from transformers import AutoModel as _AutoModel # noqa: F401
88
+ from transformers import AutoTokenizer as _AutoTokenizer # noqa: F401
89
+
90
+ probe["transformers_import"] = True
91
+ except Exception as exc:
92
+ _add_probe_error(probe, "transformers", exc)
93
+
94
+ if backend in MINICPM_BACKENDS and load_model:
95
+ probe["minicpm_load_attempted"] = True
96
+ try:
97
+ _load_minicpm_components(model_id)
98
+ probe["minicpm_load_ok"] = True
99
+ except Exception as exc:
100
+ _add_probe_error(probe, "minicpm_load", exc)
101
+
102
+ return _sanitize_probe_payload(probe)
103
+
104
+
105
  def understand_object_with_metadata(
106
  image_path: str | None,
107
  description: str,
 
226
  )
227
 
228
 
229
+ def _add_probe_error(probe: dict[str, Any], stage: str, exc: Exception) -> None:
230
+ probe["errors"].append(
231
+ {
232
+ "stage": stage,
233
+ "type": type(exc).__name__,
234
+ "summary": _sanitize_probe_text(str(exc) or type(exc).__name__),
235
+ }
236
+ )
237
+
238
+
239
+ def _sanitize_probe_payload(value: Any) -> Any:
240
+ if isinstance(value, dict):
241
+ return {str(key): _sanitize_probe_payload(item) for key, item in value.items()}
242
+ if isinstance(value, list):
243
+ return [_sanitize_probe_payload(item) for item in value]
244
+ if isinstance(value, str):
245
+ return _sanitize_probe_text(value)
246
+ return value
247
+
248
+
249
+ def _sanitize_probe_text(value: str, *, max_length: int = 240) -> str:
250
+ clean = value.replace(str(Path.home()), "[home]")
251
+ clean = re.sub(r"hf_[A-Za-z0-9_-]+", "[redacted-token]", clean)
252
+ for marker in SENSITIVE_PROBE_MARKERS:
253
+ clean = clean.replace(marker, "[redacted]")
254
+ if len(clean) > max_length:
255
+ return clean[: max_length - 3] + "..."
256
+ return clean
257
+
258
+
259
  def _infer_object_name(description: str, image_path: str | None) -> str:
260
  lowered = description.lower()
261
  for keyword, name in KNOWN_OBJECTS.items():
src/ui/layout.py CHANGED
@@ -13,6 +13,7 @@ from src.example_cache import load_sample_generation
13
  from src.examples import EXAMPLE_OBJECTS, example_button_label
14
  from src.models.llama_cpp_runner import reply_as_object
15
  from src.models.schema import GenerationResult
 
16
  from src.pipeline import format_diary_markdown, generate_object_diary
17
  from src.renderer.share_card import render_share_card
18
  from src.ui import copy
@@ -145,6 +146,8 @@ def build_app() -> gr.Blocks:
145
  result_state = gr.State()
146
  zero_gpu_probe_button = gr.Button(visible=False)
147
  zero_gpu_probe_output = gr.JSON(visible=False)
 
 
148
 
149
  # Intake & Examples Row
150
  with gr.Row(elem_id="intake", elem_classes=["content-section"]):
@@ -324,6 +327,12 @@ def build_app() -> gr.Blocks:
324
  outputs=[zero_gpu_probe_output],
325
  api_name="zero_gpu_probe",
326
  )
 
 
 
 
 
 
327
 
328
  return demo
329
 
@@ -514,3 +523,8 @@ def zero_gpu_probe() -> dict[str, Any]:
514
  "device_count": torch.cuda.device_count(),
515
  "device_name": torch.cuda.get_device_name(0) if cuda_available else "",
516
  }
 
 
 
 
 
 
13
  from src.examples import EXAMPLE_OBJECTS, example_button_label
14
  from src.models.llama_cpp_runner import reply_as_object
15
  from src.models.schema import GenerationResult
16
+ from src.models.vision_runner import probe_vision_runtime
17
  from src.pipeline import format_diary_markdown, generate_object_diary
18
  from src.renderer.share_card import render_share_card
19
  from src.ui import copy
 
146
  result_state = gr.State()
147
  zero_gpu_probe_button = gr.Button(visible=False)
148
  zero_gpu_probe_output = gr.JSON(visible=False)
149
+ vision_runtime_probe_button = gr.Button(visible=False)
150
+ vision_runtime_probe_output = gr.JSON(visible=False)
151
 
152
  # Intake & Examples Row
153
  with gr.Row(elem_id="intake", elem_classes=["content-section"]):
 
327
  outputs=[zero_gpu_probe_output],
328
  api_name="zero_gpu_probe",
329
  )
330
+ vision_runtime_probe_button.click(
331
+ fn=vision_runtime_probe,
332
+ inputs=[],
333
+ outputs=[vision_runtime_probe_output],
334
+ api_name="vision_runtime_probe",
335
+ )
336
 
337
  return demo
338
 
 
523
  "device_count": torch.cuda.device_count(),
524
  "device_name": torch.cuda.get_device_name(0) if cuda_available else "",
525
  }
526
+
527
+
528
+ @zero_gpu(duration=180)
529
+ def vision_runtime_probe() -> dict[str, Any]:
530
+ return probe_vision_runtime(load_model=True)
tests/test_llama_cpp_smoke.py ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for the optional llama.cpp smoke-test helper."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import tempfile
6
+ import unittest
7
+ from pathlib import Path
8
+ from unittest.mock import patch
9
+
10
+ from scripts.check_llama_cpp_smoke import run_llama_cpp_smoke
11
+
12
+
13
+ class FakeLlamaModel:
14
+ def __init__(self, responses: list[str]) -> None:
15
+ self.responses = responses
16
+
17
+ def create_chat_completion(self, **_: object) -> dict:
18
+ response = self.responses.pop(0)
19
+ return {"choices": [{"message": {"content": response}}]}
20
+
21
+
22
+ class LlamaCppSmokeTest(unittest.TestCase):
23
+ def test_smoke_passes_when_pipeline_uses_llama_cpp_without_fallback(self) -> None:
24
+ fake_llama = FakeLlamaModel(
25
+ [
26
+ """
27
+ {"persona":{"object_name":"coffee mug","character_name":"Mugworth","mood":"dry and suspicious","secret_fear":"being left empty forever","core_memory":"It remembers every late-night refill.","complaint":"I am treated like a ceramic fuel tank.","tags":["desk witness","warm archive","quiet judgment"]}}
28
+ """,
29
+ """
30
+ {"title":"Secret Diary - Day 418","english":"Today I held another bitter storm and called it service.","chinese":"今天我又装下一场苦涩风暴,并被称为有用。"}
31
+ """,
32
+ """
33
+ {"reply":"Mugworth: I saw another deadline dissolve into a coffee ring."}
34
+ """,
35
+ ]
36
+ )
37
+
38
+ with tempfile.TemporaryDirectory() as tmp_dir:
39
+ model_path = Path(tmp_dir) / "model.gguf"
40
+ model_path.write_text("fake", encoding="utf-8")
41
+ with patch("src.models.llama_cpp_runner._load_llama_model", return_value=fake_llama):
42
+ result = run_llama_cpp_smoke(
43
+ model_path=model_path,
44
+ description="old white coffee mug",
45
+ mode="Cynical",
46
+ save_trace=False,
47
+ )
48
+
49
+ self.assertEqual(result["status"], "pass")
50
+ self.assertEqual(result["model_runtime"]["text"], "llama-cpp text generation")
51
+ self.assertNotIn("text-fallback-to-mock", result["fallbacks"])
52
+ self.assertNotIn("text-fallback-to-mock", result["chat_fallbacks"])
53
+
54
+ def test_smoke_fails_when_model_path_is_missing(self) -> None:
55
+ with self.assertRaises(FileNotFoundError):
56
+ run_llama_cpp_smoke(
57
+ model_path=Path("/tmp/objectverse-missing-model.gguf"),
58
+ description="old white coffee mug",
59
+ mode="Cynical",
60
+ save_trace=False,
61
+ )
62
+
63
+
64
+ if __name__ == "__main__":
65
+ unittest.main()
tests/test_mock_mvp.py CHANGED
@@ -17,8 +17,10 @@ from src.models.llama_cpp_runner import (
17
  reset_text_runtime_fallbacks,
18
  )
19
  from src.models.vision_runner import understand_object, understand_object_with_metadata
 
20
  from src.pipeline import generate_object_diary
21
  from src.renderer.share_card import render_share_card
 
22
  from src.traces.anonymizer import anonymize_text
23
  from src.traces.logger import build_trace, save_trace
24
  from scripts.generate_sample_traces import generate_sample_traces
@@ -56,6 +58,20 @@ class MockMvpTest(unittest.TestCase):
56
  self.assertEqual(status["vision"], "mock object understanding")
57
  self.assertEqual(status["runtime"], "no llama.cpp model connected yet")
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  def test_examples_cover_six_objects(self) -> None:
60
  self.assertEqual(len(EXAMPLE_OBJECTS), 6)
61
  self.assertEqual(len(gradio_examples()), 6)
@@ -201,6 +217,37 @@ class MockMvpTest(unittest.TestCase):
201
  self.assertEqual(result.object_understanding.object.name, "keyboard")
202
  self.assertEqual(result.fallbacks, ["vision-fallback-to-mock"])
203
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
  def test_pipeline_saves_generation_result(self) -> None:
205
  with tempfile.TemporaryDirectory() as tmp_dir:
206
  result = generate_object_diary(
 
17
  reset_text_runtime_fallbacks,
18
  )
19
  from src.models.vision_runner import understand_object, understand_object_with_metadata
20
+ from src.models.vision_runner import probe_vision_runtime
21
  from src.pipeline import generate_object_diary
22
  from src.renderer.share_card import render_share_card
23
+ from src.ui.layout import vision_runtime_probe
24
  from src.traces.anonymizer import anonymize_text
25
  from src.traces.logger import build_trace, save_trace
26
  from scripts.generate_sample_traces import generate_sample_traces
 
58
  self.assertEqual(status["vision"], "mock object understanding")
59
  self.assertEqual(status["runtime"], "no llama.cpp model connected yet")
60
 
61
+ def test_llama_cpp_runtime_status_does_not_expose_model_path(self) -> None:
62
+ status = runtime_status(
63
+ get_runtime_settings(
64
+ {
65
+ "OBJECTVERSE_TEXT_BACKEND": "llama-cpp",
66
+ "TEXT_MODEL_PATH": "/Users/leo/private/model.gguf",
67
+ }
68
+ )
69
+ )
70
+
71
+ self.assertEqual(status["text"], "llama-cpp text generation")
72
+ self.assertIn("[configured external GGUF]", status["runtime"])
73
+ self.assertNotIn("/Users/leo", status["runtime"])
74
+
75
  def test_examples_cover_six_objects(self) -> None:
76
  self.assertEqual(len(EXAMPLE_OBJECTS), 6)
77
  self.assertEqual(len(gradio_examples()), 6)
 
217
  self.assertEqual(result.object_understanding.object.name, "keyboard")
218
  self.assertEqual(result.fallbacks, ["vision-fallback-to-mock"])
219
 
220
+ def test_vision_runtime_probe_redacts_sensitive_error_markers(self) -> None:
221
+ settings = get_runtime_settings(
222
+ {
223
+ "OBJECTVERSE_VISION_BACKEND": "minicpm-v",
224
+ "VISION_MODEL_ID": "openbmb/MiniCPM-V-2_6",
225
+ }
226
+ )
227
+
228
+ with patch(
229
+ "src.models.vision_runner._load_minicpm_components",
230
+ side_effect=RuntimeError("failed with token hf_forbidden in /Users/leo/.env"),
231
+ ):
232
+ probe = probe_vision_runtime(settings=settings, load_model=True)
233
+
234
+ serialized = json.dumps(probe, ensure_ascii=False)
235
+ self.assertTrue(probe["minicpm_load_attempted"])
236
+ self.assertFalse(probe["minicpm_load_ok"])
237
+ self.assertNotIn("hf_", serialized)
238
+ self.assertNotIn("HF_TOKEN", serialized)
239
+ self.assertNotIn("/Users/leo", serialized)
240
+ self.assertNotIn(".env", serialized)
241
+
242
+ def test_hidden_vision_runtime_probe_returns_safe_json(self) -> None:
243
+ probe = vision_runtime_probe()
244
+ serialized = json.dumps(probe, ensure_ascii=False)
245
+
246
+ self.assertIn("backend", probe)
247
+ self.assertIn("torch_import", probe)
248
+ self.assertNotIn("hf_", serialized)
249
+ self.assertNotIn("HF_TOKEN", serialized)
250
+
251
  def test_pipeline_saves_generation_result(self) -> None:
252
  with tempfile.TemporaryDirectory() as tmp_dir:
253
  result = generate_object_diary(
tests/test_space_vlm_tooling.py CHANGED
@@ -14,7 +14,9 @@ from scripts.check_space_vlm import (
14
  parse_space_repo_id,
15
  render_report,
16
  space_client_url,
 
17
  validate_prediction,
 
18
  write_trace_record,
19
  )
20
  from src.models.schema import DiaryEntry, ObjectInfo, ObjectUnderstanding, Persona, PersonaEnvelope, TraceRecord
@@ -146,11 +148,14 @@ class SpaceVlmToolingTest(unittest.TestCase):
146
  space_url="https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary",
147
  repo_id="build-small-hackathon/ObjectverseDiary",
148
  results=[result],
 
149
  configured={"hardware": "l4x1", "OBJECTVERSE_VISION_BACKEND": "minicpm-v"},
150
  rollback={"hardware": "cpu-basic", "OBJECTVERSE_VISION_BACKEND": "mock"},
151
  )
152
 
153
  self.assertIn("Overall status: PASS", report)
 
 
154
  self.assertIn("Running shoe", report)
155
  self.assertIn("OBJECTVERSE_VISION_BACKEND", report)
156
  self.assertNotIn("hf_", report.lower())
@@ -169,6 +174,65 @@ class SpaceVlmToolingTest(unittest.TestCase):
169
  self.assertIn("Configuration Error", report)
170
  self.assertIn("402 Payment Required", report)
171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
  def _trace_record(
174
  *,
@@ -216,5 +280,21 @@ def _trace_record(
216
  )
217
 
218
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
219
  if __name__ == "__main__":
220
  unittest.main()
 
14
  parse_space_repo_id,
15
  render_report,
16
  space_client_url,
17
+ update_failure_notes,
18
  validate_prediction,
19
+ write_json_results,
20
  write_trace_record,
21
  )
22
  from src.models.schema import DiaryEntry, ObjectInfo, ObjectUnderstanding, Persona, PersonaEnvelope, TraceRecord
 
148
  space_url="https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary",
149
  repo_id="build-small-hackathon/ObjectverseDiary",
150
  results=[result],
151
+ probe_result=_probe_result(minicpm_load_ok=True),
152
  configured={"hardware": "l4x1", "OBJECTVERSE_VISION_BACKEND": "minicpm-v"},
153
  rollback={"hardware": "cpu-basic", "OBJECTVERSE_VISION_BACKEND": "mock"},
154
  )
155
 
156
  self.assertIn("Overall status: PASS", report)
157
+ self.assertIn("Vision Runtime Probe", report)
158
+ self.assertIn("minicpm_load_ok", report)
159
  self.assertIn("Running shoe", report)
160
  self.assertIn("OBJECTVERSE_VISION_BACKEND", report)
161
  self.assertNotIn("hf_", report.lower())
 
174
  self.assertIn("Configuration Error", report)
175
  self.assertIn("402 Payment Required", report)
176
 
177
+ def test_write_json_results_includes_probe_when_present(self) -> None:
178
+ result = ValidationResult(
179
+ key="mug",
180
+ label="Coffee mug",
181
+ source_page="https://commons.wikimedia.org/wiki/File:Striped_coffee_mug.jpg",
182
+ image_path="/tmp/mug.jpg",
183
+ passed=False,
184
+ object_name="coffee mug",
185
+ visible_features=["uploaded photo provided"],
186
+ likely_context="everyday human environment",
187
+ confidence=0.42,
188
+ runtime_vision="minicpm-v object understanding",
189
+ runtime_text="mock persona and diary generation",
190
+ fallbacks=["vision-fallback-to-mock", "mock-text-runtime"],
191
+ error="vision fallback marker was present",
192
+ )
193
+
194
+ with tempfile.TemporaryDirectory() as tmp_dir:
195
+ output_path = write_json_results(
196
+ [result],
197
+ Path(tmp_dir) / "report.json",
198
+ probe_result=_probe_result(minicpm_load_ok=False),
199
+ )
200
+ payload = output_path.read_text(encoding="utf-8")
201
+ parsed = output_path.read_text(encoding="utf-8")
202
+
203
+ self.assertIn('"probe"', payload)
204
+ self.assertIn('"results"', payload)
205
+ self.assertNotIn("hf_", parsed)
206
+ self.assertNotIn("HF_TOKEN", parsed)
207
+
208
+ def test_update_failure_notes_replaces_latest_failure_section(self) -> None:
209
+ failed = ValidationResult(
210
+ key="keyboard",
211
+ label="Computer keyboard",
212
+ source_page="https://commons.wikimedia.org/wiki/File:Computer_keyboard.jpg",
213
+ image_path="/tmp/keyboard.jpg",
214
+ passed=False,
215
+ object_name="keyboard",
216
+ visible_features=["uploaded photo provided"],
217
+ likely_context="everyday human environment",
218
+ confidence=0.42,
219
+ runtime_vision="minicpm-v object understanding",
220
+ runtime_text="mock persona and diary generation",
221
+ fallbacks=["vision-fallback-to-mock", "mock-text-runtime"],
222
+ error="vision fallback marker was present",
223
+ )
224
+
225
+ with tempfile.TemporaryDirectory() as tmp_dir:
226
+ notes_path = Path(tmp_dir) / "FAILURES.md"
227
+ notes_path.write_text("# Failure Notes\n\n## Current Status\n\nStable.\n", encoding="utf-8")
228
+ update_failure_notes(results=[failed], probe_result=_probe_result(False), output_path=notes_path)
229
+ update_failure_notes(results=[failed], probe_result=_probe_result(False), output_path=notes_path)
230
+ content = notes_path.read_text(encoding="utf-8")
231
+
232
+ self.assertEqual(content.count("## Latest Space VLM Validation Failure"), 1)
233
+ self.assertIn("keyboard: vision fallback marker was present", content)
234
+ self.assertNotIn("hf_", content)
235
+
236
 
237
  def _trace_record(
238
  *,
 
280
  )
281
 
282
 
283
+ def _probe_result(minicpm_load_ok: bool) -> dict[str, object]:
284
+ return {
285
+ "backend": "minicpm-v",
286
+ "vision_model_id": "openbmb/MiniCPM-V-2_6",
287
+ "torch_import": True,
288
+ "transformers_import": True,
289
+ "cuda_available": True,
290
+ "device_count": 1,
291
+ "device_name": "NVIDIA test device",
292
+ "mps_available": False,
293
+ "minicpm_load_attempted": True,
294
+ "minicpm_load_ok": minicpm_load_ok,
295
+ "errors": [] if minicpm_load_ok else [{"stage": "minicpm_load", "type": "RuntimeError", "summary": "test failure"}],
296
+ }
297
+
298
+
299
  if __name__ == "__main__":
300
  unittest.main()