qqyule commited on
Commit
e20e3d9
·
verified ·
1 Parent(s): 3805824

Add ZeroGPU-compatible validation path

Browse files
README.md CHANGED
@@ -5,7 +5,7 @@ colorFrom: yellow
5
  colorTo: gray
6
  sdk: gradio
7
  sdk_version: 5.50.0
8
- python_version: '3.12'
9
  app_file: app.py
10
  pinned: false
11
  license: mit
@@ -23,9 +23,13 @@ Upload a photo of any everyday object. The app wakes it up, gives it a secret pe
23
 
24
  ## Current Status
25
 
26
- Initial mock MVP is available.
27
 
28
- The app currently uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. Real MiniCPM-V and llama.cpp model runtimes are not connected yet.
 
 
 
 
29
 
30
  ## Track
31
 
@@ -71,6 +75,19 @@ python app.py
71
 
72
  Then open the local Gradio URL printed in the terminal.
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ## Initial MVP Flow
75
 
76
  The current implementation supports:
@@ -120,7 +137,7 @@ This creates deterministic mock SFT preview data for schema and curation plannin
120
  ```
121
 
122
  See `docs/INITIAL_STAGE_REPORT.md` for the local initial-stage evidence.
123
- See `docs/EXTERNAL_SETUP.md` before creating remote GitHub or Hugging Face resources.
124
 
125
  ## Project Structure
126
 
@@ -128,7 +145,7 @@ See `docs/02-tech-architecture.md`, `AGENTS.md`, and `.codex/skills/` for the in
128
 
129
  ## Runtime Notes
130
 
131
- The current runtime is mock-only. See `docs/RUNTIME.md` for configuration keys and the future MiniCPM-V / llama.cpp boundary.
132
 
133
  ## HF Space README YAML Header
134
 
@@ -139,6 +156,7 @@ emoji: 🗝️
139
  colorFrom: amber
140
  colorTo: gray
141
  sdk: gradio
 
142
  app_file: app.py
143
  pinned: false
144
  ---
 
5
  colorTo: gray
6
  sdk: gradio
7
  sdk_version: 5.50.0
8
+ python_version: '3.10'
9
  app_file: app.py
10
  pinned: false
11
  license: mit
 
23
 
24
  ## Current Status
25
 
26
+ Initial mock MVP, MiniCPM-V vision backend wiring, and optional llama.cpp text runtime wiring are available.
27
 
28
+ By default, the app still uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the real MiniCPM-V 2.6 vision path. `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured.
29
+
30
+ Hugging Face Space:
31
+
32
+ https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
33
 
34
  ## Track
35
 
 
75
 
76
  Then open the local Gradio URL printed in the terminal.
77
 
78
+ ## Optional llama.cpp Text Runtime
79
+
80
+ The project does not commit GGUF files or require `llama-cpp-python` by default. To try a local GGUF text model:
81
+
82
+ ```bash
83
+ pip install llama-cpp-python
84
+ OBJECTVERSE_TEXT_BACKEND=llama-cpp \
85
+ TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
86
+ python app.py
87
+ ```
88
+
89
+ If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
90
+
91
  ## Initial MVP Flow
92
 
93
  The current implementation supports:
 
137
  ```
138
 
139
  See `docs/INITIAL_STAGE_REPORT.md` for the local initial-stage evidence.
140
+ See `docs/EXTERNAL_SETUP.md` before changing remote GitHub or Hugging Face resources.
141
 
142
  ## Project Structure
143
 
 
145
 
146
  ## Runtime Notes
147
 
148
+ The default runtime is mock-only. MiniCPM-V 2.6 vision and optional llama.cpp text generation can be enabled with environment variables while preserving mock fallbacks. See `docs/RUNTIME.md`.
149
 
150
  ## HF Space README YAML Header
151
 
 
156
  colorFrom: amber
157
  colorTo: gray
158
  sdk: gradio
159
+ python_version: '3.10'
160
  app_file: app.py
161
  pinned: false
162
  ---
docs/03-dev-schedule.md CHANGED
@@ -11,8 +11,9 @@
11
 
12
  **目标:确定项目不可变范围。**
13
 
14
- - [ ] 创建 GitHub repo
15
- - [ ] 创建 Hugging Face Space
 
16
  - [x] 创建基础 Gradio app
17
  - [x] 写 README 草稿
18
  - [x] 确定英文主界面文案
@@ -46,14 +47,18 @@
46
 
47
  **目标:让 AI 真正看图。**
48
 
49
- - [ ] 接入 MiniCPM-V 或轻量 VLM
50
- - [ ] 输出 object understanding JSON
51
- - [ ] 做 JSON repair
52
- - [ ] 加 example gallery
 
53
  - [ ] 缓存示例输出
 
54
 
55
  验收:上传杯子/键盘/鞋子,模型能识别物品并提取外观特征。
56
 
 
 
57
  ---
58
 
59
  ## Day 4:文本模型 + llama.cpp
@@ -61,12 +66,14 @@
61
  **目标:让核心人格生成走小模型本地推理。**
62
 
63
  - [ ] 下载小模型 GGUF
64
- - [ ] 跑通 llama.cpp / llama-cpp-python
65
- - [ ] 封装 `generate_persona()`
66
- - [ ] 封装 `generate_diary()`
67
- - [ ] README 说明参数量与运行方式
 
 
68
 
69
- 交付:`models/text_model.gguf`、`src/models/llama_cpp_runner.py``scripts/run_llama_cpp.sh`
70
 
71
  ---
72
 
@@ -144,7 +151,7 @@ Bottom: Share Card + Trace
144
  - [x] 做英文主文案 + 中文辅助
145
  - [x] 做 6 个示例卡片
146
 
147
- 完成记录:Phase 2 UI 已完成为 mock runtime archive dashboard。仍未接入真实 VLM、llama.cpp、LoRA Hugging Face Space;`UI 参考/` 仅作为本地视觉参考,不入库。
148
 
149
  ---
150
 
@@ -158,7 +165,9 @@ Bottom: Share Card + Trace
158
  - [x] dataset preview
159
  - [x] trace JSONL export
160
  - [x] 失败案例记录
161
- - [ ] GitHub repo 整理
 
 
162
 
163
  ---
164
 
@@ -205,6 +214,7 @@ Bottom: Share Card + Trace
205
  ## Day 11:提交检查
206
 
207
  - [ ] Space under official org
 
208
  - [ ] Demo video ready
209
  - [ ] Social post ready
210
  - [ ] README complete
 
11
 
12
  **目标:确定项目不可变范围。**
13
 
14
+ - [x] 配置 GitHub origin
15
+ - [ ] 确认并同步 GitHub repo
16
+ - [x] 创建 Hugging Face Space
17
  - [x] 创建基础 Gradio app
18
  - [x] 写 README 草稿
19
  - [x] 确定英文主界面文案
 
47
 
48
  **目标:让 AI 真正看图。**
49
 
50
+ - [x] 接入 MiniCPM-V 或轻量 VLM
51
+ - [x] 输出 object understanding JSON
52
+ - [x] 做 JSON repair
53
+ - [x] 加 example gallery
54
+ - [x] 新增 Space VLM 验证脚本
55
  - [ ] 缓存示例输出
56
+ - [ ] Space 1x L4 真实图片验证(2026-06-06 已尝试,因 HF `402 Payment Required` 阻塞,已回滚 mock-safe)
57
 
58
  验收:上传杯子/键盘/鞋子,模型能识别物品并提取外观特征。
59
 
60
+ 完成记录:MiniCPM-V 2.6 已作为可配置 vision backend 接入,默认仍是 mock vision;`scripts/check_space_vlm.py` 已可用三张临时公开图片验证 Space 端 mug/keyboard/shoe。2026-06-06 已尝试切到 L4,但 Hugging Face 返回 `402 Payment Required`,需要组织 billing/pre-paid credits;随后已执行 mock-safe rollback。文本生成已接入可选 llama.cpp runtime wiring,但最终 GGUF 模型仍未选择/下载。
61
+
62
  ---
63
 
64
  ## Day 4:文本模型 + llama.cpp
 
66
  **目标:让核心人格生成走小模型本地推理。**
67
 
68
  - [ ] 下载小模型 GGUF
69
+ - [x] 接入可选 llama.cpp / llama-cpp-python runtime wiring
70
+ - [x] 封装 `generate_persona()`
71
+ - [x] 封装 `generate_diary()`
72
+ - [x] README 说明运行方式
73
+ - [ ] 用真实 GGUF 做本地 smoke test
74
+ - [ ] README 说明最终模型参数量
75
 
76
+ 交付:`src/models/llama_cpp_runner.py` 已支持 `TEXT_MODEL_PATH`;不提交 `models/text_model.gguf`。后续仍需确定真实 GGUF、参数量和训练/发布路径。
77
 
78
  ---
79
 
 
151
  - [x] 做英文主文案 + 中文辅助
152
  - [x] 做 6 个示例卡片
153
 
154
+ 完成记录:Phase 2 UI 已完成为 archive dashboard。MiniCPM-V 2.6 vision backend 和可选 llama.cpp runtime wiring 已接入但默认仍 mockLoRA 未接入;`UI 参考/` 仅作为本地视觉参考,不入库。
155
 
156
  ---
157
 
 
165
  - [x] dataset preview
166
  - [x] trace JSONL export
167
  - [x] 失败案例记录
168
+ - [x] Space VLM validation report 模板
169
+ - [ ] 真实模型 traces
170
+ - [ ] GitHub repo 同步整理
171
 
172
  ---
173
 
 
214
  ## Day 11:提交检查
215
 
216
  - [ ] Space under official org
217
+ - [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe
218
  - [ ] Demo video ready
219
  - [ ] Social post ready
220
  - [ ] README complete
docs/07-development-plan.md CHANGED
@@ -8,7 +8,7 @@ The plan is intentionally staged. Each phase has a clear goal, implementation sc
8
 
9
  ## Current Baseline
10
 
11
- As of 2026-06-05, the project has:
12
 
13
  - initialized project structure
14
  - root README and AGENTS instructions
@@ -30,13 +30,17 @@ As of 2026-06-05, the project has:
30
  - stdlib unittest smoke tests for the mock MVP
31
  - runtime configuration boundary documented in `docs/RUNTIME.md`
32
  - initial-stage acceptance script at `scripts/check_initial_stage.py`
 
 
 
 
 
33
 
34
  Not yet done:
35
 
36
- - GitHub repo creation
37
- - Hugging Face Space creation
38
- - real MiniCPM-V or fallback VLM integration
39
- - real llama.cpp / llama-cpp-python text runtime
40
  - real curated dataset
41
  - LoRA fine-tuning
42
  - model card completion
@@ -111,6 +115,8 @@ Verification:
111
 
112
  Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
113
 
 
 
114
  Scope:
115
 
116
  - Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
@@ -130,15 +136,18 @@ Verification:
130
  - Run local sample image checks.
131
  - Confirm schema validation.
132
  - Confirm fallback trace markers.
 
133
 
134
  ## Phase 4 — Text Runtime With llama.cpp
135
 
136
  Goal: make persona, diary, and chat generation use a small local text model runtime.
137
 
 
 
138
  Scope:
139
 
140
- - Add llama.cpp / llama-cpp-python runner.
141
- - Add model path configuration.
142
  - Preserve `src/pipeline.py` as the UI-independent generation boundary.
143
  - Implement persona generation.
144
  - Implement diary generation.
@@ -148,12 +157,12 @@ Scope:
148
  Exit criteria:
149
 
150
  - Text generation can run through llama.cpp or documented local fallback.
151
- - README documents model size and runtime path.
152
  - Trace records include runtime metadata.
153
 
154
  Verification:
155
 
156
- - Local runtime smoke test.
157
  - JSON schema validation.
158
  - Compare at least three object generations for persona consistency.
159
 
@@ -161,6 +170,8 @@ Verification:
161
 
162
  Goal: prepare Well-Tuned badge evidence.
163
 
 
 
164
  Scope:
165
 
166
  - Use `scripts/generate_dataset.py` to validate the SFT schema locally.
@@ -237,13 +248,15 @@ Verification:
237
 
238
  Goal: deploy the app in the required Gradio format.
239
 
 
 
240
  Scope:
241
 
242
- - Create Hugging Face Space.
243
- - Add Space README YAML header.
244
- - Confirm `app_file: app.py`.
245
- - Configure model paths and fallback mode.
246
- - Check runtime resource constraints.
247
 
248
  Exit criteria:
249
 
@@ -253,8 +266,9 @@ Exit criteria:
253
 
254
  Verification:
255
 
256
- - Launch on HF Space.
257
  - Run demo flow in hosted environment.
 
258
  - Check logs for missing secrets or path errors.
259
 
260
  ## Phase 9 — Field Notes And Demo Video
 
8
 
9
  ## Current Baseline
10
 
11
+ As of 2026-06-06, the project has:
12
 
13
  - initialized project structure
14
  - root README and AGENTS instructions
 
30
  - stdlib unittest smoke tests for the mock MVP
31
  - runtime configuration boundary documented in `docs/RUNTIME.md`
32
  - initial-stage acceptance script at `scripts/check_initial_stage.py`
33
+ - Hugging Face Space created at `build-small-hackathon/ObjectverseDiary`
34
+ - optional MiniCPM-V 2.6 vision backend wiring with mock fallback
35
+ - optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`
36
+ - hosted Space VLM validation tooling in `scripts/check_space_vlm.py`
37
+ - pending Space VLM report template in `docs/SPACE_VLM_REPORT.md`
38
 
39
  Not yet done:
40
 
41
+ - GitHub repo sync / public submission confirmation
42
+ - hosted Space L4 MiniCPM-V validation with real public images
43
+ - real GGUF selection and local `TEXT_MODEL_PATH` smoke test
 
44
  - real curated dataset
45
  - LoRA fine-tuning
46
  - model card completion
 
115
 
116
  Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
117
 
118
+ Status: local wiring complete; hosted GPU validation pending.
119
+
120
  Scope:
121
 
122
  - Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
 
136
  - Run local sample image checks.
137
  - Confirm schema validation.
138
  - Confirm fallback trace markers.
139
+ - Run `scripts/check_space_vlm.py --configure-space` after external-state confirmation.
140
 
141
  ## Phase 4 — Text Runtime With llama.cpp
142
 
143
  Goal: make persona, diary, and chat generation use a small local text model runtime.
144
 
145
+ Status: optional runtime wiring complete; real GGUF smoke test pending.
146
+
147
  Scope:
148
 
149
+ - Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
150
+ - Add model path configuration. Completed through `TEXT_MODEL_PATH`.
151
  - Preserve `src/pipeline.py` as the UI-independent generation boundary.
152
  - Implement persona generation.
153
  - Implement diary generation.
 
157
  Exit criteria:
158
 
159
  - Text generation can run through llama.cpp or documented local fallback.
160
+ - README documents runtime path. Final model size remains pending until GGUF selection.
161
  - Trace records include runtime metadata.
162
 
163
  Verification:
164
 
165
+ - Local runtime smoke test with a real GGUF.
166
  - JSON schema validation.
167
  - Compare at least three object generations for persona consistency.
168
 
 
170
 
171
  Goal: prepare Well-Tuned badge evidence.
172
 
173
+ Status: mock SFT preview complete; real candidate generation waits for verified model paths.
174
+
175
  Scope:
176
 
177
  - Use `scripts/generate_dataset.py` to validate the SFT schema locally.
 
248
 
249
  Goal: deploy the app in the required Gradio format.
250
 
251
+ Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.
252
+
253
  Scope:
254
 
255
+ - Create Hugging Face Space. Completed.
256
+ - Add Space README YAML header. Completed.
257
+ - Confirm `app_file: app.py`. Completed.
258
+ - Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
259
+ - Check runtime resource constraints. Pending L4 validation.
260
 
261
  Exit criteria:
262
 
 
266
 
267
  Verification:
268
 
269
+ - Launch on HF Space. Completed for mock-safe runtime.
270
  - Run demo flow in hosted environment.
271
+ - Run Space VLM validation for mug, keyboard, and shoe.
272
  - Check logs for missing secrets or path errors.
273
 
274
  ## Phase 9 — Field Notes And Demo Video
docs/DEVELOPMENT_STATUS.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Development Status
2
+
3
+ Last updated: 2026-06-06
4
+
5
+ ## Completed
6
+
7
+ - Project skeleton, README, AGENTS instructions, and Gradio app entrypoint.
8
+ - Mock MVP flow: upload/description, personality mode, object JSON, persona JSON, diary, object chat, share card, and trace saving.
9
+ - Archive-style Gradio UI with English-first / Chinese-second copy and six stable examples.
10
+ - Trace and dataset tooling:
11
+ - six public mock sample traces
12
+ - public trace JSONL export
13
+ - deterministic SFT preview JSONL
14
+ - initial-stage acceptance script
15
+ - Hugging Face Space created: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
16
+ - MiniCPM-V 2.6 optional vision backend wiring with mock fallback.
17
+ - Optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`, with mock fallback.
18
+ - Space VLM validation tooling:
19
+ - `scripts/check_space_vlm.py`
20
+ - failed L4 validation report at `docs/SPACE_VLM_REPORT.md`
21
+ - Local tests and initial acceptance currently pass.
22
+
23
+ ## Not Completed
24
+
25
+ - Hosted Space 1x L4 MiniCPM-V validation with real public mug/keyboard/shoe images. Attempted on 2026-06-06 and blocked by Hugging Face `402 Payment Required` for paid hardware; mock-safe rollback was applied.
26
+ - Stable example output caching for real VLM demos.
27
+ - Real GGUF model selection, download/configuration outside Git, and `TEXT_MODEL_PATH` smoke test.
28
+ - Final text model parameter count documentation.
29
+ - Real model traces and curated object-persona dataset.
30
+ - LoRA training, adapter/model export, GGUF conversion, and Hugging Face model publishing.
31
+ - Hugging Face dataset publishing.
32
+ - GitHub sync / final public repository confirmation.
33
+ - Field Notes article, demo video, social post, and final submission package.
34
+
35
+ ## Current Safe Defaults
36
+
37
+ - `OBJECTVERSE_VISION_BACKEND=mock`
38
+ - `OBJECTVERSE_TEXT_BACKEND=mock`
39
+ - No commercial model API is used.
40
+ - GGUF files, tokens, credentials, and private images should not be committed.
41
+
42
+ ## Next Recommended Gate
43
+
44
+ Unblock Hugging Face paid hardware access or choose another available GPU option, then rerun the hosted Space VLM validation:
45
+
46
+ ```bash
47
+ .venv/bin/python -B scripts/check_space_vlm.py \
48
+ --configure-space \
49
+ --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
50
+ --output docs/SPACE_VLM_REPORT.md
51
+ ```
52
+
53
+ If Space validation fails or GPU is unavailable, roll back to mock-safe settings:
54
+
55
+ ```bash
56
+ .venv/bin/python -B scripts/check_space_vlm.py \
57
+ --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
58
+ --skip-validation \
59
+ --rollback-to-mock
60
+ ```
docs/EXTERNAL_SETUP.md CHANGED
@@ -8,16 +8,18 @@ These actions change external account state and should only be run after explici
8
 
9
  ## GitHub Repository
10
 
11
- Suggested repository name:
12
 
13
  ```text
14
- objectverse-diary
15
  ```
16
 
17
- Suggested visibility:
 
 
18
 
19
  ```text
20
- public
21
  ```
22
 
23
  Suggested description:
@@ -26,7 +28,7 @@ Suggested description:
26
  Small-model AI toy that turns everyday objects into secret diary characters.
27
  ```
28
 
29
- Recommended manual command after confirmation:
30
 
31
  ```bash
32
  gh repo create objectverse-diary --public --description "Small-model AI toy that turns everyday objects into secret diary characters." --source . --remote origin
@@ -36,13 +38,13 @@ Do not push until the user confirms the remote target and branch.
36
 
37
  ## Hugging Face Space
38
 
39
- Suggested Space name:
40
 
41
  ```text
42
- objectverse-diary
43
  ```
44
 
45
- Suggested SDK:
46
 
47
  ```text
48
  gradio
@@ -57,17 +59,46 @@ emoji: 🗝️
57
  colorFrom: amber
58
  colorTo: gray
59
  sdk: gradio
 
60
  app_file: app.py
61
  pinned: false
62
  ---
63
  ```
64
 
65
- Recommended setup before deployment:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
- - confirm target Hugging Face account or organization
68
- - confirm public visibility
69
- - confirm whether the Space should start with mock runtime
70
- - confirm whether sample traces should be included in the first push
71
 
72
  ## Safety Notes
73
 
 
8
 
9
  ## GitHub Repository
10
 
11
+ Local `origin` is already configured:
12
 
13
  ```text
14
+ https://github.com/qqyule/Objectverse-Diary.git
15
  ```
16
 
17
+ Use this section to confirm the remote target and branch before pushing. Do not create a second repository unless the target changes.
18
+
19
+ Originally suggested repository name:
20
 
21
  ```text
22
+ objectverse-diary
23
  ```
24
 
25
  Suggested description:
 
28
  Small-model AI toy that turns everyday objects into secret diary characters.
29
  ```
30
 
31
+ If a new repository is ever needed after confirmation:
32
 
33
  ```bash
34
  gh repo create objectverse-diary --public --description "Small-model AI toy that turns everyday objects into secret diary characters." --source . --remote origin
 
38
 
39
  ## Hugging Face Space
40
 
41
+ Created Space:
42
 
43
  ```text
44
+ https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
45
  ```
46
 
47
+ SDK:
48
 
49
  ```text
50
  gradio
 
59
  colorFrom: amber
60
  colorTo: gray
61
  sdk: gradio
62
+ python_version: '3.10'
63
  app_file: app.py
64
  pinned: false
65
  ---
66
  ```
67
 
68
+ Recommended runtime setup:
69
+
70
+ - set `OBJECTVERSE_VISION_BACKEND=minicpm-v`
71
+ - set `VISION_MODEL_ID=openbmb/MiniCPM-V-2_6`
72
+ - set `OBJECTVERSE_TEXT_BACKEND=mock`
73
+ - use 1x Nvidia L4 for MiniCPM-V 2.6
74
+ - switch vision backend back to `mock` if GPU is unavailable
75
+
76
+ Automated validation command after confirmation:
77
+
78
+ ```bash
79
+ .venv/bin/python -B scripts/check_space_vlm.py \
80
+ --configure-space \
81
+ --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
82
+ --output docs/SPACE_VLM_REPORT.md
83
+ ```
84
+
85
+ Optional rollback to mock-safe settings:
86
+
87
+ ```bash
88
+ .venv/bin/python -B scripts/check_space_vlm.py \
89
+ --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
90
+ --skip-validation \
91
+ --rollback-to-mock
92
+ ```
93
+
94
+ The validation script must not print Hugging Face tokens. It uses three temporary public Wikimedia Commons images and does not commit downloaded assets.
95
+
96
+ 2026-06-06 validation attempt:
97
 
98
+ - `--configure-space` was run for `l4x1`.
99
+ - Hugging Face returned `402 Payment Required` for paid hardware on the `build-small-hackathon` organization.
100
+ - Mock-safe rollback was run afterward.
101
+ - Next unblock step: enable billing/pre-paid credits or choose an available free GPU option before rerunning validation.
102
 
103
  ## Safety Notes
104
 
docs/FAILURES.md CHANGED
@@ -8,7 +8,7 @@ Use it for model/runtime/deployment/data issues, not for UI polish notes.
8
 
9
  ## Current Status
10
 
11
- No real model or hosted Space failures have been observed yet because the current implementation uses deterministic mock runtimes.
12
 
13
  Known non-blocking warning:
14
 
@@ -43,6 +43,7 @@ Fallback:
43
  - use manual object description
44
  - use stable example flow
45
  - record fallback marker in trace
 
46
 
47
  ### Text Runtime
48
 
 
8
 
9
  ## Current Status
10
 
11
+ MiniCPM-V 2.6 is wired as an optional vision backend. No hosted Space GPU failures have been observed yet because Space GPU validation is still pending.
12
 
13
  Known non-blocking warning:
14
 
 
43
  - use manual object description
44
  - use stable example flow
45
  - record fallback marker in trace
46
+ - `vision-fallback-to-mock` means MiniCPM-V failed or returned invalid JSON and mock object understanding was used.
47
 
48
  ### Text Runtime
49
 
docs/INITIAL_STAGE_REPORT.md CHANGED
@@ -19,15 +19,29 @@ Included:
19
  - runtime configuration boundary
20
  - local acceptance checks
21
 
22
- Not included:
23
 
24
  - creating the remote GitHub repository
25
- - creating the Hugging Face Space
26
- - real MiniCPM-V integration
27
- - real llama.cpp / llama-cpp-python text runtime
28
  - fine-tuning, dataset publishing, Field Notes, and demo video
29
 
30
- Remote GitHub and Hugging Face actions require explicit confirmation because they change external state.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Local Deliverables
33
 
@@ -35,8 +49,8 @@ Remote GitHub and Hugging Face actions require explicit confirmation because the
35
  | --- | --- |
36
  | Gradio app entrypoint | `app.py` |
37
  | Shared generation pipeline | `src/pipeline.py` |
38
- | Mock vision runner | `src/models/vision_runner.py` |
39
- | Mock text runner | `src/models/llama_cpp_runner.py` |
40
  | Pydantic schemas | `src/models/schema.py` |
41
  | Share card renderer | `src/renderer/share_card.py` |
42
  | Trace logger | `src/traces/logger.py` |
@@ -45,6 +59,7 @@ Remote GitHub and Hugging Face actions require explicit confirmation because the
45
  | Public mock traces | `data/traces/samples/` |
46
  | SFT preview generator | `scripts/generate_dataset.py` |
47
  | Public trace JSONL exporter | `scripts/export_traces.py` |
 
48
  | Dataset plan | `docs/DATASET.md` |
49
  | Failure notes | `docs/FAILURES.md` |
50
  | Runtime boundary docs | `docs/RUNTIME.md` |
@@ -85,16 +100,19 @@ OK
85
 
86
  ## Current Limitations
87
 
88
- - The app still uses mock model outputs.
89
- - Phase 2 UI polish is complete, but it still runs on the mock runtime.
 
 
90
  - Sample traces are mock traces, not real model traces.
91
- - Remote repo and hosted Space are not created yet.
92
 
93
  ## Next Gate
94
 
95
- Before moving to real model integration, confirm whether to create:
96
 
97
- - GitHub repository
98
- - Hugging Face Space
 
99
 
100
  See `docs/EXTERNAL_SETUP.md`.
 
19
  - runtime configuration boundary
20
  - local acceptance checks
21
 
22
+ Not included in the original initial-stage gate:
23
 
24
  - creating the remote GitHub repository
25
+ - hosted GPU validation for the MiniCPM-V integration
26
+ - real GGUF smoke test for llama.cpp / llama-cpp-python text runtime
 
27
  - fine-tuning, dataset publishing, Field Notes, and demo video
28
 
29
+ The Hugging Face Space has been created at:
30
+
31
+ https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
32
+
33
+ Remote GitHub actions still require explicit confirmation because they change external state.
34
+
35
+ ## Post-Initial Updates
36
+
37
+ As of 2026-06-06:
38
+
39
+ - MiniCPM-V 2.6 is wired as an optional vision backend with mock fallback.
40
+ - Optional llama.cpp / llama-cpp-python text runtime wiring is available through `TEXT_MODEL_PATH`, with mock fallback.
41
+ - `scripts/check_space_vlm.py` can validate the hosted Space with three temporary public images for mug, keyboard, and shoe.
42
+ - `docs/SPACE_VLM_REPORT.md` exists as the pending remote validation report.
43
+ - Hosted Space L4 validation has not been run yet.
44
+ - No final GGUF text model has been selected, downloaded, or committed.
45
 
46
  ## Local Deliverables
47
 
 
49
  | --- | --- |
50
  | Gradio app entrypoint | `app.py` |
51
  | Shared generation pipeline | `src/pipeline.py` |
52
+ | Vision runner with mock / MiniCPM-V backend | `src/models/vision_runner.py` |
53
+ | Text runner with mock / optional llama.cpp backend | `src/models/llama_cpp_runner.py` |
54
  | Pydantic schemas | `src/models/schema.py` |
55
  | Share card renderer | `src/renderer/share_card.py` |
56
  | Trace logger | `src/traces/logger.py` |
 
59
  | Public mock traces | `data/traces/samples/` |
60
  | SFT preview generator | `scripts/generate_dataset.py` |
61
  | Public trace JSONL exporter | `scripts/export_traces.py` |
62
+ | Hosted Space VLM validator | `scripts/check_space_vlm.py` |
63
  | Dataset plan | `docs/DATASET.md` |
64
  | Failure notes | `docs/FAILURES.md` |
65
  | Runtime boundary docs | `docs/RUNTIME.md` |
 
100
 
101
  ## Current Limitations
102
 
103
+ - The default app still uses mock model outputs.
104
+ - MiniCPM-V 2.6 vision wiring is available behind `OBJECTVERSE_VISION_BACKEND=minicpm-v`, but hosted GPU validation is still pending.
105
+ - llama.cpp text wiring is available behind `OBJECTVERSE_TEXT_BACKEND=llama-cpp`, but no real GGUF smoke test has been run.
106
+ - Phase 2 UI polish is complete.
107
  - Sample traces are mock traces, not real model traces.
108
+ - GitHub origin is configured locally, but sync/submission confirmation is still pending.
109
 
110
  ## Next Gate
111
 
112
+ Next model gate:
113
 
114
+ - verify MiniCPM-V 2.6 on the Hugging Face Space GPU
115
+ - run a real GGUF `TEXT_MODEL_PATH` smoke test
116
+ - confirm GitHub sync / submission target
117
 
118
  See `docs/EXTERNAL_SETUP.md`.
docs/MODEL_CARD.md CHANGED
@@ -2,9 +2,9 @@
2
 
3
  ## Status
4
 
5
- Draft only. No model has been fine-tuned, converted, or published yet.
6
 
7
- The app currently runs deterministic mock backends. This card is a working template for the later small-model runtime and LoRA adapter.
8
 
9
  ## Planned Components
10
 
@@ -16,9 +16,9 @@ The app currently runs deterministic mock backends. This card is a working templ
16
 
17
  | Component | Candidate | Notes |
18
  | --- | --- | --- |
19
- | Vision | MiniCPM-V or lightweight VLM fallback | Must run without commercial API calls. |
20
- | Text | small instruct model plus LoRA adapter | Final base model still pending. |
21
- | Runtime | GGUF through llama.cpp / llama-cpp-python | Needed for Llama Champion evidence. |
22
  | UI | Gradio Blocks | Required by the hackathon and project rules. |
23
 
24
  ## Parameter Budget
@@ -29,8 +29,8 @@ Record final numbers here before submission:
29
 
30
  | Component | Model | Parameters | Counted Toward Total |
31
  | --- | --- | ---: | --- |
32
- | Vision | TBD | TBD | yes |
33
- | Text base | TBD | TBD | yes |
34
  | LoRA adapter | TBD | TBD | yes |
35
  | Total | TBD | TBD | must be <= 32B |
36
 
@@ -67,7 +67,7 @@ Current preview data is deterministic and mock-generated. It should only be used
67
  ## Fallback Behavior
68
 
69
  - If VLM loading fails, use manual description and stable example flow.
70
- - If llama.cpp loading fails, keep deterministic mock text fallback for demo safety.
71
  - If model JSON is invalid, repair and validate before rendering.
72
 
73
  ## Required Notes
 
2
 
3
  ## Status
4
 
5
+ Draft only. No text model has been fine-tuned, converted, or published yet.
6
 
7
+ The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`.
8
 
9
  ## Planned Components
10
 
 
16
 
17
  | Component | Candidate | Notes |
18
  | --- | --- | --- |
19
+ | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Must run without commercial API calls. |
20
+ | Text | externally configured GGUF, later small instruct model plus LoRA adapter | Final base model still pending. |
21
+ | Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; real-model smoke test still pending. |
22
  | UI | Gradio Blocks | Required by the hackathon and project rules. |
23
 
24
  ## Parameter Budget
 
29
 
30
  | Component | Model | Parameters | Counted Toward Total |
31
  | --- | --- | ---: | --- |
32
+ | Vision | MiniCPM-V 2.6 | ~8B | yes |
33
+ | Text base | Externally configured GGUF, final model TBD | TBD | yes |
34
  | LoRA adapter | TBD | TBD | yes |
35
  | Total | TBD | TBD | must be <= 32B |
36
 
 
67
  ## Fallback Behavior
68
 
69
  - If VLM loading fails, use manual description and stable example flow.
70
+ - If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
71
  - If model JSON is invalid, repair and validate before rendering.
72
 
73
  ## Required Notes
docs/README.md CHANGED
@@ -17,10 +17,12 @@ This folder contains the planning source of truth for Objectverse Diary.
17
  - `FIELD_NOTES.md`: future technical blog draft.
18
  - `MODEL_CARD.md`: future model documentation.
19
  - `07-development-plan.md`: detailed development process plan from mock MVP to final submission.
 
20
  - `RUNTIME.md`: current mock runtime configuration and future model boundary.
21
  - `DATASET.md`: SFT preview schema, generation workflow, curation checklist, and publishing notes.
22
  - `FAILURES.md`: failure record template and anticipated non-UI fallback cases.
23
  - `INITIAL_STAGE_REPORT.md`: local initial-stage completion evidence and acceptance commands.
24
  - `PHASE2_UI_REPORT.md`: archive UI completion scope, runtime boundary, and verification targets.
25
  - `EXTERNAL_SETUP.md`: GitHub and Hugging Face Space setup notes requiring confirmation.
 
26
  - `SUBMISSION_GUIDE.md`: final submission checklist.
 
17
  - `FIELD_NOTES.md`: future technical blog draft.
18
  - `MODEL_CARD.md`: future model documentation.
19
  - `07-development-plan.md`: detailed development process plan from mock MVP to final submission.
20
+ - `DEVELOPMENT_STATUS.md`: current completed / not completed development status.
21
  - `RUNTIME.md`: current mock runtime configuration and future model boundary.
22
  - `DATASET.md`: SFT preview schema, generation workflow, curation checklist, and publishing notes.
23
  - `FAILURES.md`: failure record template and anticipated non-UI fallback cases.
24
  - `INITIAL_STAGE_REPORT.md`: local initial-stage completion evidence and acceptance commands.
25
  - `PHASE2_UI_REPORT.md`: archive UI completion scope, runtime boundary, and verification targets.
26
  - `EXTERNAL_SETUP.md`: GitHub and Hugging Face Space setup notes requiring confirmation.
27
+ - `SPACE_VLM_REPORT.md`: pending hosted Space MiniCPM-V validation report.
28
  - `SUBMISSION_GUIDE.md`: final submission checklist.
docs/RUNTIME.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  ## Current Runtime
4
 
5
- The initial MVP uses deterministic mock runtime paths:
6
 
7
  - `OBJECTVERSE_VISION_BACKEND=mock`
8
  - `OBJECTVERSE_TEXT_BACKEND=mock`
@@ -15,6 +15,28 @@ This means:
15
 
16
  No commercial cloud AI APIs are used.
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Environment Variables
19
 
20
  ```bash
@@ -25,6 +47,25 @@ TEXT_MODEL_PATH=
25
  TRACE_OUTPUT_DIR=data/traces
26
  ```
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## Future Runtime Boundary
29
 
30
  The next implementation phase should keep the same pipeline boundary:
@@ -39,6 +80,14 @@ Do not move model calls into `src/ui/layout.py`.
39
  ## Fallback Rules
40
 
41
  - VLM unavailable: use manual description and mock/example gallery path.
42
- - llama.cpp unavailable: use mock text generation path.
43
- - invalid model JSON: repair and validate before rendering.
44
  - private input: anonymize trace text before saving public traces.
 
 
 
 
 
 
 
 
 
2
 
3
  ## Current Runtime
4
 
5
+ The default MVP runtime uses deterministic mock paths:
6
 
7
  - `OBJECTVERSE_VISION_BACKEND=mock`
8
  - `OBJECTVERSE_TEXT_BACKEND=mock`
 
15
 
16
  No commercial cloud AI APIs are used.
17
 
18
+ MiniCPM-V 2.6 vision can be enabled without changing the UI:
19
+
20
+ ```bash
21
+ OBJECTVERSE_VISION_BACKEND=minicpm-v \
22
+ VISION_MODEL_ID=openbmb/MiniCPM-V-2_6 \
23
+ OBJECTVERSE_TEXT_BACKEND=mock \
24
+ .venv/bin/python app.py
25
+ ```
26
+
27
+ This only replaces object understanding. Persona generation, diary generation, and chat can remain mock or use the optional llama.cpp text path below.
28
+
29
+ Optional llama.cpp text generation can be enabled without changing the UI:
30
+
31
+ ```bash
32
+ pip install llama-cpp-python
33
+ OBJECTVERSE_TEXT_BACKEND=llama-cpp \
34
+ TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
35
+ .venv/bin/python app.py
36
+ ```
37
+
38
+ `llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
39
+
40
  ## Environment Variables
41
 
42
  ```bash
 
47
  TRACE_OUTPUT_DIR=data/traces
48
  ```
49
 
50
+ For the hosted Space, set these Variables:
51
+
52
+ ```bash
53
+ OBJECTVERSE_VISION_BACKEND=minicpm-v
54
+ VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
55
+ OBJECTVERSE_TEXT_BACKEND=mock
56
+ ```
57
+
58
+ Recommended Space hardware for this path is 1x Nvidia L4. If GPU is unavailable, switch `OBJECTVERSE_VISION_BACKEND` back to `mock` to keep the demo usable.
59
+
60
+ For a Space or local runtime with a separately provided GGUF text model, set:
61
+
62
+ ```bash
63
+ OBJECTVERSE_TEXT_BACKEND=llama-cpp
64
+ TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
65
+ ```
66
+
67
+ Do not commit GGUF files or private model paths.
68
+
69
  ## Future Runtime Boundary
70
 
71
  The next implementation phase should keep the same pipeline boundary:
 
80
  ## Fallback Rules
81
 
82
  - VLM unavailable: use manual description and mock/example gallery path.
83
+ - llama.cpp unavailable: use mock text generation path and record `text-fallback-to-mock`.
84
+ - invalid model JSON: repair and validate before rendering, then fall back to mock if validation fails.
85
  - private input: anonymize trace text before saving public traces.
86
+
87
+ Trace fallback markers:
88
+
89
+ - `mock-runtime`: default mock vision and mock text runtime.
90
+ - `mock-text-runtime`: real or configured vision path with mock text generation.
91
+ - `mock-vision-runtime`: mock vision with a configured non-mock text backend.
92
+ - `vision-fallback-to-mock`: MiniCPM-V failed or returned invalid JSON, so mock object understanding was used.
93
+ - `text-fallback-to-mock`: llama.cpp was configured but unavailable, invalid, or unable to return schema-valid JSON.
docs/SPACE_VLM_REPORT.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Space VLM Validation Report
2
+
3
+ - Generated at: 2026-06-06 04:25 UTC
4
+ - Space URL: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
5
+ - Space repo: `build-small-hackathon/ObjectverseDiary`
6
+ - Overall status: FAIL
7
+ - Vision backend expected: `minicpm-v`
8
+ - Text backend expected: `mock`
9
+
10
+ ## Space Configuration
11
+
12
+ - Requested configuration:
13
+ - `hardware`: `l4x1`
14
+ - `OBJECTVERSE_VISION_BACKEND`: `minicpm-v`
15
+ - `VISION_MODEL_ID`: `openbmb/MiniCPM-V-2_6`
16
+ - `OBJECTVERSE_TEXT_BACKEND`: `mock`
17
+
18
+ - Rollback configuration applied:
19
+ - `hardware`: `cpu-basic`
20
+ - `OBJECTVERSE_VISION_BACKEND`: `mock`
21
+ - `OBJECTVERSE_TEXT_BACKEND`: `mock`
22
+
23
+ ## Configuration Error
24
+
25
+ - Error: `HfHubHTTPError: 402 Payment Required`
26
+ - Meaning: Hugging Face requires pre-paid credits or billing access for the `build-small-hackathon` organization before the Space can use paid `l4x1` hardware.
27
+ - Impact: Remote MiniCPM-V validation did not run. No mug / keyboard / shoe image inference results were produced.
28
+ - Safety outcome: Mock-safe rollback was run after the failed hardware request.
29
+ - Post-rollback runtime check: Space is `RUNNING` with `hardware=cpu-basic` and `requested_hardware=cpu-basic`.
30
+
31
+ ## Results
32
+
33
+ - Coffee mug: NOT RUN
34
+ - Computer keyboard: NOT RUN
35
+ - Running shoe: NOT RUN
36
+
37
+ ## Notes
38
+
39
+ - Test images are temporary public Wikimedia Commons assets and are not committed.
40
+ - Text generation remains mock during this validation plan.
41
+ - No tokens, secrets, or private file paths are recorded in this report.
42
+ - Next unblock step: enable billing/pre-paid credits for the Hugging Face organization or choose an available free GPU option, then rerun `scripts/check_space_vlm.py`.
docs/SUBMISSION_GUIDE.md CHANGED
@@ -2,8 +2,8 @@
2
 
3
  ## Required Package
4
 
5
- - [ ] Hugging Face Space URL: pending external setup
6
- - [ ] GitHub Repository URL: pending external setup
7
  - [ ] Demo Video URL: pending recording
8
  - [ ] Social Media Post URL: pending final copy
9
  - [ ] Fine-tuned Model URL: pending model training
@@ -18,11 +18,28 @@
18
  - Runtime boundary: `docs/RUNTIME.md`
19
  - Dataset plan and preview workflow: `docs/DATASET.md`
20
  - External setup checklist: `docs/EXTERNAL_SETUP.md`
 
21
  - Public mock traces: `data/traces/samples/`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ## Final Checks
24
 
25
  - [ ] Space is under the official organization.
 
26
  - [ ] Demo video is under 2 minutes.
27
  - [ ] README includes model parameter counts.
28
  - [ ] No commercial cloud AI APIs are used.
 
2
 
3
  ## Required Package
4
 
5
+ - [x] Hugging Face Space URL: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
6
+ - [ ] GitHub Repository URL: local `origin` configured, sync/submission confirmation pending
7
  - [ ] Demo Video URL: pending recording
8
  - [ ] Social Media Post URL: pending final copy
9
  - [ ] Fine-tuned Model URL: pending model training
 
18
  - Runtime boundary: `docs/RUNTIME.md`
19
  - Dataset plan and preview workflow: `docs/DATASET.md`
20
  - External setup checklist: `docs/EXTERNAL_SETUP.md`
21
+ - Space VLM validation report: `docs/SPACE_VLM_REPORT.md` currently failed because `l4x1` hardware returned `402 Payment Required`.
22
  - Public mock traces: `data/traces/samples/`
23
+ - Optional llama.cpp runtime wiring: `src/models/llama_cpp_runner.py`
24
+
25
+ ## Completed Locally
26
+
27
+ - Mock MVP flow, archive-style UI, share card, trace logging, sample traces, dataset preview, and initial acceptance tooling.
28
+ - MiniCPM-V 2.6 backend wiring with fallback markers.
29
+ - Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
30
+ - Hosted Space VLM validation script and pending report template.
31
+
32
+ ## Not Completed Yet
33
+
34
+ - Hosted Space L4 MiniCPM-V validation for mug, keyboard, and shoe; attempted and blocked by Hugging Face paid hardware billing.
35
+ - Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count.
36
+ - Real model traces, curated dataset, LoRA training, model/dataset publishing.
37
+ - Field Notes article, demo video, social post, final submission package.
38
 
39
  ## Final Checks
40
 
41
  - [ ] Space is under the official organization.
42
+ - [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe. Current status: blocked by paid hardware billing.
43
  - [ ] Demo video is under 2 minutes.
44
  - [ ] README includes model parameter counts.
45
  - [ ] No commercial cloud AI APIs are used.
pyproject.toml CHANGED
@@ -6,8 +6,14 @@ requires-python = ">=3.10"
6
  dependencies = [
7
  "gradio>=4.44,<6",
8
  "pydantic>=2.7,<3",
 
 
 
 
 
 
9
  ]
10
 
11
  [tool.objectverse-diary]
12
- status = "initial-mock-mvp"
13
- implementation = "mock-runtime"
 
6
  dependencies = [
7
  "gradio>=4.44,<6",
8
  "pydantic>=2.7,<3",
9
+ "torch",
10
+ "torchvision",
11
+ "transformers>=4.40,<5",
12
+ "Pillow",
13
+ "sentencepiece",
14
+ "accelerate",
15
  ]
16
 
17
  [tool.objectverse-diary]
18
+ status = "vlm-ready-mock-text"
19
+ implementation = "minicpm-v-or-mock-vision-with-mock-text"
requirements.txt CHANGED
@@ -1,2 +1,8 @@
1
  gradio>=4.44,<6
2
  pydantic>=2.7,<3
 
 
 
 
 
 
 
1
  gradio>=4.44,<6
2
  pydantic>=2.7,<3
3
+ torch
4
+ torchvision
5
+ transformers>=4.40,<5
6
+ Pillow
7
+ sentencepiece
8
+ accelerate
scripts/README.md CHANGED
@@ -8,6 +8,7 @@ Implemented initial scripts:
8
  - `generate_sample_traces.py`: creates six stable public mock traces under `data/traces/samples/`.
9
  - `generate_dataset.py`: creates deterministic SFT preview JSONL for schema and curation planning.
10
  - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
 
11
 
12
  Expected files during implementation:
13
 
@@ -15,4 +16,18 @@ Expected files during implementation:
15
  - `convert_to_gguf.sh`
16
  - `run_llama_cpp.sh`
17
 
18
- Current status: mock trace generation, trace JSONL export, and SFT preview generation are implemented. Real model, fine-tuning, and GGUF conversion scripts are not implemented yet.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - `generate_sample_traces.py`: creates six stable public mock traces under `data/traces/samples/`.
9
  - `generate_dataset.py`: creates deterministic SFT preview JSONL for schema and curation planning.
10
  - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
11
+ - `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
12
 
13
  Expected files during implementation:
14
 
 
16
  - `convert_to_gguf.sh`
17
  - `run_llama_cpp.sh`
18
 
19
+ Space VLM validation:
20
+
21
+ ```bash
22
+ .venv/bin/python -B scripts/check_space_vlm.py \
23
+ --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
24
+ --output docs/SPACE_VLM_REPORT.md
25
+ ```
26
+
27
+ External Space changes are explicit:
28
+
29
+ ```bash
30
+ .venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
31
+ ```
32
+
33
+ Current status: mock trace generation, trace JSONL export, SFT preview generation, optional MiniCPM-V wiring, optional llama.cpp wiring, and hosted Space VLM validation tooling are implemented. Real model validation on Space, fine-tuning, and GGUF conversion are not completed yet.
scripts/check_space_vlm.py ADDED
@@ -0,0 +1,481 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Validate MiniCPM-V object understanding on the hosted Hugging Face Space."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import argparse
6
+ import json
7
+ import sys
8
+ import time
9
+ import urllib.request
10
+ from dataclasses import dataclass
11
+ from datetime import datetime, timezone
12
+ from pathlib import Path
13
+ from typing import Any
14
+ from urllib.parse import urlparse
15
+
16
+ PROJECT_ROOT = Path(__file__).resolve().parents[1]
17
+ if str(PROJECT_ROOT) not in sys.path:
18
+ sys.path.insert(0, str(PROJECT_ROOT))
19
+
20
+ from src.models.schema import TraceRecord
21
+
22
+
23
+ DEFAULT_SPACE_URL = "https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary"
24
+ DEFAULT_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.md")
25
+ DEFAULT_JSON_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.json")
26
+ DEFAULT_ASSET_DIR = Path(".tmp/space-vlm-assets")
27
+ DEFAULT_HARDWARE = "l4x1"
28
+ MOCK_SAFE_HARDWARE = "cpu-basic"
29
+ GENERATE_API_NAME = "/generate_object_file"
30
+ REQUEST_TIMEOUT_SECONDS = 45
31
+
32
+ SPACE_VARIABLES = {
33
+ "OBJECTVERSE_VISION_BACKEND": "minicpm-v",
34
+ "VISION_MODEL_ID": "openbmb/MiniCPM-V-2_6",
35
+ "OBJECTVERSE_TEXT_BACKEND": "mock",
36
+ }
37
+
38
+ MOCK_SAFE_VARIABLES = {
39
+ "OBJECTVERSE_VISION_BACKEND": "mock",
40
+ "OBJECTVERSE_TEXT_BACKEND": "mock",
41
+ }
42
+
43
+
44
+ @dataclass(frozen=True)
45
+ class ValidationAsset:
46
+ key: str
47
+ label: str
48
+ source_page: str
49
+ download_url: str
50
+ expected_terms: tuple[str, ...]
51
+ description: str
52
+ mode: str = "Cynical"
53
+
54
+
55
+ @dataclass(frozen=True)
56
+ class ValidationResult:
57
+ key: str
58
+ label: str
59
+ source_page: str
60
+ image_path: str
61
+ passed: bool
62
+ object_name: str
63
+ visible_features: list[str]
64
+ likely_context: str
65
+ confidence: float
66
+ runtime_vision: str
67
+ runtime_text: str
68
+ fallbacks: list[str]
69
+ error: str = ""
70
+
71
+
72
+ TEST_ASSETS = [
73
+ ValidationAsset(
74
+ key="mug",
75
+ label="Coffee mug",
76
+ source_page="https://commons.wikimedia.org/wiki/File:Striped_coffee_mug.jpg",
77
+ download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Striped_coffee_mug.jpg",
78
+ expected_terms=("mug", "cup", "coffee", "ceramic", "handle"),
79
+ description="A public Wikimedia Commons photo of a striped coffee mug.",
80
+ ),
81
+ ValidationAsset(
82
+ key="keyboard",
83
+ label="Computer keyboard",
84
+ source_page="https://commons.wikimedia.org/wiki/File:Computer_keyboard.jpg",
85
+ download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Computer_keyboard.jpg",
86
+ expected_terms=("keyboard", "key", "computer", "keys"),
87
+ description="A public Wikimedia Commons photo of a computer keyboard.",
88
+ mode="Philosopher",
89
+ ),
90
+ ValidationAsset(
91
+ key="shoe",
92
+ label="Running shoe",
93
+ source_page="https://commons.wikimedia.org/wiki/File:Running_shoes.jpg",
94
+ download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Running_shoes.jpg",
95
+ expected_terms=("shoe", "sneaker", "running", "footwear", "trainer"),
96
+ description="A public Wikimedia Commons photo of running shoes.",
97
+ mode="Dramatic",
98
+ ),
99
+ ]
100
+
101
+
102
+ def parse_space_repo_id(space_url: str) -> str:
103
+ parsed = urlparse(space_url)
104
+ parts = [part for part in parsed.path.split("/") if part]
105
+ if len(parts) >= 3 and parts[0] == "spaces":
106
+ return f"{parts[1]}/{parts[2]}"
107
+ if len(parts) == 2:
108
+ return f"{parts[0]}/{parts[1]}"
109
+ raise ValueError(f"Could not parse Hugging Face Space repo id from {space_url!r}")
110
+
111
+
112
+ def download_validation_assets(
113
+ asset_dir: Path = DEFAULT_ASSET_DIR,
114
+ assets: list[ValidationAsset] | None = None,
115
+ ) -> dict[str, Path]:
116
+ selected_assets = assets or TEST_ASSETS
117
+ asset_dir.mkdir(parents=True, exist_ok=True)
118
+ paths: dict[str, Path] = {}
119
+ for asset in selected_assets:
120
+ output_path = asset_dir / f"{asset.key}.jpg"
121
+ if not output_path.exists():
122
+ _download_url(asset.download_url, output_path)
123
+ paths[asset.key] = output_path
124
+ return paths
125
+
126
+
127
+ def configure_space_for_vlm(
128
+ repo_id: str,
129
+ *,
130
+ hardware: str = DEFAULT_HARDWARE,
131
+ wait: bool = True,
132
+ timeout_seconds: int = 900,
133
+ ) -> dict[str, str]:
134
+ from huggingface_hub import HfApi, SpaceHardware
135
+
136
+ api = HfApi()
137
+ _assert_hf_auth(api)
138
+ for key, value in SPACE_VARIABLES.items():
139
+ api.add_space_variable(repo_id=repo_id, key=key, value=value)
140
+ api.request_space_hardware(repo_id=repo_id, hardware=SpaceHardware(hardware))
141
+ if wait:
142
+ wait_for_space_running(repo_id, timeout_seconds=timeout_seconds)
143
+ return {"repo_id": repo_id, "hardware": hardware, **SPACE_VARIABLES}
144
+
145
+
146
+ def rollback_space_to_mock(repo_id: str, *, hardware: str = MOCK_SAFE_HARDWARE) -> dict[str, str]:
147
+ from huggingface_hub import HfApi, SpaceHardware
148
+
149
+ api = HfApi()
150
+ _assert_hf_auth(api)
151
+ for key, value in MOCK_SAFE_VARIABLES.items():
152
+ api.add_space_variable(repo_id=repo_id, key=key, value=value)
153
+ api.request_space_hardware(repo_id=repo_id, hardware=SpaceHardware(hardware))
154
+ return {"repo_id": repo_id, "hardware": hardware, **MOCK_SAFE_VARIABLES}
155
+
156
+
157
+ def wait_for_space_running(
158
+ repo_id: str,
159
+ *,
160
+ timeout_seconds: int = 900,
161
+ poll_seconds: int = 20,
162
+ ) -> str:
163
+ from huggingface_hub import HfApi
164
+
165
+ api = HfApi()
166
+ deadline = time.monotonic() + timeout_seconds
167
+ last_stage = "unknown"
168
+ while time.monotonic() < deadline:
169
+ runtime = api.get_space_runtime(repo_id=repo_id)
170
+ last_stage = _runtime_stage_name(runtime)
171
+ if last_stage.upper() == "RUNNING":
172
+ return last_stage
173
+ time.sleep(poll_seconds)
174
+ raise TimeoutError(f"Space {repo_id} did not reach RUNNING within {timeout_seconds}s; last stage: {last_stage}")
175
+
176
+
177
+ def run_space_validation(
178
+ *,
179
+ space_url: str = DEFAULT_SPACE_URL,
180
+ asset_dir: Path = DEFAULT_ASSET_DIR,
181
+ timeout_seconds: int = 900,
182
+ assets: list[ValidationAsset] | None = None,
183
+ ) -> list[ValidationResult]:
184
+ from gradio_client import Client, handle_file
185
+
186
+ selected_assets = assets or TEST_ASSETS
187
+ paths = download_validation_assets(asset_dir, selected_assets)
188
+ client = Client(space_url, verbose=False)
189
+ results: list[ValidationResult] = []
190
+ started = time.monotonic()
191
+ for asset in selected_assets:
192
+ remaining = timeout_seconds - int(time.monotonic() - started)
193
+ if remaining <= 0:
194
+ raise TimeoutError(f"Validation exceeded timeout of {timeout_seconds}s")
195
+ try:
196
+ response = client.predict(
197
+ handle_file(str(paths[asset.key])),
198
+ asset.description,
199
+ asset.mode,
200
+ api_name=GENERATE_API_NAME,
201
+ )
202
+ results.append(validate_prediction(asset, paths[asset.key], response))
203
+ except Exception as exc:
204
+ results.append(
205
+ ValidationResult(
206
+ key=asset.key,
207
+ label=asset.label,
208
+ source_page=asset.source_page,
209
+ image_path=str(paths[asset.key]),
210
+ passed=False,
211
+ object_name="",
212
+ visible_features=[],
213
+ likely_context="",
214
+ confidence=0.0,
215
+ runtime_vision="",
216
+ runtime_text="",
217
+ fallbacks=[],
218
+ error=f"{type(exc).__name__}: {exc}",
219
+ )
220
+ )
221
+ return results
222
+
223
+
224
+ def validate_prediction(
225
+ asset: ValidationAsset,
226
+ image_path: Path,
227
+ response: Any,
228
+ ) -> ValidationResult:
229
+ trace_payload = _extract_trace_payload(response)
230
+ trace = TraceRecord.model_validate(trace_payload)
231
+ object_info = trace.object_understanding.object
232
+ search_text = " ".join(
233
+ [
234
+ object_info.name,
235
+ object_info.likely_context,
236
+ " ".join(object_info.visible_features),
237
+ ]
238
+ ).lower()
239
+ expected_match = any(term in search_text for term in asset.expected_terms)
240
+ vision_runtime_ok = trace.model_runtime.get("vision") == "minicpm-v object understanding"
241
+ text_runtime_ok = trace.model_runtime.get("text") == "mock persona and diary generation"
242
+ no_vision_fallback = "vision-fallback-to-mock" not in trace.fallbacks
243
+ passed = expected_match and vision_runtime_ok and text_runtime_ok and no_vision_fallback
244
+ return ValidationResult(
245
+ key=asset.key,
246
+ label=asset.label,
247
+ source_page=asset.source_page,
248
+ image_path=str(image_path),
249
+ passed=passed,
250
+ object_name=object_info.name,
251
+ visible_features=object_info.visible_features,
252
+ likely_context=object_info.likely_context,
253
+ confidence=object_info.confidence,
254
+ runtime_vision=trace.model_runtime.get("vision", ""),
255
+ runtime_text=trace.model_runtime.get("text", ""),
256
+ fallbacks=trace.fallbacks,
257
+ error="" if passed else _failure_reason(expected_match, vision_runtime_ok, text_runtime_ok, no_vision_fallback),
258
+ )
259
+
260
+
261
+ def render_report(
262
+ *,
263
+ space_url: str,
264
+ repo_id: str,
265
+ results: list[ValidationResult],
266
+ configured: dict[str, str] | None = None,
267
+ rollback: dict[str, str] | None = None,
268
+ configuration_error: str = "",
269
+ ) -> str:
270
+ now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
271
+ status = "NOT RUN"
272
+ if configuration_error:
273
+ status = "FAIL"
274
+ elif results:
275
+ status = "PASS" if all(result.passed for result in results) else "FAIL"
276
+ lines = [
277
+ "# Space VLM Validation Report",
278
+ "",
279
+ f"- Generated at: {now}",
280
+ f"- Space URL: {space_url}",
281
+ f"- Space repo: `{repo_id}`",
282
+ f"- Overall status: {status}",
283
+ "- Vision backend expected: `minicpm-v`",
284
+ "- Text backend expected: `mock`",
285
+ "",
286
+ "## Space Configuration",
287
+ "",
288
+ ]
289
+ if configured:
290
+ lines.extend(_config_lines("Applied configuration", configured))
291
+ else:
292
+ lines.append("- Applied configuration: not changed by this run.")
293
+ if rollback:
294
+ lines.extend(["", *_config_lines("Rollback configuration", rollback)])
295
+ else:
296
+ lines.append("- Rollback configuration: not applied by this run.")
297
+ if configuration_error:
298
+ lines.extend(["", "## Configuration Error", "", f"- Error: `{configuration_error}`"])
299
+
300
+ lines.extend(["", "## Results", ""])
301
+ for result in results:
302
+ lines.extend(
303
+ [
304
+ f"### {result.label}",
305
+ "",
306
+ f"- Status: {'PASS' if result.passed else 'FAIL'}",
307
+ f"- Source: {result.source_page}",
308
+ f"- Local temporary image: `{result.image_path}`",
309
+ f"- Object name: `{result.object_name}`",
310
+ f"- Visible features: {', '.join(result.visible_features) or 'n/a'}",
311
+ f"- Likely context: `{result.likely_context}`",
312
+ f"- Confidence: {result.confidence:.2f}",
313
+ f"- Runtime vision: `{result.runtime_vision}`",
314
+ f"- Runtime text: `{result.runtime_text}`",
315
+ f"- Fallbacks: {', '.join(result.fallbacks) or 'none'}",
316
+ ]
317
+ )
318
+ if result.error:
319
+ lines.append(f"- Error: `{result.error}`")
320
+ lines.append("")
321
+ lines.extend(
322
+ [
323
+ "## Notes",
324
+ "",
325
+ "- Test images are temporary public Wikimedia Commons assets and are not committed.",
326
+ "- No tokens, secrets, or private file paths should be recorded in this report.",
327
+ "- If validation fails, switch `OBJECTVERSE_VISION_BACKEND` back to `mock` to keep the demo usable.",
328
+ ]
329
+ )
330
+ return "\n".join(lines) + "\n"
331
+
332
+
333
+ def write_report(markdown: str, output_path: Path = DEFAULT_OUTPUT_PATH) -> Path:
334
+ output_path.parent.mkdir(parents=True, exist_ok=True)
335
+ output_path.write_text(markdown, encoding="utf-8")
336
+ return output_path
337
+
338
+
339
+ def write_json_results(results: list[ValidationResult], output_path: Path) -> Path:
340
+ output_path.parent.mkdir(parents=True, exist_ok=True)
341
+ payload = [result.__dict__ for result in results]
342
+ output_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
343
+ return output_path
344
+
345
+
346
+ def _download_url(url: str, output_path: Path) -> None:
347
+ request = urllib.request.Request(
348
+ url,
349
+ headers={"User-Agent": "Objectverse-Diary-Space-VLM-Check/0.1"},
350
+ )
351
+ with urllib.request.urlopen(request, timeout=REQUEST_TIMEOUT_SECONDS) as response:
352
+ output_path.write_bytes(response.read())
353
+
354
+
355
+ def _extract_trace_payload(response: Any) -> dict[str, Any]:
356
+ if isinstance(response, tuple | list):
357
+ if len(response) < 7:
358
+ raise ValueError("Gradio response did not include trace JSON output.")
359
+ trace_payload = response[6]
360
+ elif isinstance(response, dict) and "trace" in response:
361
+ trace_payload = response["trace"]
362
+ else:
363
+ raise ValueError("Unsupported Gradio response shape.")
364
+ if not isinstance(trace_payload, dict):
365
+ raise ValueError("Trace output was not a JSON object.")
366
+ return trace_payload
367
+
368
+
369
+ def _failure_reason(
370
+ expected_match: bool,
371
+ vision_runtime_ok: bool,
372
+ text_runtime_ok: bool,
373
+ no_vision_fallback: bool,
374
+ ) -> str:
375
+ reasons: list[str] = []
376
+ if not expected_match:
377
+ reasons.append("object output did not match expected terms")
378
+ if not vision_runtime_ok:
379
+ reasons.append("vision runtime was not minicpm-v")
380
+ if not text_runtime_ok:
381
+ reasons.append("text runtime was not mock")
382
+ if not no_vision_fallback:
383
+ reasons.append("vision fallback marker was present")
384
+ return "; ".join(reasons)
385
+
386
+
387
+ def _runtime_stage_name(runtime: Any) -> str:
388
+ stage = getattr(runtime, "stage", None)
389
+ if stage is None and isinstance(runtime, dict):
390
+ stage = runtime.get("stage")
391
+ if hasattr(stage, "value"):
392
+ return str(stage.value)
393
+ return str(stage or "unknown")
394
+
395
+
396
+ def _assert_hf_auth(api: Any) -> None:
397
+ try:
398
+ user = api.whoami()
399
+ except Exception as exc:
400
+ raise RuntimeError("Hugging Face authentication is required for Space configuration.") from exc
401
+ if not isinstance(user, dict) or not user.get("name"):
402
+ raise RuntimeError("Hugging Face authentication did not return a user name.")
403
+
404
+
405
+ def _config_lines(title: str, config: dict[str, str]) -> list[str]:
406
+ lines = [f"- {title}:"]
407
+ for key, value in config.items():
408
+ lines.append(f" - `{key}`: `{value}`")
409
+ return lines
410
+
411
+
412
+ def _parse_args() -> argparse.Namespace:
413
+ parser = argparse.ArgumentParser(description=__doc__)
414
+ parser.add_argument("--space-url", default=DEFAULT_SPACE_URL)
415
+ parser.add_argument("--asset-dir", type=Path, default=DEFAULT_ASSET_DIR)
416
+ parser.add_argument("--output", type=Path, default=DEFAULT_OUTPUT_PATH)
417
+ parser.add_argument("--json-output", type=Path)
418
+ parser.add_argument("--timeout-seconds", type=int, default=900)
419
+ parser.add_argument("--configure-space", action="store_true")
420
+ parser.add_argument("--rollback-to-mock", action="store_true")
421
+ parser.add_argument("--hardware", default=DEFAULT_HARDWARE)
422
+ parser.add_argument("--skip-validation", action="store_true")
423
+ return parser.parse_args()
424
+
425
+
426
+ def main() -> None:
427
+ args = _parse_args()
428
+ repo_id = parse_space_repo_id(args.space_url)
429
+ configured = None
430
+ rollback = None
431
+ configuration_error = ""
432
+ if args.configure_space:
433
+ try:
434
+ configured = configure_space_for_vlm(
435
+ repo_id,
436
+ hardware=args.hardware,
437
+ wait=True,
438
+ timeout_seconds=args.timeout_seconds,
439
+ )
440
+ except Exception as exc:
441
+ configuration_error = f"{type(exc).__name__}: {exc}"
442
+ if args.rollback_to_mock:
443
+ try:
444
+ rollback = rollback_space_to_mock(repo_id)
445
+ except Exception as rollback_exc:
446
+ configuration_error = (
447
+ f"{configuration_error}; rollback failed with "
448
+ f"{type(rollback_exc).__name__}: {rollback_exc}"
449
+ )
450
+
451
+ results: list[ValidationResult] = []
452
+ if not args.skip_validation and not configuration_error:
453
+ results = run_space_validation(
454
+ space_url=args.space_url,
455
+ asset_dir=args.asset_dir,
456
+ timeout_seconds=args.timeout_seconds,
457
+ )
458
+
459
+ if args.rollback_to_mock and rollback is None:
460
+ rollback = rollback_space_to_mock(repo_id)
461
+
462
+ report = render_report(
463
+ space_url=args.space_url,
464
+ repo_id=repo_id,
465
+ results=results,
466
+ configured=configured,
467
+ rollback=rollback,
468
+ configuration_error=configuration_error,
469
+ )
470
+ write_report(report, args.output)
471
+ if args.json_output:
472
+ write_json_results(results, args.json_output)
473
+
474
+ if configuration_error or (results and not all(result.passed for result in results)):
475
+ raise SystemExit(1)
476
+
477
+ print(f"wrote Space VLM report to {args.output}")
478
+
479
+
480
+ if __name__ == "__main__":
481
+ main()
src/README.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  This directory is reserved for application source code.
4
 
5
- Current status: initial mock MVP. Real model runtimes are not connected yet.
6
 
7
  ## Planned Areas
8
 
 
2
 
3
  This directory is reserved for application source code.
4
 
5
+ Current status: initial mock MVP with optional MiniCPM-V 2.6 vision backend. Text generation remains mock until the llama.cpp phase.
6
 
7
  ## Planned Areas
8
 
src/config.py CHANGED
@@ -43,21 +43,26 @@ def get_runtime_settings(environ: Mapping[str, str] | None = None) -> RuntimeSet
43
 
44
  def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
45
  current = settings or get_runtime_settings()
 
 
46
  vision = (
47
  "mock object understanding"
48
- if current.vision_backend == "mock"
49
- else f"{current.vision_backend} object understanding"
50
- )
51
- text = (
52
- "mock persona and diary generation"
53
- if current.text_backend == "mock"
54
- else f"{current.text_backend} persona and diary generation"
55
- )
56
- runtime = (
57
- "no llama.cpp model connected yet"
58
- if current.text_backend == "mock"
59
- else f"text model path: {current.text_model_path or '[not configured]'}"
60
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  return {"vision": vision, "text": text, "runtime": runtime}
62
 
63
 
 
43
 
44
  def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
45
  current = settings or get_runtime_settings()
46
+ vision_backend = current.vision_backend.strip().lower()
47
+ text_backend = current.text_backend.strip().lower()
48
  vision = (
49
  "mock object understanding"
50
+ if vision_backend == "mock"
51
+ else f"{vision_backend} object understanding"
 
 
 
 
 
 
 
 
 
 
52
  )
53
+ text = "mock persona and diary generation"
54
+ if text_backend in {"llama-cpp", "llama_cpp", "llamacpp"}:
55
+ text = "llama-cpp text generation"
56
+ elif text_backend != "mock":
57
+ text = f"{text_backend} text generation"
58
+ runtime_parts: list[str] = []
59
+ if vision_backend != "mock":
60
+ runtime_parts.append(f"vision model id: {current.vision_model_id or '[not configured]'}")
61
+ if text_backend == "mock":
62
+ runtime_parts.append("no llama.cpp model connected yet")
63
+ else:
64
+ runtime_parts.append(f"text model path: {current.text_model_path or '[not configured]'}")
65
+ runtime = "; ".join(runtime_parts)
66
  return {"vision": vision, "text": text, "runtime": runtime}
67
 
68
 
src/models/llama_cpp_runner.py CHANGED
@@ -1,8 +1,16 @@
1
- """Mock text generation layer reserved for future llama.cpp integration."""
2
 
3
  from __future__ import annotations
4
 
 
 
 
 
 
5
  from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
 
 
 
6
 
7
 
8
  MODE_PROFILES = {
@@ -33,8 +41,58 @@ MODE_PROFILES = {
33
  },
34
  }
35
 
 
 
 
 
 
 
 
36
 
37
  def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  object_name = object_understanding.object.name
39
  profile = MODE_PROFILES.get(mode, MODE_PROFILES["Cynical"])
40
  character_name = _character_name(object_name, mode)
@@ -51,7 +109,7 @@ def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> Pe
51
  return PersonaEnvelope(persona=persona)
52
 
53
 
54
- def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
55
  p = persona.persona
56
  day_number = 417 + len(p.object_name)
57
 
@@ -74,7 +132,7 @@ def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
74
  )
75
 
76
 
77
- def reply_as_object(persona_data: dict, message: str) -> str:
78
  persona = persona_data.get("persona", {})
79
  character_name = persona.get("character_name", "The Object")
80
  object_name = persona.get("object_name", "object")
@@ -88,6 +146,170 @@ def reply_as_object(persona_data: dict, message: str) -> str:
88
  )
89
 
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  def _character_name(object_name: str, mode: str) -> str:
92
  compact = "".join(part.capitalize() for part in object_name.split()[:2])
93
  suffix = {
 
1
+ """Text generation runtime with mock and optional llama.cpp backends."""
2
 
3
  from __future__ import annotations
4
 
5
+ import json
6
+ from pathlib import Path
7
+ from typing import Any
8
+
9
+ from src.config import RuntimeSettings, get_runtime_settings
10
  from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
11
+ from src.prompts.diary_generation import CHAT_REPLY_PROMPT, DIARY_GENERATION_PROMPT
12
+ from src.prompts.persona_generation import PERSONA_GENERATION_PROMPT
13
+ from src.utils.json_repair import parse_json_object
14
 
15
 
16
  MODE_PROFILES = {
 
41
  },
42
  }
43
 
44
+ LLAMA_CPP_BACKENDS = {"llama-cpp", "llama_cpp", "llamacpp"}
45
+ TEXT_FALLBACK_TO_MOCK = "text-fallback-to-mock"
46
+
47
+ _LLAMA_MODEL: Any | None = None
48
+ _LLAMA_MODEL_PATH: str | None = None
49
+ _TEXT_FALLBACKS: list[str] = []
50
+
51
 
52
  def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
53
+ settings = get_runtime_settings()
54
+ if _is_llama_cpp_backend(settings):
55
+ try:
56
+ return _generate_persona_llama_cpp(object_understanding, mode, settings)
57
+ except Exception as exc:
58
+ _log_text_fallback("persona", exc)
59
+ _add_text_fallback(TEXT_FALLBACK_TO_MOCK)
60
+
61
+ return _generate_persona_mock(object_understanding, mode)
62
+
63
+
64
+ def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
65
+ settings = get_runtime_settings()
66
+ if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
67
+ try:
68
+ return _generate_diary_llama_cpp(persona, mode, settings)
69
+ except Exception as exc:
70
+ _log_text_fallback("diary", exc)
71
+ _add_text_fallback(TEXT_FALLBACK_TO_MOCK)
72
+
73
+ return _generate_diary_mock(persona, mode)
74
+
75
+
76
+ def reply_as_object(persona_data: dict, message: str) -> str:
77
+ settings = get_runtime_settings()
78
+ if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
79
+ try:
80
+ return _reply_as_object_llama_cpp(persona_data, message, settings)
81
+ except Exception as exc:
82
+ _log_text_fallback("chat", exc)
83
+
84
+ return _reply_as_object_mock(persona_data, message)
85
+
86
+
87
+ def reset_text_runtime_fallbacks() -> None:
88
+ _TEXT_FALLBACKS.clear()
89
+
90
+
91
+ def get_text_runtime_fallbacks() -> list[str]:
92
+ return list(_TEXT_FALLBACKS)
93
+
94
+
95
+ def _generate_persona_mock(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
96
  object_name = object_understanding.object.name
97
  profile = MODE_PROFILES.get(mode, MODE_PROFILES["Cynical"])
98
  character_name = _character_name(object_name, mode)
 
109
  return PersonaEnvelope(persona=persona)
110
 
111
 
112
+ def _generate_diary_mock(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
113
  p = persona.persona
114
  day_number = 417 + len(p.object_name)
115
 
 
132
  )
133
 
134
 
135
+ def _reply_as_object_mock(persona_data: dict, message: str) -> str:
136
  persona = persona_data.get("persona", {})
137
  character_name = persona.get("character_name", "The Object")
138
  object_name = persona.get("object_name", "object")
 
146
  )
147
 
148
 
149
+ def _generate_persona_llama_cpp(
150
+ object_understanding: ObjectUnderstanding,
151
+ mode: str,
152
+ settings: RuntimeSettings,
153
+ ) -> PersonaEnvelope:
154
+ raw = _run_llama_json(
155
+ system_prompt=PERSONA_GENERATION_PROMPT,
156
+ user_payload={
157
+ "mode": mode,
158
+ "object_understanding": object_understanding.model_dump(mode="json"),
159
+ },
160
+ settings=settings,
161
+ max_tokens=320,
162
+ )
163
+ return PersonaEnvelope.model_validate(raw)
164
+
165
+
166
+ def _generate_diary_llama_cpp(
167
+ persona: PersonaEnvelope,
168
+ mode: str,
169
+ settings: RuntimeSettings,
170
+ ) -> DiaryEntry:
171
+ raw = _run_llama_json(
172
+ system_prompt=DIARY_GENERATION_PROMPT,
173
+ user_payload={
174
+ "mode": mode,
175
+ "persona": persona.model_dump(mode="json"),
176
+ },
177
+ settings=settings,
178
+ max_tokens=360,
179
+ )
180
+ return DiaryEntry.model_validate(raw)
181
+
182
+
183
+ def _reply_as_object_llama_cpp(
184
+ persona_data: dict,
185
+ message: str,
186
+ settings: RuntimeSettings,
187
+ ) -> str:
188
+ PersonaEnvelope.model_validate(persona_data)
189
+ raw = _run_llama_json(
190
+ system_prompt=CHAT_REPLY_PROMPT,
191
+ user_payload={
192
+ "persona": persona_data,
193
+ "message": message.strip() or "...",
194
+ },
195
+ settings=settings,
196
+ max_tokens=180,
197
+ )
198
+ reply = raw.get("reply")
199
+ if not isinstance(reply, str) or not reply.strip():
200
+ raise ValueError("llama.cpp chat response did not include a non-empty reply.")
201
+ return reply.strip()
202
+
203
+
204
+ def _run_llama_json(
205
+ *,
206
+ system_prompt: str,
207
+ user_payload: dict[str, Any],
208
+ settings: RuntimeSettings,
209
+ max_tokens: int,
210
+ ) -> dict[str, Any]:
211
+ model = _load_llama_model(settings.text_model_path)
212
+ user_content = json.dumps(user_payload, ensure_ascii=False, indent=2)
213
+ raw = _complete_llama(
214
+ model,
215
+ system_prompt=system_prompt,
216
+ user_content=user_content,
217
+ max_tokens=max_tokens,
218
+ )
219
+ return parse_json_object(raw)
220
+
221
+
222
+ def _complete_llama(
223
+ model: Any,
224
+ *,
225
+ system_prompt: str,
226
+ user_content: str,
227
+ max_tokens: int,
228
+ ) -> str:
229
+ stop = ["</s>", "<|end|>", "<|eot_id|>", "<|im_end|>"]
230
+ if hasattr(model, "create_chat_completion"):
231
+ response = model.create_chat_completion(
232
+ messages=[
233
+ {"role": "system", "content": system_prompt},
234
+ {"role": "user", "content": user_content},
235
+ ],
236
+ temperature=0.75,
237
+ max_tokens=max_tokens,
238
+ stop=stop,
239
+ )
240
+ return _extract_completion_text(response)
241
+
242
+ prompt = f"System:\n{system_prompt}\n\nUser:\n{user_content}\n\nAssistant JSON:\n"
243
+ response = model(
244
+ prompt,
245
+ temperature=0.75,
246
+ max_tokens=max_tokens,
247
+ stop=stop,
248
+ )
249
+ return _extract_completion_text(response)
250
+
251
+
252
+ def _extract_completion_text(response: Any) -> str:
253
+ if isinstance(response, str):
254
+ return response
255
+ if not isinstance(response, dict):
256
+ raise ValueError("llama.cpp returned an unsupported response type.")
257
+
258
+ choices = response.get("choices")
259
+ if not isinstance(choices, list) or not choices:
260
+ raise ValueError("llama.cpp response did not include choices.")
261
+
262
+ first = choices[0]
263
+ if not isinstance(first, dict):
264
+ raise ValueError("llama.cpp response choice was not an object.")
265
+
266
+ message = first.get("message")
267
+ if isinstance(message, dict) and isinstance(message.get("content"), str):
268
+ return message["content"]
269
+ if isinstance(first.get("text"), str):
270
+ return first["text"]
271
+ raise ValueError("llama.cpp response did not include text content.")
272
+
273
+
274
+ def _load_llama_model(text_model_path: str) -> Any:
275
+ global _LLAMA_MODEL, _LLAMA_MODEL_PATH
276
+
277
+ clean_path = text_model_path.strip()
278
+ if not clean_path:
279
+ raise ValueError("TEXT_MODEL_PATH is not configured.")
280
+ if not Path(clean_path).exists():
281
+ raise FileNotFoundError(f"TEXT_MODEL_PATH does not exist: {clean_path}")
282
+
283
+ if _LLAMA_MODEL is not None and _LLAMA_MODEL_PATH == clean_path:
284
+ return _LLAMA_MODEL
285
+
286
+ from llama_cpp import Llama
287
+
288
+ _LLAMA_MODEL = Llama(
289
+ model_path=clean_path,
290
+ n_ctx=2048,
291
+ verbose=False,
292
+ )
293
+ _LLAMA_MODEL_PATH = clean_path
294
+ return _LLAMA_MODEL
295
+
296
+
297
+ def _is_llama_cpp_backend(settings: RuntimeSettings) -> bool:
298
+ return settings.text_backend.strip().lower() in LLAMA_CPP_BACKENDS
299
+
300
+
301
+ def _add_text_fallback(marker: str) -> None:
302
+ if marker not in _TEXT_FALLBACKS:
303
+ _TEXT_FALLBACKS.append(marker)
304
+
305
+
306
+ def _log_text_fallback(stage: str, exc: Exception) -> None:
307
+ print(
308
+ f"[Objectverse Diary] Text runtime fell back to mock during {stage}: {type(exc).__name__}",
309
+ flush=True,
310
+ )
311
+
312
+
313
  def _character_name(object_name: str, mode: str) -> str:
314
  compact = "".join(part.capitalize() for part in object_name.split()[:2])
315
  suffix = {
src/models/vision_runner.py CHANGED
@@ -1,10 +1,14 @@
1
- """Mock object understanding for the initial MVP."""
2
 
3
  from __future__ import annotations
4
 
 
5
  from pathlib import Path
 
6
 
 
7
  from src.models.schema import ObjectInfo, ObjectUnderstanding
 
8
 
9
 
10
  KNOWN_OBJECTS = {
@@ -19,9 +23,55 @@ KNOWN_OBJECTS = {
19
  "bag": "bag",
20
  }
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  def understand_object(image_path: str | None, description: str) -> ObjectUnderstanding:
24
- """Return deterministic mock object understanding until VLM integration starts."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  clean_description = description.strip()
26
  object_name = _infer_object_name(clean_description, image_path)
27
  features = _infer_features(clean_description, image_path)
@@ -36,6 +86,86 @@ def understand_object(image_path: str | None, description: str) -> ObjectUnderst
36
  )
37
 
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  def _infer_object_name(description: str, image_path: str | None) -> str:
40
  lowered = description.lower()
41
  for keyword, name in KNOWN_OBJECTS.items():
 
1
+ """Object understanding runtime for mock and MiniCPM-V backends."""
2
 
3
  from __future__ import annotations
4
 
5
+ from dataclasses import dataclass
6
  from pathlib import Path
7
+ from typing import Any
8
 
9
+ from src.config import RuntimeSettings, get_runtime_settings
10
  from src.models.schema import ObjectInfo, ObjectUnderstanding
11
+ from src.utils.json_repair import parse_json_object
12
 
13
 
14
  KNOWN_OBJECTS = {
 
23
  "bag": "bag",
24
  }
25
 
26
+ MINICPM_DEFAULT_MODEL_ID = "openbmb/MiniCPM-V-2_6"
27
+ MINICPM_BACKENDS = {"minicpm-v", "minicpm_v", "minicpmv"}
28
+
29
+ _MINICPM_MODEL: Any | None = None
30
+ _MINICPM_TOKENIZER: Any | None = None
31
+ _MINICPM_MODEL_ID: str | None = None
32
+
33
+
34
+ @dataclass(frozen=True)
35
+ class VisionRunResult:
36
+ object_understanding: ObjectUnderstanding
37
+ fallbacks: list[str]
38
+
39
 
40
  def understand_object(image_path: str | None, description: str) -> ObjectUnderstanding:
41
+ """Return object understanding without exposing runtime metadata."""
42
+ return understand_object_with_metadata(image_path, description).object_understanding
43
+
44
+
45
+ def understand_object_with_metadata(
46
+ image_path: str | None,
47
+ description: str,
48
+ *,
49
+ settings: RuntimeSettings | None = None,
50
+ ) -> VisionRunResult:
51
+ current = settings or get_runtime_settings()
52
+ backend = current.vision_backend.strip().lower()
53
+
54
+ if backend == "mock":
55
+ return VisionRunResult(_understand_object_mock(image_path, description), [])
56
+
57
+ if backend in MINICPM_BACKENDS:
58
+ try:
59
+ return VisionRunResult(_understand_object_minicpm(image_path, description, current), [])
60
+ except Exception as exc:
61
+ _log_vision_fallback("minicpm-v", exc)
62
+ return VisionRunResult(
63
+ _understand_object_mock(image_path, description),
64
+ ["vision-fallback-to-mock"],
65
+ )
66
+
67
+ return VisionRunResult(
68
+ _understand_object_mock(image_path, description),
69
+ [f"unknown-vision-backend-{backend}-fallback-to-mock"],
70
+ )
71
+
72
+
73
+ def _understand_object_mock(image_path: str | None, description: str) -> ObjectUnderstanding:
74
+ """Return deterministic mock object understanding for fallback-safe demos."""
75
  clean_description = description.strip()
76
  object_name = _infer_object_name(clean_description, image_path)
77
  features = _infer_features(clean_description, image_path)
 
86
  )
87
 
88
 
89
+ def _understand_object_minicpm(
90
+ image_path: str | None,
91
+ description: str,
92
+ settings: RuntimeSettings,
93
+ ) -> ObjectUnderstanding:
94
+ if not image_path:
95
+ raise ValueError("MiniCPM-V requires an uploaded image.")
96
+
97
+ model_id = settings.vision_model_id or MINICPM_DEFAULT_MODEL_ID
98
+ model, tokenizer = _load_minicpm_components(model_id)
99
+ image = _load_rgb_image(image_path)
100
+ prompt = _object_understanding_prompt(description)
101
+ messages = [{"role": "user", "content": [image, prompt]}]
102
+ raw = model.chat(image=None, msgs=messages, tokenizer=tokenizer)
103
+ if isinstance(raw, tuple):
104
+ raw = raw[0]
105
+
106
+ payload = parse_json_object(str(raw))
107
+ return ObjectUnderstanding.model_validate(payload)
108
+
109
+
110
+ def _load_minicpm_components(model_id: str) -> tuple[Any, Any]:
111
+ global _MINICPM_MODEL, _MINICPM_TOKENIZER, _MINICPM_MODEL_ID
112
+
113
+ if _MINICPM_MODEL is not None and _MINICPM_TOKENIZER is not None and _MINICPM_MODEL_ID == model_id:
114
+ return _MINICPM_MODEL, _MINICPM_TOKENIZER
115
+
116
+ import torch
117
+ from transformers import AutoModel, AutoTokenizer
118
+
119
+ model_kwargs: dict[str, Any] = {
120
+ "trust_remote_code": True,
121
+ "torch_dtype": torch.bfloat16,
122
+ }
123
+ try:
124
+ model_kwargs["attn_implementation"] = "sdpa"
125
+ model = AutoModel.from_pretrained(model_id, **model_kwargs)
126
+ except TypeError:
127
+ model_kwargs.pop("attn_implementation", None)
128
+ model = AutoModel.from_pretrained(model_id, **model_kwargs)
129
+
130
+ if torch.cuda.is_available():
131
+ model = model.eval().cuda()
132
+ elif getattr(torch.backends, "mps", None) and torch.backends.mps.is_available():
133
+ model = model.eval().to(device="mps", dtype=torch.float16)
134
+ else:
135
+ model = model.eval()
136
+
137
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
138
+ _MINICPM_MODEL = model
139
+ _MINICPM_TOKENIZER = tokenizer
140
+ _MINICPM_MODEL_ID = model_id
141
+ return model, tokenizer
142
+
143
+
144
+ def _load_rgb_image(image_path: str) -> Any:
145
+ from PIL import Image
146
+
147
+ return Image.open(image_path).convert("RGB")
148
+
149
+
150
+ def _object_understanding_prompt(description: str) -> str:
151
+ context = description.strip() or "No user description was provided."
152
+ return (
153
+ "You are the vision module for Objectverse Diary. Inspect the uploaded everyday object photo. "
154
+ "Return only valid JSON with exactly this shape: "
155
+ '{"object":{"name":"short object name","visible_features":["feature 1","feature 2","feature 3"],'
156
+ '"likely_context":"where this object probably is","confidence":0.0}}. '
157
+ "Use 3 to 5 concrete visible_features. confidence must be a number from 0 to 1. "
158
+ f"Optional user context: {context}"
159
+ )
160
+
161
+
162
+ def _log_vision_fallback(backend: str, exc: Exception) -> None:
163
+ print(
164
+ f"[Objectverse Diary] Vision backend '{backend}' fell back to mock: {type(exc).__name__}",
165
+ flush=True,
166
+ )
167
+
168
+
169
  def _infer_object_name(description: str, image_path: str | None) -> str:
170
  lowered = description.lower()
171
  for keyword, name in KNOWN_OBJECTS.items():
src/pipeline.py CHANGED
@@ -5,10 +5,15 @@ from __future__ import annotations
5
  from datetime import datetime
6
  from pathlib import Path
7
 
8
- from src.config import TRACE_DIR
9
- from src.models.llama_cpp_runner import generate_diary, generate_persona
 
 
 
 
 
10
  from src.models.schema import GenerationResult
11
- from src.models.vision_runner import understand_object
12
  from src.traces.logger import build_trace, save_trace
13
 
14
 
@@ -22,9 +27,13 @@ def generate_object_diary(
22
  trace_id: str | None = None,
23
  created_at: datetime | None = None,
24
  ) -> GenerationResult:
25
- object_understanding = understand_object(image_path, description)
 
 
 
26
  persona = generate_persona(object_understanding, mode)
27
  diary = generate_diary(persona, mode)
 
28
  trace = build_trace(
29
  image_path=image_path,
30
  description=description,
@@ -34,6 +43,13 @@ def generate_object_diary(
34
  diary=diary,
35
  trace_id=trace_id,
36
  created_at=created_at,
 
 
 
 
 
 
 
37
  )
38
  trace_path = save_trace(trace, trace_dir) if save else ""
39
 
@@ -48,3 +64,25 @@ def generate_object_diary(
48
 
49
  def format_diary_markdown(title: str, english: str, chinese: str) -> str:
50
  return f"## {title}\n\n{english}\n\n---\n\n**中文辅助**\n\n{chinese}"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  from datetime import datetime
6
  from pathlib import Path
7
 
8
+ from src.config import TRACE_DIR, get_runtime_settings, runtime_status
9
+ from src.models.llama_cpp_runner import (
10
+ generate_diary,
11
+ generate_persona,
12
+ get_text_runtime_fallbacks,
13
+ reset_text_runtime_fallbacks,
14
+ )
15
  from src.models.schema import GenerationResult
16
+ from src.models.vision_runner import VisionRunResult, understand_object_with_metadata
17
  from src.traces.logger import build_trace, save_trace
18
 
19
 
 
27
  trace_id: str | None = None,
28
  created_at: datetime | None = None,
29
  ) -> GenerationResult:
30
+ settings = get_runtime_settings()
31
+ vision_result = understand_object_with_metadata(image_path, description, settings=settings)
32
+ object_understanding = vision_result.object_understanding
33
+ reset_text_runtime_fallbacks()
34
  persona = generate_persona(object_understanding, mode)
35
  diary = generate_diary(persona, mode)
36
+ text_fallbacks = get_text_runtime_fallbacks()
37
  trace = build_trace(
38
  image_path=image_path,
39
  description=description,
 
43
  diary=diary,
44
  trace_id=trace_id,
45
  created_at=created_at,
46
+ model_runtime=runtime_status(settings),
47
+ fallbacks=_runtime_fallbacks(
48
+ settings.vision_backend,
49
+ settings.text_backend,
50
+ vision_result,
51
+ text_fallbacks,
52
+ ),
53
  )
54
  trace_path = save_trace(trace, trace_dir) if save else ""
55
 
 
64
 
65
  def format_diary_markdown(title: str, english: str, chinese: str) -> str:
66
  return f"## {title}\n\n{english}\n\n---\n\n**中文辅助**\n\n{chinese}"
67
+
68
+
69
+ def _runtime_fallbacks(
70
+ vision_backend: str,
71
+ text_backend: str,
72
+ vision_result: VisionRunResult,
73
+ text_fallbacks: list[str] | None = None,
74
+ ) -> list[str]:
75
+ clean_vision_backend = vision_backend.strip().lower()
76
+ clean_text_backend = text_backend.strip().lower()
77
+ if clean_vision_backend == "mock" and clean_text_backend == "mock":
78
+ return ["mock-runtime"]
79
+
80
+ fallbacks = list(vision_result.fallbacks)
81
+ for marker in text_fallbacks or []:
82
+ if marker not in fallbacks:
83
+ fallbacks.append(marker)
84
+ if clean_vision_backend == "mock":
85
+ fallbacks.append("mock-vision-runtime")
86
+ if clean_text_backend == "mock":
87
+ fallbacks.append("mock-text-runtime")
88
+ return fallbacks
src/prompts/diary_generation.py CHANGED
@@ -1,6 +1,32 @@
1
- """Prompt placeholder for future secret diary generation."""
2
 
3
  DIARY_GENERATION_PROMPT = """
4
- Write a short secret diary entry in English first, with Chinese helper translation.
5
- Keep the object persona consistent.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  """.strip()
 
1
+ """Prompt templates for diary and chat generation."""
2
 
3
  DIARY_GENERATION_PROMPT = """
4
+ Write a short secret diary entry for the object persona. Return only valid JSON
5
+ with exactly this shape:
6
+
7
+ {
8
+ "title": "Secret Diary - Day N",
9
+ "english": "one vivid English-first diary paragraph",
10
+ "chinese": "short Chinese helper translation"
11
+ }
12
+
13
+ Rules:
14
+ - Keep the persona consistent with the supplied persona JSON.
15
+ - Keep the English diary under 120 words.
16
+ - The Chinese text is secondary helper copy, not the primary UI language.
17
+ - Do not include markdown, commentary, or extra keys.
18
+ """.strip()
19
+
20
+ CHAT_REPLY_PROMPT = """
21
+ Reply as the object persona to the user's message. Return only valid JSON with
22
+ exactly this shape:
23
+
24
+ {
25
+ "reply": "one short in-character chat reply"
26
+ }
27
+
28
+ Rules:
29
+ - Stay consistent with the persona JSON.
30
+ - Keep the reply under 70 words.
31
+ - Do not include markdown, commentary, or extra keys.
32
  """.strip()
src/prompts/persona_generation.py CHANGED
@@ -1,7 +1,27 @@
1
- """Prompt placeholder for future persona generation."""
2
 
3
  PERSONA_GENERATION_PROMPT = """
4
- Create a hidden first-person object persona with name, mood, backstory,
5
- complaint, secret fear, core memory, and exactly three tags.
6
- Return structured JSON only.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  """.strip()
 
1
+ """Prompt templates for persona generation."""
2
 
3
  PERSONA_GENERATION_PROMPT = """
4
+ You are the text runtime for Objectverse Diary, a strange archive of everyday
5
+ objects with secret lives.
6
+
7
+ Create a hidden first-person object persona from the object understanding JSON
8
+ and personality mode. Return only valid JSON with exactly this shape:
9
+
10
+ {
11
+ "persona": {
12
+ "object_name": "short object name",
13
+ "character_name": "archive character name",
14
+ "mood": "short mood phrase",
15
+ "secret_fear": "one vivid fear",
16
+ "core_memory": "one sentence backstory",
17
+ "complaint": "one sentence complaint in the object's voice",
18
+ "tags": ["tag one", "tag two", "tag three"]
19
+ }
20
+ }
21
+
22
+ Rules:
23
+ - Keep the persona consistent with the visible object features.
24
+ - Use English output.
25
+ - Use exactly three tags.
26
+ - Do not include markdown, commentary, or extra keys.
27
  """.strip()
src/traces/logger.py CHANGED
@@ -1,4 +1,4 @@
1
- """Trace builder and saver for mock MVP runs."""
2
 
3
  from __future__ import annotations
4
 
@@ -7,7 +7,7 @@ from datetime import datetime, timezone
7
  from pathlib import Path
8
  from uuid import uuid4
9
 
10
- from src.config import MODEL_RUNTIME_STATUS, TRACE_DIR
11
  from src.models.schema import DiaryEntry, ObjectUnderstanding, PersonaEnvelope, TraceRecord
12
  from src.traces.anonymizer import anonymize_text
13
 
@@ -21,6 +21,8 @@ def build_trace(
21
  diary: DiaryEntry,
22
  trace_id: str | None = None,
23
  created_at: datetime | None = None,
 
 
24
  ) -> TraceRecord:
25
  return TraceRecord(
26
  trace_id=trace_id or uuid4().hex,
@@ -34,8 +36,8 @@ def build_trace(
34
  object_understanding=object_understanding,
35
  persona=persona,
36
  diary=diary,
37
- model_runtime=MODEL_RUNTIME_STATUS,
38
- fallbacks=["mock-runtime"],
39
  )
40
 
41
 
 
1
+ """Trace builder and saver for generation runs."""
2
 
3
  from __future__ import annotations
4
 
 
7
  from pathlib import Path
8
  from uuid import uuid4
9
 
10
+ from src.config import TRACE_DIR, get_runtime_settings, runtime_status
11
  from src.models.schema import DiaryEntry, ObjectUnderstanding, PersonaEnvelope, TraceRecord
12
  from src.traces.anonymizer import anonymize_text
13
 
 
21
  diary: DiaryEntry,
22
  trace_id: str | None = None,
23
  created_at: datetime | None = None,
24
+ model_runtime: dict[str, str] | None = None,
25
+ fallbacks: list[str] | None = None,
26
  ) -> TraceRecord:
27
  return TraceRecord(
28
  trace_id=trace_id or uuid4().hex,
 
36
  object_understanding=object_understanding,
37
  persona=persona,
38
  diary=diary,
39
+ model_runtime=model_runtime or runtime_status(get_runtime_settings()),
40
+ fallbacks=fallbacks if fallbacks is not None else ["mock-runtime"],
41
  )
42
 
43
 
src/ui/layout.py CHANGED
@@ -15,6 +15,7 @@ from src.models.schema import GenerationResult
15
  from src.pipeline import format_diary_markdown, generate_object_diary
16
  from src.renderer.share_card import render_share_card
17
  from src.ui import copy
 
18
 
19
  CHAT_EMPTY_MESSAGE = "Wake an object first. / 请先唤醒一个物品。"
20
 
@@ -234,6 +235,7 @@ def _example_handler(index: int):
234
  return load_example
235
 
236
 
 
237
  def generate_object_file(
238
  image_path: str | None,
239
  description: str,
 
15
  from src.pipeline import format_diary_markdown, generate_object_diary
16
  from src.renderer.share_card import render_share_card
17
  from src.ui import copy
18
+ from src.utils.zero_gpu import zero_gpu
19
 
20
  CHAT_EMPTY_MESSAGE = "Wake an object first. / 请先唤醒一个物品。"
21
 
 
235
  return load_example
236
 
237
 
238
+ @zero_gpu(duration=180)
239
  def generate_object_file(
240
  image_path: str | None,
241
  description: str,
src/utils/json_repair.py CHANGED
@@ -7,7 +7,24 @@ from typing import Any
7
 
8
 
9
  def parse_json_object(raw: str) -> dict[str, Any]:
10
- value = json.loads(raw)
11
  if not isinstance(value, dict):
12
  raise ValueError("Expected a JSON object.")
13
  return value
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
 
9
  def parse_json_object(raw: str) -> dict[str, Any]:
10
+ value = json.loads(_extract_json_object(raw))
11
  if not isinstance(value, dict):
12
  raise ValueError("Expected a JSON object.")
13
  return value
14
+
15
+
16
+ def _extract_json_object(raw: str) -> str:
17
+ clean = raw.strip()
18
+ if clean.startswith("```"):
19
+ clean = clean.strip("`").strip()
20
+ if clean.lower().startswith("json"):
21
+ clean = clean[4:].strip()
22
+
23
+ if clean.startswith("{") and clean.endswith("}"):
24
+ return clean
25
+
26
+ start = clean.find("{")
27
+ end = clean.rfind("}")
28
+ if start == -1 or end == -1 or end <= start:
29
+ raise ValueError("No JSON object found.")
30
+ return clean[start : end + 1]
src/utils/zero_gpu.py ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Optional Hugging Face ZeroGPU decorator helpers."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from collections.abc import Callable
6
+ from typing import TypeVar
7
+
8
+
9
+ F = TypeVar("F", bound=Callable)
10
+
11
+
12
+ def zero_gpu(duration: int = 180) -> Callable[[F], F]:
13
+ """Return a ZeroGPU decorator when available, otherwise a no-op decorator."""
14
+ try:
15
+ import spaces # type: ignore[import-not-found]
16
+ except Exception:
17
+ return _identity_decorator
18
+
19
+ return spaces.GPU(duration=duration)
20
+
21
+
22
+ def _identity_decorator(func: F) -> F:
23
+ return func