Spaces:

build-small-hackathon
/

ObjectverseDiary

Paused

App Files Files Community

qqyule commited on Jun 6

Commit

e20e3d9

verified ·

1 Parent(s): 3805824

Add ZeroGPU-compatible validation path

Browse files

Files changed (27) hide show

README.md +23 -5
docs/03-dev-schedule.md +23 -13
docs/07-development-plan.md +29 -15
docs/DEVELOPMENT_STATUS.md +60 -0
docs/EXTERNAL_SETUP.md +44 -13
docs/FAILURES.md +2 -1
docs/INITIAL_STAGE_REPORT.md +31 -13
docs/MODEL_CARD.md +8 -8
docs/README.md +2 -0
docs/RUNTIME.md +52 -3
docs/SPACE_VLM_REPORT.md +42 -0
docs/SUBMISSION_GUIDE.md +19 -2
pyproject.toml +8 -2
requirements.txt +6 -0
scripts/README.md +16 -1
scripts/check_space_vlm.py +481 -0
src/README.md +1 -1
src/config.py +17 -12
src/models/llama_cpp_runner.py +225 -3
src/models/vision_runner.py +132 -2
src/pipeline.py +42 -4
src/prompts/diary_generation.py +29 -3
src/prompts/persona_generation.py +24 -4
src/traces/logger.py +6 -4
src/ui/layout.py +2 -0
src/utils/json_repair.py +18 -1
src/utils/zero_gpu.py +23 -0

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ colorFrom: yellow
 colorTo: gray
 sdk: gradio
 sdk_version: 5.50.0
-python_version: '3.12'
 app_file: app.py
 pinned: false
 license: mit
@@ -23,9 +23,13 @@ Upload a photo of any everyday object. The app wakes it up, gives it a secret pe
 ## Current Status
-Initial mock MVP is available.
-The app currently uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. Real MiniCPM-V and llama.cpp model runtimes are not connected yet.
 ## Track
@@ -71,6 +75,19 @@ python app.py
 Then open the local Gradio URL printed in the terminal.
 ## Initial MVP Flow
 The current implementation supports:
@@ -120,7 +137,7 @@ This creates deterministic mock SFT preview data for schema and curation plannin
 ```
 See `docs/INITIAL_STAGE_REPORT.md` for the local initial-stage evidence.
-See `docs/EXTERNAL_SETUP.md` before creating remote GitHub or Hugging Face resources.
 ## Project Structure
@@ -128,7 +145,7 @@ See `docs/02-tech-architecture.md`, `AGENTS.md`, and `.codex/skills/` for the in
 ## Runtime Notes
-The current runtime is mock-only. See `docs/RUNTIME.md` for configuration keys and the future MiniCPM-V / llama.cpp boundary.
 ## HF Space README YAML Header
@@ -139,6 +156,7 @@ emoji: 🗝️
 colorFrom: amber
 colorTo: gray
 sdk: gradio
 app_file: app.py
 pinned: false
 ---

 colorTo: gray
 sdk: gradio
 sdk_version: 5.50.0
+python_version: '3.10'
 app_file: app.py
 pinned: false
 license: mit
 ## Current Status
+Initial mock MVP, MiniCPM-V vision backend wiring, and optional llama.cpp text runtime wiring are available.
+By default, the app still uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the real MiniCPM-V 2.6 vision path. `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured.
+Hugging Face Space:
+https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 ## Track
 Then open the local Gradio URL printed in the terminal.
+## Optional llama.cpp Text Runtime
+The project does not commit GGUF files or require `llama-cpp-python` by default. To try a local GGUF text model:
+```bash
+pip install llama-cpp-python
+OBJECTVERSE_TEXT_BACKEND=llama-cpp \
+TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
+python app.py
+```
+If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
 ## Initial MVP Flow
 The current implementation supports:
 ```
 See `docs/INITIAL_STAGE_REPORT.md` for the local initial-stage evidence.
+See `docs/EXTERNAL_SETUP.md` before changing remote GitHub or Hugging Face resources.
 ## Project Structure
 ## Runtime Notes
+The default runtime is mock-only. MiniCPM-V 2.6 vision and optional llama.cpp text generation can be enabled with environment variables while preserving mock fallbacks. See `docs/RUNTIME.md`.
 ## HF Space README YAML Header
 colorFrom: amber
 colorTo: gray
 sdk: gradio
+python_version: '3.10'
 app_file: app.py
 pinned: false
 ---

docs/03-dev-schedule.md CHANGED Viewed

@@ -11,8 +11,9 @@
 **目标：确定项目不可变范围。**
-- [ ] 创建 GitHub repo
-- [ ] 创建 Hugging Face Space
 - [x] 创建基础 Gradio app
 - [x] 写 README 草稿
 - [x] 确定英文主界面文案
@@ -46,14 +47,18 @@
 **目标：让 AI 真正看图。**
-- [ ] 接入 MiniCPM-V 或轻量 VLM
-- [ ] 输出 object understanding JSON
-- [ ] 做 JSON repair
-- [ ] 加 example gallery
 - [ ] 缓存示例输出
 验收：上传杯子/键盘/鞋子，模型能识别物品并提取外观特征。
 ---
 ## Day 4：文本模型 + llama.cpp
@@ -61,12 +66,14 @@
 **目标：让核心人格生成走小模型本地推理。**
 - [ ] 下载小模型 GGUF
-- [ ] 跑通 llama.cpp / llama-cpp-python
-- [ ] 封装 `generate_persona()`
-- [ ] 封装 `generate_diary()`
-- [ ] README 说明参数量与运行方式
-交付：`models/text_model.gguf`、`src/models/llama_cpp_runner.py`、`scripts/run_llama_cpp.sh`
 ---
@@ -144,7 +151,7 @@ Bottom: Share Card + Trace
 - [x] 做英文主文案 + 中文辅助
 - [x] 做 6 个示例卡片
-完成记录：Phase 2 UI 已完成为 mock runtime archive dashboard。仍未接入真实 VLM、llama.cpp、LoRA 或 Hugging Face Space；`UI 参考/` 仅作为本地视觉参考，不入库。
 ---
@@ -158,7 +165,9 @@ Bottom: Share Card + Trace
 - [x] dataset preview
 - [x] trace JSONL export
 - [x] 失败案例记录
-- [ ] GitHub repo 整理
 ---
@@ -205,6 +214,7 @@ Bottom: Share Card + Trace
 ## Day 11：提交检查
 - [ ] Space under official org
 - [ ] Demo video ready
 - [ ] Social post ready
 - [ ] README complete

 **目标：确定项目不可变范围。**
+- [x] 配置 GitHub origin
+- [ ] 确认并同步 GitHub repo
+- [x] 创建 Hugging Face Space
 - [x] 创建基础 Gradio app
 - [x] 写 README 草稿
 - [x] 确定英文主界面文案
 **目标：让 AI 真正看图。**
+- [x] 接入 MiniCPM-V 或轻量 VLM
+- [x] 输出 object understanding JSON
+- [x] 做 JSON repair
+- [x] 加 example gallery
+- [x] 新增 Space VLM 验证脚本
 - [ ] 缓存示例输出
+- [ ] Space 1x L4 真实图片验证（2026-06-06 已尝试，因 HF `402 Payment Required` 阻塞，已回滚 mock-safe）
 验收：上传杯子/键盘/鞋子，模型能识别物品并提取外观特征。
+完成记录：MiniCPM-V 2.6 已作为可配置 vision backend 接入，默认仍是 mock vision；`scripts/check_space_vlm.py` 已可用三张临时公开图片验证 Space 端 mug/keyboard/shoe。2026-06-06 已尝试切到 L4，但 Hugging Face 返回 `402 Payment Required`，需要组织 billing/pre-paid credits；随后已执行 mock-safe rollback。文本生成已接入可选 llama.cpp runtime wiring，但最终 GGUF 模型仍未选择/下载。
 ---
 ## Day 4：文本模型 + llama.cpp
 **目标：让核心人格生成走小模型本地推理。**
 - [ ] 下载小模型 GGUF
+- [x] 接入可选 llama.cpp / llama-cpp-python runtime wiring
+- [x] 封装 `generate_persona()`
+- [x] 封装 `generate_diary()`
+- [x] README 说明运行方式
+- [ ] 用真实 GGUF 做本地 smoke test
+- [ ] README 说明最终模型参数量
+交付：`src/models/llama_cpp_runner.py` 已支持 `TEXT_MODEL_PATH`；不提交 `models/text_model.gguf`。后续仍需确定真实 GGUF、参数量和训练/发布路径。
 ---
 - [x] 做英文主文案 + 中文辅助
 - [x] 做 6 个示例卡片
+完成记录：Phase 2 UI 已完成为 archive dashboard。MiniCPM-V 2.6 vision backend 和可选 llama.cpp runtime wiring 已接入但默认仍 mock；LoRA 未接入；`UI 参考/` 仅作为本地视觉参考，不入库。
 ---
 - [x] dataset preview
 - [x] trace JSONL export
 - [x] 失败案例记录
+- [x] Space VLM validation report 模板
+- [ ] 真实模型 traces
+- [ ] GitHub repo 同步整理
 ---
 ## Day 11：提交检查
 - [ ] Space under official org
+- [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe
 - [ ] Demo video ready
 - [ ] Social post ready
 - [ ] README complete

docs/07-development-plan.md CHANGED Viewed

@@ -8,7 +8,7 @@ The plan is intentionally staged. Each phase has a clear goal, implementation sc
 ## Current Baseline
-As of 2026-06-05, the project has:
 - initialized project structure
 - root README and AGENTS instructions
@@ -30,13 +30,17 @@ As of 2026-06-05, the project has:
 - stdlib unittest smoke tests for the mock MVP
 - runtime configuration boundary documented in `docs/RUNTIME.md`
 - initial-stage acceptance script at `scripts/check_initial_stage.py`
 Not yet done:
-- GitHub repo creation
-- Hugging Face Space creation
-- real MiniCPM-V or fallback VLM integration
-- real llama.cpp / llama-cpp-python text runtime
 - real curated dataset
 - LoRA fine-tuning
 - model card completion
@@ -111,6 +115,8 @@ Verification:
 Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
 Scope:
 - Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
@@ -130,15 +136,18 @@ Verification:
 - Run local sample image checks.
 - Confirm schema validation.
 - Confirm fallback trace markers.
 ## Phase 4 — Text Runtime With llama.cpp
 Goal: make persona, diary, and chat generation use a small local text model runtime.
 Scope:
-- Add llama.cpp / llama-cpp-python runner.
-- Add model path configuration.
 - Preserve `src/pipeline.py` as the UI-independent generation boundary.
 - Implement persona generation.
 - Implement diary generation.
@@ -148,12 +157,12 @@ Scope:
 Exit criteria:
 - Text generation can run through llama.cpp or documented local fallback.
-- README documents model size and runtime path.
 - Trace records include runtime metadata.
 Verification:
-- Local runtime smoke test.
 - JSON schema validation.
 - Compare at least three object generations for persona consistency.
@@ -161,6 +170,8 @@ Verification:
 Goal: prepare Well-Tuned badge evidence.
 Scope:
 - Use `scripts/generate_dataset.py` to validate the SFT schema locally.
@@ -237,13 +248,15 @@ Verification:
 Goal: deploy the app in the required Gradio format.
 Scope:
-- Create Hugging Face Space.
-- Add Space README YAML header.
-- Confirm `app_file: app.py`.
-- Configure model paths and fallback mode.
-- Check runtime resource constraints.
 Exit criteria:
@@ -253,8 +266,9 @@ Exit criteria:
 Verification:
-- Launch on HF Space.
 - Run demo flow in hosted environment.
 - Check logs for missing secrets or path errors.
 ## Phase 9 — Field Notes And Demo Video

 ## Current Baseline
+As of 2026-06-06, the project has:
 - initialized project structure
 - root README and AGENTS instructions
 - stdlib unittest smoke tests for the mock MVP
 - runtime configuration boundary documented in `docs/RUNTIME.md`
 - initial-stage acceptance script at `scripts/check_initial_stage.py`
+- Hugging Face Space created at `build-small-hackathon/ObjectverseDiary`
+- optional MiniCPM-V 2.6 vision backend wiring with mock fallback
+- optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`
+- hosted Space VLM validation tooling in `scripts/check_space_vlm.py`
+- pending Space VLM report template in `docs/SPACE_VLM_REPORT.md`
 Not yet done:
+- GitHub repo sync / public submission confirmation
+- hosted Space L4 MiniCPM-V validation with real public images
+- real GGUF selection and local `TEXT_MODEL_PATH` smoke test
 - real curated dataset
 - LoRA fine-tuning
 - model card completion
 Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
+Status: local wiring complete; hosted GPU validation pending.
 Scope:
 - Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
 - Run local sample image checks.
 - Confirm schema validation.
 - Confirm fallback trace markers.
+- Run `scripts/check_space_vlm.py --configure-space` after external-state confirmation.
 ## Phase 4 — Text Runtime With llama.cpp
 Goal: make persona, diary, and chat generation use a small local text model runtime.
+Status: optional runtime wiring complete; real GGUF smoke test pending.
 Scope:
+- Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
+- Add model path configuration. Completed through `TEXT_MODEL_PATH`.
 - Preserve `src/pipeline.py` as the UI-independent generation boundary.
 - Implement persona generation.
 - Implement diary generation.
 Exit criteria:
 - Text generation can run through llama.cpp or documented local fallback.
+- README documents runtime path. Final model size remains pending until GGUF selection.
 - Trace records include runtime metadata.
 Verification:
+- Local runtime smoke test with a real GGUF.
 - JSON schema validation.
 - Compare at least three object generations for persona consistency.
 Goal: prepare Well-Tuned badge evidence.
+Status: mock SFT preview complete; real candidate generation waits for verified model paths.
 Scope:
 - Use `scripts/generate_dataset.py` to validate the SFT schema locally.
 Goal: deploy the app in the required Gradio format.
+Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.
 Scope:
+- Create Hugging Face Space. Completed.
+- Add Space README YAML header. Completed.
+- Confirm `app_file: app.py`. Completed.
+- Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
+- Check runtime resource constraints. Pending L4 validation.
 Exit criteria:
 Verification:
+- Launch on HF Space. Completed for mock-safe runtime.
 - Run demo flow in hosted environment.
+- Run Space VLM validation for mug, keyboard, and shoe.
 - Check logs for missing secrets or path errors.
 ## Phase 9 — Field Notes And Demo Video

docs/DEVELOPMENT_STATUS.md ADDED Viewed

	@@ -0,0 +1,60 @@

+# Development Status
+Last updated: 2026-06-06
+## Completed
+- Project skeleton, README, AGENTS instructions, and Gradio app entrypoint.
+- Mock MVP flow: upload/description, personality mode, object JSON, persona JSON, diary, object chat, share card, and trace saving.
+- Archive-style Gradio UI with English-first / Chinese-second copy and six stable examples.
+- Trace and dataset tooling:
+  - six public mock sample traces
+  - public trace JSONL export
+  - deterministic SFT preview JSONL
+  - initial-stage acceptance script
+- Hugging Face Space created: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
+- MiniCPM-V 2.6 optional vision backend wiring with mock fallback.
+- Optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`, with mock fallback.
+- Space VLM validation tooling:
+  - `scripts/check_space_vlm.py`
+  - failed L4 validation report at `docs/SPACE_VLM_REPORT.md`
+- Local tests and initial acceptance currently pass.
+## Not Completed
+- Hosted Space 1x L4 MiniCPM-V validation with real public mug/keyboard/shoe images. Attempted on 2026-06-06 and blocked by Hugging Face `402 Payment Required` for paid hardware; mock-safe rollback was applied.
+- Stable example output caching for real VLM demos.
+- Real GGUF model selection, download/configuration outside Git, and `TEXT_MODEL_PATH` smoke test.
+- Final text model parameter count documentation.
+- Real model traces and curated object-persona dataset.
+- LoRA training, adapter/model export, GGUF conversion, and Hugging Face model publishing.
+- Hugging Face dataset publishing.
+- GitHub sync / final public repository confirmation.
+- Field Notes article, demo video, social post, and final submission package.
+## Current Safe Defaults
+- `OBJECTVERSE_VISION_BACKEND=mock`
+- `OBJECTVERSE_TEXT_BACKEND=mock`
+- No commercial model API is used.
+- GGUF files, tokens, credentials, and private images should not be committed.
+## Next Recommended Gate
+Unblock Hugging Face paid hardware access or choose another available GPU option, then rerun the hosted Space VLM validation:
+```bash
+.venv/bin/python -B scripts/check_space_vlm.py \
+  --configure-space \
+  --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
+  --output docs/SPACE_VLM_REPORT.md
+```
+If Space validation fails or GPU is unavailable, roll back to mock-safe settings:
+```bash
+.venv/bin/python -B scripts/check_space_vlm.py \
+  --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
+  --skip-validation \
+  --rollback-to-mock
+```

docs/EXTERNAL_SETUP.md CHANGED Viewed

@@ -8,16 +8,18 @@ These actions change external account state and should only be run after explici
 ## GitHub Repository
-Suggested repository name:
 ```text
-objectverse-diary
 ```
-Suggested visibility:
 ```text
-public
 ```
 Suggested description:
@@ -26,7 +28,7 @@ Suggested description:
 Small-model AI toy that turns everyday objects into secret diary characters.
 ```
-Recommended manual command after confirmation:
 ```bash
 gh repo create objectverse-diary --public --description "Small-model AI toy that turns everyday objects into secret diary characters." --source . --remote origin
@@ -36,13 +38,13 @@ Do not push until the user confirms the remote target and branch.
 ## Hugging Face Space
-Suggested Space name:
 ```text
-objectverse-diary
 ```
-Suggested SDK:
 ```text
 gradio
@@ -57,17 +59,46 @@ emoji: 🗝️
 colorFrom: amber
 colorTo: gray
 sdk: gradio
 app_file: app.py
 pinned: false
 ---
 ```
-Recommended setup before deployment:
-- confirm target Hugging Face account or organization
-- confirm public visibility
-- confirm whether the Space should start with mock runtime
-- confirm whether sample traces should be included in the first push
 ## Safety Notes

 ## GitHub Repository
+Local `origin` is already configured:
 ```text
+https://github.com/qqyule/Objectverse-Diary.git
 ```
+Use this section to confirm the remote target and branch before pushing. Do not create a second repository unless the target changes.
+Originally suggested repository name:
 ```text
+objectverse-diary
 ```
 Suggested description:
 Small-model AI toy that turns everyday objects into secret diary characters.
 ```
+If a new repository is ever needed after confirmation:
 ```bash
 gh repo create objectverse-diary --public --description "Small-model AI toy that turns everyday objects into secret diary characters." --source . --remote origin
 ## Hugging Face Space
+Created Space:
 ```text
+https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
 ```
+SDK:
 ```text
 gradio
 colorFrom: amber
 colorTo: gray
 sdk: gradio
+python_version: '3.10'
 app_file: app.py
 pinned: false
 ---
 ```
+Recommended runtime setup:
+- set `OBJECTVERSE_VISION_BACKEND=minicpm-v`
+- set `VISION_MODEL_ID=openbmb/MiniCPM-V-2_6`
+- set `OBJECTVERSE_TEXT_BACKEND=mock`
+- use 1x Nvidia L4 for MiniCPM-V 2.6
+- switch vision backend back to `mock` if GPU is unavailable
+Automated validation command after confirmation:
+```bash
+.venv/bin/python -B scripts/check_space_vlm.py \
+  --configure-space \
+  --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
+  --output docs/SPACE_VLM_REPORT.md
+```
+Optional rollback to mock-safe settings:
+```bash
+.venv/bin/python -B scripts/check_space_vlm.py \
+  --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
+  --skip-validation \
+  --rollback-to-mock
+```
+The validation script must not print Hugging Face tokens. It uses three temporary public Wikimedia Commons images and does not commit downloaded assets.
+2026-06-06 validation attempt:
+- `--configure-space` was run for `l4x1`.
+- Hugging Face returned `402 Payment Required` for paid hardware on the `build-small-hackathon` organization.
+- Mock-safe rollback was run afterward.
+- Next unblock step: enable billing/pre-paid credits or choose an available free GPU option before rerunning validation.
 ## Safety Notes

docs/FAILURES.md CHANGED Viewed

@@ -8,7 +8,7 @@ Use it for model/runtime/deployment/data issues, not for UI polish notes.
 ## Current Status
-No real model or hosted Space failures have been observed yet because the current implementation uses deterministic mock runtimes.
 Known non-blocking warning:
@@ -43,6 +43,7 @@ Fallback:
 - use manual object description
 - use stable example flow
 - record fallback marker in trace
 ### Text Runtime

 ## Current Status
+MiniCPM-V 2.6 is wired as an optional vision backend. No hosted Space GPU failures have been observed yet because Space GPU validation is still pending.
 Known non-blocking warning:
 - use manual object description
 - use stable example flow
 - record fallback marker in trace
+- `vision-fallback-to-mock` means MiniCPM-V failed or returned invalid JSON and mock object understanding was used.
 ### Text Runtime

docs/INITIAL_STAGE_REPORT.md CHANGED Viewed

@@ -19,15 +19,29 @@ Included:
 - runtime configuration boundary
 - local acceptance checks
-Not included:
 - creating the remote GitHub repository
-- creating the Hugging Face Space
-- real MiniCPM-V integration
-- real llama.cpp / llama-cpp-python text runtime
 - fine-tuning, dataset publishing, Field Notes, and demo video
-Remote GitHub and Hugging Face actions require explicit confirmation because they change external state.
 ## Local Deliverables
@@ -35,8 +49,8 @@ Remote GitHub and Hugging Face actions require explicit confirmation because the
 | --- | --- |
 | Gradio app entrypoint | `app.py` |
 | Shared generation pipeline | `src/pipeline.py` |
-| Mock vision runner | `src/models/vision_runner.py` |
-| Mock text runner | `src/models/llama_cpp_runner.py` |
 | Pydantic schemas | `src/models/schema.py` |
 | Share card renderer | `src/renderer/share_card.py` |
 | Trace logger | `src/traces/logger.py` |
@@ -45,6 +59,7 @@ Remote GitHub and Hugging Face actions require explicit confirmation because the
 | Public mock traces | `data/traces/samples/` |
 | SFT preview generator | `scripts/generate_dataset.py` |
 | Public trace JSONL exporter | `scripts/export_traces.py` |
 | Dataset plan | `docs/DATASET.md` |
 | Failure notes | `docs/FAILURES.md` |
 | Runtime boundary docs | `docs/RUNTIME.md` |
@@ -85,16 +100,19 @@ OK
 ## Current Limitations
-- The app still uses mock model outputs.
-- Phase 2 UI polish is complete, but it still runs on the mock runtime.
 - Sample traces are mock traces, not real model traces.
-- Remote repo and hosted Space are not created yet.
 ## Next Gate
-Before moving to real model integration, confirm whether to create:
-- GitHub repository
-- Hugging Face Space
 See `docs/EXTERNAL_SETUP.md`.

 - runtime configuration boundary
 - local acceptance checks
+Not included in the original initial-stage gate:
 - creating the remote GitHub repository
+- hosted GPU validation for the MiniCPM-V integration
+- real GGUF smoke test for llama.cpp / llama-cpp-python text runtime
 - fine-tuning, dataset publishing, Field Notes, and demo video
+The Hugging Face Space has been created at:
+https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
+Remote GitHub actions still require explicit confirmation because they change external state.
+## Post-Initial Updates
+As of 2026-06-06:
+- MiniCPM-V 2.6 is wired as an optional vision backend with mock fallback.
+- Optional llama.cpp / llama-cpp-python text runtime wiring is available through `TEXT_MODEL_PATH`, with mock fallback.
+- `scripts/check_space_vlm.py` can validate the hosted Space with three temporary public images for mug, keyboard, and shoe.
+- `docs/SPACE_VLM_REPORT.md` exists as the pending remote validation report.
+- Hosted Space L4 validation has not been run yet.
+- No final GGUF text model has been selected, downloaded, or committed.
 ## Local Deliverables
 | --- | --- |
 | Gradio app entrypoint | `app.py` |
 | Shared generation pipeline | `src/pipeline.py` |
+| Vision runner with mock / MiniCPM-V backend | `src/models/vision_runner.py` |
+| Text runner with mock / optional llama.cpp backend | `src/models/llama_cpp_runner.py` |
 | Pydantic schemas | `src/models/schema.py` |
 | Share card renderer | `src/renderer/share_card.py` |
 | Trace logger | `src/traces/logger.py` |
 | Public mock traces | `data/traces/samples/` |
 | SFT preview generator | `scripts/generate_dataset.py` |
 | Public trace JSONL exporter | `scripts/export_traces.py` |
+| Hosted Space VLM validator | `scripts/check_space_vlm.py` |
 | Dataset plan | `docs/DATASET.md` |
 | Failure notes | `docs/FAILURES.md` |
 | Runtime boundary docs | `docs/RUNTIME.md` |
 ## Current Limitations
+- The default app still uses mock model outputs.
+- MiniCPM-V 2.6 vision wiring is available behind `OBJECTVERSE_VISION_BACKEND=minicpm-v`, but hosted GPU validation is still pending.
+- llama.cpp text wiring is available behind `OBJECTVERSE_TEXT_BACKEND=llama-cpp`, but no real GGUF smoke test has been run.
+- Phase 2 UI polish is complete.
 - Sample traces are mock traces, not real model traces.
+- GitHub origin is configured locally, but sync/submission confirmation is still pending.
 ## Next Gate
+Next model gate:
+- verify MiniCPM-V 2.6 on the Hugging Face Space GPU
+- run a real GGUF `TEXT_MODEL_PATH` smoke test
+- confirm GitHub sync / submission target
 See `docs/EXTERNAL_SETUP.md`.

docs/MODEL_CARD.md CHANGED Viewed

@@ -2,9 +2,9 @@
 ## Status
-Draft only. No model has been fine-tuned, converted, or published yet.
-The app currently runs deterministic mock backends. This card is a working template for the later small-model runtime and LoRA adapter.
 ## Planned Components
@@ -16,9 +16,9 @@ The app currently runs deterministic mock backends. This card is a working templ
 | Component | Candidate | Notes |
 | --- | --- | --- |
-| Vision | MiniCPM-V or lightweight VLM fallback | Must run without commercial API calls. |
-| Text | small instruct model plus LoRA adapter | Final base model still pending. |
-| Runtime | GGUF through llama.cpp / llama-cpp-python | Needed for Llama Champion evidence. |
 | UI | Gradio Blocks | Required by the hackathon and project rules. |
 ## Parameter Budget
@@ -29,8 +29,8 @@ Record final numbers here before submission:
 | Component | Model | Parameters | Counted Toward Total |
 | --- | --- | ---: | --- |
-| Vision | TBD | TBD | yes |
-| Text base | TBD | TBD | yes |
 | LoRA adapter | TBD | TBD | yes |
 | Total | TBD | TBD | must be <= 32B |
@@ -67,7 +67,7 @@ Current preview data is deterministic and mock-generated. It should only be used
 ## Fallback Behavior
 - If VLM loading fails, use manual description and stable example flow.
-- If llama.cpp loading fails, keep deterministic mock text fallback for demo safety.
 - If model JSON is invalid, repair and validate before rendering.
 ## Required Notes

 ## Status
+Draft only. No text model has been fine-tuned, converted, or published yet.
+The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`.
 ## Planned Components
 | Component | Candidate | Notes |
 | --- | --- | --- |
+| Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Must run without commercial API calls. |
+| Text | externally configured GGUF, later small instruct model plus LoRA adapter | Final base model still pending. |
+| Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; real-model smoke test still pending. |
 | UI | Gradio Blocks | Required by the hackathon and project rules. |
 ## Parameter Budget
 | Component | Model | Parameters | Counted Toward Total |
 | --- | --- | ---: | --- |
+| Vision | MiniCPM-V 2.6 | ~8B | yes |
+| Text base | Externally configured GGUF, final model TBD | TBD | yes |
 | LoRA adapter | TBD | TBD | yes |
 | Total | TBD | TBD | must be <= 32B |
 ## Fallback Behavior
 - If VLM loading fails, use manual description and stable example flow.
+- If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
 - If model JSON is invalid, repair and validate before rendering.
 ## Required Notes

docs/README.md CHANGED Viewed

@@ -17,10 +17,12 @@ This folder contains the planning source of truth for Objectverse Diary.
 - `FIELD_NOTES.md`: future technical blog draft.
 - `MODEL_CARD.md`: future model documentation.
 - `07-development-plan.md`: detailed development process plan from mock MVP to final submission.
 - `RUNTIME.md`: current mock runtime configuration and future model boundary.
 - `DATASET.md`: SFT preview schema, generation workflow, curation checklist, and publishing notes.
 - `FAILURES.md`: failure record template and anticipated non-UI fallback cases.
 - `INITIAL_STAGE_REPORT.md`: local initial-stage completion evidence and acceptance commands.
 - `PHASE2_UI_REPORT.md`: archive UI completion scope, runtime boundary, and verification targets.
 - `EXTERNAL_SETUP.md`: GitHub and Hugging Face Space setup notes requiring confirmation.
 - `SUBMISSION_GUIDE.md`: final submission checklist.

 - `FIELD_NOTES.md`: future technical blog draft.
 - `MODEL_CARD.md`: future model documentation.
 - `07-development-plan.md`: detailed development process plan from mock MVP to final submission.
+- `DEVELOPMENT_STATUS.md`: current completed / not completed development status.
 - `RUNTIME.md`: current mock runtime configuration and future model boundary.
 - `DATASET.md`: SFT preview schema, generation workflow, curation checklist, and publishing notes.
 - `FAILURES.md`: failure record template and anticipated non-UI fallback cases.
 - `INITIAL_STAGE_REPORT.md`: local initial-stage completion evidence and acceptance commands.
 - `PHASE2_UI_REPORT.md`: archive UI completion scope, runtime boundary, and verification targets.
 - `EXTERNAL_SETUP.md`: GitHub and Hugging Face Space setup notes requiring confirmation.
+- `SPACE_VLM_REPORT.md`: pending hosted Space MiniCPM-V validation report.
 - `SUBMISSION_GUIDE.md`: final submission checklist.

docs/RUNTIME.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ## Current Runtime
-The initial MVP uses deterministic mock runtime paths:
 - `OBJECTVERSE_VISION_BACKEND=mock`
 - `OBJECTVERSE_TEXT_BACKEND=mock`
@@ -15,6 +15,28 @@ This means:
 No commercial cloud AI APIs are used.
 ## Environment Variables
 ```bash
@@ -25,6 +47,25 @@ TEXT_MODEL_PATH=
 TRACE_OUTPUT_DIR=data/traces
 ```
 ## Future Runtime Boundary
 The next implementation phase should keep the same pipeline boundary:
@@ -39,6 +80,14 @@ Do not move model calls into `src/ui/layout.py`.
 ## Fallback Rules
 - VLM unavailable: use manual description and mock/example gallery path.
-- llama.cpp unavailable: use mock text generation path.
-- invalid model JSON: repair and validate before rendering.
 - private input: anonymize trace text before saving public traces.

 ## Current Runtime
+The default MVP runtime uses deterministic mock paths:
 - `OBJECTVERSE_VISION_BACKEND=mock`
 - `OBJECTVERSE_TEXT_BACKEND=mock`
 No commercial cloud AI APIs are used.
+MiniCPM-V 2.6 vision can be enabled without changing the UI:
+```bash
+OBJECTVERSE_VISION_BACKEND=minicpm-v \
+VISION_MODEL_ID=openbmb/MiniCPM-V-2_6 \
+OBJECTVERSE_TEXT_BACKEND=mock \
+.venv/bin/python app.py
+```
+This only replaces object understanding. Persona generation, diary generation, and chat can remain mock or use the optional llama.cpp text path below.
+Optional llama.cpp text generation can be enabled without changing the UI:
+```bash
+pip install llama-cpp-python
+OBJECTVERSE_TEXT_BACKEND=llama-cpp \
+TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
+.venv/bin/python app.py
+```
+`llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
 ## Environment Variables
 ```bash
 TRACE_OUTPUT_DIR=data/traces
 ```
+For the hosted Space, set these Variables:
+```bash
+OBJECTVERSE_VISION_BACKEND=minicpm-v
+VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
+OBJECTVERSE_TEXT_BACKEND=mock
+```
+Recommended Space hardware for this path is 1x Nvidia L4. If GPU is unavailable, switch `OBJECTVERSE_VISION_BACKEND` back to `mock` to keep the demo usable.
+For a Space or local runtime with a separately provided GGUF text model, set:
+```bash
+OBJECTVERSE_TEXT_BACKEND=llama-cpp
+TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
+```
+Do not commit GGUF files or private model paths.
 ## Future Runtime Boundary
 The next implementation phase should keep the same pipeline boundary:
 ## Fallback Rules
 - VLM unavailable: use manual description and mock/example gallery path.
+- llama.cpp unavailable: use mock text generation path and record `text-fallback-to-mock`.
+- invalid model JSON: repair and validate before rendering, then fall back to mock if validation fails.
 - private input: anonymize trace text before saving public traces.
+Trace fallback markers:
+- `mock-runtime`: default mock vision and mock text runtime.
+- `mock-text-runtime`: real or configured vision path with mock text generation.
+- `mock-vision-runtime`: mock vision with a configured non-mock text backend.
+- `vision-fallback-to-mock`: MiniCPM-V failed or returned invalid JSON, so mock object understanding was used.
+- `text-fallback-to-mock`: llama.cpp was configured but unavailable, invalid, or unable to return schema-valid JSON.

docs/SPACE_VLM_REPORT.md ADDED Viewed

	@@ -0,0 +1,42 @@

+# Space VLM Validation Report
+- Generated at: 2026-06-06 04:25 UTC
+- Space URL: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
+- Space repo: `build-small-hackathon/ObjectverseDiary`
+- Overall status: FAIL
+- Vision backend expected: `minicpm-v`
+- Text backend expected: `mock`
+## Space Configuration
+- Requested configuration:
+  - `hardware`: `l4x1`
+  - `OBJECTVERSE_VISION_BACKEND`: `minicpm-v`
+  - `VISION_MODEL_ID`: `openbmb/MiniCPM-V-2_6`
+  - `OBJECTVERSE_TEXT_BACKEND`: `mock`
+- Rollback configuration applied:
+  - `hardware`: `cpu-basic`
+  - `OBJECTVERSE_VISION_BACKEND`: `mock`
+  - `OBJECTVERSE_TEXT_BACKEND`: `mock`
+## Configuration Error
+- Error: `HfHubHTTPError: 402 Payment Required`
+- Meaning: Hugging Face requires pre-paid credits or billing access for the `build-small-hackathon` organization before the Space can use paid `l4x1` hardware.
+- Impact: Remote MiniCPM-V validation did not run. No mug / keyboard / shoe image inference results were produced.
+- Safety outcome: Mock-safe rollback was run after the failed hardware request.
+- Post-rollback runtime check: Space is `RUNNING` with `hardware=cpu-basic` and `requested_hardware=cpu-basic`.
+## Results
+- Coffee mug: NOT RUN
+- Computer keyboard: NOT RUN
+- Running shoe: NOT RUN
+## Notes
+- Test images are temporary public Wikimedia Commons assets and are not committed.
+- Text generation remains mock during this validation plan.
+- No tokens, secrets, or private file paths are recorded in this report.
+- Next unblock step: enable billing/pre-paid credits for the Hugging Face organization or choose an available free GPU option, then rerun `scripts/check_space_vlm.py`.

docs/SUBMISSION_GUIDE.md CHANGED Viewed

@@ -2,8 +2,8 @@
 ## Required Package
-- [ ] Hugging Face Space URL: pending external setup
-- [ ] GitHub Repository URL: pending external setup
 - [ ] Demo Video URL: pending recording
 - [ ] Social Media Post URL: pending final copy
 - [ ] Fine-tuned Model URL: pending model training
@@ -18,11 +18,28 @@
 - Runtime boundary: `docs/RUNTIME.md`
 - Dataset plan and preview workflow: `docs/DATASET.md`
 - External setup checklist: `docs/EXTERNAL_SETUP.md`
 - Public mock traces: `data/traces/samples/`
 ## Final Checks
 - [ ] Space is under the official organization.
 - [ ] Demo video is under 2 minutes.
 - [ ] README includes model parameter counts.
 - [ ] No commercial cloud AI APIs are used.

 ## Required Package
+- [x] Hugging Face Space URL: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
+- [ ] GitHub Repository URL: local `origin` configured, sync/submission confirmation pending
 - [ ] Demo Video URL: pending recording
 - [ ] Social Media Post URL: pending final copy
 - [ ] Fine-tuned Model URL: pending model training
 - Runtime boundary: `docs/RUNTIME.md`
 - Dataset plan and preview workflow: `docs/DATASET.md`
 - External setup checklist: `docs/EXTERNAL_SETUP.md`
+- Space VLM validation report: `docs/SPACE_VLM_REPORT.md` currently failed because `l4x1` hardware returned `402 Payment Required`.
 - Public mock traces: `data/traces/samples/`
+- Optional llama.cpp runtime wiring: `src/models/llama_cpp_runner.py`
+## Completed Locally
+- Mock MVP flow, archive-style UI, share card, trace logging, sample traces, dataset preview, and initial acceptance tooling.
+- MiniCPM-V 2.6 backend wiring with fallback markers.
+- Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
+- Hosted Space VLM validation script and pending report template.
+## Not Completed Yet
+- Hosted Space L4 MiniCPM-V validation for mug, keyboard, and shoe; attempted and blocked by Hugging Face paid hardware billing.
+- Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count.
+- Real model traces, curated dataset, LoRA training, model/dataset publishing.
+- Field Notes article, demo video, social post, final submission package.
 ## Final Checks
 - [ ] Space is under the official organization.
+- [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe. Current status: blocked by paid hardware billing.
 - [ ] Demo video is under 2 minutes.
 - [ ] README includes model parameter counts.
 - [ ] No commercial cloud AI APIs are used.

pyproject.toml CHANGED Viewed

@@ -6,8 +6,14 @@ requires-python = ">=3.10"
 dependencies = [
     "gradio>=4.44,<6",
     "pydantic>=2.7,<3",
 ]
 [tool.objectverse-diary]
-status = "initial-mock-mvp"
-implementation = "mock-runtime"

 dependencies = [
     "gradio>=4.44,<6",
     "pydantic>=2.7,<3",
+    "torch",
+    "torchvision",
+    "transformers>=4.40,<5",
+    "Pillow",
+    "sentencepiece",
+    "accelerate",
 ]
 [tool.objectverse-diary]
+status = "vlm-ready-mock-text"
+implementation = "minicpm-v-or-mock-vision-with-mock-text"

requirements.txt CHANGED Viewed

@@ -1,2 +1,8 @@
 gradio>=4.44,<6
 pydantic>=2.7,<3

 gradio>=4.44,<6
 pydantic>=2.7,<3
+torch
+torchvision
+transformers>=4.40,<5
+Pillow
+sentencepiece
+accelerate

scripts/README.md CHANGED Viewed

@@ -8,6 +8,7 @@ Implemented initial scripts:
 - `generate_sample_traces.py`: creates six stable public mock traces under `data/traces/samples/`.
 - `generate_dataset.py`: creates deterministic SFT preview JSONL for schema and curation planning.
 - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
 Expected files during implementation:
@@ -15,4 +16,18 @@ Expected files during implementation:
 - `convert_to_gguf.sh`
 - `run_llama_cpp.sh`
-Current status: mock trace generation, trace JSONL export, and SFT preview generation are implemented. Real model, fine-tuning, and GGUF conversion scripts are not implemented yet.

 - `generate_sample_traces.py`: creates six stable public mock traces under `data/traces/samples/`.
 - `generate_dataset.py`: creates deterministic SFT preview JSONL for schema and curation planning.
 - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
+- `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
 Expected files during implementation:
 - `convert_to_gguf.sh`
 - `run_llama_cpp.sh`
+Space VLM validation:
+```bash
+.venv/bin/python -B scripts/check_space_vlm.py \
+  --space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
+  --output docs/SPACE_VLM_REPORT.md
+```
+External Space changes are explicit:
+```bash
+.venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
+```
+Current status: mock trace generation, trace JSONL export, SFT preview generation, optional MiniCPM-V wiring, optional llama.cpp wiring, and hosted Space VLM validation tooling are implemented. Real model validation on Space, fine-tuning, and GGUF conversion are not completed yet.

scripts/check_space_vlm.py ADDED Viewed

	@@ -0,0 +1,481 @@

+"""Validate MiniCPM-V object understanding on the hosted Hugging Face Space."""
+from __future__ import annotations
+import argparse
+import json
+import sys
+import time
+import urllib.request
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+from urllib.parse import urlparse
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+from src.models.schema import TraceRecord
+DEFAULT_SPACE_URL = "https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary"
+DEFAULT_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.md")
+DEFAULT_JSON_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.json")
+DEFAULT_ASSET_DIR = Path(".tmp/space-vlm-assets")
+DEFAULT_HARDWARE = "l4x1"
+MOCK_SAFE_HARDWARE = "cpu-basic"
+GENERATE_API_NAME = "/generate_object_file"
+REQUEST_TIMEOUT_SECONDS = 45
+SPACE_VARIABLES = {
+    "OBJECTVERSE_VISION_BACKEND": "minicpm-v",
+    "VISION_MODEL_ID": "openbmb/MiniCPM-V-2_6",
+    "OBJECTVERSE_TEXT_BACKEND": "mock",
+}
+MOCK_SAFE_VARIABLES = {
+    "OBJECTVERSE_VISION_BACKEND": "mock",
+    "OBJECTVERSE_TEXT_BACKEND": "mock",
+}
+@dataclass(frozen=True)
+class ValidationAsset:
+    key: str
+    label: str
+    source_page: str
+    download_url: str
+    expected_terms: tuple[str, ...]
+    description: str
+    mode: str = "Cynical"
+@dataclass(frozen=True)
+class ValidationResult:
+    key: str
+    label: str
+    source_page: str
+    image_path: str
+    passed: bool
+    object_name: str
+    visible_features: list[str]
+    likely_context: str
+    confidence: float
+    runtime_vision: str
+    runtime_text: str
+    fallbacks: list[str]
+    error: str = ""
+TEST_ASSETS = [
+    ValidationAsset(
+        key="mug",
+        label="Coffee mug",
+        source_page="https://commons.wikimedia.org/wiki/File:Striped_coffee_mug.jpg",
+        download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Striped_coffee_mug.jpg",
+        expected_terms=("mug", "cup", "coffee", "ceramic", "handle"),
+        description="A public Wikimedia Commons photo of a striped coffee mug.",
+    ),
+    ValidationAsset(
+        key="keyboard",
+        label="Computer keyboard",
+        source_page="https://commons.wikimedia.org/wiki/File:Computer_keyboard.jpg",
+        download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Computer_keyboard.jpg",
+        expected_terms=("keyboard", "key", "computer", "keys"),
+        description="A public Wikimedia Commons photo of a computer keyboard.",
+        mode="Philosopher",
+    ),
+    ValidationAsset(
+        key="shoe",
+        label="Running shoe",
+        source_page="https://commons.wikimedia.org/wiki/File:Running_shoes.jpg",
+        download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Running_shoes.jpg",
+        expected_terms=("shoe", "sneaker", "running", "footwear", "trainer"),
+        description="A public Wikimedia Commons photo of running shoes.",
+        mode="Dramatic",
+    ),
+]
+def parse_space_repo_id(space_url: str) -> str:
+    parsed = urlparse(space_url)
+    parts = [part for part in parsed.path.split("/") if part]
+    if len(parts) >= 3 and parts[0] == "spaces":
+        return f"{parts[1]}/{parts[2]}"
+    if len(parts) == 2:
+        return f"{parts[0]}/{parts[1]}"
+    raise ValueError(f"Could not parse Hugging Face Space repo id from {space_url!r}")
+def download_validation_assets(
+    asset_dir: Path = DEFAULT_ASSET_DIR,
+    assets: list[ValidationAsset] | None = None,
+) -> dict[str, Path]:
+    selected_assets = assets or TEST_ASSETS
+    asset_dir.mkdir(parents=True, exist_ok=True)
+    paths: dict[str, Path] = {}
+    for asset in selected_assets:
+        output_path = asset_dir / f"{asset.key}.jpg"
+        if not output_path.exists():
+            _download_url(asset.download_url, output_path)
+        paths[asset.key] = output_path
+    return paths
+def configure_space_for_vlm(
+    repo_id: str,
+    *,
+    hardware: str = DEFAULT_HARDWARE,
+    wait: bool = True,
+    timeout_seconds: int = 900,
+) -> dict[str, str]:
+    from huggingface_hub import HfApi, SpaceHardware
+    api = HfApi()
+    _assert_hf_auth(api)
+    for key, value in SPACE_VARIABLES.items():
+        api.add_space_variable(repo_id=repo_id, key=key, value=value)
+    api.request_space_hardware(repo_id=repo_id, hardware=SpaceHardware(hardware))
+    if wait:
+        wait_for_space_running(repo_id, timeout_seconds=timeout_seconds)
+    return {"repo_id": repo_id, "hardware": hardware, **SPACE_VARIABLES}
+def rollback_space_to_mock(repo_id: str, *, hardware: str = MOCK_SAFE_HARDWARE) -> dict[str, str]:
+    from huggingface_hub import HfApi, SpaceHardware
+    api = HfApi()
+    _assert_hf_auth(api)
+    for key, value in MOCK_SAFE_VARIABLES.items():
+        api.add_space_variable(repo_id=repo_id, key=key, value=value)
+    api.request_space_hardware(repo_id=repo_id, hardware=SpaceHardware(hardware))
+    return {"repo_id": repo_id, "hardware": hardware, **MOCK_SAFE_VARIABLES}
+def wait_for_space_running(
+    repo_id: str,
+    *,
+    timeout_seconds: int = 900,
+    poll_seconds: int = 20,
+) -> str:
+    from huggingface_hub import HfApi
+    api = HfApi()
+    deadline = time.monotonic() + timeout_seconds
+    last_stage = "unknown"
+    while time.monotonic() < deadline:
+        runtime = api.get_space_runtime(repo_id=repo_id)
+        last_stage = _runtime_stage_name(runtime)
+        if last_stage.upper() == "RUNNING":
+            return last_stage
+        time.sleep(poll_seconds)
+    raise TimeoutError(f"Space {repo_id} did not reach RUNNING within {timeout_seconds}s; last stage: {last_stage}")
+def run_space_validation(
+    *,
+    space_url: str = DEFAULT_SPACE_URL,
+    asset_dir: Path = DEFAULT_ASSET_DIR,
+    timeout_seconds: int = 900,
+    assets: list[ValidationAsset] | None = None,
+) -> list[ValidationResult]:
+    from gradio_client import Client, handle_file
+    selected_assets = assets or TEST_ASSETS
+    paths = download_validation_assets(asset_dir, selected_assets)
+    client = Client(space_url, verbose=False)
+    results: list[ValidationResult] = []
+    started = time.monotonic()
+    for asset in selected_assets:
+        remaining = timeout_seconds - int(time.monotonic() - started)
+        if remaining <= 0:
+            raise TimeoutError(f"Validation exceeded timeout of {timeout_seconds}s")
+        try:
+            response = client.predict(
+                handle_file(str(paths[asset.key])),
+                asset.description,
+                asset.mode,
+                api_name=GENERATE_API_NAME,
+            )
+            results.append(validate_prediction(asset, paths[asset.key], response))
+        except Exception as exc:
+            results.append(
+                ValidationResult(
+                    key=asset.key,
+                    label=asset.label,
+                    source_page=asset.source_page,
+                    image_path=str(paths[asset.key]),
+                    passed=False,
+                    object_name="",
+                    visible_features=[],
+                    likely_context="",
+                    confidence=0.0,
+                    runtime_vision="",
+                    runtime_text="",
+                    fallbacks=[],
+                    error=f"{type(exc).__name__}: {exc}",
+                )
+            )
+    return results
+def validate_prediction(
+    asset: ValidationAsset,
+    image_path: Path,
+    response: Any,
+) -> ValidationResult:
+    trace_payload = _extract_trace_payload(response)
+    trace = TraceRecord.model_validate(trace_payload)
+    object_info = trace.object_understanding.object
+    search_text = " ".join(
+        [
+            object_info.name,
+            object_info.likely_context,
+            " ".join(object_info.visible_features),
+        ]
+    ).lower()
+    expected_match = any(term in search_text for term in asset.expected_terms)
+    vision_runtime_ok = trace.model_runtime.get("vision") == "minicpm-v object understanding"
+    text_runtime_ok = trace.model_runtime.get("text") == "mock persona and diary generation"
+    no_vision_fallback = "vision-fallback-to-mock" not in trace.fallbacks
+    passed = expected_match and vision_runtime_ok and text_runtime_ok and no_vision_fallback
+    return ValidationResult(
+        key=asset.key,
+        label=asset.label,
+        source_page=asset.source_page,
+        image_path=str(image_path),
+        passed=passed,
+        object_name=object_info.name,
+        visible_features=object_info.visible_features,
+        likely_context=object_info.likely_context,
+        confidence=object_info.confidence,
+        runtime_vision=trace.model_runtime.get("vision", ""),
+        runtime_text=trace.model_runtime.get("text", ""),
+        fallbacks=trace.fallbacks,
+        error="" if passed else _failure_reason(expected_match, vision_runtime_ok, text_runtime_ok, no_vision_fallback),
+    )
+def render_report(
+    *,
+    space_url: str,
+    repo_id: str,
+    results: list[ValidationResult],
+    configured: dict[str, str] | None = None,
+    rollback: dict[str, str] | None = None,
+    configuration_error: str = "",
+) -> str:
+    now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
+    status = "NOT RUN"
+    if configuration_error:
+        status = "FAIL"
+    elif results:
+        status = "PASS" if all(result.passed for result in results) else "FAIL"
+    lines = [
+        "# Space VLM Validation Report",
+        "",
+        f"- Generated at: {now}",
+        f"- Space URL: {space_url}",
+        f"- Space repo: `{repo_id}`",
+        f"- Overall status: {status}",
+        "- Vision backend expected: `minicpm-v`",
+        "- Text backend expected: `mock`",
+        "",
+        "## Space Configuration",
+        "",
+    ]
+    if configured:
+        lines.extend(_config_lines("Applied configuration", configured))
+    else:
+        lines.append("- Applied configuration: not changed by this run.")
+    if rollback:
+        lines.extend(["", *_config_lines("Rollback configuration", rollback)])
+    else:
+        lines.append("- Rollback configuration: not applied by this run.")
+    if configuration_error:
+        lines.extend(["", "## Configuration Error", "", f"- Error: `{configuration_error}`"])
+    lines.extend(["", "## Results", ""])
+    for result in results:
+        lines.extend(
+            [
+                f"### {result.label}",
+                "",
+                f"- Status: {'PASS' if result.passed else 'FAIL'}",
+                f"- Source: {result.source_page}",
+                f"- Local temporary image: `{result.image_path}`",
+                f"- Object name: `{result.object_name}`",
+                f"- Visible features: {', '.join(result.visible_features) or 'n/a'}",
+                f"- Likely context: `{result.likely_context}`",
+                f"- Confidence: {result.confidence:.2f}",
+                f"- Runtime vision: `{result.runtime_vision}`",
+                f"- Runtime text: `{result.runtime_text}`",
+                f"- Fallbacks: {', '.join(result.fallbacks) or 'none'}",
+            ]
+        )
+        if result.error:
+            lines.append(f"- Error: `{result.error}`")
+        lines.append("")
+    lines.extend(
+        [
+            "## Notes",
+            "",
+            "- Test images are temporary public Wikimedia Commons assets and are not committed.",
+            "- No tokens, secrets, or private file paths should be recorded in this report.",
+            "- If validation fails, switch `OBJECTVERSE_VISION_BACKEND` back to `mock` to keep the demo usable.",
+        ]
+    )
+    return "\n".join(lines) + "\n"
+def write_report(markdown: str, output_path: Path = DEFAULT_OUTPUT_PATH) -> Path:
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    output_path.write_text(markdown, encoding="utf-8")
+    return output_path
+def write_json_results(results: list[ValidationResult], output_path: Path) -> Path:
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    payload = [result.__dict__ for result in results]
+    output_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
+    return output_path
+def _download_url(url: str, output_path: Path) -> None:
+    request = urllib.request.Request(
+        url,
+        headers={"User-Agent": "Objectverse-Diary-Space-VLM-Check/0.1"},
+    )
+    with urllib.request.urlopen(request, timeout=REQUEST_TIMEOUT_SECONDS) as response:
+        output_path.write_bytes(response.read())
+def _extract_trace_payload(response: Any) -> dict[str, Any]:
+    if isinstance(response, tuple | list):
+        if len(response) < 7:
+            raise ValueError("Gradio response did not include trace JSON output.")
+        trace_payload = response[6]
+    elif isinstance(response, dict) and "trace" in response:
+        trace_payload = response["trace"]
+    else:
+        raise ValueError("Unsupported Gradio response shape.")
+    if not isinstance(trace_payload, dict):
+        raise ValueError("Trace output was not a JSON object.")
+    return trace_payload
+def _failure_reason(
+    expected_match: bool,
+    vision_runtime_ok: bool,
+    text_runtime_ok: bool,
+    no_vision_fallback: bool,
+) -> str:
+    reasons: list[str] = []
+    if not expected_match:
+        reasons.append("object output did not match expected terms")
+    if not vision_runtime_ok:
+        reasons.append("vision runtime was not minicpm-v")
+    if not text_runtime_ok:
+        reasons.append("text runtime was not mock")
+    if not no_vision_fallback:
+        reasons.append("vision fallback marker was present")
+    return "; ".join(reasons)
+def _runtime_stage_name(runtime: Any) -> str:
+    stage = getattr(runtime, "stage", None)
+    if stage is None and isinstance(runtime, dict):
+        stage = runtime.get("stage")
+    if hasattr(stage, "value"):
+        return str(stage.value)
+    return str(stage or "unknown")
+def _assert_hf_auth(api: Any) -> None:
+    try:
+        user = api.whoami()
+    except Exception as exc:
+        raise RuntimeError("Hugging Face authentication is required for Space configuration.") from exc
+    if not isinstance(user, dict) or not user.get("name"):
+        raise RuntimeError("Hugging Face authentication did not return a user name.")
+def _config_lines(title: str, config: dict[str, str]) -> list[str]:
+    lines = [f"- {title}:"]
+    for key, value in config.items():
+        lines.append(f"  - `{key}`: `{value}`")
+    return lines
+def _parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--space-url", default=DEFAULT_SPACE_URL)
+    parser.add_argument("--asset-dir", type=Path, default=DEFAULT_ASSET_DIR)
+    parser.add_argument("--output", type=Path, default=DEFAULT_OUTPUT_PATH)
+    parser.add_argument("--json-output", type=Path)
+    parser.add_argument("--timeout-seconds", type=int, default=900)
+    parser.add_argument("--configure-space", action="store_true")
+    parser.add_argument("--rollback-to-mock", action="store_true")
+    parser.add_argument("--hardware", default=DEFAULT_HARDWARE)
+    parser.add_argument("--skip-validation", action="store_true")
+    return parser.parse_args()
+def main() -> None:
+    args = _parse_args()
+    repo_id = parse_space_repo_id(args.space_url)
+    configured = None
+    rollback = None
+    configuration_error = ""
+    if args.configure_space:
+        try:
+            configured = configure_space_for_vlm(
+                repo_id,
+                hardware=args.hardware,
+                wait=True,
+                timeout_seconds=args.timeout_seconds,
+            )
+        except Exception as exc:
+            configuration_error = f"{type(exc).__name__}: {exc}"
+            if args.rollback_to_mock:
+                try:
+                    rollback = rollback_space_to_mock(repo_id)
+                except Exception as rollback_exc:
+                    configuration_error = (
+                        f"{configuration_error}; rollback failed with "
+                        f"{type(rollback_exc).__name__}: {rollback_exc}"
+                    )
+    results: list[ValidationResult] = []
+    if not args.skip_validation and not configuration_error:
+        results = run_space_validation(
+            space_url=args.space_url,
+            asset_dir=args.asset_dir,
+            timeout_seconds=args.timeout_seconds,
+        )
+    if args.rollback_to_mock and rollback is None:
+        rollback = rollback_space_to_mock(repo_id)
+    report = render_report(
+        space_url=args.space_url,
+        repo_id=repo_id,
+        results=results,
+        configured=configured,
+        rollback=rollback,
+        configuration_error=configuration_error,
+    )
+    write_report(report, args.output)
+    if args.json_output:
+        write_json_results(results, args.json_output)
+    if configuration_error or (results and not all(result.passed for result in results)):
+        raise SystemExit(1)
+    print(f"wrote Space VLM report to {args.output}")
+if __name__ == "__main__":
+    main()

src/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 This directory is reserved for application source code.
-Current status: initial mock MVP. Real model runtimes are not connected yet.
 ## Planned Areas

 This directory is reserved for application source code.
+Current status: initial mock MVP with optional MiniCPM-V 2.6 vision backend. Text generation remains mock until the llama.cpp phase.
 ## Planned Areas

src/config.py CHANGED Viewed

@@ -43,21 +43,26 @@ def get_runtime_settings(environ: Mapping[str, str] | None = None) -> RuntimeSet
 def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
     current = settings or get_runtime_settings()
     vision = (
         "mock object understanding"
-        if current.vision_backend == "mock"
-        else f"{current.vision_backend} object understanding"
-    )
-    text = (
-        "mock persona and diary generation"
-        if current.text_backend == "mock"
-        else f"{current.text_backend} persona and diary generation"
-    )
-    runtime = (
-        "no llama.cpp model connected yet"
-        if current.text_backend == "mock"
-        else f"text model path: {current.text_model_path or '[not configured]'}"
     )
     return {"vision": vision, "text": text, "runtime": runtime}

 def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
     current = settings or get_runtime_settings()
+    vision_backend = current.vision_backend.strip().lower()
+    text_backend = current.text_backend.strip().lower()
     vision = (
         "mock object understanding"
+        if vision_backend == "mock"
+        else f"{vision_backend} object understanding"
     )
+    text = "mock persona and diary generation"
+    if text_backend in {"llama-cpp", "llama_cpp", "llamacpp"}:
+        text = "llama-cpp text generation"
+    elif text_backend != "mock":
+        text = f"{text_backend} text generation"
+    runtime_parts: list[str] = []
+    if vision_backend != "mock":
+        runtime_parts.append(f"vision model id: {current.vision_model_id or '[not configured]'}")
+    if text_backend == "mock":
+        runtime_parts.append("no llama.cpp model connected yet")
+    else:
+        runtime_parts.append(f"text model path: {current.text_model_path or '[not configured]'}")
+    runtime = "; ".join(runtime_parts)
     return {"vision": vision, "text": text, "runtime": runtime}

src/models/llama_cpp_runner.py CHANGED Viewed

@@ -1,8 +1,16 @@
-"""Mock text generation layer reserved for future llama.cpp integration."""
 from __future__ import annotations
 from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
 MODE_PROFILES = {
@@ -33,8 +41,58 @@ MODE_PROFILES = {
     },
 }
 def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
     object_name = object_understanding.object.name
     profile = MODE_PROFILES.get(mode, MODE_PROFILES["Cynical"])
     character_name = _character_name(object_name, mode)
@@ -51,7 +109,7 @@ def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> Pe
     return PersonaEnvelope(persona=persona)
-def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
     p = persona.persona
     day_number = 417 + len(p.object_name)
@@ -74,7 +132,7 @@ def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
     )
-def reply_as_object(persona_data: dict, message: str) -> str:
     persona = persona_data.get("persona", {})
     character_name = persona.get("character_name", "The Object")
     object_name = persona.get("object_name", "object")
@@ -88,6 +146,170 @@ def reply_as_object(persona_data: dict, message: str) -> str:
     )
 def _character_name(object_name: str, mode: str) -> str:
     compact = "".join(part.capitalize() for part in object_name.split()[:2])
     suffix = {

+"""Text generation runtime with mock and optional llama.cpp backends."""
 from __future__ import annotations
+import json
+from pathlib import Path
+from typing import Any
+from src.config import RuntimeSettings, get_runtime_settings
 from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
+from src.prompts.diary_generation import CHAT_REPLY_PROMPT, DIARY_GENERATION_PROMPT
+from src.prompts.persona_generation import PERSONA_GENERATION_PROMPT
+from src.utils.json_repair import parse_json_object
 MODE_PROFILES = {
     },
 }
+LLAMA_CPP_BACKENDS = {"llama-cpp", "llama_cpp", "llamacpp"}
+TEXT_FALLBACK_TO_MOCK = "text-fallback-to-mock"
+_LLAMA_MODEL: Any | None = None
+_LLAMA_MODEL_PATH: str | None = None
+_TEXT_FALLBACKS: list[str] = []
 def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
+    settings = get_runtime_settings()
+    if _is_llama_cpp_backend(settings):
+        try:
+            return _generate_persona_llama_cpp(object_understanding, mode, settings)
+        except Exception as exc:
+            _log_text_fallback("persona", exc)
+            _add_text_fallback(TEXT_FALLBACK_TO_MOCK)
+    return _generate_persona_mock(object_understanding, mode)
+def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
+    settings = get_runtime_settings()
+    if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
+        try:
+            return _generate_diary_llama_cpp(persona, mode, settings)
+        except Exception as exc:
+            _log_text_fallback("diary", exc)
+            _add_text_fallback(TEXT_FALLBACK_TO_MOCK)
+    return _generate_diary_mock(persona, mode)
+def reply_as_object(persona_data: dict, message: str) -> str:
+    settings = get_runtime_settings()
+    if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
+        try:
+            return _reply_as_object_llama_cpp(persona_data, message, settings)
+        except Exception as exc:
+            _log_text_fallback("chat", exc)
+    return _reply_as_object_mock(persona_data, message)
+def reset_text_runtime_fallbacks() -> None:
+    _TEXT_FALLBACKS.clear()
+def get_text_runtime_fallbacks() -> list[str]:
+    return list(_TEXT_FALLBACKS)
+def _generate_persona_mock(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
     object_name = object_understanding.object.name
     profile = MODE_PROFILES.get(mode, MODE_PROFILES["Cynical"])
     character_name = _character_name(object_name, mode)
     return PersonaEnvelope(persona=persona)
+def _generate_diary_mock(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
     p = persona.persona
     day_number = 417 + len(p.object_name)
     )
+def _reply_as_object_mock(persona_data: dict, message: str) -> str:
     persona = persona_data.get("persona", {})
     character_name = persona.get("character_name", "The Object")
     object_name = persona.get("object_name", "object")
     )
+def _generate_persona_llama_cpp(
+    object_understanding: ObjectUnderstanding,
+    mode: str,
+    settings: RuntimeSettings,
+) -> PersonaEnvelope:
+    raw = _run_llama_json(
+        system_prompt=PERSONA_GENERATION_PROMPT,
+        user_payload={
+            "mode": mode,
+            "object_understanding": object_understanding.model_dump(mode="json"),
+        },
+        settings=settings,
+        max_tokens=320,
+    )
+    return PersonaEnvelope.model_validate(raw)
+def _generate_diary_llama_cpp(
+    persona: PersonaEnvelope,
+    mode: str,
+    settings: RuntimeSettings,
+) -> DiaryEntry:
+    raw = _run_llama_json(
+        system_prompt=DIARY_GENERATION_PROMPT,
+        user_payload={
+            "mode": mode,
+            "persona": persona.model_dump(mode="json"),
+        },
+        settings=settings,
+        max_tokens=360,
+    )
+    return DiaryEntry.model_validate(raw)
+def _reply_as_object_llama_cpp(
+    persona_data: dict,
+    message: str,
+    settings: RuntimeSettings,
+) -> str:
+    PersonaEnvelope.model_validate(persona_data)
+    raw = _run_llama_json(
+        system_prompt=CHAT_REPLY_PROMPT,
+        user_payload={
+            "persona": persona_data,
+            "message": message.strip() or "...",
+        },
+        settings=settings,
+        max_tokens=180,
+    )
+    reply = raw.get("reply")
+    if not isinstance(reply, str) or not reply.strip():
+        raise ValueError("llama.cpp chat response did not include a non-empty reply.")
+    return reply.strip()
+def _run_llama_json(
+    *,
+    system_prompt: str,
+    user_payload: dict[str, Any],
+    settings: RuntimeSettings,
+    max_tokens: int,
+) -> dict[str, Any]:
+    model = _load_llama_model(settings.text_model_path)
+    user_content = json.dumps(user_payload, ensure_ascii=False, indent=2)
+    raw = _complete_llama(
+        model,
+        system_prompt=system_prompt,
+        user_content=user_content,
+        max_tokens=max_tokens,
+    )
+    return parse_json_object(raw)
+def _complete_llama(
+    model: Any,
+    *,
+    system_prompt: str,
+    user_content: str,
+    max_tokens: int,
+) -> str:
+    stop = ["</s>", "<|end|>", "<|eot_id|>", "<|im_end|>"]
+    if hasattr(model, "create_chat_completion"):
+        response = model.create_chat_completion(
+            messages=[
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": user_content},
+            ],
+            temperature=0.75,
+            max_tokens=max_tokens,
+            stop=stop,
+        )
+        return _extract_completion_text(response)
+    prompt = f"System:\n{system_prompt}\n\nUser:\n{user_content}\n\nAssistant JSON:\n"
+    response = model(
+        prompt,
+        temperature=0.75,
+        max_tokens=max_tokens,
+        stop=stop,
+    )
+    return _extract_completion_text(response)
+def _extract_completion_text(response: Any) -> str:
+    if isinstance(response, str):
+        return response
+    if not isinstance(response, dict):
+        raise ValueError("llama.cpp returned an unsupported response type.")
+    choices = response.get("choices")
+    if not isinstance(choices, list) or not choices:
+        raise ValueError("llama.cpp response did not include choices.")
+    first = choices[0]
+    if not isinstance(first, dict):
+        raise ValueError("llama.cpp response choice was not an object.")
+    message = first.get("message")
+    if isinstance(message, dict) and isinstance(message.get("content"), str):
+        return message["content"]
+    if isinstance(first.get("text"), str):
+        return first["text"]
+    raise ValueError("llama.cpp response did not include text content.")
+def _load_llama_model(text_model_path: str) -> Any:
+    global _LLAMA_MODEL, _LLAMA_MODEL_PATH
+    clean_path = text_model_path.strip()
+    if not clean_path:
+        raise ValueError("TEXT_MODEL_PATH is not configured.")
+    if not Path(clean_path).exists():
+        raise FileNotFoundError(f"TEXT_MODEL_PATH does not exist: {clean_path}")
+    if _LLAMA_MODEL is not None and _LLAMA_MODEL_PATH == clean_path:
+        return _LLAMA_MODEL
+    from llama_cpp import Llama
+    _LLAMA_MODEL = Llama(
+        model_path=clean_path,
+        n_ctx=2048,
+        verbose=False,
+    )
+    _LLAMA_MODEL_PATH = clean_path
+    return _LLAMA_MODEL
+def _is_llama_cpp_backend(settings: RuntimeSettings) -> bool:
+    return settings.text_backend.strip().lower() in LLAMA_CPP_BACKENDS
+def _add_text_fallback(marker: str) -> None:
+    if marker not in _TEXT_FALLBACKS:
+        _TEXT_FALLBACKS.append(marker)
+def _log_text_fallback(stage: str, exc: Exception) -> None:
+    print(
+        f"[Objectverse Diary] Text runtime fell back to mock during {stage}: {type(exc).__name__}",
+        flush=True,
+    )
 def _character_name(object_name: str, mode: str) -> str:
     compact = "".join(part.capitalize() for part in object_name.split()[:2])
     suffix = {

src/models/vision_runner.py CHANGED Viewed

@@ -1,10 +1,14 @@
-"""Mock object understanding for the initial MVP."""
 from __future__ import annotations
 from pathlib import Path
 from src.models.schema import ObjectInfo, ObjectUnderstanding
 KNOWN_OBJECTS = {
@@ -19,9 +23,55 @@ KNOWN_OBJECTS = {
     "bag": "bag",
 }
 def understand_object(image_path: str | None, description: str) -> ObjectUnderstanding:
-    """Return deterministic mock object understanding until VLM integration starts."""
     clean_description = description.strip()
     object_name = _infer_object_name(clean_description, image_path)
     features = _infer_features(clean_description, image_path)
@@ -36,6 +86,86 @@ def understand_object(image_path: str | None, description: str) -> ObjectUnderst
     )
 def _infer_object_name(description: str, image_path: str | None) -> str:
     lowered = description.lower()
     for keyword, name in KNOWN_OBJECTS.items():

+"""Object understanding runtime for mock and MiniCPM-V backends."""
 from __future__ import annotations
+from dataclasses import dataclass
 from pathlib import Path
+from typing import Any
+from src.config import RuntimeSettings, get_runtime_settings
 from src.models.schema import ObjectInfo, ObjectUnderstanding
+from src.utils.json_repair import parse_json_object
 KNOWN_OBJECTS = {
     "bag": "bag",
 }
+MINICPM_DEFAULT_MODEL_ID = "openbmb/MiniCPM-V-2_6"
+MINICPM_BACKENDS = {"minicpm-v", "minicpm_v", "minicpmv"}
+_MINICPM_MODEL: Any | None = None
+_MINICPM_TOKENIZER: Any | None = None
+_MINICPM_MODEL_ID: str | None = None
+@dataclass(frozen=True)
+class VisionRunResult:
+    object_understanding: ObjectUnderstanding
+    fallbacks: list[str]
 def understand_object(image_path: str | None, description: str) -> ObjectUnderstanding:
+    """Return object understanding without exposing runtime metadata."""
+    return understand_object_with_metadata(image_path, description).object_understanding
+def understand_object_with_metadata(
+    image_path: str | None,
+    description: str,
+    *,
+    settings: RuntimeSettings | None = None,
+) -> VisionRunResult:
+    current = settings or get_runtime_settings()
+    backend = current.vision_backend.strip().lower()
+    if backend == "mock":
+        return VisionRunResult(_understand_object_mock(image_path, description), [])
+    if backend in MINICPM_BACKENDS:
+        try:
+            return VisionRunResult(_understand_object_minicpm(image_path, description, current), [])
+        except Exception as exc:
+            _log_vision_fallback("minicpm-v", exc)
+            return VisionRunResult(
+                _understand_object_mock(image_path, description),
+                ["vision-fallback-to-mock"],
+            )
+    return VisionRunResult(
+        _understand_object_mock(image_path, description),
+        [f"unknown-vision-backend-{backend}-fallback-to-mock"],
+    )
+def _understand_object_mock(image_path: str | None, description: str) -> ObjectUnderstanding:
+    """Return deterministic mock object understanding for fallback-safe demos."""
     clean_description = description.strip()
     object_name = _infer_object_name(clean_description, image_path)
     features = _infer_features(clean_description, image_path)
     )
+def _understand_object_minicpm(
+    image_path: str | None,
+    description: str,
+    settings: RuntimeSettings,
+) -> ObjectUnderstanding:
+    if not image_path:
+        raise ValueError("MiniCPM-V requires an uploaded image.")
+    model_id = settings.vision_model_id or MINICPM_DEFAULT_MODEL_ID
+    model, tokenizer = _load_minicpm_components(model_id)
+    image = _load_rgb_image(image_path)
+    prompt = _object_understanding_prompt(description)
+    messages = [{"role": "user", "content": [image, prompt]}]
+    raw = model.chat(image=None, msgs=messages, tokenizer=tokenizer)
+    if isinstance(raw, tuple):
+        raw = raw[0]
+    payload = parse_json_object(str(raw))
+    return ObjectUnderstanding.model_validate(payload)
+def _load_minicpm_components(model_id: str) -> tuple[Any, Any]:
+    global _MINICPM_MODEL, _MINICPM_TOKENIZER, _MINICPM_MODEL_ID
+    if _MINICPM_MODEL is not None and _MINICPM_TOKENIZER is not None and _MINICPM_MODEL_ID == model_id:
+        return _MINICPM_MODEL, _MINICPM_TOKENIZER
+    import torch
+    from transformers import AutoModel, AutoTokenizer
+    model_kwargs: dict[str, Any] = {
+        "trust_remote_code": True,
+        "torch_dtype": torch.bfloat16,
+    }
+    try:
+        model_kwargs["attn_implementation"] = "sdpa"
+        model = AutoModel.from_pretrained(model_id, **model_kwargs)
+    except TypeError:
+        model_kwargs.pop("attn_implementation", None)
+        model = AutoModel.from_pretrained(model_id, **model_kwargs)
+    if torch.cuda.is_available():
+        model = model.eval().cuda()
+    elif getattr(torch.backends, "mps", None) and torch.backends.mps.is_available():
+        model = model.eval().to(device="mps", dtype=torch.float16)
+    else:
+        model = model.eval()
+    tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+    _MINICPM_MODEL = model
+    _MINICPM_TOKENIZER = tokenizer
+    _MINICPM_MODEL_ID = model_id
+    return model, tokenizer
+def _load_rgb_image(image_path: str) -> Any:
+    from PIL import Image
+    return Image.open(image_path).convert("RGB")
+def _object_understanding_prompt(description: str) -> str:
+    context = description.strip() or "No user description was provided."
+    return (
+        "You are the vision module for Objectverse Diary. Inspect the uploaded everyday object photo. "
+        "Return only valid JSON with exactly this shape: "
+        '{"object":{"name":"short object name","visible_features":["feature 1","feature 2","feature 3"],'
+        '"likely_context":"where this object probably is","confidence":0.0}}. '
+        "Use 3 to 5 concrete visible_features. confidence must be a number from 0 to 1. "
+        f"Optional user context: {context}"
+    )
+def _log_vision_fallback(backend: str, exc: Exception) -> None:
+    print(
+        f"[Objectverse Diary] Vision backend '{backend}' fell back to mock: {type(exc).__name__}",
+        flush=True,
+    )
 def _infer_object_name(description: str, image_path: str | None) -> str:
     lowered = description.lower()
     for keyword, name in KNOWN_OBJECTS.items():

src/pipeline.py CHANGED Viewed

@@ -5,10 +5,15 @@ from __future__ import annotations
 from datetime import datetime
 from pathlib import Path
-from src.config import TRACE_DIR
-from src.models.llama_cpp_runner import generate_diary, generate_persona
 from src.models.schema import GenerationResult
-from src.models.vision_runner import understand_object
 from src.traces.logger import build_trace, save_trace
@@ -22,9 +27,13 @@ def generate_object_diary(
     trace_id: str | None = None,
     created_at: datetime | None = None,
 ) -> GenerationResult:
-    object_understanding = understand_object(image_path, description)
     persona = generate_persona(object_understanding, mode)
     diary = generate_diary(persona, mode)
     trace = build_trace(
         image_path=image_path,
         description=description,
@@ -34,6 +43,13 @@ def generate_object_diary(
         diary=diary,
         trace_id=trace_id,
         created_at=created_at,
     )
     trace_path = save_trace(trace, trace_dir) if save else ""
@@ -48,3 +64,25 @@ def generate_object_diary(
 def format_diary_markdown(title: str, english: str, chinese: str) -> str:
     return f"## {title}\n\n{english}\n\n---\n\n**中文辅助**\n\n{chinese}"

 from datetime import datetime
 from pathlib import Path
+from src.config import TRACE_DIR, get_runtime_settings, runtime_status
+from src.models.llama_cpp_runner import (
+    generate_diary,
+    generate_persona,
+    get_text_runtime_fallbacks,
+    reset_text_runtime_fallbacks,
+)
 from src.models.schema import GenerationResult
+from src.models.vision_runner import VisionRunResult, understand_object_with_metadata
 from src.traces.logger import build_trace, save_trace
     trace_id: str | None = None,
     created_at: datetime | None = None,
 ) -> GenerationResult:
+    settings = get_runtime_settings()
+    vision_result = understand_object_with_metadata(image_path, description, settings=settings)
+    object_understanding = vision_result.object_understanding
+    reset_text_runtime_fallbacks()
     persona = generate_persona(object_understanding, mode)
     diary = generate_diary(persona, mode)
+    text_fallbacks = get_text_runtime_fallbacks()
     trace = build_trace(
         image_path=image_path,
         description=description,
         diary=diary,
         trace_id=trace_id,
         created_at=created_at,
+        model_runtime=runtime_status(settings),
+        fallbacks=_runtime_fallbacks(
+            settings.vision_backend,
+            settings.text_backend,
+            vision_result,
+            text_fallbacks,
+        ),
     )
     trace_path = save_trace(trace, trace_dir) if save else ""
 def format_diary_markdown(title: str, english: str, chinese: str) -> str:
     return f"## {title}\n\n{english}\n\n---\n\n**中文辅助**\n\n{chinese}"
+def _runtime_fallbacks(
+    vision_backend: str,
+    text_backend: str,
+    vision_result: VisionRunResult,
+    text_fallbacks: list[str] | None = None,
+) -> list[str]:
+    clean_vision_backend = vision_backend.strip().lower()
+    clean_text_backend = text_backend.strip().lower()
+    if clean_vision_backend == "mock" and clean_text_backend == "mock":
+        return ["mock-runtime"]
+    fallbacks = list(vision_result.fallbacks)
+    for marker in text_fallbacks or []:
+        if marker not in fallbacks:
+            fallbacks.append(marker)
+    if clean_vision_backend == "mock":
+        fallbacks.append("mock-vision-runtime")
+    if clean_text_backend == "mock":
+        fallbacks.append("mock-text-runtime")
+    return fallbacks

src/prompts/diary_generation.py CHANGED Viewed

@@ -1,6 +1,32 @@
-"""Prompt placeholder for future secret diary generation."""
 DIARY_GENERATION_PROMPT = """
-Write a short secret diary entry in English first, with Chinese helper translation.
-Keep the object persona consistent.
 """.strip()

+"""Prompt templates for diary and chat generation."""
 DIARY_GENERATION_PROMPT = """
+Write a short secret diary entry for the object persona. Return only valid JSON
+with exactly this shape:
+{
+  "title": "Secret Diary - Day N",
+  "english": "one vivid English-first diary paragraph",
+  "chinese": "short Chinese helper translation"
+}
+Rules:
+- Keep the persona consistent with the supplied persona JSON.
+- Keep the English diary under 120 words.
+- The Chinese text is secondary helper copy, not the primary UI language.
+- Do not include markdown, commentary, or extra keys.
+""".strip()
+CHAT_REPLY_PROMPT = """
+Reply as the object persona to the user's message. Return only valid JSON with
+exactly this shape:
+{
+  "reply": "one short in-character chat reply"
+}
+Rules:
+- Stay consistent with the persona JSON.
+- Keep the reply under 70 words.
+- Do not include markdown, commentary, or extra keys.
 """.strip()

src/prompts/persona_generation.py CHANGED Viewed

@@ -1,7 +1,27 @@
-"""Prompt placeholder for future persona generation."""
 PERSONA_GENERATION_PROMPT = """
-Create a hidden first-person object persona with name, mood, backstory,
-complaint, secret fear, core memory, and exactly three tags.
-Return structured JSON only.
 """.strip()

+"""Prompt templates for persona generation."""
 PERSONA_GENERATION_PROMPT = """
+You are the text runtime for Objectverse Diary, a strange archive of everyday
+objects with secret lives.
+Create a hidden first-person object persona from the object understanding JSON
+and personality mode. Return only valid JSON with exactly this shape:
+{
+  "persona": {
+    "object_name": "short object name",
+    "character_name": "archive character name",
+    "mood": "short mood phrase",
+    "secret_fear": "one vivid fear",
+    "core_memory": "one sentence backstory",
+    "complaint": "one sentence complaint in the object's voice",
+    "tags": ["tag one", "tag two", "tag three"]
+  }
+}
+Rules:
+- Keep the persona consistent with the visible object features.
+- Use English output.
+- Use exactly three tags.
+- Do not include markdown, commentary, or extra keys.
 """.strip()

src/traces/logger.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""Trace builder and saver for mock MVP runs."""
 from __future__ import annotations
@@ -7,7 +7,7 @@ from datetime import datetime, timezone
 from pathlib import Path
 from uuid import uuid4
-from src.config import MODEL_RUNTIME_STATUS, TRACE_DIR
 from src.models.schema import DiaryEntry, ObjectUnderstanding, PersonaEnvelope, TraceRecord
 from src.traces.anonymizer import anonymize_text
@@ -21,6 +21,8 @@ def build_trace(
     diary: DiaryEntry,
     trace_id: str | None = None,
     created_at: datetime | None = None,
 ) -> TraceRecord:
     return TraceRecord(
         trace_id=trace_id or uuid4().hex,
@@ -34,8 +36,8 @@ def build_trace(
         object_understanding=object_understanding,
         persona=persona,
         diary=diary,
-        model_runtime=MODEL_RUNTIME_STATUS,
-        fallbacks=["mock-runtime"],
     )

+"""Trace builder and saver for generation runs."""
 from __future__ import annotations
 from pathlib import Path
 from uuid import uuid4
+from src.config import TRACE_DIR, get_runtime_settings, runtime_status
 from src.models.schema import DiaryEntry, ObjectUnderstanding, PersonaEnvelope, TraceRecord
 from src.traces.anonymizer import anonymize_text
     diary: DiaryEntry,
     trace_id: str | None = None,
     created_at: datetime | None = None,
+    model_runtime: dict[str, str] | None = None,
+    fallbacks: list[str] | None = None,
 ) -> TraceRecord:
     return TraceRecord(
         trace_id=trace_id or uuid4().hex,
         object_understanding=object_understanding,
         persona=persona,
         diary=diary,
+        model_runtime=model_runtime or runtime_status(get_runtime_settings()),
+        fallbacks=fallbacks if fallbacks is not None else ["mock-runtime"],
     )

src/ui/layout.py CHANGED Viewed

@@ -15,6 +15,7 @@ from src.models.schema import GenerationResult
 from src.pipeline import format_diary_markdown, generate_object_diary
 from src.renderer.share_card import render_share_card
 from src.ui import copy
 CHAT_EMPTY_MESSAGE = "Wake an object first. / 请先唤醒一个物品。"
@@ -234,6 +235,7 @@ def _example_handler(index: int):
     return load_example
 def generate_object_file(
     image_path: str | None,
     description: str,

 from src.pipeline import format_diary_markdown, generate_object_diary
 from src.renderer.share_card import render_share_card
 from src.ui import copy
+from src.utils.zero_gpu import zero_gpu
 CHAT_EMPTY_MESSAGE = "Wake an object first. / 请先唤醒一个物品。"
     return load_example
+@zero_gpu(duration=180)
 def generate_object_file(
     image_path: str | None,
     description: str,

src/utils/json_repair.py CHANGED Viewed

@@ -7,7 +7,24 @@ from typing import Any
 def parse_json_object(raw: str) -> dict[str, Any]:
-    value = json.loads(raw)
     if not isinstance(value, dict):
         raise ValueError("Expected a JSON object.")
     return value

 def parse_json_object(raw: str) -> dict[str, Any]:
+    value = json.loads(_extract_json_object(raw))
     if not isinstance(value, dict):
         raise ValueError("Expected a JSON object.")
     return value
+def _extract_json_object(raw: str) -> str:
+    clean = raw.strip()
+    if clean.startswith("```"):
+        clean = clean.strip("`").strip()
+        if clean.lower().startswith("json"):
+            clean = clean[4:].strip()
+    if clean.startswith("{") and clean.endswith("}"):
+        return clean
+    start = clean.find("{")
+    end = clean.rfind("}")
+    if start == -1 or end == -1 or end <= start:
+        raise ValueError("No JSON object found.")
+    return clean[start : end + 1]

src/utils/zero_gpu.py ADDED Viewed

	@@ -0,0 +1,23 @@

+"""Optional Hugging Face ZeroGPU decorator helpers."""
+from __future__ import annotations
+from collections.abc import Callable
+from typing import TypeVar
+F = TypeVar("F", bound=Callable)
+def zero_gpu(duration: int = 180) -> Callable[[F], F]:
+    """Return a ZeroGPU decorator when available, otherwise a no-op decorator."""
+    try:
+        import spaces  # type: ignore[import-not-found]
+    except Exception:
+        return _identity_decorator
+    return spaces.GPU(duration=duration)
+def _identity_decorator(func: F) -> F:
+    return func