Spaces:
Running on Zero
Running on Zero
Add ZeroGPU-compatible validation path
Browse files- README.md +23 -5
- docs/03-dev-schedule.md +23 -13
- docs/07-development-plan.md +29 -15
- docs/DEVELOPMENT_STATUS.md +60 -0
- docs/EXTERNAL_SETUP.md +44 -13
- docs/FAILURES.md +2 -1
- docs/INITIAL_STAGE_REPORT.md +31 -13
- docs/MODEL_CARD.md +8 -8
- docs/README.md +2 -0
- docs/RUNTIME.md +52 -3
- docs/SPACE_VLM_REPORT.md +42 -0
- docs/SUBMISSION_GUIDE.md +19 -2
- pyproject.toml +8 -2
- requirements.txt +6 -0
- scripts/README.md +16 -1
- scripts/check_space_vlm.py +481 -0
- src/README.md +1 -1
- src/config.py +17 -12
- src/models/llama_cpp_runner.py +225 -3
- src/models/vision_runner.py +132 -2
- src/pipeline.py +42 -4
- src/prompts/diary_generation.py +29 -3
- src/prompts/persona_generation.py +24 -4
- src/traces/logger.py +6 -4
- src/ui/layout.py +2 -0
- src/utils/json_repair.py +18 -1
- src/utils/zero_gpu.py +23 -0
README.md
CHANGED
|
@@ -5,7 +5,7 @@ colorFrom: yellow
|
|
| 5 |
colorTo: gray
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.50.0
|
| 8 |
-
python_version: '3.
|
| 9 |
app_file: app.py
|
| 10 |
pinned: false
|
| 11 |
license: mit
|
|
@@ -23,9 +23,13 @@ Upload a photo of any everyday object. The app wakes it up, gives it a secret pe
|
|
| 23 |
|
| 24 |
## Current Status
|
| 25 |
|
| 26 |
-
Initial mock MVP
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Track
|
| 31 |
|
|
@@ -71,6 +75,19 @@ python app.py
|
|
| 71 |
|
| 72 |
Then open the local Gradio URL printed in the terminal.
|
| 73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
## Initial MVP Flow
|
| 75 |
|
| 76 |
The current implementation supports:
|
|
@@ -120,7 +137,7 @@ This creates deterministic mock SFT preview data for schema and curation plannin
|
|
| 120 |
```
|
| 121 |
|
| 122 |
See `docs/INITIAL_STAGE_REPORT.md` for the local initial-stage evidence.
|
| 123 |
-
See `docs/EXTERNAL_SETUP.md` before
|
| 124 |
|
| 125 |
## Project Structure
|
| 126 |
|
|
@@ -128,7 +145,7 @@ See `docs/02-tech-architecture.md`, `AGENTS.md`, and `.codex/skills/` for the in
|
|
| 128 |
|
| 129 |
## Runtime Notes
|
| 130 |
|
| 131 |
-
The
|
| 132 |
|
| 133 |
## HF Space README YAML Header
|
| 134 |
|
|
@@ -139,6 +156,7 @@ emoji: 🗝️
|
|
| 139 |
colorFrom: amber
|
| 140 |
colorTo: gray
|
| 141 |
sdk: gradio
|
|
|
|
| 142 |
app_file: app.py
|
| 143 |
pinned: false
|
| 144 |
---
|
|
|
|
| 5 |
colorTo: gray
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.50.0
|
| 8 |
+
python_version: '3.10'
|
| 9 |
app_file: app.py
|
| 10 |
pinned: false
|
| 11 |
license: mit
|
|
|
|
| 23 |
|
| 24 |
## Current Status
|
| 25 |
|
| 26 |
+
Initial mock MVP, MiniCPM-V vision backend wiring, and optional llama.cpp text runtime wiring are available.
|
| 27 |
|
| 28 |
+
By default, the app still uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the real MiniCPM-V 2.6 vision path. `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured.
|
| 29 |
+
|
| 30 |
+
Hugging Face Space:
|
| 31 |
+
|
| 32 |
+
https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
|
| 33 |
|
| 34 |
## Track
|
| 35 |
|
|
|
|
| 75 |
|
| 76 |
Then open the local Gradio URL printed in the terminal.
|
| 77 |
|
| 78 |
+
## Optional llama.cpp Text Runtime
|
| 79 |
+
|
| 80 |
+
The project does not commit GGUF files or require `llama-cpp-python` by default. To try a local GGUF text model:
|
| 81 |
+
|
| 82 |
+
```bash
|
| 83 |
+
pip install llama-cpp-python
|
| 84 |
+
OBJECTVERSE_TEXT_BACKEND=llama-cpp \
|
| 85 |
+
TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
|
| 86 |
+
python app.py
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
|
| 90 |
+
|
| 91 |
## Initial MVP Flow
|
| 92 |
|
| 93 |
The current implementation supports:
|
|
|
|
| 137 |
```
|
| 138 |
|
| 139 |
See `docs/INITIAL_STAGE_REPORT.md` for the local initial-stage evidence.
|
| 140 |
+
See `docs/EXTERNAL_SETUP.md` before changing remote GitHub or Hugging Face resources.
|
| 141 |
|
| 142 |
## Project Structure
|
| 143 |
|
|
|
|
| 145 |
|
| 146 |
## Runtime Notes
|
| 147 |
|
| 148 |
+
The default runtime is mock-only. MiniCPM-V 2.6 vision and optional llama.cpp text generation can be enabled with environment variables while preserving mock fallbacks. See `docs/RUNTIME.md`.
|
| 149 |
|
| 150 |
## HF Space README YAML Header
|
| 151 |
|
|
|
|
| 156 |
colorFrom: amber
|
| 157 |
colorTo: gray
|
| 158 |
sdk: gradio
|
| 159 |
+
python_version: '3.10'
|
| 160 |
app_file: app.py
|
| 161 |
pinned: false
|
| 162 |
---
|
docs/03-dev-schedule.md
CHANGED
|
@@ -11,8 +11,9 @@
|
|
| 11 |
|
| 12 |
**目标:确定项目不可变范围。**
|
| 13 |
|
| 14 |
-
- [
|
| 15 |
-
- [ ]
|
|
|
|
| 16 |
- [x] 创建基础 Gradio app
|
| 17 |
- [x] 写 README 草稿
|
| 18 |
- [x] 确定英文主界面文案
|
|
@@ -46,14 +47,18 @@
|
|
| 46 |
|
| 47 |
**目标:让 AI 真正看图。**
|
| 48 |
|
| 49 |
-
- [
|
| 50 |
-
- [
|
| 51 |
-
- [
|
| 52 |
-
- [
|
|
|
|
| 53 |
- [ ] 缓存示例输出
|
|
|
|
| 54 |
|
| 55 |
验收:上传杯子/键盘/鞋子,模型能识别物品并提取外观特征。
|
| 56 |
|
|
|
|
|
|
|
| 57 |
---
|
| 58 |
|
| 59 |
## Day 4:文本模型 + llama.cpp
|
|
@@ -61,12 +66,14 @@
|
|
| 61 |
**目标:让核心人格生成走小模型本地推理。**
|
| 62 |
|
| 63 |
- [ ] 下载小模型 GGUF
|
| 64 |
-
- [
|
| 65 |
-
- [
|
| 66 |
-
- [
|
| 67 |
-
- [
|
|
|
|
|
|
|
| 68 |
|
| 69 |
-
交付:`
|
| 70 |
|
| 71 |
---
|
| 72 |
|
|
@@ -144,7 +151,7 @@ Bottom: Share Card + Trace
|
|
| 144 |
- [x] 做英文主文案 + 中文辅助
|
| 145 |
- [x] 做 6 个示例卡片
|
| 146 |
|
| 147 |
-
完成记录:Phase 2 UI 已完成为
|
| 148 |
|
| 149 |
---
|
| 150 |
|
|
@@ -158,7 +165,9 @@ Bottom: Share Card + Trace
|
|
| 158 |
- [x] dataset preview
|
| 159 |
- [x] trace JSONL export
|
| 160 |
- [x] 失败案例记录
|
| 161 |
-
- [
|
|
|
|
|
|
|
| 162 |
|
| 163 |
---
|
| 164 |
|
|
@@ -205,6 +214,7 @@ Bottom: Share Card + Trace
|
|
| 205 |
## Day 11:提交检查
|
| 206 |
|
| 207 |
- [ ] Space under official org
|
|
|
|
| 208 |
- [ ] Demo video ready
|
| 209 |
- [ ] Social post ready
|
| 210 |
- [ ] README complete
|
|
|
|
| 11 |
|
| 12 |
**目标:确定项目不可变范围。**
|
| 13 |
|
| 14 |
+
- [x] 配置 GitHub origin
|
| 15 |
+
- [ ] 确认并同步 GitHub repo
|
| 16 |
+
- [x] 创建 Hugging Face Space
|
| 17 |
- [x] 创建基础 Gradio app
|
| 18 |
- [x] 写 README 草稿
|
| 19 |
- [x] 确定英文主界面文案
|
|
|
|
| 47 |
|
| 48 |
**目标:让 AI 真正看图。**
|
| 49 |
|
| 50 |
+
- [x] 接入 MiniCPM-V 或轻量 VLM
|
| 51 |
+
- [x] 输出 object understanding JSON
|
| 52 |
+
- [x] 做 JSON repair
|
| 53 |
+
- [x] 加 example gallery
|
| 54 |
+
- [x] 新增 Space VLM 验证脚本
|
| 55 |
- [ ] 缓存示例输出
|
| 56 |
+
- [ ] Space 1x L4 真实图片验证(2026-06-06 已尝试,因 HF `402 Payment Required` 阻塞,已回滚 mock-safe)
|
| 57 |
|
| 58 |
验收:上传杯子/键盘/鞋子,模型能识别物品并提取外观特征。
|
| 59 |
|
| 60 |
+
完成记录:MiniCPM-V 2.6 已作为可配置 vision backend 接入,默认仍是 mock vision;`scripts/check_space_vlm.py` 已可用三张临时公开图片验证 Space 端 mug/keyboard/shoe。2026-06-06 已尝试切到 L4,但 Hugging Face 返回 `402 Payment Required`,需要组织 billing/pre-paid credits;随后已执行 mock-safe rollback。文本生成已接入可选 llama.cpp runtime wiring,但最终 GGUF 模型仍未选择/下载。
|
| 61 |
+
|
| 62 |
---
|
| 63 |
|
| 64 |
## Day 4:文本模型 + llama.cpp
|
|
|
|
| 66 |
**目标:让核心人格生成走小模型本地推理。**
|
| 67 |
|
| 68 |
- [ ] 下载小模型 GGUF
|
| 69 |
+
- [x] 接入可选 llama.cpp / llama-cpp-python runtime wiring
|
| 70 |
+
- [x] 封装 `generate_persona()`
|
| 71 |
+
- [x] 封装 `generate_diary()`
|
| 72 |
+
- [x] README 说明运行方式
|
| 73 |
+
- [ ] 用真实 GGUF 做本地 smoke test
|
| 74 |
+
- [ ] README 说明最终模型参数量
|
| 75 |
|
| 76 |
+
交付:`src/models/llama_cpp_runner.py` 已支持 `TEXT_MODEL_PATH`;不提交 `models/text_model.gguf`。后续仍需确定真实 GGUF、参数量和训练/发布路径。
|
| 77 |
|
| 78 |
---
|
| 79 |
|
|
|
|
| 151 |
- [x] 做英文主文案 + 中文辅助
|
| 152 |
- [x] 做 6 个示例卡片
|
| 153 |
|
| 154 |
+
完成记录:Phase 2 UI 已完成为 archive dashboard。MiniCPM-V 2.6 vision backend 和可选 llama.cpp runtime wiring 已接入但默认仍 mock;LoRA 未接入;`UI 参考/` 仅作为本地视觉参考,不入库。
|
| 155 |
|
| 156 |
---
|
| 157 |
|
|
|
|
| 165 |
- [x] dataset preview
|
| 166 |
- [x] trace JSONL export
|
| 167 |
- [x] 失败案例记录
|
| 168 |
+
- [x] Space VLM validation report 模板
|
| 169 |
+
- [ ] 真实模型 traces
|
| 170 |
+
- [ ] GitHub repo 同步整理
|
| 171 |
|
| 172 |
---
|
| 173 |
|
|
|
|
| 214 |
## Day 11:提交检查
|
| 215 |
|
| 216 |
- [ ] Space under official org
|
| 217 |
+
- [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe
|
| 218 |
- [ ] Demo video ready
|
| 219 |
- [ ] Social post ready
|
| 220 |
- [ ] README complete
|
docs/07-development-plan.md
CHANGED
|
@@ -8,7 +8,7 @@ The plan is intentionally staged. Each phase has a clear goal, implementation sc
|
|
| 8 |
|
| 9 |
## Current Baseline
|
| 10 |
|
| 11 |
-
As of 2026-06-
|
| 12 |
|
| 13 |
- initialized project structure
|
| 14 |
- root README and AGENTS instructions
|
|
@@ -30,13 +30,17 @@ As of 2026-06-05, the project has:
|
|
| 30 |
- stdlib unittest smoke tests for the mock MVP
|
| 31 |
- runtime configuration boundary documented in `docs/RUNTIME.md`
|
| 32 |
- initial-stage acceptance script at `scripts/check_initial_stage.py`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
Not yet done:
|
| 35 |
|
| 36 |
-
- GitHub repo
|
| 37 |
-
-
|
| 38 |
-
- real
|
| 39 |
-
- real llama.cpp / llama-cpp-python text runtime
|
| 40 |
- real curated dataset
|
| 41 |
- LoRA fine-tuning
|
| 42 |
- model card completion
|
|
@@ -111,6 +115,8 @@ Verification:
|
|
| 111 |
|
| 112 |
Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
|
| 113 |
|
|
|
|
|
|
|
| 114 |
Scope:
|
| 115 |
|
| 116 |
- Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
|
|
@@ -130,15 +136,18 @@ Verification:
|
|
| 130 |
- Run local sample image checks.
|
| 131 |
- Confirm schema validation.
|
| 132 |
- Confirm fallback trace markers.
|
|
|
|
| 133 |
|
| 134 |
## Phase 4 — Text Runtime With llama.cpp
|
| 135 |
|
| 136 |
Goal: make persona, diary, and chat generation use a small local text model runtime.
|
| 137 |
|
|
|
|
|
|
|
| 138 |
Scope:
|
| 139 |
|
| 140 |
-
- Add llama.cpp / llama-cpp-python runner.
|
| 141 |
-
- Add model path configuration.
|
| 142 |
- Preserve `src/pipeline.py` as the UI-independent generation boundary.
|
| 143 |
- Implement persona generation.
|
| 144 |
- Implement diary generation.
|
|
@@ -148,12 +157,12 @@ Scope:
|
|
| 148 |
Exit criteria:
|
| 149 |
|
| 150 |
- Text generation can run through llama.cpp or documented local fallback.
|
| 151 |
-
- README documents model size
|
| 152 |
- Trace records include runtime metadata.
|
| 153 |
|
| 154 |
Verification:
|
| 155 |
|
| 156 |
-
- Local runtime smoke test.
|
| 157 |
- JSON schema validation.
|
| 158 |
- Compare at least three object generations for persona consistency.
|
| 159 |
|
|
@@ -161,6 +170,8 @@ Verification:
|
|
| 161 |
|
| 162 |
Goal: prepare Well-Tuned badge evidence.
|
| 163 |
|
|
|
|
|
|
|
| 164 |
Scope:
|
| 165 |
|
| 166 |
- Use `scripts/generate_dataset.py` to validate the SFT schema locally.
|
|
@@ -237,13 +248,15 @@ Verification:
|
|
| 237 |
|
| 238 |
Goal: deploy the app in the required Gradio format.
|
| 239 |
|
|
|
|
|
|
|
| 240 |
Scope:
|
| 241 |
|
| 242 |
-
- Create Hugging Face Space.
|
| 243 |
-
- Add Space README YAML header.
|
| 244 |
-
- Confirm `app_file: app.py`.
|
| 245 |
-
- Configure model paths and fallback mode.
|
| 246 |
-
- Check runtime resource constraints.
|
| 247 |
|
| 248 |
Exit criteria:
|
| 249 |
|
|
@@ -253,8 +266,9 @@ Exit criteria:
|
|
| 253 |
|
| 254 |
Verification:
|
| 255 |
|
| 256 |
-
- Launch on HF Space.
|
| 257 |
- Run demo flow in hosted environment.
|
|
|
|
| 258 |
- Check logs for missing secrets or path errors.
|
| 259 |
|
| 260 |
## Phase 9 — Field Notes And Demo Video
|
|
|
|
| 8 |
|
| 9 |
## Current Baseline
|
| 10 |
|
| 11 |
+
As of 2026-06-06, the project has:
|
| 12 |
|
| 13 |
- initialized project structure
|
| 14 |
- root README and AGENTS instructions
|
|
|
|
| 30 |
- stdlib unittest smoke tests for the mock MVP
|
| 31 |
- runtime configuration boundary documented in `docs/RUNTIME.md`
|
| 32 |
- initial-stage acceptance script at `scripts/check_initial_stage.py`
|
| 33 |
+
- Hugging Face Space created at `build-small-hackathon/ObjectverseDiary`
|
| 34 |
+
- optional MiniCPM-V 2.6 vision backend wiring with mock fallback
|
| 35 |
+
- optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`
|
| 36 |
+
- hosted Space VLM validation tooling in `scripts/check_space_vlm.py`
|
| 37 |
+
- pending Space VLM report template in `docs/SPACE_VLM_REPORT.md`
|
| 38 |
|
| 39 |
Not yet done:
|
| 40 |
|
| 41 |
+
- GitHub repo sync / public submission confirmation
|
| 42 |
+
- hosted Space L4 MiniCPM-V validation with real public images
|
| 43 |
+
- real GGUF selection and local `TEXT_MODEL_PATH` smoke test
|
|
|
|
| 44 |
- real curated dataset
|
| 45 |
- LoRA fine-tuning
|
| 46 |
- model card completion
|
|
|
|
| 115 |
|
| 116 |
Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
|
| 117 |
|
| 118 |
+
Status: local wiring complete; hosted GPU validation pending.
|
| 119 |
+
|
| 120 |
Scope:
|
| 121 |
|
| 122 |
- Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
|
|
|
|
| 136 |
- Run local sample image checks.
|
| 137 |
- Confirm schema validation.
|
| 138 |
- Confirm fallback trace markers.
|
| 139 |
+
- Run `scripts/check_space_vlm.py --configure-space` after external-state confirmation.
|
| 140 |
|
| 141 |
## Phase 4 — Text Runtime With llama.cpp
|
| 142 |
|
| 143 |
Goal: make persona, diary, and chat generation use a small local text model runtime.
|
| 144 |
|
| 145 |
+
Status: optional runtime wiring complete; real GGUF smoke test pending.
|
| 146 |
+
|
| 147 |
Scope:
|
| 148 |
|
| 149 |
+
- Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
|
| 150 |
+
- Add model path configuration. Completed through `TEXT_MODEL_PATH`.
|
| 151 |
- Preserve `src/pipeline.py` as the UI-independent generation boundary.
|
| 152 |
- Implement persona generation.
|
| 153 |
- Implement diary generation.
|
|
|
|
| 157 |
Exit criteria:
|
| 158 |
|
| 159 |
- Text generation can run through llama.cpp or documented local fallback.
|
| 160 |
+
- README documents runtime path. Final model size remains pending until GGUF selection.
|
| 161 |
- Trace records include runtime metadata.
|
| 162 |
|
| 163 |
Verification:
|
| 164 |
|
| 165 |
+
- Local runtime smoke test with a real GGUF.
|
| 166 |
- JSON schema validation.
|
| 167 |
- Compare at least three object generations for persona consistency.
|
| 168 |
|
|
|
|
| 170 |
|
| 171 |
Goal: prepare Well-Tuned badge evidence.
|
| 172 |
|
| 173 |
+
Status: mock SFT preview complete; real candidate generation waits for verified model paths.
|
| 174 |
+
|
| 175 |
Scope:
|
| 176 |
|
| 177 |
- Use `scripts/generate_dataset.py` to validate the SFT schema locally.
|
|
|
|
| 248 |
|
| 249 |
Goal: deploy the app in the required Gradio format.
|
| 250 |
|
| 251 |
+
Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.
|
| 252 |
+
|
| 253 |
Scope:
|
| 254 |
|
| 255 |
+
- Create Hugging Face Space. Completed.
|
| 256 |
+
- Add Space README YAML header. Completed.
|
| 257 |
+
- Confirm `app_file: app.py`. Completed.
|
| 258 |
+
- Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
|
| 259 |
+
- Check runtime resource constraints. Pending L4 validation.
|
| 260 |
|
| 261 |
Exit criteria:
|
| 262 |
|
|
|
|
| 266 |
|
| 267 |
Verification:
|
| 268 |
|
| 269 |
+
- Launch on HF Space. Completed for mock-safe runtime.
|
| 270 |
- Run demo flow in hosted environment.
|
| 271 |
+
- Run Space VLM validation for mug, keyboard, and shoe.
|
| 272 |
- Check logs for missing secrets or path errors.
|
| 273 |
|
| 274 |
## Phase 9 — Field Notes And Demo Video
|
docs/DEVELOPMENT_STATUS.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Development Status
|
| 2 |
+
|
| 3 |
+
Last updated: 2026-06-06
|
| 4 |
+
|
| 5 |
+
## Completed
|
| 6 |
+
|
| 7 |
+
- Project skeleton, README, AGENTS instructions, and Gradio app entrypoint.
|
| 8 |
+
- Mock MVP flow: upload/description, personality mode, object JSON, persona JSON, diary, object chat, share card, and trace saving.
|
| 9 |
+
- Archive-style Gradio UI with English-first / Chinese-second copy and six stable examples.
|
| 10 |
+
- Trace and dataset tooling:
|
| 11 |
+
- six public mock sample traces
|
| 12 |
+
- public trace JSONL export
|
| 13 |
+
- deterministic SFT preview JSONL
|
| 14 |
+
- initial-stage acceptance script
|
| 15 |
+
- Hugging Face Space created: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
|
| 16 |
+
- MiniCPM-V 2.6 optional vision backend wiring with mock fallback.
|
| 17 |
+
- Optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`, with mock fallback.
|
| 18 |
+
- Space VLM validation tooling:
|
| 19 |
+
- `scripts/check_space_vlm.py`
|
| 20 |
+
- failed L4 validation report at `docs/SPACE_VLM_REPORT.md`
|
| 21 |
+
- Local tests and initial acceptance currently pass.
|
| 22 |
+
|
| 23 |
+
## Not Completed
|
| 24 |
+
|
| 25 |
+
- Hosted Space 1x L4 MiniCPM-V validation with real public mug/keyboard/shoe images. Attempted on 2026-06-06 and blocked by Hugging Face `402 Payment Required` for paid hardware; mock-safe rollback was applied.
|
| 26 |
+
- Stable example output caching for real VLM demos.
|
| 27 |
+
- Real GGUF model selection, download/configuration outside Git, and `TEXT_MODEL_PATH` smoke test.
|
| 28 |
+
- Final text model parameter count documentation.
|
| 29 |
+
- Real model traces and curated object-persona dataset.
|
| 30 |
+
- LoRA training, adapter/model export, GGUF conversion, and Hugging Face model publishing.
|
| 31 |
+
- Hugging Face dataset publishing.
|
| 32 |
+
- GitHub sync / final public repository confirmation.
|
| 33 |
+
- Field Notes article, demo video, social post, and final submission package.
|
| 34 |
+
|
| 35 |
+
## Current Safe Defaults
|
| 36 |
+
|
| 37 |
+
- `OBJECTVERSE_VISION_BACKEND=mock`
|
| 38 |
+
- `OBJECTVERSE_TEXT_BACKEND=mock`
|
| 39 |
+
- No commercial model API is used.
|
| 40 |
+
- GGUF files, tokens, credentials, and private images should not be committed.
|
| 41 |
+
|
| 42 |
+
## Next Recommended Gate
|
| 43 |
+
|
| 44 |
+
Unblock Hugging Face paid hardware access or choose another available GPU option, then rerun the hosted Space VLM validation:
|
| 45 |
+
|
| 46 |
+
```bash
|
| 47 |
+
.venv/bin/python -B scripts/check_space_vlm.py \
|
| 48 |
+
--configure-space \
|
| 49 |
+
--space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
|
| 50 |
+
--output docs/SPACE_VLM_REPORT.md
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
If Space validation fails or GPU is unavailable, roll back to mock-safe settings:
|
| 54 |
+
|
| 55 |
+
```bash
|
| 56 |
+
.venv/bin/python -B scripts/check_space_vlm.py \
|
| 57 |
+
--space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
|
| 58 |
+
--skip-validation \
|
| 59 |
+
--rollback-to-mock
|
| 60 |
+
```
|
docs/EXTERNAL_SETUP.md
CHANGED
|
@@ -8,16 +8,18 @@ These actions change external account state and should only be run after explici
|
|
| 8 |
|
| 9 |
## GitHub Repository
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
```text
|
| 14 |
-
|
| 15 |
```
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
| 18 |
|
| 19 |
```text
|
| 20 |
-
|
| 21 |
```
|
| 22 |
|
| 23 |
Suggested description:
|
|
@@ -26,7 +28,7 @@ Suggested description:
|
|
| 26 |
Small-model AI toy that turns everyday objects into secret diary characters.
|
| 27 |
```
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
```bash
|
| 32 |
gh repo create objectverse-diary --public --description "Small-model AI toy that turns everyday objects into secret diary characters." --source . --remote origin
|
|
@@ -36,13 +38,13 @@ Do not push until the user confirms the remote target and branch.
|
|
| 36 |
|
| 37 |
## Hugging Face Space
|
| 38 |
|
| 39 |
-
|
| 40 |
|
| 41 |
```text
|
| 42 |
-
|
| 43 |
```
|
| 44 |
|
| 45 |
-
|
| 46 |
|
| 47 |
```text
|
| 48 |
gradio
|
|
@@ -57,17 +59,46 @@ emoji: 🗝️
|
|
| 57 |
colorFrom: amber
|
| 58 |
colorTo: gray
|
| 59 |
sdk: gradio
|
|
|
|
| 60 |
app_file: app.py
|
| 61 |
pinned: false
|
| 62 |
---
|
| 63 |
```
|
| 64 |
|
| 65 |
-
Recommended
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
-
|
| 68 |
-
-
|
| 69 |
-
-
|
| 70 |
-
-
|
| 71 |
|
| 72 |
## Safety Notes
|
| 73 |
|
|
|
|
| 8 |
|
| 9 |
## GitHub Repository
|
| 10 |
|
| 11 |
+
Local `origin` is already configured:
|
| 12 |
|
| 13 |
```text
|
| 14 |
+
https://github.com/qqyule/Objectverse-Diary.git
|
| 15 |
```
|
| 16 |
|
| 17 |
+
Use this section to confirm the remote target and branch before pushing. Do not create a second repository unless the target changes.
|
| 18 |
+
|
| 19 |
+
Originally suggested repository name:
|
| 20 |
|
| 21 |
```text
|
| 22 |
+
objectverse-diary
|
| 23 |
```
|
| 24 |
|
| 25 |
Suggested description:
|
|
|
|
| 28 |
Small-model AI toy that turns everyday objects into secret diary characters.
|
| 29 |
```
|
| 30 |
|
| 31 |
+
If a new repository is ever needed after confirmation:
|
| 32 |
|
| 33 |
```bash
|
| 34 |
gh repo create objectverse-diary --public --description "Small-model AI toy that turns everyday objects into secret diary characters." --source . --remote origin
|
|
|
|
| 38 |
|
| 39 |
## Hugging Face Space
|
| 40 |
|
| 41 |
+
Created Space:
|
| 42 |
|
| 43 |
```text
|
| 44 |
+
https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
|
| 45 |
```
|
| 46 |
|
| 47 |
+
SDK:
|
| 48 |
|
| 49 |
```text
|
| 50 |
gradio
|
|
|
|
| 59 |
colorFrom: amber
|
| 60 |
colorTo: gray
|
| 61 |
sdk: gradio
|
| 62 |
+
python_version: '3.10'
|
| 63 |
app_file: app.py
|
| 64 |
pinned: false
|
| 65 |
---
|
| 66 |
```
|
| 67 |
|
| 68 |
+
Recommended runtime setup:
|
| 69 |
+
|
| 70 |
+
- set `OBJECTVERSE_VISION_BACKEND=minicpm-v`
|
| 71 |
+
- set `VISION_MODEL_ID=openbmb/MiniCPM-V-2_6`
|
| 72 |
+
- set `OBJECTVERSE_TEXT_BACKEND=mock`
|
| 73 |
+
- use 1x Nvidia L4 for MiniCPM-V 2.6
|
| 74 |
+
- switch vision backend back to `mock` if GPU is unavailable
|
| 75 |
+
|
| 76 |
+
Automated validation command after confirmation:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
.venv/bin/python -B scripts/check_space_vlm.py \
|
| 80 |
+
--configure-space \
|
| 81 |
+
--space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
|
| 82 |
+
--output docs/SPACE_VLM_REPORT.md
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
Optional rollback to mock-safe settings:
|
| 86 |
+
|
| 87 |
+
```bash
|
| 88 |
+
.venv/bin/python -B scripts/check_space_vlm.py \
|
| 89 |
+
--space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
|
| 90 |
+
--skip-validation \
|
| 91 |
+
--rollback-to-mock
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
The validation script must not print Hugging Face tokens. It uses three temporary public Wikimedia Commons images and does not commit downloaded assets.
|
| 95 |
+
|
| 96 |
+
2026-06-06 validation attempt:
|
| 97 |
|
| 98 |
+
- `--configure-space` was run for `l4x1`.
|
| 99 |
+
- Hugging Face returned `402 Payment Required` for paid hardware on the `build-small-hackathon` organization.
|
| 100 |
+
- Mock-safe rollback was run afterward.
|
| 101 |
+
- Next unblock step: enable billing/pre-paid credits or choose an available free GPU option before rerunning validation.
|
| 102 |
|
| 103 |
## Safety Notes
|
| 104 |
|
docs/FAILURES.md
CHANGED
|
@@ -8,7 +8,7 @@ Use it for model/runtime/deployment/data issues, not for UI polish notes.
|
|
| 8 |
|
| 9 |
## Current Status
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
Known non-blocking warning:
|
| 14 |
|
|
@@ -43,6 +43,7 @@ Fallback:
|
|
| 43 |
- use manual object description
|
| 44 |
- use stable example flow
|
| 45 |
- record fallback marker in trace
|
|
|
|
| 46 |
|
| 47 |
### Text Runtime
|
| 48 |
|
|
|
|
| 8 |
|
| 9 |
## Current Status
|
| 10 |
|
| 11 |
+
MiniCPM-V 2.6 is wired as an optional vision backend. No hosted Space GPU failures have been observed yet because Space GPU validation is still pending.
|
| 12 |
|
| 13 |
Known non-blocking warning:
|
| 14 |
|
|
|
|
| 43 |
- use manual object description
|
| 44 |
- use stable example flow
|
| 45 |
- record fallback marker in trace
|
| 46 |
+
- `vision-fallback-to-mock` means MiniCPM-V failed or returned invalid JSON and mock object understanding was used.
|
| 47 |
|
| 48 |
### Text Runtime
|
| 49 |
|
docs/INITIAL_STAGE_REPORT.md
CHANGED
|
@@ -19,15 +19,29 @@ Included:
|
|
| 19 |
- runtime configuration boundary
|
| 20 |
- local acceptance checks
|
| 21 |
|
| 22 |
-
Not included:
|
| 23 |
|
| 24 |
- creating the remote GitHub repository
|
| 25 |
-
-
|
| 26 |
-
- real
|
| 27 |
-
- real llama.cpp / llama-cpp-python text runtime
|
| 28 |
- fine-tuning, dataset publishing, Field Notes, and demo video
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Local Deliverables
|
| 33 |
|
|
@@ -35,8 +49,8 @@ Remote GitHub and Hugging Face actions require explicit confirmation because the
|
|
| 35 |
| --- | --- |
|
| 36 |
| Gradio app entrypoint | `app.py` |
|
| 37 |
| Shared generation pipeline | `src/pipeline.py` |
|
| 38 |
-
|
|
| 39 |
-
|
|
| 40 |
| Pydantic schemas | `src/models/schema.py` |
|
| 41 |
| Share card renderer | `src/renderer/share_card.py` |
|
| 42 |
| Trace logger | `src/traces/logger.py` |
|
|
@@ -45,6 +59,7 @@ Remote GitHub and Hugging Face actions require explicit confirmation because the
|
|
| 45 |
| Public mock traces | `data/traces/samples/` |
|
| 46 |
| SFT preview generator | `scripts/generate_dataset.py` |
|
| 47 |
| Public trace JSONL exporter | `scripts/export_traces.py` |
|
|
|
|
| 48 |
| Dataset plan | `docs/DATASET.md` |
|
| 49 |
| Failure notes | `docs/FAILURES.md` |
|
| 50 |
| Runtime boundary docs | `docs/RUNTIME.md` |
|
|
@@ -85,16 +100,19 @@ OK
|
|
| 85 |
|
| 86 |
## Current Limitations
|
| 87 |
|
| 88 |
-
- The app still uses mock model outputs.
|
| 89 |
-
-
|
|
|
|
|
|
|
| 90 |
- Sample traces are mock traces, not real model traces.
|
| 91 |
-
-
|
| 92 |
|
| 93 |
## Next Gate
|
| 94 |
|
| 95 |
-
|
| 96 |
|
| 97 |
-
-
|
| 98 |
-
-
|
|
|
|
| 99 |
|
| 100 |
See `docs/EXTERNAL_SETUP.md`.
|
|
|
|
| 19 |
- runtime configuration boundary
|
| 20 |
- local acceptance checks
|
| 21 |
|
| 22 |
+
Not included in the original initial-stage gate:
|
| 23 |
|
| 24 |
- creating the remote GitHub repository
|
| 25 |
+
- hosted GPU validation for the MiniCPM-V integration
|
| 26 |
+
- real GGUF smoke test for llama.cpp / llama-cpp-python text runtime
|
|
|
|
| 27 |
- fine-tuning, dataset publishing, Field Notes, and demo video
|
| 28 |
|
| 29 |
+
The Hugging Face Space has been created at:
|
| 30 |
+
|
| 31 |
+
https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
|
| 32 |
+
|
| 33 |
+
Remote GitHub actions still require explicit confirmation because they change external state.
|
| 34 |
+
|
| 35 |
+
## Post-Initial Updates
|
| 36 |
+
|
| 37 |
+
As of 2026-06-06:
|
| 38 |
+
|
| 39 |
+
- MiniCPM-V 2.6 is wired as an optional vision backend with mock fallback.
|
| 40 |
+
- Optional llama.cpp / llama-cpp-python text runtime wiring is available through `TEXT_MODEL_PATH`, with mock fallback.
|
| 41 |
+
- `scripts/check_space_vlm.py` can validate the hosted Space with three temporary public images for mug, keyboard, and shoe.
|
| 42 |
+
- `docs/SPACE_VLM_REPORT.md` exists as the pending remote validation report.
|
| 43 |
+
- Hosted Space L4 validation has not been run yet.
|
| 44 |
+
- No final GGUF text model has been selected, downloaded, or committed.
|
| 45 |
|
| 46 |
## Local Deliverables
|
| 47 |
|
|
|
|
| 49 |
| --- | --- |
|
| 50 |
| Gradio app entrypoint | `app.py` |
|
| 51 |
| Shared generation pipeline | `src/pipeline.py` |
|
| 52 |
+
| Vision runner with mock / MiniCPM-V backend | `src/models/vision_runner.py` |
|
| 53 |
+
| Text runner with mock / optional llama.cpp backend | `src/models/llama_cpp_runner.py` |
|
| 54 |
| Pydantic schemas | `src/models/schema.py` |
|
| 55 |
| Share card renderer | `src/renderer/share_card.py` |
|
| 56 |
| Trace logger | `src/traces/logger.py` |
|
|
|
|
| 59 |
| Public mock traces | `data/traces/samples/` |
|
| 60 |
| SFT preview generator | `scripts/generate_dataset.py` |
|
| 61 |
| Public trace JSONL exporter | `scripts/export_traces.py` |
|
| 62 |
+
| Hosted Space VLM validator | `scripts/check_space_vlm.py` |
|
| 63 |
| Dataset plan | `docs/DATASET.md` |
|
| 64 |
| Failure notes | `docs/FAILURES.md` |
|
| 65 |
| Runtime boundary docs | `docs/RUNTIME.md` |
|
|
|
|
| 100 |
|
| 101 |
## Current Limitations
|
| 102 |
|
| 103 |
+
- The default app still uses mock model outputs.
|
| 104 |
+
- MiniCPM-V 2.6 vision wiring is available behind `OBJECTVERSE_VISION_BACKEND=minicpm-v`, but hosted GPU validation is still pending.
|
| 105 |
+
- llama.cpp text wiring is available behind `OBJECTVERSE_TEXT_BACKEND=llama-cpp`, but no real GGUF smoke test has been run.
|
| 106 |
+
- Phase 2 UI polish is complete.
|
| 107 |
- Sample traces are mock traces, not real model traces.
|
| 108 |
+
- GitHub origin is configured locally, but sync/submission confirmation is still pending.
|
| 109 |
|
| 110 |
## Next Gate
|
| 111 |
|
| 112 |
+
Next model gate:
|
| 113 |
|
| 114 |
+
- verify MiniCPM-V 2.6 on the Hugging Face Space GPU
|
| 115 |
+
- run a real GGUF `TEXT_MODEL_PATH` smoke test
|
| 116 |
+
- confirm GitHub sync / submission target
|
| 117 |
|
| 118 |
See `docs/EXTERNAL_SETUP.md`.
|
docs/MODEL_CARD.md
CHANGED
|
@@ -2,9 +2,9 @@
|
|
| 2 |
|
| 3 |
## Status
|
| 4 |
|
| 5 |
-
Draft only. No model has been fine-tuned, converted, or published yet.
|
| 6 |
|
| 7 |
-
The app
|
| 8 |
|
| 9 |
## Planned Components
|
| 10 |
|
|
@@ -16,9 +16,9 @@ The app currently runs deterministic mock backends. This card is a working templ
|
|
| 16 |
|
| 17 |
| Component | Candidate | Notes |
|
| 18 |
| --- | --- | --- |
|
| 19 |
-
| Vision | MiniCPM-V or
|
| 20 |
-
| Text | small instruct model plus LoRA adapter | Final base model still pending. |
|
| 21 |
-
| Runtime | GGUF through llama.cpp / llama-cpp-python |
|
| 22 |
| UI | Gradio Blocks | Required by the hackathon and project rules. |
|
| 23 |
|
| 24 |
## Parameter Budget
|
|
@@ -29,8 +29,8 @@ Record final numbers here before submission:
|
|
| 29 |
|
| 30 |
| Component | Model | Parameters | Counted Toward Total |
|
| 31 |
| --- | --- | ---: | --- |
|
| 32 |
-
| Vision |
|
| 33 |
-
| Text base | TBD | TBD | yes |
|
| 34 |
| LoRA adapter | TBD | TBD | yes |
|
| 35 |
| Total | TBD | TBD | must be <= 32B |
|
| 36 |
|
|
@@ -67,7 +67,7 @@ Current preview data is deterministic and mock-generated. It should only be used
|
|
| 67 |
## Fallback Behavior
|
| 68 |
|
| 69 |
- If VLM loading fails, use manual description and stable example flow.
|
| 70 |
-
- If llama.cpp loading fails, keep deterministic mock text fallback for demo safety.
|
| 71 |
- If model JSON is invalid, repair and validate before rendering.
|
| 72 |
|
| 73 |
## Required Notes
|
|
|
|
| 2 |
|
| 3 |
## Status
|
| 4 |
|
| 5 |
+
Draft only. No text model has been fine-tuned, converted, or published yet.
|
| 6 |
|
| 7 |
+
The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`.
|
| 8 |
|
| 9 |
## Planned Components
|
| 10 |
|
|
|
|
| 16 |
|
| 17 |
| Component | Candidate | Notes |
|
| 18 |
| --- | --- | --- |
|
| 19 |
+
| Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Must run without commercial API calls. |
|
| 20 |
+
| Text | externally configured GGUF, later small instruct model plus LoRA adapter | Final base model still pending. |
|
| 21 |
+
| Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; real-model smoke test still pending. |
|
| 22 |
| UI | Gradio Blocks | Required by the hackathon and project rules. |
|
| 23 |
|
| 24 |
## Parameter Budget
|
|
|
|
| 29 |
|
| 30 |
| Component | Model | Parameters | Counted Toward Total |
|
| 31 |
| --- | --- | ---: | --- |
|
| 32 |
+
| Vision | MiniCPM-V 2.6 | ~8B | yes |
|
| 33 |
+
| Text base | Externally configured GGUF, final model TBD | TBD | yes |
|
| 34 |
| LoRA adapter | TBD | TBD | yes |
|
| 35 |
| Total | TBD | TBD | must be <= 32B |
|
| 36 |
|
|
|
|
| 67 |
## Fallback Behavior
|
| 68 |
|
| 69 |
- If VLM loading fails, use manual description and stable example flow.
|
| 70 |
+
- If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
|
| 71 |
- If model JSON is invalid, repair and validate before rendering.
|
| 72 |
|
| 73 |
## Required Notes
|
docs/README.md
CHANGED
|
@@ -17,10 +17,12 @@ This folder contains the planning source of truth for Objectverse Diary.
|
|
| 17 |
- `FIELD_NOTES.md`: future technical blog draft.
|
| 18 |
- `MODEL_CARD.md`: future model documentation.
|
| 19 |
- `07-development-plan.md`: detailed development process plan from mock MVP to final submission.
|
|
|
|
| 20 |
- `RUNTIME.md`: current mock runtime configuration and future model boundary.
|
| 21 |
- `DATASET.md`: SFT preview schema, generation workflow, curation checklist, and publishing notes.
|
| 22 |
- `FAILURES.md`: failure record template and anticipated non-UI fallback cases.
|
| 23 |
- `INITIAL_STAGE_REPORT.md`: local initial-stage completion evidence and acceptance commands.
|
| 24 |
- `PHASE2_UI_REPORT.md`: archive UI completion scope, runtime boundary, and verification targets.
|
| 25 |
- `EXTERNAL_SETUP.md`: GitHub and Hugging Face Space setup notes requiring confirmation.
|
|
|
|
| 26 |
- `SUBMISSION_GUIDE.md`: final submission checklist.
|
|
|
|
| 17 |
- `FIELD_NOTES.md`: future technical blog draft.
|
| 18 |
- `MODEL_CARD.md`: future model documentation.
|
| 19 |
- `07-development-plan.md`: detailed development process plan from mock MVP to final submission.
|
| 20 |
+
- `DEVELOPMENT_STATUS.md`: current completed / not completed development status.
|
| 21 |
- `RUNTIME.md`: current mock runtime configuration and future model boundary.
|
| 22 |
- `DATASET.md`: SFT preview schema, generation workflow, curation checklist, and publishing notes.
|
| 23 |
- `FAILURES.md`: failure record template and anticipated non-UI fallback cases.
|
| 24 |
- `INITIAL_STAGE_REPORT.md`: local initial-stage completion evidence and acceptance commands.
|
| 25 |
- `PHASE2_UI_REPORT.md`: archive UI completion scope, runtime boundary, and verification targets.
|
| 26 |
- `EXTERNAL_SETUP.md`: GitHub and Hugging Face Space setup notes requiring confirmation.
|
| 27 |
+
- `SPACE_VLM_REPORT.md`: pending hosted Space MiniCPM-V validation report.
|
| 28 |
- `SUBMISSION_GUIDE.md`: final submission checklist.
|
docs/RUNTIME.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
|
| 3 |
## Current Runtime
|
| 4 |
|
| 5 |
-
The
|
| 6 |
|
| 7 |
- `OBJECTVERSE_VISION_BACKEND=mock`
|
| 8 |
- `OBJECTVERSE_TEXT_BACKEND=mock`
|
|
@@ -15,6 +15,28 @@ This means:
|
|
| 15 |
|
| 16 |
No commercial cloud AI APIs are used.
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
## Environment Variables
|
| 19 |
|
| 20 |
```bash
|
|
@@ -25,6 +47,25 @@ TEXT_MODEL_PATH=
|
|
| 25 |
TRACE_OUTPUT_DIR=data/traces
|
| 26 |
```
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
## Future Runtime Boundary
|
| 29 |
|
| 30 |
The next implementation phase should keep the same pipeline boundary:
|
|
@@ -39,6 +80,14 @@ Do not move model calls into `src/ui/layout.py`.
|
|
| 39 |
## Fallback Rules
|
| 40 |
|
| 41 |
- VLM unavailable: use manual description and mock/example gallery path.
|
| 42 |
-
- llama.cpp unavailable: use mock text generation path.
|
| 43 |
-
- invalid model JSON: repair and validate before rendering.
|
| 44 |
- private input: anonymize trace text before saving public traces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
## Current Runtime
|
| 4 |
|
| 5 |
+
The default MVP runtime uses deterministic mock paths:
|
| 6 |
|
| 7 |
- `OBJECTVERSE_VISION_BACKEND=mock`
|
| 8 |
- `OBJECTVERSE_TEXT_BACKEND=mock`
|
|
|
|
| 15 |
|
| 16 |
No commercial cloud AI APIs are used.
|
| 17 |
|
| 18 |
+
MiniCPM-V 2.6 vision can be enabled without changing the UI:
|
| 19 |
+
|
| 20 |
+
```bash
|
| 21 |
+
OBJECTVERSE_VISION_BACKEND=minicpm-v \
|
| 22 |
+
VISION_MODEL_ID=openbmb/MiniCPM-V-2_6 \
|
| 23 |
+
OBJECTVERSE_TEXT_BACKEND=mock \
|
| 24 |
+
.venv/bin/python app.py
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
This only replaces object understanding. Persona generation, diary generation, and chat can remain mock or use the optional llama.cpp text path below.
|
| 28 |
+
|
| 29 |
+
Optional llama.cpp text generation can be enabled without changing the UI:
|
| 30 |
+
|
| 31 |
+
```bash
|
| 32 |
+
pip install llama-cpp-python
|
| 33 |
+
OBJECTVERSE_TEXT_BACKEND=llama-cpp \
|
| 34 |
+
TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
|
| 35 |
+
.venv/bin/python app.py
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
`llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
|
| 39 |
+
|
| 40 |
## Environment Variables
|
| 41 |
|
| 42 |
```bash
|
|
|
|
| 47 |
TRACE_OUTPUT_DIR=data/traces
|
| 48 |
```
|
| 49 |
|
| 50 |
+
For the hosted Space, set these Variables:
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
OBJECTVERSE_VISION_BACKEND=minicpm-v
|
| 54 |
+
VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
|
| 55 |
+
OBJECTVERSE_TEXT_BACKEND=mock
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Recommended Space hardware for this path is 1x Nvidia L4. If GPU is unavailable, switch `OBJECTVERSE_VISION_BACKEND` back to `mock` to keep the demo usable.
|
| 59 |
+
|
| 60 |
+
For a Space or local runtime with a separately provided GGUF text model, set:
|
| 61 |
+
|
| 62 |
+
```bash
|
| 63 |
+
OBJECTVERSE_TEXT_BACKEND=llama-cpp
|
| 64 |
+
TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
Do not commit GGUF files or private model paths.
|
| 68 |
+
|
| 69 |
## Future Runtime Boundary
|
| 70 |
|
| 71 |
The next implementation phase should keep the same pipeline boundary:
|
|
|
|
| 80 |
## Fallback Rules
|
| 81 |
|
| 82 |
- VLM unavailable: use manual description and mock/example gallery path.
|
| 83 |
+
- llama.cpp unavailable: use mock text generation path and record `text-fallback-to-mock`.
|
| 84 |
+
- invalid model JSON: repair and validate before rendering, then fall back to mock if validation fails.
|
| 85 |
- private input: anonymize trace text before saving public traces.
|
| 86 |
+
|
| 87 |
+
Trace fallback markers:
|
| 88 |
+
|
| 89 |
+
- `mock-runtime`: default mock vision and mock text runtime.
|
| 90 |
+
- `mock-text-runtime`: real or configured vision path with mock text generation.
|
| 91 |
+
- `mock-vision-runtime`: mock vision with a configured non-mock text backend.
|
| 92 |
+
- `vision-fallback-to-mock`: MiniCPM-V failed or returned invalid JSON, so mock object understanding was used.
|
| 93 |
+
- `text-fallback-to-mock`: llama.cpp was configured but unavailable, invalid, or unable to return schema-valid JSON.
|
docs/SPACE_VLM_REPORT.md
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Space VLM Validation Report
|
| 2 |
+
|
| 3 |
+
- Generated at: 2026-06-06 04:25 UTC
|
| 4 |
+
- Space URL: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
|
| 5 |
+
- Space repo: `build-small-hackathon/ObjectverseDiary`
|
| 6 |
+
- Overall status: FAIL
|
| 7 |
+
- Vision backend expected: `minicpm-v`
|
| 8 |
+
- Text backend expected: `mock`
|
| 9 |
+
|
| 10 |
+
## Space Configuration
|
| 11 |
+
|
| 12 |
+
- Requested configuration:
|
| 13 |
+
- `hardware`: `l4x1`
|
| 14 |
+
- `OBJECTVERSE_VISION_BACKEND`: `minicpm-v`
|
| 15 |
+
- `VISION_MODEL_ID`: `openbmb/MiniCPM-V-2_6`
|
| 16 |
+
- `OBJECTVERSE_TEXT_BACKEND`: `mock`
|
| 17 |
+
|
| 18 |
+
- Rollback configuration applied:
|
| 19 |
+
- `hardware`: `cpu-basic`
|
| 20 |
+
- `OBJECTVERSE_VISION_BACKEND`: `mock`
|
| 21 |
+
- `OBJECTVERSE_TEXT_BACKEND`: `mock`
|
| 22 |
+
|
| 23 |
+
## Configuration Error
|
| 24 |
+
|
| 25 |
+
- Error: `HfHubHTTPError: 402 Payment Required`
|
| 26 |
+
- Meaning: Hugging Face requires pre-paid credits or billing access for the `build-small-hackathon` organization before the Space can use paid `l4x1` hardware.
|
| 27 |
+
- Impact: Remote MiniCPM-V validation did not run. No mug / keyboard / shoe image inference results were produced.
|
| 28 |
+
- Safety outcome: Mock-safe rollback was run after the failed hardware request.
|
| 29 |
+
- Post-rollback runtime check: Space is `RUNNING` with `hardware=cpu-basic` and `requested_hardware=cpu-basic`.
|
| 30 |
+
|
| 31 |
+
## Results
|
| 32 |
+
|
| 33 |
+
- Coffee mug: NOT RUN
|
| 34 |
+
- Computer keyboard: NOT RUN
|
| 35 |
+
- Running shoe: NOT RUN
|
| 36 |
+
|
| 37 |
+
## Notes
|
| 38 |
+
|
| 39 |
+
- Test images are temporary public Wikimedia Commons assets and are not committed.
|
| 40 |
+
- Text generation remains mock during this validation plan.
|
| 41 |
+
- No tokens, secrets, or private file paths are recorded in this report.
|
| 42 |
+
- Next unblock step: enable billing/pre-paid credits for the Hugging Face organization or choose an available free GPU option, then rerun `scripts/check_space_vlm.py`.
|
docs/SUBMISSION_GUIDE.md
CHANGED
|
@@ -2,8 +2,8 @@
|
|
| 2 |
|
| 3 |
## Required Package
|
| 4 |
|
| 5 |
-
- [
|
| 6 |
-
- [ ] GitHub Repository URL:
|
| 7 |
- [ ] Demo Video URL: pending recording
|
| 8 |
- [ ] Social Media Post URL: pending final copy
|
| 9 |
- [ ] Fine-tuned Model URL: pending model training
|
|
@@ -18,11 +18,28 @@
|
|
| 18 |
- Runtime boundary: `docs/RUNTIME.md`
|
| 19 |
- Dataset plan and preview workflow: `docs/DATASET.md`
|
| 20 |
- External setup checklist: `docs/EXTERNAL_SETUP.md`
|
|
|
|
| 21 |
- Public mock traces: `data/traces/samples/`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## Final Checks
|
| 24 |
|
| 25 |
- [ ] Space is under the official organization.
|
|
|
|
| 26 |
- [ ] Demo video is under 2 minutes.
|
| 27 |
- [ ] README includes model parameter counts.
|
| 28 |
- [ ] No commercial cloud AI APIs are used.
|
|
|
|
| 2 |
|
| 3 |
## Required Package
|
| 4 |
|
| 5 |
+
- [x] Hugging Face Space URL: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
|
| 6 |
+
- [ ] GitHub Repository URL: local `origin` configured, sync/submission confirmation pending
|
| 7 |
- [ ] Demo Video URL: pending recording
|
| 8 |
- [ ] Social Media Post URL: pending final copy
|
| 9 |
- [ ] Fine-tuned Model URL: pending model training
|
|
|
|
| 18 |
- Runtime boundary: `docs/RUNTIME.md`
|
| 19 |
- Dataset plan and preview workflow: `docs/DATASET.md`
|
| 20 |
- External setup checklist: `docs/EXTERNAL_SETUP.md`
|
| 21 |
+
- Space VLM validation report: `docs/SPACE_VLM_REPORT.md` currently failed because `l4x1` hardware returned `402 Payment Required`.
|
| 22 |
- Public mock traces: `data/traces/samples/`
|
| 23 |
+
- Optional llama.cpp runtime wiring: `src/models/llama_cpp_runner.py`
|
| 24 |
+
|
| 25 |
+
## Completed Locally
|
| 26 |
+
|
| 27 |
+
- Mock MVP flow, archive-style UI, share card, trace logging, sample traces, dataset preview, and initial acceptance tooling.
|
| 28 |
+
- MiniCPM-V 2.6 backend wiring with fallback markers.
|
| 29 |
+
- Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
|
| 30 |
+
- Hosted Space VLM validation script and pending report template.
|
| 31 |
+
|
| 32 |
+
## Not Completed Yet
|
| 33 |
+
|
| 34 |
+
- Hosted Space L4 MiniCPM-V validation for mug, keyboard, and shoe; attempted and blocked by Hugging Face paid hardware billing.
|
| 35 |
+
- Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count.
|
| 36 |
+
- Real model traces, curated dataset, LoRA training, model/dataset publishing.
|
| 37 |
+
- Field Notes article, demo video, social post, final submission package.
|
| 38 |
|
| 39 |
## Final Checks
|
| 40 |
|
| 41 |
- [ ] Space is under the official organization.
|
| 42 |
+
- [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe. Current status: blocked by paid hardware billing.
|
| 43 |
- [ ] Demo video is under 2 minutes.
|
| 44 |
- [ ] README includes model parameter counts.
|
| 45 |
- [ ] No commercial cloud AI APIs are used.
|
pyproject.toml
CHANGED
|
@@ -6,8 +6,14 @@ requires-python = ">=3.10"
|
|
| 6 |
dependencies = [
|
| 7 |
"gradio>=4.44,<6",
|
| 8 |
"pydantic>=2.7,<3",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
]
|
| 10 |
|
| 11 |
[tool.objectverse-diary]
|
| 12 |
-
status = "
|
| 13 |
-
implementation = "mock-
|
|
|
|
| 6 |
dependencies = [
|
| 7 |
"gradio>=4.44,<6",
|
| 8 |
"pydantic>=2.7,<3",
|
| 9 |
+
"torch",
|
| 10 |
+
"torchvision",
|
| 11 |
+
"transformers>=4.40,<5",
|
| 12 |
+
"Pillow",
|
| 13 |
+
"sentencepiece",
|
| 14 |
+
"accelerate",
|
| 15 |
]
|
| 16 |
|
| 17 |
[tool.objectverse-diary]
|
| 18 |
+
status = "vlm-ready-mock-text"
|
| 19 |
+
implementation = "minicpm-v-or-mock-vision-with-mock-text"
|
requirements.txt
CHANGED
|
@@ -1,2 +1,8 @@
|
|
| 1 |
gradio>=4.44,<6
|
| 2 |
pydantic>=2.7,<3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
gradio>=4.44,<6
|
| 2 |
pydantic>=2.7,<3
|
| 3 |
+
torch
|
| 4 |
+
torchvision
|
| 5 |
+
transformers>=4.40,<5
|
| 6 |
+
Pillow
|
| 7 |
+
sentencepiece
|
| 8 |
+
accelerate
|
scripts/README.md
CHANGED
|
@@ -8,6 +8,7 @@ Implemented initial scripts:
|
|
| 8 |
- `generate_sample_traces.py`: creates six stable public mock traces under `data/traces/samples/`.
|
| 9 |
- `generate_dataset.py`: creates deterministic SFT preview JSONL for schema and curation planning.
|
| 10 |
- `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
|
|
|
|
| 11 |
|
| 12 |
Expected files during implementation:
|
| 13 |
|
|
@@ -15,4 +16,18 @@ Expected files during implementation:
|
|
| 15 |
- `convert_to_gguf.sh`
|
| 16 |
- `run_llama_cpp.sh`
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
- `generate_sample_traces.py`: creates six stable public mock traces under `data/traces/samples/`.
|
| 9 |
- `generate_dataset.py`: creates deterministic SFT preview JSONL for schema and curation planning.
|
| 10 |
- `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
|
| 11 |
+
- `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
|
| 12 |
|
| 13 |
Expected files during implementation:
|
| 14 |
|
|
|
|
| 16 |
- `convert_to_gguf.sh`
|
| 17 |
- `run_llama_cpp.sh`
|
| 18 |
|
| 19 |
+
Space VLM validation:
|
| 20 |
+
|
| 21 |
+
```bash
|
| 22 |
+
.venv/bin/python -B scripts/check_space_vlm.py \
|
| 23 |
+
--space-url https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary \
|
| 24 |
+
--output docs/SPACE_VLM_REPORT.md
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
External Space changes are explicit:
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
.venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
Current status: mock trace generation, trace JSONL export, SFT preview generation, optional MiniCPM-V wiring, optional llama.cpp wiring, and hosted Space VLM validation tooling are implemented. Real model validation on Space, fine-tuning, and GGUF conversion are not completed yet.
|
scripts/check_space_vlm.py
ADDED
|
@@ -0,0 +1,481 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Validate MiniCPM-V object understanding on the hosted Hugging Face Space."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import argparse
|
| 6 |
+
import json
|
| 7 |
+
import sys
|
| 8 |
+
import time
|
| 9 |
+
import urllib.request
|
| 10 |
+
from dataclasses import dataclass
|
| 11 |
+
from datetime import datetime, timezone
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
from typing import Any
|
| 14 |
+
from urllib.parse import urlparse
|
| 15 |
+
|
| 16 |
+
PROJECT_ROOT = Path(__file__).resolve().parents[1]
|
| 17 |
+
if str(PROJECT_ROOT) not in sys.path:
|
| 18 |
+
sys.path.insert(0, str(PROJECT_ROOT))
|
| 19 |
+
|
| 20 |
+
from src.models.schema import TraceRecord
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
DEFAULT_SPACE_URL = "https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary"
|
| 24 |
+
DEFAULT_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.md")
|
| 25 |
+
DEFAULT_JSON_OUTPUT_PATH = Path("docs/SPACE_VLM_REPORT.json")
|
| 26 |
+
DEFAULT_ASSET_DIR = Path(".tmp/space-vlm-assets")
|
| 27 |
+
DEFAULT_HARDWARE = "l4x1"
|
| 28 |
+
MOCK_SAFE_HARDWARE = "cpu-basic"
|
| 29 |
+
GENERATE_API_NAME = "/generate_object_file"
|
| 30 |
+
REQUEST_TIMEOUT_SECONDS = 45
|
| 31 |
+
|
| 32 |
+
SPACE_VARIABLES = {
|
| 33 |
+
"OBJECTVERSE_VISION_BACKEND": "minicpm-v",
|
| 34 |
+
"VISION_MODEL_ID": "openbmb/MiniCPM-V-2_6",
|
| 35 |
+
"OBJECTVERSE_TEXT_BACKEND": "mock",
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
MOCK_SAFE_VARIABLES = {
|
| 39 |
+
"OBJECTVERSE_VISION_BACKEND": "mock",
|
| 40 |
+
"OBJECTVERSE_TEXT_BACKEND": "mock",
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
@dataclass(frozen=True)
|
| 45 |
+
class ValidationAsset:
|
| 46 |
+
key: str
|
| 47 |
+
label: str
|
| 48 |
+
source_page: str
|
| 49 |
+
download_url: str
|
| 50 |
+
expected_terms: tuple[str, ...]
|
| 51 |
+
description: str
|
| 52 |
+
mode: str = "Cynical"
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
@dataclass(frozen=True)
|
| 56 |
+
class ValidationResult:
|
| 57 |
+
key: str
|
| 58 |
+
label: str
|
| 59 |
+
source_page: str
|
| 60 |
+
image_path: str
|
| 61 |
+
passed: bool
|
| 62 |
+
object_name: str
|
| 63 |
+
visible_features: list[str]
|
| 64 |
+
likely_context: str
|
| 65 |
+
confidence: float
|
| 66 |
+
runtime_vision: str
|
| 67 |
+
runtime_text: str
|
| 68 |
+
fallbacks: list[str]
|
| 69 |
+
error: str = ""
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
TEST_ASSETS = [
|
| 73 |
+
ValidationAsset(
|
| 74 |
+
key="mug",
|
| 75 |
+
label="Coffee mug",
|
| 76 |
+
source_page="https://commons.wikimedia.org/wiki/File:Striped_coffee_mug.jpg",
|
| 77 |
+
download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Striped_coffee_mug.jpg",
|
| 78 |
+
expected_terms=("mug", "cup", "coffee", "ceramic", "handle"),
|
| 79 |
+
description="A public Wikimedia Commons photo of a striped coffee mug.",
|
| 80 |
+
),
|
| 81 |
+
ValidationAsset(
|
| 82 |
+
key="keyboard",
|
| 83 |
+
label="Computer keyboard",
|
| 84 |
+
source_page="https://commons.wikimedia.org/wiki/File:Computer_keyboard.jpg",
|
| 85 |
+
download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Computer_keyboard.jpg",
|
| 86 |
+
expected_terms=("keyboard", "key", "computer", "keys"),
|
| 87 |
+
description="A public Wikimedia Commons photo of a computer keyboard.",
|
| 88 |
+
mode="Philosopher",
|
| 89 |
+
),
|
| 90 |
+
ValidationAsset(
|
| 91 |
+
key="shoe",
|
| 92 |
+
label="Running shoe",
|
| 93 |
+
source_page="https://commons.wikimedia.org/wiki/File:Running_shoes.jpg",
|
| 94 |
+
download_url="https://commons.wikimedia.org/wiki/Special:Redirect/file/Running_shoes.jpg",
|
| 95 |
+
expected_terms=("shoe", "sneaker", "running", "footwear", "trainer"),
|
| 96 |
+
description="A public Wikimedia Commons photo of running shoes.",
|
| 97 |
+
mode="Dramatic",
|
| 98 |
+
),
|
| 99 |
+
]
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
def parse_space_repo_id(space_url: str) -> str:
|
| 103 |
+
parsed = urlparse(space_url)
|
| 104 |
+
parts = [part for part in parsed.path.split("/") if part]
|
| 105 |
+
if len(parts) >= 3 and parts[0] == "spaces":
|
| 106 |
+
return f"{parts[1]}/{parts[2]}"
|
| 107 |
+
if len(parts) == 2:
|
| 108 |
+
return f"{parts[0]}/{parts[1]}"
|
| 109 |
+
raise ValueError(f"Could not parse Hugging Face Space repo id from {space_url!r}")
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
def download_validation_assets(
|
| 113 |
+
asset_dir: Path = DEFAULT_ASSET_DIR,
|
| 114 |
+
assets: list[ValidationAsset] | None = None,
|
| 115 |
+
) -> dict[str, Path]:
|
| 116 |
+
selected_assets = assets or TEST_ASSETS
|
| 117 |
+
asset_dir.mkdir(parents=True, exist_ok=True)
|
| 118 |
+
paths: dict[str, Path] = {}
|
| 119 |
+
for asset in selected_assets:
|
| 120 |
+
output_path = asset_dir / f"{asset.key}.jpg"
|
| 121 |
+
if not output_path.exists():
|
| 122 |
+
_download_url(asset.download_url, output_path)
|
| 123 |
+
paths[asset.key] = output_path
|
| 124 |
+
return paths
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
def configure_space_for_vlm(
|
| 128 |
+
repo_id: str,
|
| 129 |
+
*,
|
| 130 |
+
hardware: str = DEFAULT_HARDWARE,
|
| 131 |
+
wait: bool = True,
|
| 132 |
+
timeout_seconds: int = 900,
|
| 133 |
+
) -> dict[str, str]:
|
| 134 |
+
from huggingface_hub import HfApi, SpaceHardware
|
| 135 |
+
|
| 136 |
+
api = HfApi()
|
| 137 |
+
_assert_hf_auth(api)
|
| 138 |
+
for key, value in SPACE_VARIABLES.items():
|
| 139 |
+
api.add_space_variable(repo_id=repo_id, key=key, value=value)
|
| 140 |
+
api.request_space_hardware(repo_id=repo_id, hardware=SpaceHardware(hardware))
|
| 141 |
+
if wait:
|
| 142 |
+
wait_for_space_running(repo_id, timeout_seconds=timeout_seconds)
|
| 143 |
+
return {"repo_id": repo_id, "hardware": hardware, **SPACE_VARIABLES}
|
| 144 |
+
|
| 145 |
+
|
| 146 |
+
def rollback_space_to_mock(repo_id: str, *, hardware: str = MOCK_SAFE_HARDWARE) -> dict[str, str]:
|
| 147 |
+
from huggingface_hub import HfApi, SpaceHardware
|
| 148 |
+
|
| 149 |
+
api = HfApi()
|
| 150 |
+
_assert_hf_auth(api)
|
| 151 |
+
for key, value in MOCK_SAFE_VARIABLES.items():
|
| 152 |
+
api.add_space_variable(repo_id=repo_id, key=key, value=value)
|
| 153 |
+
api.request_space_hardware(repo_id=repo_id, hardware=SpaceHardware(hardware))
|
| 154 |
+
return {"repo_id": repo_id, "hardware": hardware, **MOCK_SAFE_VARIABLES}
|
| 155 |
+
|
| 156 |
+
|
| 157 |
+
def wait_for_space_running(
|
| 158 |
+
repo_id: str,
|
| 159 |
+
*,
|
| 160 |
+
timeout_seconds: int = 900,
|
| 161 |
+
poll_seconds: int = 20,
|
| 162 |
+
) -> str:
|
| 163 |
+
from huggingface_hub import HfApi
|
| 164 |
+
|
| 165 |
+
api = HfApi()
|
| 166 |
+
deadline = time.monotonic() + timeout_seconds
|
| 167 |
+
last_stage = "unknown"
|
| 168 |
+
while time.monotonic() < deadline:
|
| 169 |
+
runtime = api.get_space_runtime(repo_id=repo_id)
|
| 170 |
+
last_stage = _runtime_stage_name(runtime)
|
| 171 |
+
if last_stage.upper() == "RUNNING":
|
| 172 |
+
return last_stage
|
| 173 |
+
time.sleep(poll_seconds)
|
| 174 |
+
raise TimeoutError(f"Space {repo_id} did not reach RUNNING within {timeout_seconds}s; last stage: {last_stage}")
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
def run_space_validation(
|
| 178 |
+
*,
|
| 179 |
+
space_url: str = DEFAULT_SPACE_URL,
|
| 180 |
+
asset_dir: Path = DEFAULT_ASSET_DIR,
|
| 181 |
+
timeout_seconds: int = 900,
|
| 182 |
+
assets: list[ValidationAsset] | None = None,
|
| 183 |
+
) -> list[ValidationResult]:
|
| 184 |
+
from gradio_client import Client, handle_file
|
| 185 |
+
|
| 186 |
+
selected_assets = assets or TEST_ASSETS
|
| 187 |
+
paths = download_validation_assets(asset_dir, selected_assets)
|
| 188 |
+
client = Client(space_url, verbose=False)
|
| 189 |
+
results: list[ValidationResult] = []
|
| 190 |
+
started = time.monotonic()
|
| 191 |
+
for asset in selected_assets:
|
| 192 |
+
remaining = timeout_seconds - int(time.monotonic() - started)
|
| 193 |
+
if remaining <= 0:
|
| 194 |
+
raise TimeoutError(f"Validation exceeded timeout of {timeout_seconds}s")
|
| 195 |
+
try:
|
| 196 |
+
response = client.predict(
|
| 197 |
+
handle_file(str(paths[asset.key])),
|
| 198 |
+
asset.description,
|
| 199 |
+
asset.mode,
|
| 200 |
+
api_name=GENERATE_API_NAME,
|
| 201 |
+
)
|
| 202 |
+
results.append(validate_prediction(asset, paths[asset.key], response))
|
| 203 |
+
except Exception as exc:
|
| 204 |
+
results.append(
|
| 205 |
+
ValidationResult(
|
| 206 |
+
key=asset.key,
|
| 207 |
+
label=asset.label,
|
| 208 |
+
source_page=asset.source_page,
|
| 209 |
+
image_path=str(paths[asset.key]),
|
| 210 |
+
passed=False,
|
| 211 |
+
object_name="",
|
| 212 |
+
visible_features=[],
|
| 213 |
+
likely_context="",
|
| 214 |
+
confidence=0.0,
|
| 215 |
+
runtime_vision="",
|
| 216 |
+
runtime_text="",
|
| 217 |
+
fallbacks=[],
|
| 218 |
+
error=f"{type(exc).__name__}: {exc}",
|
| 219 |
+
)
|
| 220 |
+
)
|
| 221 |
+
return results
|
| 222 |
+
|
| 223 |
+
|
| 224 |
+
def validate_prediction(
|
| 225 |
+
asset: ValidationAsset,
|
| 226 |
+
image_path: Path,
|
| 227 |
+
response: Any,
|
| 228 |
+
) -> ValidationResult:
|
| 229 |
+
trace_payload = _extract_trace_payload(response)
|
| 230 |
+
trace = TraceRecord.model_validate(trace_payload)
|
| 231 |
+
object_info = trace.object_understanding.object
|
| 232 |
+
search_text = " ".join(
|
| 233 |
+
[
|
| 234 |
+
object_info.name,
|
| 235 |
+
object_info.likely_context,
|
| 236 |
+
" ".join(object_info.visible_features),
|
| 237 |
+
]
|
| 238 |
+
).lower()
|
| 239 |
+
expected_match = any(term in search_text for term in asset.expected_terms)
|
| 240 |
+
vision_runtime_ok = trace.model_runtime.get("vision") == "minicpm-v object understanding"
|
| 241 |
+
text_runtime_ok = trace.model_runtime.get("text") == "mock persona and diary generation"
|
| 242 |
+
no_vision_fallback = "vision-fallback-to-mock" not in trace.fallbacks
|
| 243 |
+
passed = expected_match and vision_runtime_ok and text_runtime_ok and no_vision_fallback
|
| 244 |
+
return ValidationResult(
|
| 245 |
+
key=asset.key,
|
| 246 |
+
label=asset.label,
|
| 247 |
+
source_page=asset.source_page,
|
| 248 |
+
image_path=str(image_path),
|
| 249 |
+
passed=passed,
|
| 250 |
+
object_name=object_info.name,
|
| 251 |
+
visible_features=object_info.visible_features,
|
| 252 |
+
likely_context=object_info.likely_context,
|
| 253 |
+
confidence=object_info.confidence,
|
| 254 |
+
runtime_vision=trace.model_runtime.get("vision", ""),
|
| 255 |
+
runtime_text=trace.model_runtime.get("text", ""),
|
| 256 |
+
fallbacks=trace.fallbacks,
|
| 257 |
+
error="" if passed else _failure_reason(expected_match, vision_runtime_ok, text_runtime_ok, no_vision_fallback),
|
| 258 |
+
)
|
| 259 |
+
|
| 260 |
+
|
| 261 |
+
def render_report(
|
| 262 |
+
*,
|
| 263 |
+
space_url: str,
|
| 264 |
+
repo_id: str,
|
| 265 |
+
results: list[ValidationResult],
|
| 266 |
+
configured: dict[str, str] | None = None,
|
| 267 |
+
rollback: dict[str, str] | None = None,
|
| 268 |
+
configuration_error: str = "",
|
| 269 |
+
) -> str:
|
| 270 |
+
now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
|
| 271 |
+
status = "NOT RUN"
|
| 272 |
+
if configuration_error:
|
| 273 |
+
status = "FAIL"
|
| 274 |
+
elif results:
|
| 275 |
+
status = "PASS" if all(result.passed for result in results) else "FAIL"
|
| 276 |
+
lines = [
|
| 277 |
+
"# Space VLM Validation Report",
|
| 278 |
+
"",
|
| 279 |
+
f"- Generated at: {now}",
|
| 280 |
+
f"- Space URL: {space_url}",
|
| 281 |
+
f"- Space repo: `{repo_id}`",
|
| 282 |
+
f"- Overall status: {status}",
|
| 283 |
+
"- Vision backend expected: `minicpm-v`",
|
| 284 |
+
"- Text backend expected: `mock`",
|
| 285 |
+
"",
|
| 286 |
+
"## Space Configuration",
|
| 287 |
+
"",
|
| 288 |
+
]
|
| 289 |
+
if configured:
|
| 290 |
+
lines.extend(_config_lines("Applied configuration", configured))
|
| 291 |
+
else:
|
| 292 |
+
lines.append("- Applied configuration: not changed by this run.")
|
| 293 |
+
if rollback:
|
| 294 |
+
lines.extend(["", *_config_lines("Rollback configuration", rollback)])
|
| 295 |
+
else:
|
| 296 |
+
lines.append("- Rollback configuration: not applied by this run.")
|
| 297 |
+
if configuration_error:
|
| 298 |
+
lines.extend(["", "## Configuration Error", "", f"- Error: `{configuration_error}`"])
|
| 299 |
+
|
| 300 |
+
lines.extend(["", "## Results", ""])
|
| 301 |
+
for result in results:
|
| 302 |
+
lines.extend(
|
| 303 |
+
[
|
| 304 |
+
f"### {result.label}",
|
| 305 |
+
"",
|
| 306 |
+
f"- Status: {'PASS' if result.passed else 'FAIL'}",
|
| 307 |
+
f"- Source: {result.source_page}",
|
| 308 |
+
f"- Local temporary image: `{result.image_path}`",
|
| 309 |
+
f"- Object name: `{result.object_name}`",
|
| 310 |
+
f"- Visible features: {', '.join(result.visible_features) or 'n/a'}",
|
| 311 |
+
f"- Likely context: `{result.likely_context}`",
|
| 312 |
+
f"- Confidence: {result.confidence:.2f}",
|
| 313 |
+
f"- Runtime vision: `{result.runtime_vision}`",
|
| 314 |
+
f"- Runtime text: `{result.runtime_text}`",
|
| 315 |
+
f"- Fallbacks: {', '.join(result.fallbacks) or 'none'}",
|
| 316 |
+
]
|
| 317 |
+
)
|
| 318 |
+
if result.error:
|
| 319 |
+
lines.append(f"- Error: `{result.error}`")
|
| 320 |
+
lines.append("")
|
| 321 |
+
lines.extend(
|
| 322 |
+
[
|
| 323 |
+
"## Notes",
|
| 324 |
+
"",
|
| 325 |
+
"- Test images are temporary public Wikimedia Commons assets and are not committed.",
|
| 326 |
+
"- No tokens, secrets, or private file paths should be recorded in this report.",
|
| 327 |
+
"- If validation fails, switch `OBJECTVERSE_VISION_BACKEND` back to `mock` to keep the demo usable.",
|
| 328 |
+
]
|
| 329 |
+
)
|
| 330 |
+
return "\n".join(lines) + "\n"
|
| 331 |
+
|
| 332 |
+
|
| 333 |
+
def write_report(markdown: str, output_path: Path = DEFAULT_OUTPUT_PATH) -> Path:
|
| 334 |
+
output_path.parent.mkdir(parents=True, exist_ok=True)
|
| 335 |
+
output_path.write_text(markdown, encoding="utf-8")
|
| 336 |
+
return output_path
|
| 337 |
+
|
| 338 |
+
|
| 339 |
+
def write_json_results(results: list[ValidationResult], output_path: Path) -> Path:
|
| 340 |
+
output_path.parent.mkdir(parents=True, exist_ok=True)
|
| 341 |
+
payload = [result.__dict__ for result in results]
|
| 342 |
+
output_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
|
| 343 |
+
return output_path
|
| 344 |
+
|
| 345 |
+
|
| 346 |
+
def _download_url(url: str, output_path: Path) -> None:
|
| 347 |
+
request = urllib.request.Request(
|
| 348 |
+
url,
|
| 349 |
+
headers={"User-Agent": "Objectverse-Diary-Space-VLM-Check/0.1"},
|
| 350 |
+
)
|
| 351 |
+
with urllib.request.urlopen(request, timeout=REQUEST_TIMEOUT_SECONDS) as response:
|
| 352 |
+
output_path.write_bytes(response.read())
|
| 353 |
+
|
| 354 |
+
|
| 355 |
+
def _extract_trace_payload(response: Any) -> dict[str, Any]:
|
| 356 |
+
if isinstance(response, tuple | list):
|
| 357 |
+
if len(response) < 7:
|
| 358 |
+
raise ValueError("Gradio response did not include trace JSON output.")
|
| 359 |
+
trace_payload = response[6]
|
| 360 |
+
elif isinstance(response, dict) and "trace" in response:
|
| 361 |
+
trace_payload = response["trace"]
|
| 362 |
+
else:
|
| 363 |
+
raise ValueError("Unsupported Gradio response shape.")
|
| 364 |
+
if not isinstance(trace_payload, dict):
|
| 365 |
+
raise ValueError("Trace output was not a JSON object.")
|
| 366 |
+
return trace_payload
|
| 367 |
+
|
| 368 |
+
|
| 369 |
+
def _failure_reason(
|
| 370 |
+
expected_match: bool,
|
| 371 |
+
vision_runtime_ok: bool,
|
| 372 |
+
text_runtime_ok: bool,
|
| 373 |
+
no_vision_fallback: bool,
|
| 374 |
+
) -> str:
|
| 375 |
+
reasons: list[str] = []
|
| 376 |
+
if not expected_match:
|
| 377 |
+
reasons.append("object output did not match expected terms")
|
| 378 |
+
if not vision_runtime_ok:
|
| 379 |
+
reasons.append("vision runtime was not minicpm-v")
|
| 380 |
+
if not text_runtime_ok:
|
| 381 |
+
reasons.append("text runtime was not mock")
|
| 382 |
+
if not no_vision_fallback:
|
| 383 |
+
reasons.append("vision fallback marker was present")
|
| 384 |
+
return "; ".join(reasons)
|
| 385 |
+
|
| 386 |
+
|
| 387 |
+
def _runtime_stage_name(runtime: Any) -> str:
|
| 388 |
+
stage = getattr(runtime, "stage", None)
|
| 389 |
+
if stage is None and isinstance(runtime, dict):
|
| 390 |
+
stage = runtime.get("stage")
|
| 391 |
+
if hasattr(stage, "value"):
|
| 392 |
+
return str(stage.value)
|
| 393 |
+
return str(stage or "unknown")
|
| 394 |
+
|
| 395 |
+
|
| 396 |
+
def _assert_hf_auth(api: Any) -> None:
|
| 397 |
+
try:
|
| 398 |
+
user = api.whoami()
|
| 399 |
+
except Exception as exc:
|
| 400 |
+
raise RuntimeError("Hugging Face authentication is required for Space configuration.") from exc
|
| 401 |
+
if not isinstance(user, dict) or not user.get("name"):
|
| 402 |
+
raise RuntimeError("Hugging Face authentication did not return a user name.")
|
| 403 |
+
|
| 404 |
+
|
| 405 |
+
def _config_lines(title: str, config: dict[str, str]) -> list[str]:
|
| 406 |
+
lines = [f"- {title}:"]
|
| 407 |
+
for key, value in config.items():
|
| 408 |
+
lines.append(f" - `{key}`: `{value}`")
|
| 409 |
+
return lines
|
| 410 |
+
|
| 411 |
+
|
| 412 |
+
def _parse_args() -> argparse.Namespace:
|
| 413 |
+
parser = argparse.ArgumentParser(description=__doc__)
|
| 414 |
+
parser.add_argument("--space-url", default=DEFAULT_SPACE_URL)
|
| 415 |
+
parser.add_argument("--asset-dir", type=Path, default=DEFAULT_ASSET_DIR)
|
| 416 |
+
parser.add_argument("--output", type=Path, default=DEFAULT_OUTPUT_PATH)
|
| 417 |
+
parser.add_argument("--json-output", type=Path)
|
| 418 |
+
parser.add_argument("--timeout-seconds", type=int, default=900)
|
| 419 |
+
parser.add_argument("--configure-space", action="store_true")
|
| 420 |
+
parser.add_argument("--rollback-to-mock", action="store_true")
|
| 421 |
+
parser.add_argument("--hardware", default=DEFAULT_HARDWARE)
|
| 422 |
+
parser.add_argument("--skip-validation", action="store_true")
|
| 423 |
+
return parser.parse_args()
|
| 424 |
+
|
| 425 |
+
|
| 426 |
+
def main() -> None:
|
| 427 |
+
args = _parse_args()
|
| 428 |
+
repo_id = parse_space_repo_id(args.space_url)
|
| 429 |
+
configured = None
|
| 430 |
+
rollback = None
|
| 431 |
+
configuration_error = ""
|
| 432 |
+
if args.configure_space:
|
| 433 |
+
try:
|
| 434 |
+
configured = configure_space_for_vlm(
|
| 435 |
+
repo_id,
|
| 436 |
+
hardware=args.hardware,
|
| 437 |
+
wait=True,
|
| 438 |
+
timeout_seconds=args.timeout_seconds,
|
| 439 |
+
)
|
| 440 |
+
except Exception as exc:
|
| 441 |
+
configuration_error = f"{type(exc).__name__}: {exc}"
|
| 442 |
+
if args.rollback_to_mock:
|
| 443 |
+
try:
|
| 444 |
+
rollback = rollback_space_to_mock(repo_id)
|
| 445 |
+
except Exception as rollback_exc:
|
| 446 |
+
configuration_error = (
|
| 447 |
+
f"{configuration_error}; rollback failed with "
|
| 448 |
+
f"{type(rollback_exc).__name__}: {rollback_exc}"
|
| 449 |
+
)
|
| 450 |
+
|
| 451 |
+
results: list[ValidationResult] = []
|
| 452 |
+
if not args.skip_validation and not configuration_error:
|
| 453 |
+
results = run_space_validation(
|
| 454 |
+
space_url=args.space_url,
|
| 455 |
+
asset_dir=args.asset_dir,
|
| 456 |
+
timeout_seconds=args.timeout_seconds,
|
| 457 |
+
)
|
| 458 |
+
|
| 459 |
+
if args.rollback_to_mock and rollback is None:
|
| 460 |
+
rollback = rollback_space_to_mock(repo_id)
|
| 461 |
+
|
| 462 |
+
report = render_report(
|
| 463 |
+
space_url=args.space_url,
|
| 464 |
+
repo_id=repo_id,
|
| 465 |
+
results=results,
|
| 466 |
+
configured=configured,
|
| 467 |
+
rollback=rollback,
|
| 468 |
+
configuration_error=configuration_error,
|
| 469 |
+
)
|
| 470 |
+
write_report(report, args.output)
|
| 471 |
+
if args.json_output:
|
| 472 |
+
write_json_results(results, args.json_output)
|
| 473 |
+
|
| 474 |
+
if configuration_error or (results and not all(result.passed for result in results)):
|
| 475 |
+
raise SystemExit(1)
|
| 476 |
+
|
| 477 |
+
print(f"wrote Space VLM report to {args.output}")
|
| 478 |
+
|
| 479 |
+
|
| 480 |
+
if __name__ == "__main__":
|
| 481 |
+
main()
|
src/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
|
| 3 |
This directory is reserved for application source code.
|
| 4 |
|
| 5 |
-
Current status: initial mock MVP.
|
| 6 |
|
| 7 |
## Planned Areas
|
| 8 |
|
|
|
|
| 2 |
|
| 3 |
This directory is reserved for application source code.
|
| 4 |
|
| 5 |
+
Current status: initial mock MVP with optional MiniCPM-V 2.6 vision backend. Text generation remains mock until the llama.cpp phase.
|
| 6 |
|
| 7 |
## Planned Areas
|
| 8 |
|
src/config.py
CHANGED
|
@@ -43,21 +43,26 @@ def get_runtime_settings(environ: Mapping[str, str] | None = None) -> RuntimeSet
|
|
| 43 |
|
| 44 |
def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
|
| 45 |
current = settings or get_runtime_settings()
|
|
|
|
|
|
|
| 46 |
vision = (
|
| 47 |
"mock object understanding"
|
| 48 |
-
if
|
| 49 |
-
else f"{
|
| 50 |
-
)
|
| 51 |
-
text = (
|
| 52 |
-
"mock persona and diary generation"
|
| 53 |
-
if current.text_backend == "mock"
|
| 54 |
-
else f"{current.text_backend} persona and diary generation"
|
| 55 |
-
)
|
| 56 |
-
runtime = (
|
| 57 |
-
"no llama.cpp model connected yet"
|
| 58 |
-
if current.text_backend == "mock"
|
| 59 |
-
else f"text model path: {current.text_model_path or '[not configured]'}"
|
| 60 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
return {"vision": vision, "text": text, "runtime": runtime}
|
| 62 |
|
| 63 |
|
|
|
|
| 43 |
|
| 44 |
def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
|
| 45 |
current = settings or get_runtime_settings()
|
| 46 |
+
vision_backend = current.vision_backend.strip().lower()
|
| 47 |
+
text_backend = current.text_backend.strip().lower()
|
| 48 |
vision = (
|
| 49 |
"mock object understanding"
|
| 50 |
+
if vision_backend == "mock"
|
| 51 |
+
else f"{vision_backend} object understanding"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
)
|
| 53 |
+
text = "mock persona and diary generation"
|
| 54 |
+
if text_backend in {"llama-cpp", "llama_cpp", "llamacpp"}:
|
| 55 |
+
text = "llama-cpp text generation"
|
| 56 |
+
elif text_backend != "mock":
|
| 57 |
+
text = f"{text_backend} text generation"
|
| 58 |
+
runtime_parts: list[str] = []
|
| 59 |
+
if vision_backend != "mock":
|
| 60 |
+
runtime_parts.append(f"vision model id: {current.vision_model_id or '[not configured]'}")
|
| 61 |
+
if text_backend == "mock":
|
| 62 |
+
runtime_parts.append("no llama.cpp model connected yet")
|
| 63 |
+
else:
|
| 64 |
+
runtime_parts.append(f"text model path: {current.text_model_path or '[not configured]'}")
|
| 65 |
+
runtime = "; ".join(runtime_parts)
|
| 66 |
return {"vision": vision, "text": text, "runtime": runtime}
|
| 67 |
|
| 68 |
|
src/models/llama_cpp_runner.py
CHANGED
|
@@ -1,8 +1,16 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
|
| 8 |
MODE_PROFILES = {
|
|
@@ -33,8 +41,58 @@ MODE_PROFILES = {
|
|
| 33 |
},
|
| 34 |
}
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
object_name = object_understanding.object.name
|
| 39 |
profile = MODE_PROFILES.get(mode, MODE_PROFILES["Cynical"])
|
| 40 |
character_name = _character_name(object_name, mode)
|
|
@@ -51,7 +109,7 @@ def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> Pe
|
|
| 51 |
return PersonaEnvelope(persona=persona)
|
| 52 |
|
| 53 |
|
| 54 |
-
def
|
| 55 |
p = persona.persona
|
| 56 |
day_number = 417 + len(p.object_name)
|
| 57 |
|
|
@@ -74,7 +132,7 @@ def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
|
|
| 74 |
)
|
| 75 |
|
| 76 |
|
| 77 |
-
def
|
| 78 |
persona = persona_data.get("persona", {})
|
| 79 |
character_name = persona.get("character_name", "The Object")
|
| 80 |
object_name = persona.get("object_name", "object")
|
|
@@ -88,6 +146,170 @@ def reply_as_object(persona_data: dict, message: str) -> str:
|
|
| 88 |
)
|
| 89 |
|
| 90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
def _character_name(object_name: str, mode: str) -> str:
|
| 92 |
compact = "".join(part.capitalize() for part in object_name.split()[:2])
|
| 93 |
suffix = {
|
|
|
|
| 1 |
+
"""Text generation runtime with mock and optional llama.cpp backends."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
+
import json
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
from typing import Any
|
| 8 |
+
|
| 9 |
+
from src.config import RuntimeSettings, get_runtime_settings
|
| 10 |
from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
|
| 11 |
+
from src.prompts.diary_generation import CHAT_REPLY_PROMPT, DIARY_GENERATION_PROMPT
|
| 12 |
+
from src.prompts.persona_generation import PERSONA_GENERATION_PROMPT
|
| 13 |
+
from src.utils.json_repair import parse_json_object
|
| 14 |
|
| 15 |
|
| 16 |
MODE_PROFILES = {
|
|
|
|
| 41 |
},
|
| 42 |
}
|
| 43 |
|
| 44 |
+
LLAMA_CPP_BACKENDS = {"llama-cpp", "llama_cpp", "llamacpp"}
|
| 45 |
+
TEXT_FALLBACK_TO_MOCK = "text-fallback-to-mock"
|
| 46 |
+
|
| 47 |
+
_LLAMA_MODEL: Any | None = None
|
| 48 |
+
_LLAMA_MODEL_PATH: str | None = None
|
| 49 |
+
_TEXT_FALLBACKS: list[str] = []
|
| 50 |
+
|
| 51 |
|
| 52 |
def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
|
| 53 |
+
settings = get_runtime_settings()
|
| 54 |
+
if _is_llama_cpp_backend(settings):
|
| 55 |
+
try:
|
| 56 |
+
return _generate_persona_llama_cpp(object_understanding, mode, settings)
|
| 57 |
+
except Exception as exc:
|
| 58 |
+
_log_text_fallback("persona", exc)
|
| 59 |
+
_add_text_fallback(TEXT_FALLBACK_TO_MOCK)
|
| 60 |
+
|
| 61 |
+
return _generate_persona_mock(object_understanding, mode)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
|
| 65 |
+
settings = get_runtime_settings()
|
| 66 |
+
if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
|
| 67 |
+
try:
|
| 68 |
+
return _generate_diary_llama_cpp(persona, mode, settings)
|
| 69 |
+
except Exception as exc:
|
| 70 |
+
_log_text_fallback("diary", exc)
|
| 71 |
+
_add_text_fallback(TEXT_FALLBACK_TO_MOCK)
|
| 72 |
+
|
| 73 |
+
return _generate_diary_mock(persona, mode)
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
def reply_as_object(persona_data: dict, message: str) -> str:
|
| 77 |
+
settings = get_runtime_settings()
|
| 78 |
+
if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
|
| 79 |
+
try:
|
| 80 |
+
return _reply_as_object_llama_cpp(persona_data, message, settings)
|
| 81 |
+
except Exception as exc:
|
| 82 |
+
_log_text_fallback("chat", exc)
|
| 83 |
+
|
| 84 |
+
return _reply_as_object_mock(persona_data, message)
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
def reset_text_runtime_fallbacks() -> None:
|
| 88 |
+
_TEXT_FALLBACKS.clear()
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
def get_text_runtime_fallbacks() -> list[str]:
|
| 92 |
+
return list(_TEXT_FALLBACKS)
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
def _generate_persona_mock(object_understanding: ObjectUnderstanding, mode: str) -> PersonaEnvelope:
|
| 96 |
object_name = object_understanding.object.name
|
| 97 |
profile = MODE_PROFILES.get(mode, MODE_PROFILES["Cynical"])
|
| 98 |
character_name = _character_name(object_name, mode)
|
|
|
|
| 109 |
return PersonaEnvelope(persona=persona)
|
| 110 |
|
| 111 |
|
| 112 |
+
def _generate_diary_mock(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
|
| 113 |
p = persona.persona
|
| 114 |
day_number = 417 + len(p.object_name)
|
| 115 |
|
|
|
|
| 132 |
)
|
| 133 |
|
| 134 |
|
| 135 |
+
def _reply_as_object_mock(persona_data: dict, message: str) -> str:
|
| 136 |
persona = persona_data.get("persona", {})
|
| 137 |
character_name = persona.get("character_name", "The Object")
|
| 138 |
object_name = persona.get("object_name", "object")
|
|
|
|
| 146 |
)
|
| 147 |
|
| 148 |
|
| 149 |
+
def _generate_persona_llama_cpp(
|
| 150 |
+
object_understanding: ObjectUnderstanding,
|
| 151 |
+
mode: str,
|
| 152 |
+
settings: RuntimeSettings,
|
| 153 |
+
) -> PersonaEnvelope:
|
| 154 |
+
raw = _run_llama_json(
|
| 155 |
+
system_prompt=PERSONA_GENERATION_PROMPT,
|
| 156 |
+
user_payload={
|
| 157 |
+
"mode": mode,
|
| 158 |
+
"object_understanding": object_understanding.model_dump(mode="json"),
|
| 159 |
+
},
|
| 160 |
+
settings=settings,
|
| 161 |
+
max_tokens=320,
|
| 162 |
+
)
|
| 163 |
+
return PersonaEnvelope.model_validate(raw)
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
def _generate_diary_llama_cpp(
|
| 167 |
+
persona: PersonaEnvelope,
|
| 168 |
+
mode: str,
|
| 169 |
+
settings: RuntimeSettings,
|
| 170 |
+
) -> DiaryEntry:
|
| 171 |
+
raw = _run_llama_json(
|
| 172 |
+
system_prompt=DIARY_GENERATION_PROMPT,
|
| 173 |
+
user_payload={
|
| 174 |
+
"mode": mode,
|
| 175 |
+
"persona": persona.model_dump(mode="json"),
|
| 176 |
+
},
|
| 177 |
+
settings=settings,
|
| 178 |
+
max_tokens=360,
|
| 179 |
+
)
|
| 180 |
+
return DiaryEntry.model_validate(raw)
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
def _reply_as_object_llama_cpp(
|
| 184 |
+
persona_data: dict,
|
| 185 |
+
message: str,
|
| 186 |
+
settings: RuntimeSettings,
|
| 187 |
+
) -> str:
|
| 188 |
+
PersonaEnvelope.model_validate(persona_data)
|
| 189 |
+
raw = _run_llama_json(
|
| 190 |
+
system_prompt=CHAT_REPLY_PROMPT,
|
| 191 |
+
user_payload={
|
| 192 |
+
"persona": persona_data,
|
| 193 |
+
"message": message.strip() or "...",
|
| 194 |
+
},
|
| 195 |
+
settings=settings,
|
| 196 |
+
max_tokens=180,
|
| 197 |
+
)
|
| 198 |
+
reply = raw.get("reply")
|
| 199 |
+
if not isinstance(reply, str) or not reply.strip():
|
| 200 |
+
raise ValueError("llama.cpp chat response did not include a non-empty reply.")
|
| 201 |
+
return reply.strip()
|
| 202 |
+
|
| 203 |
+
|
| 204 |
+
def _run_llama_json(
|
| 205 |
+
*,
|
| 206 |
+
system_prompt: str,
|
| 207 |
+
user_payload: dict[str, Any],
|
| 208 |
+
settings: RuntimeSettings,
|
| 209 |
+
max_tokens: int,
|
| 210 |
+
) -> dict[str, Any]:
|
| 211 |
+
model = _load_llama_model(settings.text_model_path)
|
| 212 |
+
user_content = json.dumps(user_payload, ensure_ascii=False, indent=2)
|
| 213 |
+
raw = _complete_llama(
|
| 214 |
+
model,
|
| 215 |
+
system_prompt=system_prompt,
|
| 216 |
+
user_content=user_content,
|
| 217 |
+
max_tokens=max_tokens,
|
| 218 |
+
)
|
| 219 |
+
return parse_json_object(raw)
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
def _complete_llama(
|
| 223 |
+
model: Any,
|
| 224 |
+
*,
|
| 225 |
+
system_prompt: str,
|
| 226 |
+
user_content: str,
|
| 227 |
+
max_tokens: int,
|
| 228 |
+
) -> str:
|
| 229 |
+
stop = ["</s>", "<|end|>", "<|eot_id|>", "<|im_end|>"]
|
| 230 |
+
if hasattr(model, "create_chat_completion"):
|
| 231 |
+
response = model.create_chat_completion(
|
| 232 |
+
messages=[
|
| 233 |
+
{"role": "system", "content": system_prompt},
|
| 234 |
+
{"role": "user", "content": user_content},
|
| 235 |
+
],
|
| 236 |
+
temperature=0.75,
|
| 237 |
+
max_tokens=max_tokens,
|
| 238 |
+
stop=stop,
|
| 239 |
+
)
|
| 240 |
+
return _extract_completion_text(response)
|
| 241 |
+
|
| 242 |
+
prompt = f"System:\n{system_prompt}\n\nUser:\n{user_content}\n\nAssistant JSON:\n"
|
| 243 |
+
response = model(
|
| 244 |
+
prompt,
|
| 245 |
+
temperature=0.75,
|
| 246 |
+
max_tokens=max_tokens,
|
| 247 |
+
stop=stop,
|
| 248 |
+
)
|
| 249 |
+
return _extract_completion_text(response)
|
| 250 |
+
|
| 251 |
+
|
| 252 |
+
def _extract_completion_text(response: Any) -> str:
|
| 253 |
+
if isinstance(response, str):
|
| 254 |
+
return response
|
| 255 |
+
if not isinstance(response, dict):
|
| 256 |
+
raise ValueError("llama.cpp returned an unsupported response type.")
|
| 257 |
+
|
| 258 |
+
choices = response.get("choices")
|
| 259 |
+
if not isinstance(choices, list) or not choices:
|
| 260 |
+
raise ValueError("llama.cpp response did not include choices.")
|
| 261 |
+
|
| 262 |
+
first = choices[0]
|
| 263 |
+
if not isinstance(first, dict):
|
| 264 |
+
raise ValueError("llama.cpp response choice was not an object.")
|
| 265 |
+
|
| 266 |
+
message = first.get("message")
|
| 267 |
+
if isinstance(message, dict) and isinstance(message.get("content"), str):
|
| 268 |
+
return message["content"]
|
| 269 |
+
if isinstance(first.get("text"), str):
|
| 270 |
+
return first["text"]
|
| 271 |
+
raise ValueError("llama.cpp response did not include text content.")
|
| 272 |
+
|
| 273 |
+
|
| 274 |
+
def _load_llama_model(text_model_path: str) -> Any:
|
| 275 |
+
global _LLAMA_MODEL, _LLAMA_MODEL_PATH
|
| 276 |
+
|
| 277 |
+
clean_path = text_model_path.strip()
|
| 278 |
+
if not clean_path:
|
| 279 |
+
raise ValueError("TEXT_MODEL_PATH is not configured.")
|
| 280 |
+
if not Path(clean_path).exists():
|
| 281 |
+
raise FileNotFoundError(f"TEXT_MODEL_PATH does not exist: {clean_path}")
|
| 282 |
+
|
| 283 |
+
if _LLAMA_MODEL is not None and _LLAMA_MODEL_PATH == clean_path:
|
| 284 |
+
return _LLAMA_MODEL
|
| 285 |
+
|
| 286 |
+
from llama_cpp import Llama
|
| 287 |
+
|
| 288 |
+
_LLAMA_MODEL = Llama(
|
| 289 |
+
model_path=clean_path,
|
| 290 |
+
n_ctx=2048,
|
| 291 |
+
verbose=False,
|
| 292 |
+
)
|
| 293 |
+
_LLAMA_MODEL_PATH = clean_path
|
| 294 |
+
return _LLAMA_MODEL
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
def _is_llama_cpp_backend(settings: RuntimeSettings) -> bool:
|
| 298 |
+
return settings.text_backend.strip().lower() in LLAMA_CPP_BACKENDS
|
| 299 |
+
|
| 300 |
+
|
| 301 |
+
def _add_text_fallback(marker: str) -> None:
|
| 302 |
+
if marker not in _TEXT_FALLBACKS:
|
| 303 |
+
_TEXT_FALLBACKS.append(marker)
|
| 304 |
+
|
| 305 |
+
|
| 306 |
+
def _log_text_fallback(stage: str, exc: Exception) -> None:
|
| 307 |
+
print(
|
| 308 |
+
f"[Objectverse Diary] Text runtime fell back to mock during {stage}: {type(exc).__name__}",
|
| 309 |
+
flush=True,
|
| 310 |
+
)
|
| 311 |
+
|
| 312 |
+
|
| 313 |
def _character_name(object_name: str, mode: str) -> str:
|
| 314 |
compact = "".join(part.capitalize() for part in object_name.split()[:2])
|
| 315 |
suffix = {
|
src/models/vision_runner.py
CHANGED
|
@@ -1,10 +1,14 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
|
|
|
| 5 |
from pathlib import Path
|
|
|
|
| 6 |
|
|
|
|
| 7 |
from src.models.schema import ObjectInfo, ObjectUnderstanding
|
|
|
|
| 8 |
|
| 9 |
|
| 10 |
KNOWN_OBJECTS = {
|
|
@@ -19,9 +23,55 @@ KNOWN_OBJECTS = {
|
|
| 19 |
"bag": "bag",
|
| 20 |
}
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
def understand_object(image_path: str | None, description: str) -> ObjectUnderstanding:
|
| 24 |
-
"""Return
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
clean_description = description.strip()
|
| 26 |
object_name = _infer_object_name(clean_description, image_path)
|
| 27 |
features = _infer_features(clean_description, image_path)
|
|
@@ -36,6 +86,86 @@ def understand_object(image_path: str | None, description: str) -> ObjectUnderst
|
|
| 36 |
)
|
| 37 |
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
def _infer_object_name(description: str, image_path: str | None) -> str:
|
| 40 |
lowered = description.lower()
|
| 41 |
for keyword, name in KNOWN_OBJECTS.items():
|
|
|
|
| 1 |
+
"""Object understanding runtime for mock and MiniCPM-V backends."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
+
from dataclasses import dataclass
|
| 6 |
from pathlib import Path
|
| 7 |
+
from typing import Any
|
| 8 |
|
| 9 |
+
from src.config import RuntimeSettings, get_runtime_settings
|
| 10 |
from src.models.schema import ObjectInfo, ObjectUnderstanding
|
| 11 |
+
from src.utils.json_repair import parse_json_object
|
| 12 |
|
| 13 |
|
| 14 |
KNOWN_OBJECTS = {
|
|
|
|
| 23 |
"bag": "bag",
|
| 24 |
}
|
| 25 |
|
| 26 |
+
MINICPM_DEFAULT_MODEL_ID = "openbmb/MiniCPM-V-2_6"
|
| 27 |
+
MINICPM_BACKENDS = {"minicpm-v", "minicpm_v", "minicpmv"}
|
| 28 |
+
|
| 29 |
+
_MINICPM_MODEL: Any | None = None
|
| 30 |
+
_MINICPM_TOKENIZER: Any | None = None
|
| 31 |
+
_MINICPM_MODEL_ID: str | None = None
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
@dataclass(frozen=True)
|
| 35 |
+
class VisionRunResult:
|
| 36 |
+
object_understanding: ObjectUnderstanding
|
| 37 |
+
fallbacks: list[str]
|
| 38 |
+
|
| 39 |
|
| 40 |
def understand_object(image_path: str | None, description: str) -> ObjectUnderstanding:
|
| 41 |
+
"""Return object understanding without exposing runtime metadata."""
|
| 42 |
+
return understand_object_with_metadata(image_path, description).object_understanding
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
def understand_object_with_metadata(
|
| 46 |
+
image_path: str | None,
|
| 47 |
+
description: str,
|
| 48 |
+
*,
|
| 49 |
+
settings: RuntimeSettings | None = None,
|
| 50 |
+
) -> VisionRunResult:
|
| 51 |
+
current = settings or get_runtime_settings()
|
| 52 |
+
backend = current.vision_backend.strip().lower()
|
| 53 |
+
|
| 54 |
+
if backend == "mock":
|
| 55 |
+
return VisionRunResult(_understand_object_mock(image_path, description), [])
|
| 56 |
+
|
| 57 |
+
if backend in MINICPM_BACKENDS:
|
| 58 |
+
try:
|
| 59 |
+
return VisionRunResult(_understand_object_minicpm(image_path, description, current), [])
|
| 60 |
+
except Exception as exc:
|
| 61 |
+
_log_vision_fallback("minicpm-v", exc)
|
| 62 |
+
return VisionRunResult(
|
| 63 |
+
_understand_object_mock(image_path, description),
|
| 64 |
+
["vision-fallback-to-mock"],
|
| 65 |
+
)
|
| 66 |
+
|
| 67 |
+
return VisionRunResult(
|
| 68 |
+
_understand_object_mock(image_path, description),
|
| 69 |
+
[f"unknown-vision-backend-{backend}-fallback-to-mock"],
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
def _understand_object_mock(image_path: str | None, description: str) -> ObjectUnderstanding:
|
| 74 |
+
"""Return deterministic mock object understanding for fallback-safe demos."""
|
| 75 |
clean_description = description.strip()
|
| 76 |
object_name = _infer_object_name(clean_description, image_path)
|
| 77 |
features = _infer_features(clean_description, image_path)
|
|
|
|
| 86 |
)
|
| 87 |
|
| 88 |
|
| 89 |
+
def _understand_object_minicpm(
|
| 90 |
+
image_path: str | None,
|
| 91 |
+
description: str,
|
| 92 |
+
settings: RuntimeSettings,
|
| 93 |
+
) -> ObjectUnderstanding:
|
| 94 |
+
if not image_path:
|
| 95 |
+
raise ValueError("MiniCPM-V requires an uploaded image.")
|
| 96 |
+
|
| 97 |
+
model_id = settings.vision_model_id or MINICPM_DEFAULT_MODEL_ID
|
| 98 |
+
model, tokenizer = _load_minicpm_components(model_id)
|
| 99 |
+
image = _load_rgb_image(image_path)
|
| 100 |
+
prompt = _object_understanding_prompt(description)
|
| 101 |
+
messages = [{"role": "user", "content": [image, prompt]}]
|
| 102 |
+
raw = model.chat(image=None, msgs=messages, tokenizer=tokenizer)
|
| 103 |
+
if isinstance(raw, tuple):
|
| 104 |
+
raw = raw[0]
|
| 105 |
+
|
| 106 |
+
payload = parse_json_object(str(raw))
|
| 107 |
+
return ObjectUnderstanding.model_validate(payload)
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
def _load_minicpm_components(model_id: str) -> tuple[Any, Any]:
|
| 111 |
+
global _MINICPM_MODEL, _MINICPM_TOKENIZER, _MINICPM_MODEL_ID
|
| 112 |
+
|
| 113 |
+
if _MINICPM_MODEL is not None and _MINICPM_TOKENIZER is not None and _MINICPM_MODEL_ID == model_id:
|
| 114 |
+
return _MINICPM_MODEL, _MINICPM_TOKENIZER
|
| 115 |
+
|
| 116 |
+
import torch
|
| 117 |
+
from transformers import AutoModel, AutoTokenizer
|
| 118 |
+
|
| 119 |
+
model_kwargs: dict[str, Any] = {
|
| 120 |
+
"trust_remote_code": True,
|
| 121 |
+
"torch_dtype": torch.bfloat16,
|
| 122 |
+
}
|
| 123 |
+
try:
|
| 124 |
+
model_kwargs["attn_implementation"] = "sdpa"
|
| 125 |
+
model = AutoModel.from_pretrained(model_id, **model_kwargs)
|
| 126 |
+
except TypeError:
|
| 127 |
+
model_kwargs.pop("attn_implementation", None)
|
| 128 |
+
model = AutoModel.from_pretrained(model_id, **model_kwargs)
|
| 129 |
+
|
| 130 |
+
if torch.cuda.is_available():
|
| 131 |
+
model = model.eval().cuda()
|
| 132 |
+
elif getattr(torch.backends, "mps", None) and torch.backends.mps.is_available():
|
| 133 |
+
model = model.eval().to(device="mps", dtype=torch.float16)
|
| 134 |
+
else:
|
| 135 |
+
model = model.eval()
|
| 136 |
+
|
| 137 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
| 138 |
+
_MINICPM_MODEL = model
|
| 139 |
+
_MINICPM_TOKENIZER = tokenizer
|
| 140 |
+
_MINICPM_MODEL_ID = model_id
|
| 141 |
+
return model, tokenizer
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
def _load_rgb_image(image_path: str) -> Any:
|
| 145 |
+
from PIL import Image
|
| 146 |
+
|
| 147 |
+
return Image.open(image_path).convert("RGB")
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
def _object_understanding_prompt(description: str) -> str:
|
| 151 |
+
context = description.strip() or "No user description was provided."
|
| 152 |
+
return (
|
| 153 |
+
"You are the vision module for Objectverse Diary. Inspect the uploaded everyday object photo. "
|
| 154 |
+
"Return only valid JSON with exactly this shape: "
|
| 155 |
+
'{"object":{"name":"short object name","visible_features":["feature 1","feature 2","feature 3"],'
|
| 156 |
+
'"likely_context":"where this object probably is","confidence":0.0}}. '
|
| 157 |
+
"Use 3 to 5 concrete visible_features. confidence must be a number from 0 to 1. "
|
| 158 |
+
f"Optional user context: {context}"
|
| 159 |
+
)
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
def _log_vision_fallback(backend: str, exc: Exception) -> None:
|
| 163 |
+
print(
|
| 164 |
+
f"[Objectverse Diary] Vision backend '{backend}' fell back to mock: {type(exc).__name__}",
|
| 165 |
+
flush=True,
|
| 166 |
+
)
|
| 167 |
+
|
| 168 |
+
|
| 169 |
def _infer_object_name(description: str, image_path: str | None) -> str:
|
| 170 |
lowered = description.lower()
|
| 171 |
for keyword, name in KNOWN_OBJECTS.items():
|
src/pipeline.py
CHANGED
|
@@ -5,10 +5,15 @@ from __future__ import annotations
|
|
| 5 |
from datetime import datetime
|
| 6 |
from pathlib import Path
|
| 7 |
|
| 8 |
-
from src.config import TRACE_DIR
|
| 9 |
-
from src.models.llama_cpp_runner import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
from src.models.schema import GenerationResult
|
| 11 |
-
from src.models.vision_runner import
|
| 12 |
from src.traces.logger import build_trace, save_trace
|
| 13 |
|
| 14 |
|
|
@@ -22,9 +27,13 @@ def generate_object_diary(
|
|
| 22 |
trace_id: str | None = None,
|
| 23 |
created_at: datetime | None = None,
|
| 24 |
) -> GenerationResult:
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
| 26 |
persona = generate_persona(object_understanding, mode)
|
| 27 |
diary = generate_diary(persona, mode)
|
|
|
|
| 28 |
trace = build_trace(
|
| 29 |
image_path=image_path,
|
| 30 |
description=description,
|
|
@@ -34,6 +43,13 @@ def generate_object_diary(
|
|
| 34 |
diary=diary,
|
| 35 |
trace_id=trace_id,
|
| 36 |
created_at=created_at,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
)
|
| 38 |
trace_path = save_trace(trace, trace_dir) if save else ""
|
| 39 |
|
|
@@ -48,3 +64,25 @@ def generate_object_diary(
|
|
| 48 |
|
| 49 |
def format_diary_markdown(title: str, english: str, chinese: str) -> str:
|
| 50 |
return f"## {title}\n\n{english}\n\n---\n\n**中文辅助**\n\n{chinese}"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
from datetime import datetime
|
| 6 |
from pathlib import Path
|
| 7 |
|
| 8 |
+
from src.config import TRACE_DIR, get_runtime_settings, runtime_status
|
| 9 |
+
from src.models.llama_cpp_runner import (
|
| 10 |
+
generate_diary,
|
| 11 |
+
generate_persona,
|
| 12 |
+
get_text_runtime_fallbacks,
|
| 13 |
+
reset_text_runtime_fallbacks,
|
| 14 |
+
)
|
| 15 |
from src.models.schema import GenerationResult
|
| 16 |
+
from src.models.vision_runner import VisionRunResult, understand_object_with_metadata
|
| 17 |
from src.traces.logger import build_trace, save_trace
|
| 18 |
|
| 19 |
|
|
|
|
| 27 |
trace_id: str | None = None,
|
| 28 |
created_at: datetime | None = None,
|
| 29 |
) -> GenerationResult:
|
| 30 |
+
settings = get_runtime_settings()
|
| 31 |
+
vision_result = understand_object_with_metadata(image_path, description, settings=settings)
|
| 32 |
+
object_understanding = vision_result.object_understanding
|
| 33 |
+
reset_text_runtime_fallbacks()
|
| 34 |
persona = generate_persona(object_understanding, mode)
|
| 35 |
diary = generate_diary(persona, mode)
|
| 36 |
+
text_fallbacks = get_text_runtime_fallbacks()
|
| 37 |
trace = build_trace(
|
| 38 |
image_path=image_path,
|
| 39 |
description=description,
|
|
|
|
| 43 |
diary=diary,
|
| 44 |
trace_id=trace_id,
|
| 45 |
created_at=created_at,
|
| 46 |
+
model_runtime=runtime_status(settings),
|
| 47 |
+
fallbacks=_runtime_fallbacks(
|
| 48 |
+
settings.vision_backend,
|
| 49 |
+
settings.text_backend,
|
| 50 |
+
vision_result,
|
| 51 |
+
text_fallbacks,
|
| 52 |
+
),
|
| 53 |
)
|
| 54 |
trace_path = save_trace(trace, trace_dir) if save else ""
|
| 55 |
|
|
|
|
| 64 |
|
| 65 |
def format_diary_markdown(title: str, english: str, chinese: str) -> str:
|
| 66 |
return f"## {title}\n\n{english}\n\n---\n\n**中文辅助**\n\n{chinese}"
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def _runtime_fallbacks(
|
| 70 |
+
vision_backend: str,
|
| 71 |
+
text_backend: str,
|
| 72 |
+
vision_result: VisionRunResult,
|
| 73 |
+
text_fallbacks: list[str] | None = None,
|
| 74 |
+
) -> list[str]:
|
| 75 |
+
clean_vision_backend = vision_backend.strip().lower()
|
| 76 |
+
clean_text_backend = text_backend.strip().lower()
|
| 77 |
+
if clean_vision_backend == "mock" and clean_text_backend == "mock":
|
| 78 |
+
return ["mock-runtime"]
|
| 79 |
+
|
| 80 |
+
fallbacks = list(vision_result.fallbacks)
|
| 81 |
+
for marker in text_fallbacks or []:
|
| 82 |
+
if marker not in fallbacks:
|
| 83 |
+
fallbacks.append(marker)
|
| 84 |
+
if clean_vision_backend == "mock":
|
| 85 |
+
fallbacks.append("mock-vision-runtime")
|
| 86 |
+
if clean_text_backend == "mock":
|
| 87 |
+
fallbacks.append("mock-text-runtime")
|
| 88 |
+
return fallbacks
|
src/prompts/diary_generation.py
CHANGED
|
@@ -1,6 +1,32 @@
|
|
| 1 |
-
"""Prompt
|
| 2 |
|
| 3 |
DIARY_GENERATION_PROMPT = """
|
| 4 |
-
Write a short secret diary entry
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
""".strip()
|
|
|
|
| 1 |
+
"""Prompt templates for diary and chat generation."""
|
| 2 |
|
| 3 |
DIARY_GENERATION_PROMPT = """
|
| 4 |
+
Write a short secret diary entry for the object persona. Return only valid JSON
|
| 5 |
+
with exactly this shape:
|
| 6 |
+
|
| 7 |
+
{
|
| 8 |
+
"title": "Secret Diary - Day N",
|
| 9 |
+
"english": "one vivid English-first diary paragraph",
|
| 10 |
+
"chinese": "short Chinese helper translation"
|
| 11 |
+
}
|
| 12 |
+
|
| 13 |
+
Rules:
|
| 14 |
+
- Keep the persona consistent with the supplied persona JSON.
|
| 15 |
+
- Keep the English diary under 120 words.
|
| 16 |
+
- The Chinese text is secondary helper copy, not the primary UI language.
|
| 17 |
+
- Do not include markdown, commentary, or extra keys.
|
| 18 |
+
""".strip()
|
| 19 |
+
|
| 20 |
+
CHAT_REPLY_PROMPT = """
|
| 21 |
+
Reply as the object persona to the user's message. Return only valid JSON with
|
| 22 |
+
exactly this shape:
|
| 23 |
+
|
| 24 |
+
{
|
| 25 |
+
"reply": "one short in-character chat reply"
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
Rules:
|
| 29 |
+
- Stay consistent with the persona JSON.
|
| 30 |
+
- Keep the reply under 70 words.
|
| 31 |
+
- Do not include markdown, commentary, or extra keys.
|
| 32 |
""".strip()
|
src/prompts/persona_generation.py
CHANGED
|
@@ -1,7 +1,27 @@
|
|
| 1 |
-
"""Prompt
|
| 2 |
|
| 3 |
PERSONA_GENERATION_PROMPT = """
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
""".strip()
|
|
|
|
| 1 |
+
"""Prompt templates for persona generation."""
|
| 2 |
|
| 3 |
PERSONA_GENERATION_PROMPT = """
|
| 4 |
+
You are the text runtime for Objectverse Diary, a strange archive of everyday
|
| 5 |
+
objects with secret lives.
|
| 6 |
+
|
| 7 |
+
Create a hidden first-person object persona from the object understanding JSON
|
| 8 |
+
and personality mode. Return only valid JSON with exactly this shape:
|
| 9 |
+
|
| 10 |
+
{
|
| 11 |
+
"persona": {
|
| 12 |
+
"object_name": "short object name",
|
| 13 |
+
"character_name": "archive character name",
|
| 14 |
+
"mood": "short mood phrase",
|
| 15 |
+
"secret_fear": "one vivid fear",
|
| 16 |
+
"core_memory": "one sentence backstory",
|
| 17 |
+
"complaint": "one sentence complaint in the object's voice",
|
| 18 |
+
"tags": ["tag one", "tag two", "tag three"]
|
| 19 |
+
}
|
| 20 |
+
}
|
| 21 |
+
|
| 22 |
+
Rules:
|
| 23 |
+
- Keep the persona consistent with the visible object features.
|
| 24 |
+
- Use English output.
|
| 25 |
+
- Use exactly three tags.
|
| 26 |
+
- Do not include markdown, commentary, or extra keys.
|
| 27 |
""".strip()
|
src/traces/logger.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""Trace builder and saver for
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
|
@@ -7,7 +7,7 @@ from datetime import datetime, timezone
|
|
| 7 |
from pathlib import Path
|
| 8 |
from uuid import uuid4
|
| 9 |
|
| 10 |
-
from src.config import
|
| 11 |
from src.models.schema import DiaryEntry, ObjectUnderstanding, PersonaEnvelope, TraceRecord
|
| 12 |
from src.traces.anonymizer import anonymize_text
|
| 13 |
|
|
@@ -21,6 +21,8 @@ def build_trace(
|
|
| 21 |
diary: DiaryEntry,
|
| 22 |
trace_id: str | None = None,
|
| 23 |
created_at: datetime | None = None,
|
|
|
|
|
|
|
| 24 |
) -> TraceRecord:
|
| 25 |
return TraceRecord(
|
| 26 |
trace_id=trace_id or uuid4().hex,
|
|
@@ -34,8 +36,8 @@ def build_trace(
|
|
| 34 |
object_understanding=object_understanding,
|
| 35 |
persona=persona,
|
| 36 |
diary=diary,
|
| 37 |
-
model_runtime=
|
| 38 |
-
fallbacks=["mock-runtime"],
|
| 39 |
)
|
| 40 |
|
| 41 |
|
|
|
|
| 1 |
+
"""Trace builder and saver for generation runs."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
|
|
|
| 7 |
from pathlib import Path
|
| 8 |
from uuid import uuid4
|
| 9 |
|
| 10 |
+
from src.config import TRACE_DIR, get_runtime_settings, runtime_status
|
| 11 |
from src.models.schema import DiaryEntry, ObjectUnderstanding, PersonaEnvelope, TraceRecord
|
| 12 |
from src.traces.anonymizer import anonymize_text
|
| 13 |
|
|
|
|
| 21 |
diary: DiaryEntry,
|
| 22 |
trace_id: str | None = None,
|
| 23 |
created_at: datetime | None = None,
|
| 24 |
+
model_runtime: dict[str, str] | None = None,
|
| 25 |
+
fallbacks: list[str] | None = None,
|
| 26 |
) -> TraceRecord:
|
| 27 |
return TraceRecord(
|
| 28 |
trace_id=trace_id or uuid4().hex,
|
|
|
|
| 36 |
object_understanding=object_understanding,
|
| 37 |
persona=persona,
|
| 38 |
diary=diary,
|
| 39 |
+
model_runtime=model_runtime or runtime_status(get_runtime_settings()),
|
| 40 |
+
fallbacks=fallbacks if fallbacks is not None else ["mock-runtime"],
|
| 41 |
)
|
| 42 |
|
| 43 |
|
src/ui/layout.py
CHANGED
|
@@ -15,6 +15,7 @@ from src.models.schema import GenerationResult
|
|
| 15 |
from src.pipeline import format_diary_markdown, generate_object_diary
|
| 16 |
from src.renderer.share_card import render_share_card
|
| 17 |
from src.ui import copy
|
|
|
|
| 18 |
|
| 19 |
CHAT_EMPTY_MESSAGE = "Wake an object first. / 请先唤醒一个物品。"
|
| 20 |
|
|
@@ -234,6 +235,7 @@ def _example_handler(index: int):
|
|
| 234 |
return load_example
|
| 235 |
|
| 236 |
|
|
|
|
| 237 |
def generate_object_file(
|
| 238 |
image_path: str | None,
|
| 239 |
description: str,
|
|
|
|
| 15 |
from src.pipeline import format_diary_markdown, generate_object_diary
|
| 16 |
from src.renderer.share_card import render_share_card
|
| 17 |
from src.ui import copy
|
| 18 |
+
from src.utils.zero_gpu import zero_gpu
|
| 19 |
|
| 20 |
CHAT_EMPTY_MESSAGE = "Wake an object first. / 请先唤醒一个物品。"
|
| 21 |
|
|
|
|
| 235 |
return load_example
|
| 236 |
|
| 237 |
|
| 238 |
+
@zero_gpu(duration=180)
|
| 239 |
def generate_object_file(
|
| 240 |
image_path: str | None,
|
| 241 |
description: str,
|
src/utils/json_repair.py
CHANGED
|
@@ -7,7 +7,24 @@ from typing import Any
|
|
| 7 |
|
| 8 |
|
| 9 |
def parse_json_object(raw: str) -> dict[str, Any]:
|
| 10 |
-
value = json.loads(raw)
|
| 11 |
if not isinstance(value, dict):
|
| 12 |
raise ValueError("Expected a JSON object.")
|
| 13 |
return value
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
|
| 9 |
def parse_json_object(raw: str) -> dict[str, Any]:
|
| 10 |
+
value = json.loads(_extract_json_object(raw))
|
| 11 |
if not isinstance(value, dict):
|
| 12 |
raise ValueError("Expected a JSON object.")
|
| 13 |
return value
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def _extract_json_object(raw: str) -> str:
|
| 17 |
+
clean = raw.strip()
|
| 18 |
+
if clean.startswith("```"):
|
| 19 |
+
clean = clean.strip("`").strip()
|
| 20 |
+
if clean.lower().startswith("json"):
|
| 21 |
+
clean = clean[4:].strip()
|
| 22 |
+
|
| 23 |
+
if clean.startswith("{") and clean.endswith("}"):
|
| 24 |
+
return clean
|
| 25 |
+
|
| 26 |
+
start = clean.find("{")
|
| 27 |
+
end = clean.rfind("}")
|
| 28 |
+
if start == -1 or end == -1 or end <= start:
|
| 29 |
+
raise ValueError("No JSON object found.")
|
| 30 |
+
return clean[start : end + 1]
|
src/utils/zero_gpu.py
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Optional Hugging Face ZeroGPU decorator helpers."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from collections.abc import Callable
|
| 6 |
+
from typing import TypeVar
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
F = TypeVar("F", bound=Callable)
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def zero_gpu(duration: int = 180) -> Callable[[F], F]:
|
| 13 |
+
"""Return a ZeroGPU decorator when available, otherwise a no-op decorator."""
|
| 14 |
+
try:
|
| 15 |
+
import spaces # type: ignore[import-not-found]
|
| 16 |
+
except Exception:
|
| 17 |
+
return _identity_decorator
|
| 18 |
+
|
| 19 |
+
return spaces.GPU(duration=duration)
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def _identity_decorator(func: F) -> F:
|
| 23 |
+
return func
|