File size: 10,676 Bytes
bc02199
 
 
 
 
 
 
 
 
 
e20e3d9
bc02199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e20e3d9
 
 
 
 
bc02199
 
 
e20e3d9
1e2c036
e20e3d9
bc02199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1e2c036
e20e3d9
bc02199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1e2c036
 
bc02199
 
 
 
 
dd6cefc
e20e3d9
bc02199
 
e20e3d9
 
bc02199
 
 
 
 
 
 
 
 
dd6cefc
bc02199
 
 
 
dd6cefc
bc02199
 
 
 
 
 
 
e20e3d9
 
bc02199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e20e3d9
 
bc02199
 
e20e3d9
 
 
 
 
bc02199
 
 
 
 
 
 
 
 
e20e3d9
bc02199
e20e3d9
bc02199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# Objectverse Diary β€” Detailed Development Plan

## Purpose

This document turns the day-by-day schedule into an execution plan for completing Objectverse Diary from the initial mock MVP to hackathon submission.

The plan is intentionally staged. Each phase has a clear goal, implementation scope, verification method, and exit criteria.

## Current Baseline

As of 2026-06-06, the project has:

- initialized project structure
- root README and AGENTS instructions
- `.codex/skills/` project guidance
- initial Gradio mock MVP
- six stable example objects
- mock object understanding JSON
- mock persona and diary generation
- object chat with mock persona consistency
- share card HTML preview
- anonymized trace JSON saving under `data/traces/`
- six stable public mock traces under `data/traces/samples/`
- deterministic SFT preview generator and dataset plan
- public trace JSONL exporter
- failure notes template
- `scripts/generate_sample_traces.py`
- `scripts/generate_dataset.py`
- `scripts/export_traces.py`
- stdlib unittest smoke tests for the mock MVP
- runtime configuration boundary documented in `docs/RUNTIME.md`
- initial-stage acceptance script at `scripts/check_initial_stage.py`
- Hugging Face Space created at `build-small-hackathon/ObjectverseDiary`
- optional MiniCPM-V 2.6 vision backend wiring with mock fallback
- optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`
- hosted Space VLM validation tooling in `scripts/check_space_vlm.py`
- pending Space VLM report template in `docs/SPACE_VLM_REPORT.md`

Not yet done:

- GitHub repo sync / public submission confirmation
- hosted Space MiniCPM-V validation with real public images
- real GGUF selection and local `TEXT_MODEL_PATH` smoke test
- real curated dataset
- LoRA fine-tuning
- model card completion
- Field Notes article
- demo video
- final submission package

## Phase 1 β€” Initial Mock MVP

Goal: validate the product loop before model integration.

Scope:

- Build `app.py` entrypoint.
- Build Gradio Blocks UI.
- Support image upload and optional text description.
- Add personality mode selection.
- Add six stable example objects.
- Produce deterministic mock object JSON.
- Produce deterministic mock persona JSON.
- Produce English-first diary with Chinese helper translation.
- Support chat replies using the generated persona.
- Render a share card preview.
- Save anonymized trace JSON.

Exit criteria:

- `python app.py` starts a Gradio app.
- User can complete `Upload -> Generate -> Diary -> Share Card -> Trace`.
- Trace JSON is saved locally.
- No commercial model APIs are used.

Verification:

- Import smoke test for `app`.
- Direct function smoke test for generation flow.
- `unittest` smoke tests for mock flow, chat, share card, trace save, and anonymization.
- Sample trace generation script writes six stable trace files.
- Dataset preview script writes deterministic mock SFT preview JSONL.
- Trace export script writes validated public trace JSONL.
- `scripts/check_initial_stage.py` validates required initial-stage artifacts.
- Manual Gradio preview.

## Phase 2 β€” UI Polish And Example Gallery

Goal: make the app feel like an object archive instead of a default Gradio demo.

Scope:

- Refine `src/ui/styles.css`.
- Reference the design images under `UI 参考/` for visual direction.
- Keep content, interaction flow, language hierarchy, and feature scope aligned with `docs/`.
- Keep six stable example objects visible in the UI.
- Add clearer empty states and error states.
- Improve mobile layout.
- Keep UI English-first and Chinese-second.

Exit criteria:

- 1366px desktop layout is usable.
- Mobile-width layout is usable.
- Example gallery can reproduce stable outputs.
- Share card is readable and screenshot-friendly.

Verification:

- Manual browser preview.
- Screenshot review at desktop and mobile widths.
- Example generation for at least six objects.

## Phase 3 β€” Vision Understanding

Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.

Status: local wiring complete; hosted ZeroGPU validation reaches the app but falls back to mock vision.

Scope:

- Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
- Keep manual description fallback.
- Validate object understanding JSON with schemas.
- Add JSON repair or retry behavior.
- Cache stable examples for demo reliability.

Exit criteria:

- Uploaded object photos produce structured object JSON.
- Cups, keyboards, and shoes are recognized with useful visible features.
- Fallback path works when VLM fails.

Verification:

- Run local sample image checks.
- Confirm schema validation.
- Confirm fallback trace markers.
- Run `scripts/check_space_vlm.py --configure-space --hardware zero-a10g --rollback-to-mock` after external-state confirmation.
- Inspect Space runtime logs or add non-secret diagnostics before rerunning, because the 2026-06-08 hosted validation returned `vision-fallback-to-mock` for mug, keyboard, and shoe.

## Phase 4 β€” Text Runtime With llama.cpp

Goal: make persona, diary, and chat generation use a small local text model runtime.

Status: optional runtime wiring complete; published LoRA v2 Q4_K_M GGUF passed local llama.cpp smoke. Hosted Space text runtime validation is still pending.

Scope:

- Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
- Add model path configuration. Completed through `TEXT_MODEL_PATH`.
- Preserve `src/pipeline.py` as the UI-independent generation boundary.
- Implement persona generation.
- Implement diary generation.
- Implement chat continuation.
- Keep deterministic mock fallback for demos.

Exit criteria:

- Text generation can run through llama.cpp or documented local fallback.
- README documents runtime path and published GGUF selection.
- Trace records include runtime metadata.

Verification:

- Local runtime smoke test with `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`.
- JSON schema validation.
- Compare at least three object generations for persona consistency.

## Phase 5 β€” Dataset And Fine-Tuning Preparation

Goal: prepare Well-Tuned badge evidence.

Status: mock SFT preview complete; real candidate generation waits for verified model paths.

Scope:

- Use `scripts/generate_dataset.py` to validate the SFT schema locally.
- Generate 200-500 object-persona candidate samples after real model path is available.
- Manually curate at least 50 high-quality examples.
- Define SFT schema.
- Prepare dataset preview.
- Draft dataset privacy notes.

Exit criteria:

- Mock SFT preview exists and parses as JSONL.
- Training dataset is structured and inspectable.
- Public examples contain no private data.
- Dataset card draft exists.

Verification:

- Validate JSONL format.
- Spot-check curated samples.
- Confirm no obvious sensitive data.

## Phase 6 β€” LoRA Fine-Tuning And Model Card

Goal: publish a small fine-tuned model or adapter that can be linked in submission materials.

Scope:

- Run LoRA training with Modal or local resources.
- Export adapter or merged model.
- Convert to GGUF if needed.
- Publish HF model repo.
- Complete `docs/MODEL_CARD.md`.

Exit criteria:

- Fine-tuned model repo exists.
- Model parameter count is documented.
- Runtime instructions are documented.

Verification:

- Run inference on sample prompts.
- Confirm HF model links.
- Confirm no private credit codes or tokens are present.

## Phase 7 β€” Public Traces And Reproducibility

Goal: satisfy Sharing is Caring expectations.

Scope:

- Produce at least six public traces.
- Keep `data/traces/samples/` in sync with the six example objects.
- Export public traces to JSONL for dataset-style sharing.
- Add prompt templates.
- Add dataset preview.
- Document failures and fallbacks.
- Ensure trace anonymization.

Exit criteria:

- Public trace files are readable JSON.
- Trace docs explain how outputs were produced.
- Example gallery aligns with public traces.

Verification:

- Validate trace JSON.
- Inspect anonymization.
- Confirm README links.

## Phase 8 β€” Hugging Face Space Deployment

Goal: deploy the app in the required Gradio format.

Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.

Scope:

- Create Hugging Face Space. Completed.
- Add Space README YAML header. Completed.
- Confirm `app_file: app.py`. Completed.
- Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
- Check runtime resource constraints. Pending L4 validation.

Exit criteria:

- Space opens publicly or under the official hackathon organization.
- App can generate at least stable demo examples.
- README includes deployment and model notes.

Verification:

- Launch on HF Space. Completed for mock-safe runtime.
- Run demo flow in hosted environment.
- Run Space VLM validation for mug, keyboard, and shoe.
- Check logs for missing secrets or path errors.

## Phase 9 β€” Field Notes And Demo Video

Goal: complete narrative submission assets.

Scope:

- Write Field Notes article.
- Record demo video under 2 minutes.
- Prepare social post.
- Add badge evidence to README.

Exit criteria:

- Field Notes URL exists.
- Demo video URL exists.
- Social post URL exists.
- Submission package has all required links.

Verification:

- Watch final video.
- Check all URLs.
- Confirm README and submission guide are aligned.

## Phase 10 β€” Final Submission Audit

Goal: reduce avoidable submission risk.

Checklist:

- [ ] Space under official organization.
- [ ] Demo video ready.
- [ ] Social post ready.
- [ ] README complete.
- [ ] Model parameter count documented.
- [ ] No commercial cloud AI API.
- [ ] Fine-tuned model linked.
- [ ] Dataset linked.
- [ ] Traces linked.
- [ ] Field Notes linked.
- [ ] UI English-first and Chinese-second.
- [ ] Submit before June 15, 2026.

## Risk Register

| Risk | Impact | Mitigation |
| --- | --- | --- |
| VLM deployment is slow | Blocks real image understanding | Keep manual description and example gallery fallback |
| llama.cpp setup is unstable | Blocks Llama Champion badge | Use text mock fallback for demo while isolating runtime work |
| Fine-tuning takes too long | Weakens Well-Tuned badge | Prepare small curated dataset and prompt-tuned fallback |
| HF Space resources are limited | Demo may be slow | Cache examples and support CPU fallback |
| Trace contains private data | Submission/privacy risk | Anonymize trace input and avoid raw private images |

## Working Rule

Do not start a later phase by breaking an earlier verified flow. The mock MVP should remain usable while real model paths are added behind clear fallbacks.