Spaces:
Running on Zero
Running on Zero
File size: 10,676 Bytes
bc02199 e20e3d9 bc02199 e20e3d9 bc02199 e20e3d9 1e2c036 e20e3d9 bc02199 1e2c036 e20e3d9 bc02199 1e2c036 bc02199 dd6cefc e20e3d9 bc02199 e20e3d9 bc02199 dd6cefc bc02199 dd6cefc bc02199 e20e3d9 bc02199 e20e3d9 bc02199 e20e3d9 bc02199 e20e3d9 bc02199 e20e3d9 bc02199 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 | # Objectverse Diary β Detailed Development Plan
## Purpose
This document turns the day-by-day schedule into an execution plan for completing Objectverse Diary from the initial mock MVP to hackathon submission.
The plan is intentionally staged. Each phase has a clear goal, implementation scope, verification method, and exit criteria.
## Current Baseline
As of 2026-06-06, the project has:
- initialized project structure
- root README and AGENTS instructions
- `.codex/skills/` project guidance
- initial Gradio mock MVP
- six stable example objects
- mock object understanding JSON
- mock persona and diary generation
- object chat with mock persona consistency
- share card HTML preview
- anonymized trace JSON saving under `data/traces/`
- six stable public mock traces under `data/traces/samples/`
- deterministic SFT preview generator and dataset plan
- public trace JSONL exporter
- failure notes template
- `scripts/generate_sample_traces.py`
- `scripts/generate_dataset.py`
- `scripts/export_traces.py`
- stdlib unittest smoke tests for the mock MVP
- runtime configuration boundary documented in `docs/RUNTIME.md`
- initial-stage acceptance script at `scripts/check_initial_stage.py`
- Hugging Face Space created at `build-small-hackathon/ObjectverseDiary`
- optional MiniCPM-V 2.6 vision backend wiring with mock fallback
- optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`
- hosted Space VLM validation tooling in `scripts/check_space_vlm.py`
- pending Space VLM report template in `docs/SPACE_VLM_REPORT.md`
Not yet done:
- GitHub repo sync / public submission confirmation
- hosted Space MiniCPM-V validation with real public images
- real GGUF selection and local `TEXT_MODEL_PATH` smoke test
- real curated dataset
- LoRA fine-tuning
- model card completion
- Field Notes article
- demo video
- final submission package
## Phase 1 β Initial Mock MVP
Goal: validate the product loop before model integration.
Scope:
- Build `app.py` entrypoint.
- Build Gradio Blocks UI.
- Support image upload and optional text description.
- Add personality mode selection.
- Add six stable example objects.
- Produce deterministic mock object JSON.
- Produce deterministic mock persona JSON.
- Produce English-first diary with Chinese helper translation.
- Support chat replies using the generated persona.
- Render a share card preview.
- Save anonymized trace JSON.
Exit criteria:
- `python app.py` starts a Gradio app.
- User can complete `Upload -> Generate -> Diary -> Share Card -> Trace`.
- Trace JSON is saved locally.
- No commercial model APIs are used.
Verification:
- Import smoke test for `app`.
- Direct function smoke test for generation flow.
- `unittest` smoke tests for mock flow, chat, share card, trace save, and anonymization.
- Sample trace generation script writes six stable trace files.
- Dataset preview script writes deterministic mock SFT preview JSONL.
- Trace export script writes validated public trace JSONL.
- `scripts/check_initial_stage.py` validates required initial-stage artifacts.
- Manual Gradio preview.
## Phase 2 β UI Polish And Example Gallery
Goal: make the app feel like an object archive instead of a default Gradio demo.
Scope:
- Refine `src/ui/styles.css`.
- Reference the design images under `UI εθ/` for visual direction.
- Keep content, interaction flow, language hierarchy, and feature scope aligned with `docs/`.
- Keep six stable example objects visible in the UI.
- Add clearer empty states and error states.
- Improve mobile layout.
- Keep UI English-first and Chinese-second.
Exit criteria:
- 1366px desktop layout is usable.
- Mobile-width layout is usable.
- Example gallery can reproduce stable outputs.
- Share card is readable and screenshot-friendly.
Verification:
- Manual browser preview.
- Screenshot review at desktop and mobile widths.
- Example generation for at least six objects.
## Phase 3 β Vision Understanding
Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
Status: local wiring complete; hosted ZeroGPU validation reaches the app but falls back to mock vision.
Scope:
- Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
- Keep manual description fallback.
- Validate object understanding JSON with schemas.
- Add JSON repair or retry behavior.
- Cache stable examples for demo reliability.
Exit criteria:
- Uploaded object photos produce structured object JSON.
- Cups, keyboards, and shoes are recognized with useful visible features.
- Fallback path works when VLM fails.
Verification:
- Run local sample image checks.
- Confirm schema validation.
- Confirm fallback trace markers.
- Run `scripts/check_space_vlm.py --configure-space --hardware zero-a10g --rollback-to-mock` after external-state confirmation.
- Inspect Space runtime logs or add non-secret diagnostics before rerunning, because the 2026-06-08 hosted validation returned `vision-fallback-to-mock` for mug, keyboard, and shoe.
## Phase 4 β Text Runtime With llama.cpp
Goal: make persona, diary, and chat generation use a small local text model runtime.
Status: optional runtime wiring complete; published LoRA v2 Q4_K_M GGUF passed local llama.cpp smoke. Hosted Space text runtime validation is still pending.
Scope:
- Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
- Add model path configuration. Completed through `TEXT_MODEL_PATH`.
- Preserve `src/pipeline.py` as the UI-independent generation boundary.
- Implement persona generation.
- Implement diary generation.
- Implement chat continuation.
- Keep deterministic mock fallback for demos.
Exit criteria:
- Text generation can run through llama.cpp or documented local fallback.
- README documents runtime path and published GGUF selection.
- Trace records include runtime metadata.
Verification:
- Local runtime smoke test with `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`.
- JSON schema validation.
- Compare at least three object generations for persona consistency.
## Phase 5 β Dataset And Fine-Tuning Preparation
Goal: prepare Well-Tuned badge evidence.
Status: mock SFT preview complete; real candidate generation waits for verified model paths.
Scope:
- Use `scripts/generate_dataset.py` to validate the SFT schema locally.
- Generate 200-500 object-persona candidate samples after real model path is available.
- Manually curate at least 50 high-quality examples.
- Define SFT schema.
- Prepare dataset preview.
- Draft dataset privacy notes.
Exit criteria:
- Mock SFT preview exists and parses as JSONL.
- Training dataset is structured and inspectable.
- Public examples contain no private data.
- Dataset card draft exists.
Verification:
- Validate JSONL format.
- Spot-check curated samples.
- Confirm no obvious sensitive data.
## Phase 6 β LoRA Fine-Tuning And Model Card
Goal: publish a small fine-tuned model or adapter that can be linked in submission materials.
Scope:
- Run LoRA training with Modal or local resources.
- Export adapter or merged model.
- Convert to GGUF if needed.
- Publish HF model repo.
- Complete `docs/MODEL_CARD.md`.
Exit criteria:
- Fine-tuned model repo exists.
- Model parameter count is documented.
- Runtime instructions are documented.
Verification:
- Run inference on sample prompts.
- Confirm HF model links.
- Confirm no private credit codes or tokens are present.
## Phase 7 β Public Traces And Reproducibility
Goal: satisfy Sharing is Caring expectations.
Scope:
- Produce at least six public traces.
- Keep `data/traces/samples/` in sync with the six example objects.
- Export public traces to JSONL for dataset-style sharing.
- Add prompt templates.
- Add dataset preview.
- Document failures and fallbacks.
- Ensure trace anonymization.
Exit criteria:
- Public trace files are readable JSON.
- Trace docs explain how outputs were produced.
- Example gallery aligns with public traces.
Verification:
- Validate trace JSON.
- Inspect anonymization.
- Confirm README links.
## Phase 8 β Hugging Face Space Deployment
Goal: deploy the app in the required Gradio format.
Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.
Scope:
- Create Hugging Face Space. Completed.
- Add Space README YAML header. Completed.
- Confirm `app_file: app.py`. Completed.
- Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
- Check runtime resource constraints. Pending L4 validation.
Exit criteria:
- Space opens publicly or under the official hackathon organization.
- App can generate at least stable demo examples.
- README includes deployment and model notes.
Verification:
- Launch on HF Space. Completed for mock-safe runtime.
- Run demo flow in hosted environment.
- Run Space VLM validation for mug, keyboard, and shoe.
- Check logs for missing secrets or path errors.
## Phase 9 β Field Notes And Demo Video
Goal: complete narrative submission assets.
Scope:
- Write Field Notes article.
- Record demo video under 2 minutes.
- Prepare social post.
- Add badge evidence to README.
Exit criteria:
- Field Notes URL exists.
- Demo video URL exists.
- Social post URL exists.
- Submission package has all required links.
Verification:
- Watch final video.
- Check all URLs.
- Confirm README and submission guide are aligned.
## Phase 10 β Final Submission Audit
Goal: reduce avoidable submission risk.
Checklist:
- [ ] Space under official organization.
- [ ] Demo video ready.
- [ ] Social post ready.
- [ ] README complete.
- [ ] Model parameter count documented.
- [ ] No commercial cloud AI API.
- [ ] Fine-tuned model linked.
- [ ] Dataset linked.
- [ ] Traces linked.
- [ ] Field Notes linked.
- [ ] UI English-first and Chinese-second.
- [ ] Submit before June 15, 2026.
## Risk Register
| Risk | Impact | Mitigation |
| --- | --- | --- |
| VLM deployment is slow | Blocks real image understanding | Keep manual description and example gallery fallback |
| llama.cpp setup is unstable | Blocks Llama Champion badge | Use text mock fallback for demo while isolating runtime work |
| Fine-tuning takes too long | Weakens Well-Tuned badge | Prepare small curated dataset and prompt-tuned fallback |
| HF Space resources are limited | Demo may be slow | Cache examples and support CPU fallback |
| Trace contains private data | Submission/privacy risk | Anonymize trace input and avoid raw private images |
## Working Rule
Do not start a later phase by breaking an earlier verified flow. The mock MVP should remain usable while real model paths are added behind clear fallbacks.
|