ObjectverseDiary / docs /07-development-plan.md
qqyule's picture
Deploy latest Objectverse Diary from fa09aac
dd6cefc verified
# Objectverse Diary — Detailed Development Plan
## Purpose
This document turns the day-by-day schedule into an execution plan for completing Objectverse Diary from the initial mock MVP to hackathon submission.
The plan is intentionally staged. Each phase has a clear goal, implementation scope, verification method, and exit criteria.
## Current Baseline
As of 2026-06-06, the project has:
- initialized project structure
- root README and AGENTS instructions
- `.codex/skills/` project guidance
- initial Gradio mock MVP
- six stable example objects
- mock object understanding JSON
- mock persona and diary generation
- object chat with mock persona consistency
- share card HTML preview
- anonymized trace JSON saving under `data/traces/`
- six stable public mock traces under `data/traces/samples/`
- deterministic SFT preview generator and dataset plan
- public trace JSONL exporter
- failure notes template
- `scripts/generate_sample_traces.py`
- `scripts/generate_dataset.py`
- `scripts/export_traces.py`
- stdlib unittest smoke tests for the mock MVP
- runtime configuration boundary documented in `docs/RUNTIME.md`
- initial-stage acceptance script at `scripts/check_initial_stage.py`
- Hugging Face Space created at `build-small-hackathon/ObjectverseDiary`
- optional MiniCPM-V 2.6 vision backend wiring with mock fallback
- optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`
- hosted Space VLM validation tooling in `scripts/check_space_vlm.py`
- pending Space VLM report template in `docs/SPACE_VLM_REPORT.md`
Not yet done:
- GitHub repo sync / public submission confirmation
- hosted Space MiniCPM-V validation with real public images
- real GGUF selection and local `TEXT_MODEL_PATH` smoke test
- real curated dataset
- LoRA fine-tuning
- model card completion
- Field Notes article
- demo video
- final submission package
## Phase 1 — Initial Mock MVP
Goal: validate the product loop before model integration.
Scope:
- Build `app.py` entrypoint.
- Build Gradio Blocks UI.
- Support image upload and optional text description.
- Add personality mode selection.
- Add six stable example objects.
- Produce deterministic mock object JSON.
- Produce deterministic mock persona JSON.
- Produce English-first diary with Chinese helper translation.
- Support chat replies using the generated persona.
- Render a share card preview.
- Save anonymized trace JSON.
Exit criteria:
- `python app.py` starts a Gradio app.
- User can complete `Upload -> Generate -> Diary -> Share Card -> Trace`.
- Trace JSON is saved locally.
- No commercial model APIs are used.
Verification:
- Import smoke test for `app`.
- Direct function smoke test for generation flow.
- `unittest` smoke tests for mock flow, chat, share card, trace save, and anonymization.
- Sample trace generation script writes six stable trace files.
- Dataset preview script writes deterministic mock SFT preview JSONL.
- Trace export script writes validated public trace JSONL.
- `scripts/check_initial_stage.py` validates required initial-stage artifacts.
- Manual Gradio preview.
## Phase 2 — UI Polish And Example Gallery
Goal: make the app feel like an object archive instead of a default Gradio demo.
Scope:
- Refine `src/ui/styles.css`.
- Reference the design images under `UI 参考/` for visual direction.
- Keep content, interaction flow, language hierarchy, and feature scope aligned with `docs/`.
- Keep six stable example objects visible in the UI.
- Add clearer empty states and error states.
- Improve mobile layout.
- Keep UI English-first and Chinese-second.
Exit criteria:
- 1366px desktop layout is usable.
- Mobile-width layout is usable.
- Example gallery can reproduce stable outputs.
- Share card is readable and screenshot-friendly.
Verification:
- Manual browser preview.
- Screenshot review at desktop and mobile widths.
- Example generation for at least six objects.
## Phase 3 — Vision Understanding
Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
Status: local wiring complete; hosted ZeroGPU validation reaches the app but falls back to mock vision.
Scope:
- Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
- Keep manual description fallback.
- Validate object understanding JSON with schemas.
- Add JSON repair or retry behavior.
- Cache stable examples for demo reliability.
Exit criteria:
- Uploaded object photos produce structured object JSON.
- Cups, keyboards, and shoes are recognized with useful visible features.
- Fallback path works when VLM fails.
Verification:
- Run local sample image checks.
- Confirm schema validation.
- Confirm fallback trace markers.
- Run `scripts/check_space_vlm.py --configure-space --hardware zero-a10g --rollback-to-mock` after external-state confirmation.
- Inspect Space runtime logs or add non-secret diagnostics before rerunning, because the 2026-06-08 hosted validation returned `vision-fallback-to-mock` for mug, keyboard, and shoe.
## Phase 4 — Text Runtime With llama.cpp
Goal: make persona, diary, and chat generation use a small local text model runtime.
Status: optional runtime wiring complete; published LoRA v2 Q4_K_M GGUF passed local llama.cpp smoke. Hosted Space text runtime validation is still pending.
Scope:
- Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
- Add model path configuration. Completed through `TEXT_MODEL_PATH`.
- Preserve `src/pipeline.py` as the UI-independent generation boundary.
- Implement persona generation.
- Implement diary generation.
- Implement chat continuation.
- Keep deterministic mock fallback for demos.
Exit criteria:
- Text generation can run through llama.cpp or documented local fallback.
- README documents runtime path and published GGUF selection.
- Trace records include runtime metadata.
Verification:
- Local runtime smoke test with `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`.
- JSON schema validation.
- Compare at least three object generations for persona consistency.
## Phase 5 — Dataset And Fine-Tuning Preparation
Goal: prepare Well-Tuned badge evidence.
Status: mock SFT preview complete; real candidate generation waits for verified model paths.
Scope:
- Use `scripts/generate_dataset.py` to validate the SFT schema locally.
- Generate 200-500 object-persona candidate samples after real model path is available.
- Manually curate at least 50 high-quality examples.
- Define SFT schema.
- Prepare dataset preview.
- Draft dataset privacy notes.
Exit criteria:
- Mock SFT preview exists and parses as JSONL.
- Training dataset is structured and inspectable.
- Public examples contain no private data.
- Dataset card draft exists.
Verification:
- Validate JSONL format.
- Spot-check curated samples.
- Confirm no obvious sensitive data.
## Phase 6 — LoRA Fine-Tuning And Model Card
Goal: publish a small fine-tuned model or adapter that can be linked in submission materials.
Scope:
- Run LoRA training with Modal or local resources.
- Export adapter or merged model.
- Convert to GGUF if needed.
- Publish HF model repo.
- Complete `docs/MODEL_CARD.md`.
Exit criteria:
- Fine-tuned model repo exists.
- Model parameter count is documented.
- Runtime instructions are documented.
Verification:
- Run inference on sample prompts.
- Confirm HF model links.
- Confirm no private credit codes or tokens are present.
## Phase 7 — Public Traces And Reproducibility
Goal: satisfy Sharing is Caring expectations.
Scope:
- Produce at least six public traces.
- Keep `data/traces/samples/` in sync with the six example objects.
- Export public traces to JSONL for dataset-style sharing.
- Add prompt templates.
- Add dataset preview.
- Document failures and fallbacks.
- Ensure trace anonymization.
Exit criteria:
- Public trace files are readable JSON.
- Trace docs explain how outputs were produced.
- Example gallery aligns with public traces.
Verification:
- Validate trace JSON.
- Inspect anonymization.
- Confirm README links.
## Phase 8 — Hugging Face Space Deployment
Goal: deploy the app in the required Gradio format.
Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.
Scope:
- Create Hugging Face Space. Completed.
- Add Space README YAML header. Completed.
- Confirm `app_file: app.py`. Completed.
- Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
- Check runtime resource constraints. Pending L4 validation.
Exit criteria:
- Space opens publicly or under the official hackathon organization.
- App can generate at least stable demo examples.
- README includes deployment and model notes.
Verification:
- Launch on HF Space. Completed for mock-safe runtime.
- Run demo flow in hosted environment.
- Run Space VLM validation for mug, keyboard, and shoe.
- Check logs for missing secrets or path errors.
## Phase 9 — Field Notes And Demo Video
Goal: complete narrative submission assets.
Scope:
- Write Field Notes article.
- Record demo video under 2 minutes.
- Prepare social post.
- Add badge evidence to README.
Exit criteria:
- Field Notes URL exists.
- Demo video URL exists.
- Social post URL exists.
- Submission package has all required links.
Verification:
- Watch final video.
- Check all URLs.
- Confirm README and submission guide are aligned.
## Phase 10 — Final Submission Audit
Goal: reduce avoidable submission risk.
Checklist:
- [ ] Space under official organization.
- [ ] Demo video ready.
- [ ] Social post ready.
- [ ] README complete.
- [ ] Model parameter count documented.
- [ ] No commercial cloud AI API.
- [ ] Fine-tuned model linked.
- [ ] Dataset linked.
- [ ] Traces linked.
- [ ] Field Notes linked.
- [ ] UI English-first and Chinese-second.
- [ ] Submit before June 15, 2026.
## Risk Register
| Risk | Impact | Mitigation |
| --- | --- | --- |
| VLM deployment is slow | Blocks real image understanding | Keep manual description and example gallery fallback |
| llama.cpp setup is unstable | Blocks Llama Champion badge | Use text mock fallback for demo while isolating runtime work |
| Fine-tuning takes too long | Weakens Well-Tuned badge | Prepare small curated dataset and prompt-tuned fallback |
| HF Space resources are limited | Demo may be slow | Cache examples and support CPU fallback |
| Trace contains private data | Submission/privacy risk | Anonymize trace input and avoid raw private images |
## Working Rule
Do not start a later phase by breaking an earlier verified flow. The mock MVP should remain usable while real model paths are added behind clear fallbacks.