# Objectverse Diary — Detailed Development Plan ## Purpose This document turns the day-by-day schedule into an execution plan for completing Objectverse Diary from the initial mock MVP to hackathon submission. The plan is intentionally staged. Each phase has a clear goal, implementation scope, verification method, and exit criteria. ## Current Baseline As of 2026-06-06, the project has: - initialized project structure - root README and AGENTS instructions - `.codex/skills/` project guidance - initial Gradio mock MVP - six stable example objects - mock object understanding JSON - mock persona and diary generation - object chat with mock persona consistency - share card HTML preview - anonymized trace JSON saving under `data/traces/` - six stable public mock traces under `data/traces/samples/` - deterministic SFT preview generator and dataset plan - public trace JSONL exporter - failure notes template - `scripts/generate_sample_traces.py` - `scripts/generate_dataset.py` - `scripts/export_traces.py` - stdlib unittest smoke tests for the mock MVP - runtime configuration boundary documented in `docs/RUNTIME.md` - initial-stage acceptance script at `scripts/check_initial_stage.py` - Hugging Face Space created at `build-small-hackathon/ObjectverseDiary` - optional MiniCPM-V 2.6 vision backend wiring with mock fallback - optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH` - hosted Space VLM validation tooling in `scripts/check_space_vlm.py` - pending Space VLM report template in `docs/SPACE_VLM_REPORT.md` Not yet done: - GitHub repo sync / public submission confirmation - hosted Space MiniCPM-V validation with real public images - real GGUF selection and local `TEXT_MODEL_PATH` smoke test - real curated dataset - LoRA fine-tuning - model card completion - Field Notes article - demo video - final submission package ## Phase 1 — Initial Mock MVP Goal: validate the product loop before model integration. Scope: - Build `app.py` entrypoint. - Build Gradio Blocks UI. - Support image upload and optional text description. - Add personality mode selection. - Add six stable example objects. - Produce deterministic mock object JSON. - Produce deterministic mock persona JSON. - Produce English-first diary with Chinese helper translation. - Support chat replies using the generated persona. - Render a share card preview. - Save anonymized trace JSON. Exit criteria: - `python app.py` starts a Gradio app. - User can complete `Upload -> Generate -> Diary -> Share Card -> Trace`. - Trace JSON is saved locally. - No commercial model APIs are used. Verification: - Import smoke test for `app`. - Direct function smoke test for generation flow. - `unittest` smoke tests for mock flow, chat, share card, trace save, and anonymization. - Sample trace generation script writes six stable trace files. - Dataset preview script writes deterministic mock SFT preview JSONL. - Trace export script writes validated public trace JSONL. - `scripts/check_initial_stage.py` validates required initial-stage artifacts. - Manual Gradio preview. ## Phase 2 — UI Polish And Example Gallery Goal: make the app feel like an object archive instead of a default Gradio demo. Scope: - Refine `src/ui/styles.css`. - Reference the design images under `UI 参考/` for visual direction. - Keep content, interaction flow, language hierarchy, and feature scope aligned with `docs/`. - Keep six stable example objects visible in the UI. - Add clearer empty states and error states. - Improve mobile layout. - Keep UI English-first and Chinese-second. Exit criteria: - 1366px desktop layout is usable. - Mobile-width layout is usable. - Example gallery can reproduce stable outputs. - Share card is readable and screenshot-friendly. Verification: - Manual browser preview. - Screenshot review at desktop and mobile widths. - Example generation for at least six objects. ## Phase 3 — Vision Understanding Goal: replace mock object recognition with a real VLM path while preserving fallback behavior. Status: local wiring complete; hosted ZeroGPU validation reaches the app but falls back to mock vision. Scope: - Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`. - Keep manual description fallback. - Validate object understanding JSON with schemas. - Add JSON repair or retry behavior. - Cache stable examples for demo reliability. Exit criteria: - Uploaded object photos produce structured object JSON. - Cups, keyboards, and shoes are recognized with useful visible features. - Fallback path works when VLM fails. Verification: - Run local sample image checks. - Confirm schema validation. - Confirm fallback trace markers. - Run `scripts/check_space_vlm.py --configure-space --hardware zero-a10g --rollback-to-mock` after external-state confirmation. - Inspect Space runtime logs or add non-secret diagnostics before rerunning, because the 2026-06-08 hosted validation returned `vision-fallback-to-mock` for mug, keyboard, and shoe. ## Phase 4 — Text Runtime With llama.cpp Goal: make persona, diary, and chat generation use a small local text model runtime. Status: optional runtime wiring complete; published LoRA v2 Q4_K_M GGUF passed local llama.cpp smoke. Hosted Space text runtime validation is still pending. Scope: - Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring. - Add model path configuration. Completed through `TEXT_MODEL_PATH`. - Preserve `src/pipeline.py` as the UI-independent generation boundary. - Implement persona generation. - Implement diary generation. - Implement chat continuation. - Keep deterministic mock fallback for demos. Exit criteria: - Text generation can run through llama.cpp or documented local fallback. - README documents runtime path and published GGUF selection. - Trace records include runtime metadata. Verification: - Local runtime smoke test with `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`. - JSON schema validation. - Compare at least three object generations for persona consistency. ## Phase 5 — Dataset And Fine-Tuning Preparation Goal: prepare Well-Tuned badge evidence. Status: mock SFT preview complete; real candidate generation waits for verified model paths. Scope: - Use `scripts/generate_dataset.py` to validate the SFT schema locally. - Generate 200-500 object-persona candidate samples after real model path is available. - Manually curate at least 50 high-quality examples. - Define SFT schema. - Prepare dataset preview. - Draft dataset privacy notes. Exit criteria: - Mock SFT preview exists and parses as JSONL. - Training dataset is structured and inspectable. - Public examples contain no private data. - Dataset card draft exists. Verification: - Validate JSONL format. - Spot-check curated samples. - Confirm no obvious sensitive data. ## Phase 6 — LoRA Fine-Tuning And Model Card Goal: publish a small fine-tuned model or adapter that can be linked in submission materials. Scope: - Run LoRA training with Modal or local resources. - Export adapter or merged model. - Convert to GGUF if needed. - Publish HF model repo. - Complete `docs/MODEL_CARD.md`. Exit criteria: - Fine-tuned model repo exists. - Model parameter count is documented. - Runtime instructions are documented. Verification: - Run inference on sample prompts. - Confirm HF model links. - Confirm no private credit codes or tokens are present. ## Phase 7 — Public Traces And Reproducibility Goal: satisfy Sharing is Caring expectations. Scope: - Produce at least six public traces. - Keep `data/traces/samples/` in sync with the six example objects. - Export public traces to JSONL for dataset-style sharing. - Add prompt templates. - Add dataset preview. - Document failures and fallbacks. - Ensure trace anonymization. Exit criteria: - Public trace files are readable JSON. - Trace docs explain how outputs were produced. - Example gallery aligns with public traces. Verification: - Validate trace JSON. - Inspect anonymization. - Confirm README links. ## Phase 8 — Hugging Face Space Deployment Goal: deploy the app in the required Gradio format. Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending. Scope: - Create Hugging Face Space. Completed. - Add Space README YAML header. Completed. - Confirm `app_file: app.py`. Completed. - Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation. - Check runtime resource constraints. Pending L4 validation. Exit criteria: - Space opens publicly or under the official hackathon organization. - App can generate at least stable demo examples. - README includes deployment and model notes. Verification: - Launch on HF Space. Completed for mock-safe runtime. - Run demo flow in hosted environment. - Run Space VLM validation for mug, keyboard, and shoe. - Check logs for missing secrets or path errors. ## Phase 9 — Field Notes And Demo Video Goal: complete narrative submission assets. Scope: - Write Field Notes article. - Record demo video under 2 minutes. - Prepare social post. - Add badge evidence to README. Exit criteria: - Field Notes URL exists. - Demo video URL exists. - Social post URL exists. - Submission package has all required links. Verification: - Watch final video. - Check all URLs. - Confirm README and submission guide are aligned. ## Phase 10 — Final Submission Audit Goal: reduce avoidable submission risk. Checklist: - [ ] Space under official organization. - [ ] Demo video ready. - [ ] Social post ready. - [ ] README complete. - [ ] Model parameter count documented. - [ ] No commercial cloud AI API. - [ ] Fine-tuned model linked. - [ ] Dataset linked. - [ ] Traces linked. - [ ] Field Notes linked. - [ ] UI English-first and Chinese-second. - [ ] Submit before June 15, 2026. ## Risk Register | Risk | Impact | Mitigation | | --- | --- | --- | | VLM deployment is slow | Blocks real image understanding | Keep manual description and example gallery fallback | | llama.cpp setup is unstable | Blocks Llama Champion badge | Use text mock fallback for demo while isolating runtime work | | Fine-tuning takes too long | Weakens Well-Tuned badge | Prepare small curated dataset and prompt-tuned fallback | | HF Space resources are limited | Demo may be slow | Cache examples and support CPU fallback | | Trace contains private data | Submission/privacy risk | Anonymize trace input and avoid raw private images | ## Working Rule Do not start a later phase by breaking an earlier verified flow. The mock MVP should remain usable while real model paths are added behind clear fallbacks.