Spaces:
Running on Zero
Running on Zero
| # Objectverse Diary — Detailed Development Plan | |
| ## Purpose | |
| This document turns the day-by-day schedule into an execution plan for completing Objectverse Diary from the initial mock MVP to hackathon submission. | |
| The plan is intentionally staged. Each phase has a clear goal, implementation scope, verification method, and exit criteria. | |
| ## Current Baseline | |
| As of 2026-06-06, the project has: | |
| - initialized project structure | |
| - root README and AGENTS instructions | |
| - `.codex/skills/` project guidance | |
| - initial Gradio mock MVP | |
| - six stable example objects | |
| - mock object understanding JSON | |
| - mock persona and diary generation | |
| - object chat with mock persona consistency | |
| - share card HTML preview | |
| - anonymized trace JSON saving under `data/traces/` | |
| - six stable public mock traces under `data/traces/samples/` | |
| - deterministic SFT preview generator and dataset plan | |
| - public trace JSONL exporter | |
| - failure notes template | |
| - `scripts/generate_sample_traces.py` | |
| - `scripts/generate_dataset.py` | |
| - `scripts/export_traces.py` | |
| - stdlib unittest smoke tests for the mock MVP | |
| - runtime configuration boundary documented in `docs/RUNTIME.md` | |
| - initial-stage acceptance script at `scripts/check_initial_stage.py` | |
| - Hugging Face Space created at `build-small-hackathon/ObjectverseDiary` | |
| - optional MiniCPM-V 2.6 vision backend wiring with mock fallback | |
| - optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH` | |
| - hosted Space VLM validation tooling in `scripts/check_space_vlm.py` | |
| - pending Space VLM report template in `docs/SPACE_VLM_REPORT.md` | |
| Not yet done: | |
| - GitHub repo sync / public submission confirmation | |
| - hosted Space MiniCPM-V validation with real public images | |
| - real GGUF selection and local `TEXT_MODEL_PATH` smoke test | |
| - real curated dataset | |
| - LoRA fine-tuning | |
| - model card completion | |
| - Field Notes article | |
| - demo video | |
| - final submission package | |
| ## Phase 1 — Initial Mock MVP | |
| Goal: validate the product loop before model integration. | |
| Scope: | |
| - Build `app.py` entrypoint. | |
| - Build Gradio Blocks UI. | |
| - Support image upload and optional text description. | |
| - Add personality mode selection. | |
| - Add six stable example objects. | |
| - Produce deterministic mock object JSON. | |
| - Produce deterministic mock persona JSON. | |
| - Produce English-first diary with Chinese helper translation. | |
| - Support chat replies using the generated persona. | |
| - Render a share card preview. | |
| - Save anonymized trace JSON. | |
| Exit criteria: | |
| - `python app.py` starts a Gradio app. | |
| - User can complete `Upload -> Generate -> Diary -> Share Card -> Trace`. | |
| - Trace JSON is saved locally. | |
| - No commercial model APIs are used. | |
| Verification: | |
| - Import smoke test for `app`. | |
| - Direct function smoke test for generation flow. | |
| - `unittest` smoke tests for mock flow, chat, share card, trace save, and anonymization. | |
| - Sample trace generation script writes six stable trace files. | |
| - Dataset preview script writes deterministic mock SFT preview JSONL. | |
| - Trace export script writes validated public trace JSONL. | |
| - `scripts/check_initial_stage.py` validates required initial-stage artifacts. | |
| - Manual Gradio preview. | |
| ## Phase 2 — UI Polish And Example Gallery | |
| Goal: make the app feel like an object archive instead of a default Gradio demo. | |
| Scope: | |
| - Refine `src/ui/styles.css`. | |
| - Reference the design images under `UI 参考/` for visual direction. | |
| - Keep content, interaction flow, language hierarchy, and feature scope aligned with `docs/`. | |
| - Keep six stable example objects visible in the UI. | |
| - Add clearer empty states and error states. | |
| - Improve mobile layout. | |
| - Keep UI English-first and Chinese-second. | |
| Exit criteria: | |
| - 1366px desktop layout is usable. | |
| - Mobile-width layout is usable. | |
| - Example gallery can reproduce stable outputs. | |
| - Share card is readable and screenshot-friendly. | |
| Verification: | |
| - Manual browser preview. | |
| - Screenshot review at desktop and mobile widths. | |
| - Example generation for at least six objects. | |
| ## Phase 3 — Vision Understanding | |
| Goal: replace mock object recognition with a real VLM path while preserving fallback behavior. | |
| Status: local wiring complete; hosted ZeroGPU validation reaches the app but falls back to mock vision. | |
| Scope: | |
| - Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`. | |
| - Keep manual description fallback. | |
| - Validate object understanding JSON with schemas. | |
| - Add JSON repair or retry behavior. | |
| - Cache stable examples for demo reliability. | |
| Exit criteria: | |
| - Uploaded object photos produce structured object JSON. | |
| - Cups, keyboards, and shoes are recognized with useful visible features. | |
| - Fallback path works when VLM fails. | |
| Verification: | |
| - Run local sample image checks. | |
| - Confirm schema validation. | |
| - Confirm fallback trace markers. | |
| - Run `scripts/check_space_vlm.py --configure-space --hardware zero-a10g --rollback-to-mock` after external-state confirmation. | |
| - Inspect Space runtime logs or add non-secret diagnostics before rerunning, because the 2026-06-08 hosted validation returned `vision-fallback-to-mock` for mug, keyboard, and shoe. | |
| ## Phase 4 — Text Runtime With llama.cpp | |
| Goal: make persona, diary, and chat generation use a small local text model runtime. | |
| Status: optional runtime wiring complete; published LoRA v2 Q4_K_M GGUF passed local llama.cpp smoke. Hosted Space text runtime validation is still pending. | |
| Scope: | |
| - Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring. | |
| - Add model path configuration. Completed through `TEXT_MODEL_PATH`. | |
| - Preserve `src/pipeline.py` as the UI-independent generation boundary. | |
| - Implement persona generation. | |
| - Implement diary generation. | |
| - Implement chat continuation. | |
| - Keep deterministic mock fallback for demos. | |
| Exit criteria: | |
| - Text generation can run through llama.cpp or documented local fallback. | |
| - README documents runtime path and published GGUF selection. | |
| - Trace records include runtime metadata. | |
| Verification: | |
| - Local runtime smoke test with `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`. | |
| - JSON schema validation. | |
| - Compare at least three object generations for persona consistency. | |
| ## Phase 5 — Dataset And Fine-Tuning Preparation | |
| Goal: prepare Well-Tuned badge evidence. | |
| Status: mock SFT preview complete; real candidate generation waits for verified model paths. | |
| Scope: | |
| - Use `scripts/generate_dataset.py` to validate the SFT schema locally. | |
| - Generate 200-500 object-persona candidate samples after real model path is available. | |
| - Manually curate at least 50 high-quality examples. | |
| - Define SFT schema. | |
| - Prepare dataset preview. | |
| - Draft dataset privacy notes. | |
| Exit criteria: | |
| - Mock SFT preview exists and parses as JSONL. | |
| - Training dataset is structured and inspectable. | |
| - Public examples contain no private data. | |
| - Dataset card draft exists. | |
| Verification: | |
| - Validate JSONL format. | |
| - Spot-check curated samples. | |
| - Confirm no obvious sensitive data. | |
| ## Phase 6 — LoRA Fine-Tuning And Model Card | |
| Goal: publish a small fine-tuned model or adapter that can be linked in submission materials. | |
| Scope: | |
| - Run LoRA training with Modal or local resources. | |
| - Export adapter or merged model. | |
| - Convert to GGUF if needed. | |
| - Publish HF model repo. | |
| - Complete `docs/MODEL_CARD.md`. | |
| Exit criteria: | |
| - Fine-tuned model repo exists. | |
| - Model parameter count is documented. | |
| - Runtime instructions are documented. | |
| Verification: | |
| - Run inference on sample prompts. | |
| - Confirm HF model links. | |
| - Confirm no private credit codes or tokens are present. | |
| ## Phase 7 — Public Traces And Reproducibility | |
| Goal: satisfy Sharing is Caring expectations. | |
| Scope: | |
| - Produce at least six public traces. | |
| - Keep `data/traces/samples/` in sync with the six example objects. | |
| - Export public traces to JSONL for dataset-style sharing. | |
| - Add prompt templates. | |
| - Add dataset preview. | |
| - Document failures and fallbacks. | |
| - Ensure trace anonymization. | |
| Exit criteria: | |
| - Public trace files are readable JSON. | |
| - Trace docs explain how outputs were produced. | |
| - Example gallery aligns with public traces. | |
| Verification: | |
| - Validate trace JSON. | |
| - Inspect anonymization. | |
| - Confirm README links. | |
| ## Phase 8 — Hugging Face Space Deployment | |
| Goal: deploy the app in the required Gradio format. | |
| Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending. | |
| Scope: | |
| - Create Hugging Face Space. Completed. | |
| - Add Space README YAML header. Completed. | |
| - Confirm `app_file: app.py`. Completed. | |
| - Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation. | |
| - Check runtime resource constraints. Pending L4 validation. | |
| Exit criteria: | |
| - Space opens publicly or under the official hackathon organization. | |
| - App can generate at least stable demo examples. | |
| - README includes deployment and model notes. | |
| Verification: | |
| - Launch on HF Space. Completed for mock-safe runtime. | |
| - Run demo flow in hosted environment. | |
| - Run Space VLM validation for mug, keyboard, and shoe. | |
| - Check logs for missing secrets or path errors. | |
| ## Phase 9 — Field Notes And Demo Video | |
| Goal: complete narrative submission assets. | |
| Scope: | |
| - Write Field Notes article. | |
| - Record demo video under 2 minutes. | |
| - Prepare social post. | |
| - Add badge evidence to README. | |
| Exit criteria: | |
| - Field Notes URL exists. | |
| - Demo video URL exists. | |
| - Social post URL exists. | |
| - Submission package has all required links. | |
| Verification: | |
| - Watch final video. | |
| - Check all URLs. | |
| - Confirm README and submission guide are aligned. | |
| ## Phase 10 — Final Submission Audit | |
| Goal: reduce avoidable submission risk. | |
| Checklist: | |
| - [ ] Space under official organization. | |
| - [ ] Demo video ready. | |
| - [ ] Social post ready. | |
| - [ ] README complete. | |
| - [ ] Model parameter count documented. | |
| - [ ] No commercial cloud AI API. | |
| - [ ] Fine-tuned model linked. | |
| - [ ] Dataset linked. | |
| - [ ] Traces linked. | |
| - [ ] Field Notes linked. | |
| - [ ] UI English-first and Chinese-second. | |
| - [ ] Submit before June 15, 2026. | |
| ## Risk Register | |
| | Risk | Impact | Mitigation | | |
| | --- | --- | --- | | |
| | VLM deployment is slow | Blocks real image understanding | Keep manual description and example gallery fallback | | |
| | llama.cpp setup is unstable | Blocks Llama Champion badge | Use text mock fallback for demo while isolating runtime work | | |
| | Fine-tuning takes too long | Weakens Well-Tuned badge | Prepare small curated dataset and prompt-tuned fallback | | |
| | HF Space resources are limited | Demo may be slow | Cache examples and support CPU fallback | | |
| | Trace contains private data | Submission/privacy risk | Anonymize trace input and avoid raw private images | | |
| ## Working Rule | |
| Do not start a later phase by breaking an earlier verified flow. The mock MVP should remain usable while real model paths are added behind clear fallbacks. | |