ObjectverseDiary / docs /07-development-plan.md
qqyule's picture
Deploy latest Objectverse Diary from fa09aac
dd6cefc verified

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

Objectverse Diary β€” Detailed Development Plan

Purpose

This document turns the day-by-day schedule into an execution plan for completing Objectverse Diary from the initial mock MVP to hackathon submission.

The plan is intentionally staged. Each phase has a clear goal, implementation scope, verification method, and exit criteria.

Current Baseline

As of 2026-06-06, the project has:

  • initialized project structure
  • root README and AGENTS instructions
  • .codex/skills/ project guidance
  • initial Gradio mock MVP
  • six stable example objects
  • mock object understanding JSON
  • mock persona and diary generation
  • object chat with mock persona consistency
  • share card HTML preview
  • anonymized trace JSON saving under data/traces/
  • six stable public mock traces under data/traces/samples/
  • deterministic SFT preview generator and dataset plan
  • public trace JSONL exporter
  • failure notes template
  • scripts/generate_sample_traces.py
  • scripts/generate_dataset.py
  • scripts/export_traces.py
  • stdlib unittest smoke tests for the mock MVP
  • runtime configuration boundary documented in docs/RUNTIME.md
  • initial-stage acceptance script at scripts/check_initial_stage.py
  • Hugging Face Space created at build-small-hackathon/ObjectverseDiary
  • optional MiniCPM-V 2.6 vision backend wiring with mock fallback
  • optional llama.cpp / llama-cpp-python text runtime wiring through TEXT_MODEL_PATH
  • hosted Space VLM validation tooling in scripts/check_space_vlm.py
  • pending Space VLM report template in docs/SPACE_VLM_REPORT.md

Not yet done:

  • GitHub repo sync / public submission confirmation
  • hosted Space MiniCPM-V validation with real public images
  • real GGUF selection and local TEXT_MODEL_PATH smoke test
  • real curated dataset
  • LoRA fine-tuning
  • model card completion
  • Field Notes article
  • demo video
  • final submission package

Phase 1 β€” Initial Mock MVP

Goal: validate the product loop before model integration.

Scope:

  • Build app.py entrypoint.
  • Build Gradio Blocks UI.
  • Support image upload and optional text description.
  • Add personality mode selection.
  • Add six stable example objects.
  • Produce deterministic mock object JSON.
  • Produce deterministic mock persona JSON.
  • Produce English-first diary with Chinese helper translation.
  • Support chat replies using the generated persona.
  • Render a share card preview.
  • Save anonymized trace JSON.

Exit criteria:

  • python app.py starts a Gradio app.
  • User can complete Upload -> Generate -> Diary -> Share Card -> Trace.
  • Trace JSON is saved locally.
  • No commercial model APIs are used.

Verification:

  • Import smoke test for app.
  • Direct function smoke test for generation flow.
  • unittest smoke tests for mock flow, chat, share card, trace save, and anonymization.
  • Sample trace generation script writes six stable trace files.
  • Dataset preview script writes deterministic mock SFT preview JSONL.
  • Trace export script writes validated public trace JSONL.
  • scripts/check_initial_stage.py validates required initial-stage artifacts.
  • Manual Gradio preview.

Phase 2 β€” UI Polish And Example Gallery

Goal: make the app feel like an object archive instead of a default Gradio demo.

Scope:

  • Refine src/ui/styles.css.
  • Reference the design images under UI 参考/ for visual direction.
  • Keep content, interaction flow, language hierarchy, and feature scope aligned with docs/.
  • Keep six stable example objects visible in the UI.
  • Add clearer empty states and error states.
  • Improve mobile layout.
  • Keep UI English-first and Chinese-second.

Exit criteria:

  • 1366px desktop layout is usable.
  • Mobile-width layout is usable.
  • Example gallery can reproduce stable outputs.
  • Share card is readable and screenshot-friendly.

Verification:

  • Manual browser preview.
  • Screenshot review at desktop and mobile widths.
  • Example generation for at least six objects.

Phase 3 β€” Vision Understanding

Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.

Status: local wiring complete; hosted ZeroGPU validation reaches the app but falls back to mock vision.

Scope:

  • Add MiniCPM-V or lightweight VLM runner in src/models/vision_runner.py.
  • Keep manual description fallback.
  • Validate object understanding JSON with schemas.
  • Add JSON repair or retry behavior.
  • Cache stable examples for demo reliability.

Exit criteria:

  • Uploaded object photos produce structured object JSON.
  • Cups, keyboards, and shoes are recognized with useful visible features.
  • Fallback path works when VLM fails.

Verification:

  • Run local sample image checks.
  • Confirm schema validation.
  • Confirm fallback trace markers.
  • Run scripts/check_space_vlm.py --configure-space --hardware zero-a10g --rollback-to-mock after external-state confirmation.
  • Inspect Space runtime logs or add non-secret diagnostics before rerunning, because the 2026-06-08 hosted validation returned vision-fallback-to-mock for mug, keyboard, and shoe.

Phase 4 β€” Text Runtime With llama.cpp

Goal: make persona, diary, and chat generation use a small local text model runtime.

Status: optional runtime wiring complete; published LoRA v2 Q4_K_M GGUF passed local llama.cpp smoke. Hosted Space text runtime validation is still pending.

Scope:

  • Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
  • Add model path configuration. Completed through TEXT_MODEL_PATH.
  • Preserve src/pipeline.py as the UI-independent generation boundary.
  • Implement persona generation.
  • Implement diary generation.
  • Implement chat continuation.
  • Keep deterministic mock fallback for demos.

Exit criteria:

  • Text generation can run through llama.cpp or documented local fallback.
  • README documents runtime path and published GGUF selection.
  • Trace records include runtime metadata.

Verification:

  • Local runtime smoke test with objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf.
  • JSON schema validation.
  • Compare at least three object generations for persona consistency.

Phase 5 β€” Dataset And Fine-Tuning Preparation

Goal: prepare Well-Tuned badge evidence.

Status: mock SFT preview complete; real candidate generation waits for verified model paths.

Scope:

  • Use scripts/generate_dataset.py to validate the SFT schema locally.
  • Generate 200-500 object-persona candidate samples after real model path is available.
  • Manually curate at least 50 high-quality examples.
  • Define SFT schema.
  • Prepare dataset preview.
  • Draft dataset privacy notes.

Exit criteria:

  • Mock SFT preview exists and parses as JSONL.
  • Training dataset is structured and inspectable.
  • Public examples contain no private data.
  • Dataset card draft exists.

Verification:

  • Validate JSONL format.
  • Spot-check curated samples.
  • Confirm no obvious sensitive data.

Phase 6 β€” LoRA Fine-Tuning And Model Card

Goal: publish a small fine-tuned model or adapter that can be linked in submission materials.

Scope:

  • Run LoRA training with Modal or local resources.
  • Export adapter or merged model.
  • Convert to GGUF if needed.
  • Publish HF model repo.
  • Complete docs/MODEL_CARD.md.

Exit criteria:

  • Fine-tuned model repo exists.
  • Model parameter count is documented.
  • Runtime instructions are documented.

Verification:

  • Run inference on sample prompts.
  • Confirm HF model links.
  • Confirm no private credit codes or tokens are present.

Phase 7 β€” Public Traces And Reproducibility

Goal: satisfy Sharing is Caring expectations.

Scope:

  • Produce at least six public traces.
  • Keep data/traces/samples/ in sync with the six example objects.
  • Export public traces to JSONL for dataset-style sharing.
  • Add prompt templates.
  • Add dataset preview.
  • Document failures and fallbacks.
  • Ensure trace anonymization.

Exit criteria:

  • Public trace files are readable JSON.
  • Trace docs explain how outputs were produced.
  • Example gallery aligns with public traces.

Verification:

  • Validate trace JSON.
  • Inspect anonymization.
  • Confirm README links.

Phase 8 β€” Hugging Face Space Deployment

Goal: deploy the app in the required Gradio format.

Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.

Scope:

  • Create Hugging Face Space. Completed.
  • Add Space README YAML header. Completed.
  • Confirm app_file: app.py. Completed.
  • Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
  • Check runtime resource constraints. Pending L4 validation.

Exit criteria:

  • Space opens publicly or under the official hackathon organization.
  • App can generate at least stable demo examples.
  • README includes deployment and model notes.

Verification:

  • Launch on HF Space. Completed for mock-safe runtime.
  • Run demo flow in hosted environment.
  • Run Space VLM validation for mug, keyboard, and shoe.
  • Check logs for missing secrets or path errors.

Phase 9 β€” Field Notes And Demo Video

Goal: complete narrative submission assets.

Scope:

  • Write Field Notes article.
  • Record demo video under 2 minutes.
  • Prepare social post.
  • Add badge evidence to README.

Exit criteria:

  • Field Notes URL exists.
  • Demo video URL exists.
  • Social post URL exists.
  • Submission package has all required links.

Verification:

  • Watch final video.
  • Check all URLs.
  • Confirm README and submission guide are aligned.

Phase 10 β€” Final Submission Audit

Goal: reduce avoidable submission risk.

Checklist:

  • Space under official organization.
  • Demo video ready.
  • Social post ready.
  • README complete.
  • Model parameter count documented.
  • No commercial cloud AI API.
  • Fine-tuned model linked.
  • Dataset linked.
  • Traces linked.
  • Field Notes linked.
  • UI English-first and Chinese-second.
  • Submit before June 15, 2026.

Risk Register

Risk Impact Mitigation
VLM deployment is slow Blocks real image understanding Keep manual description and example gallery fallback
llama.cpp setup is unstable Blocks Llama Champion badge Use text mock fallback for demo while isolating runtime work
Fine-tuning takes too long Weakens Well-Tuned badge Prepare small curated dataset and prompt-tuned fallback
HF Space resources are limited Demo may be slow Cache examples and support CPU fallback
Trace contains private data Submission/privacy risk Anonymize trace input and avoid raw private images

Working Rule

Do not start a later phase by breaking an earlier verified flow. The mock MVP should remain usable while real model paths are added behind clear fallbacks.