Spaces:

build-small-hackathon
/

ObjectverseDiary

Running on Zero

App Files Files Community

ObjectverseDiary / docs /07-development-plan.md

qqyule

Deploy latest Objectverse Diary from fa09aac

dd6cefc verified 3 days ago

preview code

raw

history blame contribute delete

10.7 kB

	# Objectverse Diary — Detailed Development Plan

	## Purpose

	This document turns the day-by-day schedule into an execution plan for completing Objectverse Diary from the initial mock MVP to hackathon submission.

	The plan is intentionally staged. Each phase has a clear goal, implementation scope, verification method, and exit criteria.

	## Current Baseline

	As of 2026-06-06, the project has:

	- initialized project structure
	- root README and AGENTS instructions
	- `.codex/skills/` project guidance
	- initial Gradio mock MVP
	- six stable example objects
	- mock object understanding JSON
	- mock persona and diary generation
	- object chat with mock persona consistency
	- share card HTML preview
	- anonymized trace JSON saving under `data/traces/`
	- six stable public mock traces under `data/traces/samples/`
	- deterministic SFT preview generator and dataset plan
	- public trace JSONL exporter
	- failure notes template
	- `scripts/generate_sample_traces.py`
	- `scripts/generate_dataset.py`
	- `scripts/export_traces.py`
	- stdlib unittest smoke tests for the mock MVP
	- runtime configuration boundary documented in `docs/RUNTIME.md`
	- initial-stage acceptance script at `scripts/check_initial_stage.py`
	- Hugging Face Space created at `build-small-hackathon/ObjectverseDiary`
	- optional MiniCPM-V 2.6 vision backend wiring with mock fallback
	- optional llama.cpp / llama-cpp-python text runtime wiring through `TEXT_MODEL_PATH`
	- hosted Space VLM validation tooling in `scripts/check_space_vlm.py`
	- pending Space VLM report template in `docs/SPACE_VLM_REPORT.md`

	Not yet done:

	- GitHub repo sync / public submission confirmation
	- hosted Space MiniCPM-V validation with real public images
	- real GGUF selection and local `TEXT_MODEL_PATH` smoke test
	- real curated dataset
	- LoRA fine-tuning
	- model card completion
	- Field Notes article
	- demo video
	- final submission package

	## Phase 1 — Initial Mock MVP

	Goal: validate the product loop before model integration.

	Scope:

	- Build `app.py` entrypoint.
	- Build Gradio Blocks UI.
	- Support image upload and optional text description.
	- Add personality mode selection.
	- Add six stable example objects.
	- Produce deterministic mock object JSON.
	- Produce deterministic mock persona JSON.
	- Produce English-first diary with Chinese helper translation.
	- Support chat replies using the generated persona.
	- Render a share card preview.
	- Save anonymized trace JSON.

	Exit criteria:

	- `python app.py` starts a Gradio app.
	- User can complete `Upload -> Generate -> Diary -> Share Card -> Trace`.
	- Trace JSON is saved locally.
	- No commercial model APIs are used.

	Verification:

	- Import smoke test for `app`.
	- Direct function smoke test for generation flow.
	- `unittest` smoke tests for mock flow, chat, share card, trace save, and anonymization.
	- Sample trace generation script writes six stable trace files.
	- Dataset preview script writes deterministic mock SFT preview JSONL.
	- Trace export script writes validated public trace JSONL.
	- `scripts/check_initial_stage.py` validates required initial-stage artifacts.
	- Manual Gradio preview.

	## Phase 2 — UI Polish And Example Gallery

	Goal: make the app feel like an object archive instead of a default Gradio demo.

	Scope:

	- Refine `src/ui/styles.css`.
	- Reference the design images under `UI 参考/` for visual direction.
	- Keep content, interaction flow, language hierarchy, and feature scope aligned with `docs/`.
	- Keep six stable example objects visible in the UI.
	- Add clearer empty states and error states.
	- Improve mobile layout.
	- Keep UI English-first and Chinese-second.

	Exit criteria:

	- 1366px desktop layout is usable.
	- Mobile-width layout is usable.
	- Example gallery can reproduce stable outputs.
	- Share card is readable and screenshot-friendly.

	Verification:

	- Manual browser preview.
	- Screenshot review at desktop and mobile widths.
	- Example generation for at least six objects.

	## Phase 3 — Vision Understanding

	Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.

	Status: local wiring complete; hosted ZeroGPU validation reaches the app but falls back to mock vision.

	Scope:

	- Add MiniCPM-V or lightweight VLM runner in `src/models/vision_runner.py`.
	- Keep manual description fallback.
	- Validate object understanding JSON with schemas.
	- Add JSON repair or retry behavior.
	- Cache stable examples for demo reliability.

	Exit criteria:

	- Uploaded object photos produce structured object JSON.
	- Cups, keyboards, and shoes are recognized with useful visible features.
	- Fallback path works when VLM fails.

	Verification:

	- Run local sample image checks.
	- Confirm schema validation.
	- Confirm fallback trace markers.
	- Run `scripts/check_space_vlm.py --configure-space --hardware zero-a10g --rollback-to-mock` after external-state confirmation.
	- Inspect Space runtime logs or add non-secret diagnostics before rerunning, because the 2026-06-08 hosted validation returned `vision-fallback-to-mock` for mug, keyboard, and shoe.

	## Phase 4 — Text Runtime With llama.cpp

	Goal: make persona, diary, and chat generation use a small local text model runtime.

	Status: optional runtime wiring complete; published LoRA v2 Q4_K_M GGUF passed local llama.cpp smoke. Hosted Space text runtime validation is still pending.

	Scope:

	- Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
	- Add model path configuration. Completed through `TEXT_MODEL_PATH`.
	- Preserve `src/pipeline.py` as the UI-independent generation boundary.
	- Implement persona generation.
	- Implement diary generation.
	- Implement chat continuation.
	- Keep deterministic mock fallback for demos.

	Exit criteria:

	- Text generation can run through llama.cpp or documented local fallback.
	- README documents runtime path and published GGUF selection.
	- Trace records include runtime metadata.

	Verification:

	- Local runtime smoke test with `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`.
	- JSON schema validation.
	- Compare at least three object generations for persona consistency.

	## Phase 5 — Dataset And Fine-Tuning Preparation

	Goal: prepare Well-Tuned badge evidence.

	Status: mock SFT preview complete; real candidate generation waits for verified model paths.

	Scope:

	- Use `scripts/generate_dataset.py` to validate the SFT schema locally.
	- Generate 200-500 object-persona candidate samples after real model path is available.
	- Manually curate at least 50 high-quality examples.
	- Define SFT schema.
	- Prepare dataset preview.
	- Draft dataset privacy notes.

	Exit criteria:

	- Mock SFT preview exists and parses as JSONL.
	- Training dataset is structured and inspectable.
	- Public examples contain no private data.
	- Dataset card draft exists.

	Verification:

	- Validate JSONL format.
	- Spot-check curated samples.
	- Confirm no obvious sensitive data.

	## Phase 6 — LoRA Fine-Tuning And Model Card

	Goal: publish a small fine-tuned model or adapter that can be linked in submission materials.

	Scope:

	- Run LoRA training with Modal or local resources.
	- Export adapter or merged model.
	- Convert to GGUF if needed.
	- Publish HF model repo.
	- Complete `docs/MODEL_CARD.md`.

	Exit criteria:

	- Fine-tuned model repo exists.
	- Model parameter count is documented.
	- Runtime instructions are documented.

	Verification:

	- Run inference on sample prompts.
	- Confirm HF model links.
	- Confirm no private credit codes or tokens are present.

	## Phase 7 — Public Traces And Reproducibility

	Goal: satisfy Sharing is Caring expectations.

	Scope:

	- Produce at least six public traces.
	- Keep `data/traces/samples/` in sync with the six example objects.
	- Export public traces to JSONL for dataset-style sharing.
	- Add prompt templates.
	- Add dataset preview.
	- Document failures and fallbacks.
	- Ensure trace anonymization.

	Exit criteria:

	- Public trace files are readable JSON.
	- Trace docs explain how outputs were produced.
	- Example gallery aligns with public traces.

	Verification:

	- Validate trace JSON.
	- Inspect anonymization.
	- Confirm README links.

	## Phase 8 — Hugging Face Space Deployment

	Goal: deploy the app in the required Gradio format.

	Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.

	Scope:

	- Create Hugging Face Space. Completed.
	- Add Space README YAML header. Completed.
	- Confirm `app_file: app.py`. Completed.
	- Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
	- Check runtime resource constraints. Pending L4 validation.

	Exit criteria:

	- Space opens publicly or under the official hackathon organization.
	- App can generate at least stable demo examples.
	- README includes deployment and model notes.

	Verification:

	- Launch on HF Space. Completed for mock-safe runtime.
	- Run demo flow in hosted environment.
	- Run Space VLM validation for mug, keyboard, and shoe.
	- Check logs for missing secrets or path errors.

	## Phase 9 — Field Notes And Demo Video

	Goal: complete narrative submission assets.

	Scope:

	- Write Field Notes article.
	- Record demo video under 2 minutes.
	- Prepare social post.
	- Add badge evidence to README.

	Exit criteria:

	- Field Notes URL exists.
	- Demo video URL exists.
	- Social post URL exists.
	- Submission package has all required links.

	Verification:

	- Watch final video.
	- Check all URLs.
	- Confirm README and submission guide are aligned.

	## Phase 10 — Final Submission Audit

	Goal: reduce avoidable submission risk.

	Checklist:

	- [ ] Space under official organization.
	- [ ] Demo video ready.
	- [ ] Social post ready.
	- [ ] README complete.
	- [ ] Model parameter count documented.
	- [ ] No commercial cloud AI API.
	- [ ] Fine-tuned model linked.
	- [ ] Dataset linked.
	- [ ] Traces linked.
	- [ ] Field Notes linked.
	- [ ] UI English-first and Chinese-second.
	- [ ] Submit before June 15, 2026.

	## Risk Register

	\| Risk \| Impact \| Mitigation \|
	\| --- \| --- \| --- \|
	\| VLM deployment is slow \| Blocks real image understanding \| Keep manual description and example gallery fallback \|
	\| llama.cpp setup is unstable \| Blocks Llama Champion badge \| Use text mock fallback for demo while isolating runtime work \|
	\| Fine-tuning takes too long \| Weakens Well-Tuned badge \| Prepare small curated dataset and prompt-tuned fallback \|
	\| HF Space resources are limited \| Demo may be slow \| Cache examples and support CPU fallback \|
	\| Trace contains private data \| Submission/privacy risk \| Anonymize trace input and avoid raw private images \|

	## Working Rule

	Do not start a later phase by breaking an earlier verified flow. The mock MVP should remain usable while real model paths are added behind clear fallbacks.