Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.17.3
Objectverse Diary β Detailed Development Plan
Purpose
This document turns the day-by-day schedule into an execution plan for completing Objectverse Diary from the initial mock MVP to hackathon submission.
The plan is intentionally staged. Each phase has a clear goal, implementation scope, verification method, and exit criteria.
Current Baseline
As of 2026-06-06, the project has:
- initialized project structure
- root README and AGENTS instructions
.codex/skills/project guidance- initial Gradio mock MVP
- six stable example objects
- mock object understanding JSON
- mock persona and diary generation
- object chat with mock persona consistency
- share card HTML preview
- anonymized trace JSON saving under
data/traces/ - six stable public mock traces under
data/traces/samples/ - deterministic SFT preview generator and dataset plan
- public trace JSONL exporter
- failure notes template
scripts/generate_sample_traces.pyscripts/generate_dataset.pyscripts/export_traces.py- stdlib unittest smoke tests for the mock MVP
- runtime configuration boundary documented in
docs/RUNTIME.md - initial-stage acceptance script at
scripts/check_initial_stage.py - Hugging Face Space created at
build-small-hackathon/ObjectverseDiary - optional MiniCPM-V 2.6 vision backend wiring with mock fallback
- optional llama.cpp / llama-cpp-python text runtime wiring through
TEXT_MODEL_PATH - hosted Space VLM validation tooling in
scripts/check_space_vlm.py - pending Space VLM report template in
docs/SPACE_VLM_REPORT.md
Not yet done:
- GitHub repo sync / public submission confirmation
- hosted Space MiniCPM-V validation with real public images
- real GGUF selection and local
TEXT_MODEL_PATHsmoke test - real curated dataset
- LoRA fine-tuning
- model card completion
- Field Notes article
- demo video
- final submission package
Phase 1 β Initial Mock MVP
Goal: validate the product loop before model integration.
Scope:
- Build
app.pyentrypoint. - Build Gradio Blocks UI.
- Support image upload and optional text description.
- Add personality mode selection.
- Add six stable example objects.
- Produce deterministic mock object JSON.
- Produce deterministic mock persona JSON.
- Produce English-first diary with Chinese helper translation.
- Support chat replies using the generated persona.
- Render a share card preview.
- Save anonymized trace JSON.
Exit criteria:
python app.pystarts a Gradio app.- User can complete
Upload -> Generate -> Diary -> Share Card -> Trace. - Trace JSON is saved locally.
- No commercial model APIs are used.
Verification:
- Import smoke test for
app. - Direct function smoke test for generation flow.
unittestsmoke tests for mock flow, chat, share card, trace save, and anonymization.- Sample trace generation script writes six stable trace files.
- Dataset preview script writes deterministic mock SFT preview JSONL.
- Trace export script writes validated public trace JSONL.
scripts/check_initial_stage.pyvalidates required initial-stage artifacts.- Manual Gradio preview.
Phase 2 β UI Polish And Example Gallery
Goal: make the app feel like an object archive instead of a default Gradio demo.
Scope:
- Refine
src/ui/styles.css. - Reference the design images under
UI εθ/for visual direction. - Keep content, interaction flow, language hierarchy, and feature scope aligned with
docs/. - Keep six stable example objects visible in the UI.
- Add clearer empty states and error states.
- Improve mobile layout.
- Keep UI English-first and Chinese-second.
Exit criteria:
- 1366px desktop layout is usable.
- Mobile-width layout is usable.
- Example gallery can reproduce stable outputs.
- Share card is readable and screenshot-friendly.
Verification:
- Manual browser preview.
- Screenshot review at desktop and mobile widths.
- Example generation for at least six objects.
Phase 3 β Vision Understanding
Goal: replace mock object recognition with a real VLM path while preserving fallback behavior.
Status: local wiring complete; hosted ZeroGPU validation reaches the app but falls back to mock vision.
Scope:
- Add MiniCPM-V or lightweight VLM runner in
src/models/vision_runner.py. - Keep manual description fallback.
- Validate object understanding JSON with schemas.
- Add JSON repair or retry behavior.
- Cache stable examples for demo reliability.
Exit criteria:
- Uploaded object photos produce structured object JSON.
- Cups, keyboards, and shoes are recognized with useful visible features.
- Fallback path works when VLM fails.
Verification:
- Run local sample image checks.
- Confirm schema validation.
- Confirm fallback trace markers.
- Run
scripts/check_space_vlm.py --configure-space --hardware zero-a10g --rollback-to-mockafter external-state confirmation. - Inspect Space runtime logs or add non-secret diagnostics before rerunning, because the 2026-06-08 hosted validation returned
vision-fallback-to-mockfor mug, keyboard, and shoe.
Phase 4 β Text Runtime With llama.cpp
Goal: make persona, diary, and chat generation use a small local text model runtime.
Status: optional runtime wiring complete; published LoRA v2 Q4_K_M GGUF passed local llama.cpp smoke. Hosted Space text runtime validation is still pending.
Scope:
- Add llama.cpp / llama-cpp-python runner. Completed as optional runtime wiring.
- Add model path configuration. Completed through
TEXT_MODEL_PATH. - Preserve
src/pipeline.pyas the UI-independent generation boundary. - Implement persona generation.
- Implement diary generation.
- Implement chat continuation.
- Keep deterministic mock fallback for demos.
Exit criteria:
- Text generation can run through llama.cpp or documented local fallback.
- README documents runtime path and published GGUF selection.
- Trace records include runtime metadata.
Verification:
- Local runtime smoke test with
objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf. - JSON schema validation.
- Compare at least three object generations for persona consistency.
Phase 5 β Dataset And Fine-Tuning Preparation
Goal: prepare Well-Tuned badge evidence.
Status: mock SFT preview complete; real candidate generation waits for verified model paths.
Scope:
- Use
scripts/generate_dataset.pyto validate the SFT schema locally. - Generate 200-500 object-persona candidate samples after real model path is available.
- Manually curate at least 50 high-quality examples.
- Define SFT schema.
- Prepare dataset preview.
- Draft dataset privacy notes.
Exit criteria:
- Mock SFT preview exists and parses as JSONL.
- Training dataset is structured and inspectable.
- Public examples contain no private data.
- Dataset card draft exists.
Verification:
- Validate JSONL format.
- Spot-check curated samples.
- Confirm no obvious sensitive data.
Phase 6 β LoRA Fine-Tuning And Model Card
Goal: publish a small fine-tuned model or adapter that can be linked in submission materials.
Scope:
- Run LoRA training with Modal or local resources.
- Export adapter or merged model.
- Convert to GGUF if needed.
- Publish HF model repo.
- Complete
docs/MODEL_CARD.md.
Exit criteria:
- Fine-tuned model repo exists.
- Model parameter count is documented.
- Runtime instructions are documented.
Verification:
- Run inference on sample prompts.
- Confirm HF model links.
- Confirm no private credit codes or tokens are present.
Phase 7 β Public Traces And Reproducibility
Goal: satisfy Sharing is Caring expectations.
Scope:
- Produce at least six public traces.
- Keep
data/traces/samples/in sync with the six example objects. - Export public traces to JSONL for dataset-style sharing.
- Add prompt templates.
- Add dataset preview.
- Document failures and fallbacks.
- Ensure trace anonymization.
Exit criteria:
- Public trace files are readable JSON.
- Trace docs explain how outputs were produced.
- Example gallery aligns with public traces.
Verification:
- Validate trace JSON.
- Inspect anonymization.
- Confirm README links.
Phase 8 β Hugging Face Space Deployment
Goal: deploy the app in the required Gradio format.
Status: Space exists and mock app has been verified; MiniCPM-V L4 validation is pending.
Scope:
- Create Hugging Face Space. Completed.
- Add Space README YAML header. Completed.
- Confirm
app_file: app.py. Completed. - Configure model paths and fallback mode. Mock-safe default complete; VLM variables pending real validation.
- Check runtime resource constraints. Pending L4 validation.
Exit criteria:
- Space opens publicly or under the official hackathon organization.
- App can generate at least stable demo examples.
- README includes deployment and model notes.
Verification:
- Launch on HF Space. Completed for mock-safe runtime.
- Run demo flow in hosted environment.
- Run Space VLM validation for mug, keyboard, and shoe.
- Check logs for missing secrets or path errors.
Phase 9 β Field Notes And Demo Video
Goal: complete narrative submission assets.
Scope:
- Write Field Notes article.
- Record demo video under 2 minutes.
- Prepare social post.
- Add badge evidence to README.
Exit criteria:
- Field Notes URL exists.
- Demo video URL exists.
- Social post URL exists.
- Submission package has all required links.
Verification:
- Watch final video.
- Check all URLs.
- Confirm README and submission guide are aligned.
Phase 10 β Final Submission Audit
Goal: reduce avoidable submission risk.
Checklist:
- Space under official organization.
- Demo video ready.
- Social post ready.
- README complete.
- Model parameter count documented.
- No commercial cloud AI API.
- Fine-tuned model linked.
- Dataset linked.
- Traces linked.
- Field Notes linked.
- UI English-first and Chinese-second.
- Submit before June 15, 2026.
Risk Register
| Risk | Impact | Mitigation |
|---|---|---|
| VLM deployment is slow | Blocks real image understanding | Keep manual description and example gallery fallback |
| llama.cpp setup is unstable | Blocks Llama Champion badge | Use text mock fallback for demo while isolating runtime work |
| Fine-tuning takes too long | Weakens Well-Tuned badge | Prepare small curated dataset and prompt-tuned fallback |
| HF Space resources are limited | Demo may be slow | Cache examples and support CPU fallback |
| Trace contains private data | Submission/privacy risk | Anonymize trace input and avoid raw private images |
Working Rule
Do not start a later phase by breaking an earlier verified flow. The mock MVP should remain usable while real model paths are added behind clear fallbacks.