ObjectverseDiary / docs /SUBMISSION_GUIDE.md
qqyule's picture
Deploy live MiniCPM-V vision defaults
0cadcec verified
# Submission Guide
## Required Package
- [x] Hugging Face Space URL: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
- [x] GitHub Repository URL: https://github.com/qqyule/Objectverse-Diary
- [x] Demo Video Script: `docs/DEMO_VIDEO_SCRIPT.md`
- [x] Social Media Post Draft: `docs/SOCIAL_POST.md`
- [x] Fine-tuned Model URL: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
- [x] Dataset URL: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
- [x] Trace Dataset: local public mock JSONL export at `data/traces/samples/objectverse_public_mock_traces.jsonl`
- [x] Field Notes Draft: `docs/FIELD_NOTES.md`
- [x] Short project description: available in README
## Local Evidence Ready
- Initial mock MVP report: `docs/INITIAL_STAGE_REPORT.md`
- Runtime boundary: `docs/RUNTIME.md`
- Dataset plan and preview workflow: `docs/DATASET.md`
- External setup checklist: `docs/EXTERNAL_SETUP.md`
- Space VLM validation report: `docs/SPACE_VLM_REPORT.md` currently passes for public mug, keyboard, and shoe images on ZeroGPU with `OBJECTVERSE_VISION_BACKEND=minicpm-v`.
- Space VLM diagnostics: hidden `/vision_runtime_probe` API confirms Torch/Transformers, CUDA, and MiniCPM-V model load status.
- Live Space runtime: ZeroGPU `zero-a10g` with `OBJECTVERSE_VISION_BACKEND=minicpm-v`, `VISION_MODEL_ID=openbmb/MiniCPM-V-2_6`, and `OBJECTVERSE_TEXT_BACKEND=mock`.
- Space VLM trace evidence: `data/traces/space-vlm/`
- Public mock traces: `data/traces/samples/`
- Stable demo baseline: Gradio example buttons replay committed sample traces first, then fall back to the live generation pipeline if a cached trace is missing.
- Optional llama.cpp runtime wiring: `src/models/llama_cpp_runner.py`
- Published LoRA v2 Q4_K_M GGUF: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora/blob/main/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
## Completed Locally
- Mock MVP flow, archive-style UI, share card, trace logging, sample traces, dataset preview, and initial acceptance tooling.
- Stable local demo baseline with six replayable example outputs, shared cached/live UI formatting, chat wake state, share card, and trace panel output.
- MiniCPM-V 2.6 backend wiring with fallback markers.
- Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
- Hosted Space VLM validation script, report, JSON summary, and trace evidence export.
- Hosted Space VLM probe support, latest failure-note update support, and passing MiniCPM-V ZeroGPU validation after adding an `HF_TOKEN` Space secret for gated model access.
- Local GGUF smoke-test helper passed with `models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`; trace text runtime was `llama-cpp text generation` and no `text-fallback-to-mock` was present.
- Synthetic curated v2 SFT dataset published to Hugging Face Datasets: 200 rows, 40 objects, 5 personality modes.
- Modal Qwen 1.5B LoRA v2 run completed and adapter published to Hugging Face Models.
- LoRA v2 adapter merged into `Qwen/Qwen2.5-1.5B-Instruct`, converted with pinned `llama.cpp`, quantized to Q4_K_M, and uploaded to the same model repo.
- Field Notes draft, demo video script, and social post draft for the stable submission package.
## Not Completed Yet
- Hosted Space text runtime validation with the published GGUF. The local runtime passed, but the public Space intentionally remains on mock text for the live MiniCPM-V vision release.
- Real text-model traces from the hosted Space.
- Field Notes publication URL, recorded demo video URL, social post URL, and final public submission.
## Final Checks
- [x] Space is under the official organization.
- [x] Space MiniCPM-V validation passes for mug, keyboard, and shoe.
- [x] Space is configured for live MiniCPM-V vision on ZeroGPU with mock text.
- [x] Space MiniCPM-V non-secret diagnostic probe is implemented locally.
- [x] Demo video script targets under 2 minutes.
- [x] README includes stable-baseline parameter budget and links to the model card.
- [x] No commercial cloud AI APIs are used.
- [x] Mock-safe local demo baseline is reproducible from committed sample traces.
- [x] Fine-tuned model is linked.
- [x] Dataset is linked.
- [x] Traces are linked.
- [ ] Field Notes are linked.
- [x] UI remains English-first and Chinese-second.
- [ ] Submission is complete before June 15, 2026.