ObjectverseDiary / docs /SUBMISSION_GUIDE.md
qqyule's picture
Deploy live MiniCPM-V vision defaults
0cadcec verified

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

Submission Guide

Required Package

Local Evidence Ready

  • Initial mock MVP report: docs/INITIAL_STAGE_REPORT.md
  • Runtime boundary: docs/RUNTIME.md
  • Dataset plan and preview workflow: docs/DATASET.md
  • External setup checklist: docs/EXTERNAL_SETUP.md
  • Space VLM validation report: docs/SPACE_VLM_REPORT.md currently passes for public mug, keyboard, and shoe images on ZeroGPU with OBJECTVERSE_VISION_BACKEND=minicpm-v.
  • Space VLM diagnostics: hidden /vision_runtime_probe API confirms Torch/Transformers, CUDA, and MiniCPM-V model load status.
  • Live Space runtime: ZeroGPU zero-a10g with OBJECTVERSE_VISION_BACKEND=minicpm-v, VISION_MODEL_ID=openbmb/MiniCPM-V-2_6, and OBJECTVERSE_TEXT_BACKEND=mock.
  • Space VLM trace evidence: data/traces/space-vlm/
  • Public mock traces: data/traces/samples/
  • Stable demo baseline: Gradio example buttons replay committed sample traces first, then fall back to the live generation pipeline if a cached trace is missing.
  • Optional llama.cpp runtime wiring: src/models/llama_cpp_runner.py
  • Published LoRA v2 Q4_K_M GGUF: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora/blob/main/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf

Completed Locally

  • Mock MVP flow, archive-style UI, share card, trace logging, sample traces, dataset preview, and initial acceptance tooling.
  • Stable local demo baseline with six replayable example outputs, shared cached/live UI formatting, chat wake state, share card, and trace panel output.
  • MiniCPM-V 2.6 backend wiring with fallback markers.
  • Optional llama.cpp text runtime wiring through TEXT_MODEL_PATH.
  • Hosted Space VLM validation script, report, JSON summary, and trace evidence export.
  • Hosted Space VLM probe support, latest failure-note update support, and passing MiniCPM-V ZeroGPU validation after adding an HF_TOKEN Space secret for gated model access.
  • Local GGUF smoke-test helper passed with models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf; trace text runtime was llama-cpp text generation and no text-fallback-to-mock was present.
  • Synthetic curated v2 SFT dataset published to Hugging Face Datasets: 200 rows, 40 objects, 5 personality modes.
  • Modal Qwen 1.5B LoRA v2 run completed and adapter published to Hugging Face Models.
  • LoRA v2 adapter merged into Qwen/Qwen2.5-1.5B-Instruct, converted with pinned llama.cpp, quantized to Q4_K_M, and uploaded to the same model repo.
  • Field Notes draft, demo video script, and social post draft for the stable submission package.

Not Completed Yet

  • Hosted Space text runtime validation with the published GGUF. The local runtime passed, but the public Space intentionally remains on mock text for the live MiniCPM-V vision release.
  • Real text-model traces from the hosted Space.
  • Field Notes publication URL, recorded demo video URL, social post URL, and final public submission.

Final Checks

  • Space is under the official organization.
  • Space MiniCPM-V validation passes for mug, keyboard, and shoe.
  • Space is configured for live MiniCPM-V vision on ZeroGPU with mock text.
  • Space MiniCPM-V non-secret diagnostic probe is implemented locally.
  • Demo video script targets under 2 minutes.
  • README includes stable-baseline parameter budget and links to the model card.
  • No commercial cloud AI APIs are used.
  • Mock-safe local demo baseline is reproducible from committed sample traces.
  • Fine-tuned model is linked.
  • Dataset is linked.
  • Traces are linked.
  • Field Notes are linked.
  • UI remains English-first and Chinese-second.
  • Submission is complete before June 15, 2026.