ObjectverseDiary / docs /RUNTIME.md
qqyule's picture
Deploy live MiniCPM-V vision defaults
0cadcec verified

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

Runtime Configuration

Current Runtime

Local development defaults to deterministic mock paths:

  • OBJECTVERSE_VISION_BACKEND=mock
  • OBJECTVERSE_TEXT_BACKEND=mock

For local runs, this means:

  • object understanding is generated by src/models/vision_runner.py
  • persona, diary, and chat are generated by src/models/llama_cpp_runner.py
  • traces mark mock-runtime in the fallbacks field

No commercial cloud AI APIs are used.

The public Hugging Face Space is configured differently for the live demo:

OBJECTVERSE_VISION_BACKEND=minicpm-v
VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
OBJECTVERSE_TEXT_BACKEND=mock

The Space should run on zero-a10g so @spaces.GPU can allocate GPU time for MiniCPM-V requests. The required HF_TOKEN for gated openbmb/MiniCPM-V-2_6 access is stored as a Space Secret and must not be committed.

MiniCPM-V 2.6 vision can be enabled without changing the UI:

OBJECTVERSE_VISION_BACKEND=minicpm-v \
VISION_MODEL_ID=openbmb/MiniCPM-V-2_6 \
OBJECTVERSE_TEXT_BACKEND=mock \
.venv/bin/python app.py

This only replaces object understanding. Persona generation, diary generation, and chat can remain mock or use the optional llama.cpp text path below.

Optional llama.cpp text generation can be enabled without changing the UI:

OBJECTVERSE_TEXT_BACKEND=llama-cpp \
TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
.venv/bin/python app.py

For a hosted Space where the GGUF is stored on Hugging Face Hub instead of the local filesystem, configure the Hub source instead of TEXT_MODEL_PATH:

OBJECTVERSE_TEXT_BACKEND=llama-cpp
TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf

TEXT_MODEL_REVISION is optional and defaults to the Hub repo default branch. If TEXT_MODEL_PATH is set, it takes precedence over Hub download variables.

llama-cpp-python and huggingface_hub are installed by the Space runtime dependencies. Missing package, missing model path, download errors, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.

The runtime trace intentionally records only whether an external GGUF path was configured, not the literal TEXT_MODEL_PATH, so local private paths do not leak into public traces.

Local LoRA v2 GGUF status:

  • Base model: Qwen/Qwen2.5-1.5B-Instruct
  • Adapter / GGUF repo: qqyule/objectverse-diary-qwen15b-lora
  • Published GGUF: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
  • Local smoke: passed on 2026-06-08 with llama-cpp text generation and no text-fallback-to-mock
  • Space runtime: live MiniCPM-V vision with mock text; not switched to llama.cpp text until a separate Space validation passes

Runtime Diagnostics

The Gradio app exposes two hidden diagnostic APIs:

  • /zero_gpu_probe: checks Torch import and CUDA visibility.
  • /vision_runtime_probe: checks configured vision backend, Torch/Transformers import, CUDA/MPS visibility, and MiniCPM-V load success or sanitized failure summaries.

These APIs are for validation scripts and are not visible in the main UI. They must not return tokens, .env paths, Hugging Face token markers, or private local filesystem paths.

scripts/check_space_vlm.py calls /vision_runtime_probe before the mug/keyboard/shoe validation run and writes the probe output into docs/SPACE_VLM_REPORT.md and docs/SPACE_VLM_REPORT.json.

Optional GGUF Smoke Test

Recommended LoRA v2 smoke model:

repo: qqyule/objectverse-diary-qwen15b-lora
file: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
local path: models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf

The models/ directory and *.gguf are ignored by Git. After downloading the file externally and installing optional llama-cpp-python, run:

.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
  --model-path models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf

A passing smoke test must show llama-cpp text generation and must not include text-fallback-to-mock in either generation or chat fallback markers.

Environment Variables

OBJECTVERSE_VISION_BACKEND=mock
OBJECTVERSE_TEXT_BACKEND=mock
VISION_MODEL_ID=
TEXT_MODEL_PATH=
TEXT_MODEL_REPO_ID=
TEXT_MODEL_FILENAME=
TEXT_MODEL_REVISION=
TRACE_OUTPUT_DIR=data/traces

For the live hosted Space, set these Variables:

OBJECTVERSE_VISION_BACKEND=minicpm-v
VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
OBJECTVERSE_TEXT_BACKEND=mock

Recommended Space hardware for this path is ZeroGPU zero-a10g. If live validation fails, use the rollback command in docs/DEVELOPMENT_STATUS.md to switch OBJECTVERSE_VISION_BACKEND back to mock and request cpu-basic.

For a Space or local runtime with a separately provided GGUF text model, set:

OBJECTVERSE_TEXT_BACKEND=llama-cpp
TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf

For a Space runtime that should download the published LoRA v2 GGUF from Hub, set:

OBJECTVERSE_VISION_BACKEND=mock
OBJECTVERSE_TEXT_BACKEND=llama-cpp
TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf

Do not commit GGUF files or private model paths.

Future Runtime Boundary

The next implementation phase should keep the same pipeline boundary:

  1. UI calls src/pipeline.py.
  2. src/pipeline.py calls the configured vision and text runners.
  3. runners return validated Pydantic schemas.
  4. trace logging records backend metadata and fallback markers.

Do not move model calls into src/ui/layout.py.

Fallback Rules

  • VLM unavailable: use manual description and mock/example gallery path.
  • llama.cpp unavailable: use mock text generation path and record text-fallback-to-mock.
  • invalid model JSON: repair and validate before rendering, then fall back to mock if validation fails.
  • private input: anonymize trace text before saving public traces.

Trace fallback markers:

  • mock-runtime: default mock vision and mock text runtime.
  • mock-text-runtime: real or configured vision path with mock text generation.
  • mock-vision-runtime: mock vision with a configured non-mock text backend.
  • vision-fallback-to-mock: MiniCPM-V failed or returned invalid JSON, so mock object understanding was used.
  • text-fallback-to-mock: llama.cpp was configured but unavailable, invalid, or unable to return schema-valid JSON.