Spaces:

build-small-hackathon
/

ObjectverseDiary

Running on Zero

App Files Files Community

ObjectverseDiary / docs /RUNTIME.md

qqyule

Deploy live MiniCPM-V vision defaults

0cadcec verified 3 days ago

preview code

raw

history blame contribute delete

6.45 kB

	# Runtime Configuration

	## Current Runtime

	Local development defaults to deterministic mock paths:

	- `OBJECTVERSE_VISION_BACKEND=mock`
	- `OBJECTVERSE_TEXT_BACKEND=mock`

	For local runs, this means:

	- object understanding is generated by `src/models/vision_runner.py`
	- persona, diary, and chat are generated by `src/models/llama_cpp_runner.py`
	- traces mark `mock-runtime` in the `fallbacks` field

	No commercial cloud AI APIs are used.

	The public Hugging Face Space is configured differently for the live demo:

	```bash
	OBJECTVERSE_VISION_BACKEND=minicpm-v
	VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
	OBJECTVERSE_TEXT_BACKEND=mock
	```

	The Space should run on `zero-a10g` so `@spaces.GPU` can allocate GPU time for MiniCPM-V requests. The required `HF_TOKEN` for gated `openbmb/MiniCPM-V-2_6` access is stored as a Space Secret and must not be committed.

	MiniCPM-V 2.6 vision can be enabled without changing the UI:

	```bash
	OBJECTVERSE_VISION_BACKEND=minicpm-v \
	VISION_MODEL_ID=openbmb/MiniCPM-V-2_6 \
	OBJECTVERSE_TEXT_BACKEND=mock \
	.venv/bin/python app.py
	```

	This only replaces object understanding. Persona generation, diary generation, and chat can remain mock or use the optional llama.cpp text path below.

	Optional llama.cpp text generation can be enabled without changing the UI:

	```bash
	OBJECTVERSE_TEXT_BACKEND=llama-cpp \
	TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
	.venv/bin/python app.py
	```

	For a hosted Space where the GGUF is stored on Hugging Face Hub instead of the local filesystem, configure the Hub source instead of `TEXT_MODEL_PATH`:

	```bash
	OBJECTVERSE_TEXT_BACKEND=llama-cpp
	TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
	TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
	```

	`TEXT_MODEL_REVISION` is optional and defaults to the Hub repo default branch. If `TEXT_MODEL_PATH` is set, it takes precedence over Hub download variables.

	`llama-cpp-python` and `huggingface_hub` are installed by the Space runtime dependencies. Missing package, missing model path, download errors, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.

	The runtime trace intentionally records only whether an external GGUF path was configured, not the literal `TEXT_MODEL_PATH`, so local private paths do not leak into public traces.

	Local LoRA v2 GGUF status:

	- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
	- Adapter / GGUF repo: `qqyule/objectverse-diary-qwen15b-lora`
	- Published GGUF: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`
	- Local smoke: passed on 2026-06-08 with `llama-cpp text generation` and no `text-fallback-to-mock`
	- Space runtime: live MiniCPM-V vision with mock text; not switched to llama.cpp text until a separate Space validation passes

	## Runtime Diagnostics

	The Gradio app exposes two hidden diagnostic APIs:

	- `/zero_gpu_probe`: checks Torch import and CUDA visibility.
	- `/vision_runtime_probe`: checks configured vision backend, Torch/Transformers import, CUDA/MPS visibility, and MiniCPM-V load success or sanitized failure summaries.

	These APIs are for validation scripts and are not visible in the main UI. They must not return tokens, `.env` paths, Hugging Face token markers, or private local filesystem paths.

	`scripts/check_space_vlm.py` calls `/vision_runtime_probe` before the mug/keyboard/shoe validation run and writes the probe output into `docs/SPACE_VLM_REPORT.md` and `docs/SPACE_VLM_REPORT.json`.

	## Optional GGUF Smoke Test

	Recommended LoRA v2 smoke model:

	```text
	repo: qqyule/objectverse-diary-qwen15b-lora
	file: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
	local path: models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
	```

	The `models/` directory and `*.gguf` are ignored by Git. After downloading the file externally and installing optional `llama-cpp-python`, run:

	```bash
	.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
	--model-path models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
	```

	A passing smoke test must show `llama-cpp text generation` and must not include `text-fallback-to-mock` in either generation or chat fallback markers.

	## Environment Variables

	```bash
	OBJECTVERSE_VISION_BACKEND=mock
	OBJECTVERSE_TEXT_BACKEND=mock
	VISION_MODEL_ID=
	TEXT_MODEL_PATH=
	TEXT_MODEL_REPO_ID=
	TEXT_MODEL_FILENAME=
	TEXT_MODEL_REVISION=
	TRACE_OUTPUT_DIR=data/traces
	```

	For the live hosted Space, set these Variables:

	```bash
	OBJECTVERSE_VISION_BACKEND=minicpm-v
	VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
	OBJECTVERSE_TEXT_BACKEND=mock
	```

	Recommended Space hardware for this path is ZeroGPU `zero-a10g`. If live validation fails, use the rollback command in `docs/DEVELOPMENT_STATUS.md` to switch `OBJECTVERSE_VISION_BACKEND` back to `mock` and request `cpu-basic`.

	For a Space or local runtime with a separately provided GGUF text model, set:

	```bash
	OBJECTVERSE_TEXT_BACKEND=llama-cpp
	TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
	```

	For a Space runtime that should download the published LoRA v2 GGUF from Hub, set:

	```bash
	OBJECTVERSE_VISION_BACKEND=mock
	OBJECTVERSE_TEXT_BACKEND=llama-cpp
	TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
	TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
	```

	Do not commit GGUF files or private model paths.

	## Future Runtime Boundary

	The next implementation phase should keep the same pipeline boundary:

	1. UI calls `src/pipeline.py`.
	2. `src/pipeline.py` calls the configured vision and text runners.
	3. runners return validated Pydantic schemas.
	4. trace logging records backend metadata and fallback markers.

	Do not move model calls into `src/ui/layout.py`.

	## Fallback Rules

	- VLM unavailable: use manual description and mock/example gallery path.
	- llama.cpp unavailable: use mock text generation path and record `text-fallback-to-mock`.
	- invalid model JSON: repair and validate before rendering, then fall back to mock if validation fails.
	- private input: anonymize trace text before saving public traces.

	Trace fallback markers:

	- `mock-runtime`: default mock vision and mock text runtime.
	- `mock-text-runtime`: real or configured vision path with mock text generation.
	- `mock-vision-runtime`: mock vision with a configured non-mock text backend.
	- `vision-fallback-to-mock`: MiniCPM-V failed or returned invalid JSON, so mock object understanding was used.
	- `text-fallback-to-mock`: llama.cpp was configured but unavailable, invalid, or unable to return schema-valid JSON.