Spaces:

build-small-hackathon
/

ObjectverseDiary

Running on Zero

App Files Files Community

ObjectverseDiary / docs /FAILURES.md

qqyule

Deploy latest Objectverse Diary from fa09aac

dd6cefc verified 3 days ago

preview code

raw

history blame contribute delete

6.89 kB

	# Failure Notes

	## Purpose

	This file tracks reproducible failures and fallback behavior for Objectverse Diary.

	Use it for model/runtime/deployment/data issues, not for UI polish notes.

	## Current Status

	MiniCPM-V 2.6 is wired as an optional vision backend. Hosted Space ZeroGPU validation now passes for public mug, keyboard, and shoe images after adding an `HF_TOKEN` Space secret with access to the gated `openbmb/MiniCPM-V-2_6` model.

	The app includes a hidden `/vision_runtime_probe` API and `scripts/check_space_vlm.py` writes probe output into the Space VLM report before image validation. This probe identified the previous failure as a gated-model access issue rather than a GPU or dependency issue.

	The published LoRA v2 GGUF for local text smoke testing is available and has passed local llama.cpp smoke:

	- repo: `qqyule/objectverse-diary-qwen15b-lora`
	- file: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`
	- helper: `scripts/check_llama_cpp_smoke.py`

	Known non-blocking warning:

	- Gradio emits deprecation warnings for upcoming 6.0 API changes during local tests. This does not break the current Gradio Blocks build and can be handled with the later UI/API polish pass.

	## Failure Record Template

	```markdown
	## YYYY-MM-DD - Short Failure Name

	- Area:
	- Reproduction:
	- Expected:
	- Actual:
	- Impact:
	- Fallback used:
	- Resolution:
	- Evidence:
	```

	## 2026-06-08 - Hosted ZeroGPU MiniCPM-V Falls Back To Mock

	- Area: Hugging Face Space vision runtime.
	- Reproduction: Run `scripts/check_space_vlm.py` with `--configure-space --hardware zero-a10g --rollback-to-mock` against `build-small-hackathon/ObjectverseDiary`.
	- Expected: mug, keyboard, and shoe validations use `minicpm-v object understanding` without `vision-fallback-to-mock`.
	- Actual: all three validations returned schema-valid traces, but every trace included `vision-fallback-to-mock`.
	- Impact: hosted Space MiniCPM-V evidence is not ready for submission; stable mock demo remains usable.
	- Fallback used: mock object understanding plus mock text runtime.
	- Resolution: unresolved; inspect Space runtime logs or add non-secret fallback diagnostics for the MiniCPM-V load/chat exception.
	- Evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, and `data/traces/space-vlm/`.

	## 2026-06-08 - Hosted ZeroGPU MiniCPM-V Validation Passes After HF_TOKEN Secret

	- Area: Hugging Face Space vision runtime.
	- Reproduction: Run `scripts/check_space_vlm.py` with `--configure-space --hardware zero-a10g --rollback-to-mock` against `build-small-hackathon/ObjectverseDiary`.
	- Expected: mug, keyboard, and shoe validations use `minicpm-v object understanding` without `vision-fallback-to-mock`.
	- Actual: all three validations passed; probe reported `minicpm_load_ok=True`; traces include only `mock-text-runtime`.
	- Impact: OpenBMB / hosted MiniCPM-V vision evidence is ready. Public Space still rolls back to mock-safe defaults after validation.
	- Resolution: resolved by adding an `HF_TOKEN` Space secret with gated model access.
	- Evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, and `data/traces/space-vlm/`.

	## Previous Space VLM Validation Failure

	- Updated: 2026-06-08 08:33:19 UTC
	- Area: Hugging Face Space vision runtime.
	- Probe backend: `minicpm-v`
	- MiniCPM load attempted: `True`
	- MiniCPM load ok: `False`
	- Probe errors: minicpm_load=OSError
	- Failed checks: mug: vision fallback marker was present; keyboard: vision fallback marker was present; shoe: vision fallback marker was present
	- Fallback used: mock object understanding plus mock text runtime if validation reaches generation.
	- Resolution: unresolved; keep the public Space mock-safe until this section reports a passing VLM validation.

	## 2026-06-08 - LoRA v2 GGUF Local Smoke Passed

	- Area: llama.cpp text runtime evidence.
	- Reproduction: Run `scripts/check_llama_cpp_smoke.py` with `models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` after optional `llama-cpp-python` installation.
	- Expected: trace records `llama-cpp text generation`, persona/diary/chat run without `text-fallback-to-mock`.
	- Actual: passed locally; trace included only `mock-vision-runtime` because text was real and vision remained mock for the smoke input.
	- Impact: local llama.cpp text runtime evidence is ready. Public Space text runtime is still not validated with this GGUF.
	- Fallback used: none for text.
	- Resolution: resolved locally by using the merged LoRA v2 Q4_K_M GGUF and conservative JSON extraction / decoding settings.
	- Evidence: `scripts/check_llama_cpp_smoke.py`, `docs/RUNTIME.md`, and the Hub file in `qqyule/objectverse-diary-qwen15b-lora`.

	## 2026-06-08 - Hugging Face Xet GGUF Upload Stalled

	- Area: Hugging Face model file upload.
	- Reproduction: Upload `models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` with the default Hub client path.
	- Expected: upload completes and commits `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`.
	- Actual: the first upload stalled with Xet TLS EOF / `CLOSE_WAIT` after partial progress.
	- Impact: upload needed a retry; local GGUF file was unaffected.
	- Fallback used: stopped the stalled upload process and retried with `HF_HUB_DISABLE_XET=1`.
	- Resolution: resolved; ordinary Hub/LFS upload succeeded.
	- Evidence: Hub file `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora/blob/main/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`.

	## Anticipated Failure Areas

	### Vision Runtime

	- MiniCPM-V or fallback VLM fails to load in local or Space environment.
	- Uploaded image is unsupported, too large, or not an object photo.
	- Model output is not valid JSON.
	- Object recognition is too vague for persona generation.

	Fallback:

	- use manual object description
	- use stable example flow
	- record fallback marker in trace
	- `vision-fallback-to-mock` means MiniCPM-V failed or returned invalid JSON and mock object understanding was used.

	### Text Runtime

	- GGUF model path is missing.
	- llama.cpp or llama-cpp-python import fails.
	- Generation output does not match `PersonaEnvelope` or `DiaryEntry` schema.
	- Chat response loses persona consistency.

	Fallback:

	- use deterministic mock text runtime
	- repair JSON before schema validation
	- record fallback marker in trace

	### Dataset And Fine-Tuning

	- Candidate samples contain private information.
	- JSONL rows fail to parse.
	- Curated data is too repetitive.
	- Training run fails or adapter quality is weaker than mock prompts.

	Fallback:

	- keep mock preview data for schema validation only
	- separate raw candidates from curated rows
	- publish only reviewed examples

	### Hugging Face Space

	- Space cannot start because of missing packages.
	- Model files are too large or not available.
	- CPU runtime is too slow.
	- Secrets or private paths appear in logs.

	Fallback:

	- launch with mock runtime first
	- keep model files out of the repo
	- document runtime mode clearly in README