Spaces:

build-small-hackathon
/

ObjectverseDiary

Running on Zero

App Files Files Community

ObjectverseDiary / docs /MODEL_CARD.md

qqyule

Deploy live MiniCPM-V vision defaults

0cadcec verified 3 days ago

preview code

raw

history blame contribute delete

6.24 kB

	# Model Card

	## Status

	Stable local baseline plus live MiniCPM-V Space vision, one published text LoRA v2 adapter, and one published Q4_K_M GGUF. The public Gradio Space defaults to real MiniCPM-V object understanding with deterministic mock text; the GGUF has passed local llama.cpp smoke, but it has not been switched into the live Space runtime.

	Local development defaults to deterministic mock backends. The hosted Space runs MiniCPM-V 2.6 vision on ZeroGPU with a hidden non-secret probe for diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`, but the live Space keeps text on the mock runtime for this release. A Modal LoRA v2 run completed, the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`, and the merged Q4_K_M GGUF is published in the same repo.

	Hosted MiniCPM-V validation passed after adding an `HF_TOKEN` Space secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The validation uses public mug, keyboard, and shoe images on ZeroGPU, while text generation intentionally remains mock. See `docs/SPACE_VLM_REPORT.md`.

	## Planned Components

	- Vision understanding: MiniCPM-V or lightweight fallback VLM.
	- Text generation: fine-tuned small LLM.
	- Runtime: llama.cpp / llama-cpp-python.

	## Candidate Architecture

	\| Component \| Candidate \| Notes \|
	\| --- \| --- \| --- \|
	\| Vision \| `openbmb/MiniCPM-V-2_6` or mock fallback \| Live Space uses MiniCPM-V on ZeroGPU; local runtime can still default to mock. \|
	\| Text \| deterministic mock text by default; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA v2 Q4_K_M GGUF for local runtime \| Adapter and GGUF published; Space text runtime remains mock for the live vision release. \|
	\| Runtime \| optional GGUF through llama.cpp / llama-cpp-python \| Wired with mock fallback; local GGUF smoke passed on 2026-06-08. \|
	\| UI \| Gradio Blocks \| Required by the hackathon and project rules. \|

	## Parameter Budget

	Total model parameters must remain <= 32B.

	Record final numbers here before submission:

	\| Component \| Model \| Parameters \| Counted Toward Total \|
	\| --- \| --- \| ---: \| --- \|
	\| Vision \| MiniCPM-V 2.6 optional path \| ~8B \| yes, when enabled \|
	\| Text base \| Stable baseline mock text \| 0 \| no model parameters \|
	\| Optional text base \| `Qwen/Qwen2.5-1.5B-Instruct` \| ~1.5B \| yes, when enabled \|
	\| Published LoRA v2 GGUF \| `qqyule/objectverse-diary-qwen15b-lora` / `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` \| ~1.5B base, quantized file \| yes, if enabled \|
	\| Published LoRA adapter \| `qqyule/objectverse-diary-qwen15b-lora` \| small adapter over base model \| yes, when enabled \|
	\| Live Space total \| MiniCPM-V vision + mock text \| ~8B active model parameters \| <= 32B \|

	If the optional MiniCPM-V 2.6 vision path and planned Qwen 1.5B text base are both enabled, the expected total remains about 9.5B plus a small LoRA adapter, safely under the 32B project budget.

	## Intended Inputs And Outputs

	Inputs:

	- user-uploaded everyday object photo
	- optional object description
	- personality mode

	Outputs:

	- structured object understanding JSON
	- hidden object persona JSON
	- short English-first diary with Chinese helper text
	- object chat response
	- share card preview
	- anonymized trace record

	## Dataset Notes

	Dataset planning lives in `docs/DATASET.md`.

	Current preview data is deterministic and mock-generated. It should only be used for schema validation, dry-run validation, and workflow planning until real candidate samples are generated and curated.

	The Modal training scaffold defaults to `Qwen/Qwen2.5-1.5B-Instruct` and saves adapter artifacts to a Modal Volume. `data/train/objectverse_sft_curated_v2.jsonl` contains 200 synthetic curated rows covering 40 everyday objects and 5 personality modes. It is published at `https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated` as `objectverse_sft_curated_v2.jsonl`.

	Published adapter:

	```text
	https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
	```

	Current v2 training run summary:

	- Platform: Modal
	- Run name: `objectverse-diary-qwen15b-lora-v2`
	- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
	- Dataset: 200 synthetic curated v2 rows
	- Train / eval rows: 180 / 20
	- Steps: 120
	- Max sequence length: 1536
	- Learning rate: 0.0001
	- Effective batch size: 8
	- LoRA rank / alpha / dropout: 32 / 64 / 0.05
	- Assistant-output-only loss: enabled
	- Train loss: 0.3240
	- Eval loss: 0.0162
	- Epoch: 5.2222
	- GGUF conversion: completed with pinned `llama.cpp` commit `8f83d6c271d194bde2d410145a0ce73bc42e85cd`
	- Published GGUF: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`

	GGUF smoke status:

	- Repo: `qqyule/objectverse-diary-qwen15b-lora`
	- File: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`
	- Local helper: `scripts/check_llama_cpp_smoke.py`
	- Local result: passed on 2026-06-08 with `llama-cpp text generation`, no `text-fallback-to-mock`, schema-valid persona and diary, and non-empty chat reply.
	- Space result: not run; do not claim live Space text runtime until a separate Space validation passes.

	## Safety And Privacy

	- Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
	- Do not publish private user photos or unconsented personal data.
	- Do not include tokens, credit codes, emails, serial numbers, or credentials.
	- Keep raw private traces out of public datasets.

	## Fallback Behavior

	- If VLM loading fails, use manual description and stable example flow.
	- If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
	- If model JSON is invalid, repair and validate before rendering.
	- Runtime traces do not record literal `TEXT_MODEL_PATH`; they only record that an external GGUF path is configured.
	- Hosted VLM validation evidence is preserved in `data/traces/space-vlm/`. These traces use real MiniCPM-V object understanding plus mock text generation and should not be described as full real-text-runtime traces.

	## Required Notes

	- Total model parameter count must remain <= 32B.
	- No commercial model APIs.
	- Fallback behavior must be documented.
	- Dataset provenance and privacy rules must be documented before release.