Spaces:

build-small-hackathon
/

ObjectverseDiary

Running on Zero

App Files Files Community

ObjectverseDiary / docs /MODEL_CARD.md

qqyule

Deploy live MiniCPM-V vision defaults

0cadcec verified 3 days ago

preview code

raw

history blame contribute delete

6.24 kB

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

Model Card

Status

Stable local baseline plus live MiniCPM-V Space vision, one published text LoRA v2 adapter, and one published Q4_K_M GGUF. The public Gradio Space defaults to real MiniCPM-V object understanding with deterministic mock text; the GGUF has passed local llama.cpp smoke, but it has not been switched into the live Space runtime.

Local development defaults to deterministic mock backends. The hosted Space runs MiniCPM-V 2.6 vision on ZeroGPU with a hidden non-secret probe for diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via TEXT_MODEL_PATH, but the live Space keeps text on the mock runtime for this release. A Modal LoRA v2 run completed, the adapter is published at https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora, and the merged Q4_K_M GGUF is published in the same repo.

Hosted MiniCPM-V validation passed after adding an HF_TOKEN Space secret with access to the gated openbmb/MiniCPM-V-2_6 model. The validation uses public mug, keyboard, and shoe images on ZeroGPU, while text generation intentionally remains mock. See docs/SPACE_VLM_REPORT.md.

Planned Components

Vision understanding: MiniCPM-V or lightweight fallback VLM.
Text generation: fine-tuned small LLM.
Runtime: llama.cpp / llama-cpp-python.

Candidate Architecture

Component	Candidate	Notes
Vision	`openbmb/MiniCPM-V-2_6` or mock fallback	Live Space uses MiniCPM-V on ZeroGPU; local runtime can still default to mock.
Text	deterministic mock text by default; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA v2 Q4_K_M GGUF for local runtime	Adapter and GGUF published; Space text runtime remains mock for the live vision release.
Runtime	optional GGUF through llama.cpp / llama-cpp-python	Wired with mock fallback; local GGUF smoke passed on 2026-06-08.
UI	Gradio Blocks	Required by the hackathon and project rules.

Parameter Budget

Total model parameters must remain <= 32B.

Record final numbers here before submission:

Component	Model	Parameters	Counted Toward Total
Vision	MiniCPM-V 2.6 optional path	~8B	yes, when enabled
Text base	Stable baseline mock text	0	no model parameters
Optional text base	`Qwen/Qwen2.5-1.5B-Instruct`	~1.5B	yes, when enabled
Published LoRA v2 GGUF	`qqyule/objectverse-diary-qwen15b-lora` / `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`	~1.5B base, quantized file	yes, if enabled
Published LoRA adapter	`qqyule/objectverse-diary-qwen15b-lora`	small adapter over base model	yes, when enabled
Live Space total	MiniCPM-V vision + mock text	~8B active model parameters	<= 32B

If the optional MiniCPM-V 2.6 vision path and planned Qwen 1.5B text base are both enabled, the expected total remains about 9.5B plus a small LoRA adapter, safely under the 32B project budget.

Intended Inputs And Outputs

Inputs:

user-uploaded everyday object photo
optional object description
personality mode

Outputs:

structured object understanding JSON
hidden object persona JSON
short English-first diary with Chinese helper text
object chat response
share card preview
anonymized trace record

Dataset Notes

Dataset planning lives in docs/DATASET.md.

Current preview data is deterministic and mock-generated. It should only be used for schema validation, dry-run validation, and workflow planning until real candidate samples are generated and curated.

The Modal training scaffold defaults to Qwen/Qwen2.5-1.5B-Instruct and saves adapter artifacts to a Modal Volume. data/train/objectverse_sft_curated_v2.jsonl contains 200 synthetic curated rows covering 40 everyday objects and 5 personality modes. It is published at https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated as objectverse_sft_curated_v2.jsonl.

Published adapter:

https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora

Current v2 training run summary:

Platform: Modal
Run name: objectverse-diary-qwen15b-lora-v2
Base model: Qwen/Qwen2.5-1.5B-Instruct
Dataset: 200 synthetic curated v2 rows
Train / eval rows: 180 / 20
Steps: 120
Max sequence length: 1536
Learning rate: 0.0001
Effective batch size: 8
LoRA rank / alpha / dropout: 32 / 64 / 0.05
Assistant-output-only loss: enabled
Train loss: 0.3240
Eval loss: 0.0162
Epoch: 5.2222
GGUF conversion: completed with pinned llama.cpp commit 8f83d6c271d194bde2d410145a0ce73bc42e85cd
Published GGUF: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf

GGUF smoke status:

Repo: qqyule/objectverse-diary-qwen15b-lora
File: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
Local helper: scripts/check_llama_cpp_smoke.py
Local result: passed on 2026-06-08 with llama-cpp text generation, no text-fallback-to-mock, schema-valid persona and diary, and non-empty chat reply.
Space result: not run; do not claim live Space text runtime until a separate Space validation passes.

Safety And Privacy

Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
Do not publish private user photos or unconsented personal data.
Do not include tokens, credit codes, emails, serial numbers, or credentials.
Keep raw private traces out of public datasets.

Fallback Behavior

If VLM loading fails, use manual description and stable example flow.
If llama.cpp is not installed, TEXT_MODEL_PATH is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
If model JSON is invalid, repair and validate before rendering.
Runtime traces do not record literal TEXT_MODEL_PATH; they only record that an external GGUF path is configured.
Hosted VLM validation evidence is preserved in data/traces/space-vlm/. These traces use real MiniCPM-V object understanding plus mock text generation and should not be described as full real-text-runtime traces.

Required Notes

Total model parameter count must remain <= 32B.
No commercial model APIs.
Fallback behavior must be documented.
Dataset provenance and privacy rules must be documented before release.