Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.17.3
Model Card
Status
Stable local baseline plus live MiniCPM-V Space vision, one published text LoRA v2 adapter, and one published Q4_K_M GGUF. The public Gradio Space defaults to real MiniCPM-V object understanding with deterministic mock text; the GGUF has passed local llama.cpp smoke, but it has not been switched into the live Space runtime.
Local development defaults to deterministic mock backends. The hosted Space runs MiniCPM-V 2.6 vision on ZeroGPU with a hidden non-secret probe for diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via TEXT_MODEL_PATH, but the live Space keeps text on the mock runtime for this release. A Modal LoRA v2 run completed, the adapter is published at https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora, and the merged Q4_K_M GGUF is published in the same repo.
Hosted MiniCPM-V validation passed after adding an HF_TOKEN Space secret with access to the gated openbmb/MiniCPM-V-2_6 model. The validation uses public mug, keyboard, and shoe images on ZeroGPU, while text generation intentionally remains mock. See docs/SPACE_VLM_REPORT.md.
Planned Components
- Vision understanding: MiniCPM-V or lightweight fallback VLM.
- Text generation: fine-tuned small LLM.
- Runtime: llama.cpp / llama-cpp-python.
Candidate Architecture
| Component | Candidate | Notes |
|---|---|---|
| Vision | openbmb/MiniCPM-V-2_6 or mock fallback |
Live Space uses MiniCPM-V on ZeroGPU; local runtime can still default to mock. |
| Text | deterministic mock text by default; published Qwen/Qwen2.5-1.5B-Instruct LoRA v2 Q4_K_M GGUF for local runtime |
Adapter and GGUF published; Space text runtime remains mock for the live vision release. |
| Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; local GGUF smoke passed on 2026-06-08. |
| UI | Gradio Blocks | Required by the hackathon and project rules. |
Parameter Budget
Total model parameters must remain <= 32B.
Record final numbers here before submission:
| Component | Model | Parameters | Counted Toward Total |
|---|---|---|---|
| Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
| Text base | Stable baseline mock text | 0 | no model parameters |
| Optional text base | Qwen/Qwen2.5-1.5B-Instruct |
~1.5B | yes, when enabled |
| Published LoRA v2 GGUF | qqyule/objectverse-diary-qwen15b-lora / objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf |
~1.5B base, quantized file | yes, if enabled |
| Published LoRA adapter | qqyule/objectverse-diary-qwen15b-lora |
small adapter over base model | yes, when enabled |
| Live Space total | MiniCPM-V vision + mock text | ~8B active model parameters | <= 32B |
If the optional MiniCPM-V 2.6 vision path and planned Qwen 1.5B text base are both enabled, the expected total remains about 9.5B plus a small LoRA adapter, safely under the 32B project budget.
Intended Inputs And Outputs
Inputs:
- user-uploaded everyday object photo
- optional object description
- personality mode
Outputs:
- structured object understanding JSON
- hidden object persona JSON
- short English-first diary with Chinese helper text
- object chat response
- share card preview
- anonymized trace record
Dataset Notes
Dataset planning lives in docs/DATASET.md.
Current preview data is deterministic and mock-generated. It should only be used for schema validation, dry-run validation, and workflow planning until real candidate samples are generated and curated.
The Modal training scaffold defaults to Qwen/Qwen2.5-1.5B-Instruct and saves adapter artifacts to a Modal Volume. data/train/objectverse_sft_curated_v2.jsonl contains 200 synthetic curated rows covering 40 everyday objects and 5 personality modes. It is published at https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated as objectverse_sft_curated_v2.jsonl.
Published adapter:
https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
Current v2 training run summary:
- Platform: Modal
- Run name:
objectverse-diary-qwen15b-lora-v2 - Base model:
Qwen/Qwen2.5-1.5B-Instruct - Dataset: 200 synthetic curated v2 rows
- Train / eval rows: 180 / 20
- Steps: 120
- Max sequence length: 1536
- Learning rate: 0.0001
- Effective batch size: 8
- LoRA rank / alpha / dropout: 32 / 64 / 0.05
- Assistant-output-only loss: enabled
- Train loss: 0.3240
- Eval loss: 0.0162
- Epoch: 5.2222
- GGUF conversion: completed with pinned
llama.cppcommit8f83d6c271d194bde2d410145a0ce73bc42e85cd - Published GGUF:
objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
GGUF smoke status:
- Repo:
qqyule/objectverse-diary-qwen15b-lora - File:
objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf - Local helper:
scripts/check_llama_cpp_smoke.py - Local result: passed on 2026-06-08 with
llama-cpp text generation, notext-fallback-to-mock, schema-valid persona and diary, and non-empty chat reply. - Space result: not run; do not claim live Space text runtime until a separate Space validation passes.
Safety And Privacy
- Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
- Do not publish private user photos or unconsented personal data.
- Do not include tokens, credit codes, emails, serial numbers, or credentials.
- Keep raw private traces out of public datasets.
Fallback Behavior
- If VLM loading fails, use manual description and stable example flow.
- If llama.cpp is not installed,
TEXT_MODEL_PATHis missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety. - If model JSON is invalid, repair and validate before rendering.
- Runtime traces do not record literal
TEXT_MODEL_PATH; they only record that an external GGUF path is configured. - Hosted VLM validation evidence is preserved in
data/traces/space-vlm/. These traces use real MiniCPM-V object understanding plus mock text generation and should not be described as full real-text-runtime traces.
Required Notes
- Total model parameter count must remain <= 32B.
- No commercial model APIs.
- Fallback behavior must be documented.
- Dataset provenance and privacy rules must be documented before release.