# Model Card ## Status Stable local baseline plus live MiniCPM-V Space vision, one published text LoRA v2 adapter, and one published Q4_K_M GGUF. The public Gradio Space defaults to real MiniCPM-V object understanding with deterministic mock text; the GGUF has passed local llama.cpp smoke, but it has not been switched into the live Space runtime. Local development defaults to deterministic mock backends. The hosted Space runs MiniCPM-V 2.6 vision on ZeroGPU with a hidden non-secret probe for diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`, but the live Space keeps text on the mock runtime for this release. A Modal LoRA v2 run completed, the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`, and the merged Q4_K_M GGUF is published in the same repo. Hosted MiniCPM-V validation passed after adding an `HF_TOKEN` Space secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The validation uses public mug, keyboard, and shoe images on ZeroGPU, while text generation intentionally remains mock. See `docs/SPACE_VLM_REPORT.md`. ## Planned Components - Vision understanding: MiniCPM-V or lightweight fallback VLM. - Text generation: fine-tuned small LLM. - Runtime: llama.cpp / llama-cpp-python. ## Candidate Architecture | Component | Candidate | Notes | | --- | --- | --- | | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Live Space uses MiniCPM-V on ZeroGPU; local runtime can still default to mock. | | Text | deterministic mock text by default; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA v2 Q4_K_M GGUF for local runtime | Adapter and GGUF published; Space text runtime remains mock for the live vision release. | | Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; local GGUF smoke passed on 2026-06-08. | | UI | Gradio Blocks | Required by the hackathon and project rules. | ## Parameter Budget Total model parameters must remain <= 32B. Record final numbers here before submission: | Component | Model | Parameters | Counted Toward Total | | --- | --- | ---: | --- | | Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled | | Text base | Stable baseline mock text | 0 | no model parameters | | Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled | | Published LoRA v2 GGUF | `qqyule/objectverse-diary-qwen15b-lora` / `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` | ~1.5B base, quantized file | yes, if enabled | | Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled | | Live Space total | MiniCPM-V vision + mock text | ~8B active model parameters | <= 32B | If the optional MiniCPM-V 2.6 vision path and planned Qwen 1.5B text base are both enabled, the expected total remains about 9.5B plus a small LoRA adapter, safely under the 32B project budget. ## Intended Inputs And Outputs Inputs: - user-uploaded everyday object photo - optional object description - personality mode Outputs: - structured object understanding JSON - hidden object persona JSON - short English-first diary with Chinese helper text - object chat response - share card preview - anonymized trace record ## Dataset Notes Dataset planning lives in `docs/DATASET.md`. Current preview data is deterministic and mock-generated. It should only be used for schema validation, dry-run validation, and workflow planning until real candidate samples are generated and curated. The Modal training scaffold defaults to `Qwen/Qwen2.5-1.5B-Instruct` and saves adapter artifacts to a Modal Volume. `data/train/objectverse_sft_curated_v2.jsonl` contains 200 synthetic curated rows covering 40 everyday objects and 5 personality modes. It is published at `https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated` as `objectverse_sft_curated_v2.jsonl`. Published adapter: ```text https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora ``` Current v2 training run summary: - Platform: Modal - Run name: `objectverse-diary-qwen15b-lora-v2` - Base model: `Qwen/Qwen2.5-1.5B-Instruct` - Dataset: 200 synthetic curated v2 rows - Train / eval rows: 180 / 20 - Steps: 120 - Max sequence length: 1536 - Learning rate: 0.0001 - Effective batch size: 8 - LoRA rank / alpha / dropout: 32 / 64 / 0.05 - Assistant-output-only loss: enabled - Train loss: 0.3240 - Eval loss: 0.0162 - Epoch: 5.2222 - GGUF conversion: completed with pinned `llama.cpp` commit `8f83d6c271d194bde2d410145a0ce73bc42e85cd` - Published GGUF: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` GGUF smoke status: - Repo: `qqyule/objectverse-diary-qwen15b-lora` - File: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` - Local helper: `scripts/check_llama_cpp_smoke.py` - Local result: passed on 2026-06-08 with `llama-cpp text generation`, no `text-fallback-to-mock`, schema-valid persona and diary, and non-empty chat reply. - Space result: not run; do not claim live Space text runtime until a separate Space validation passes. ## Safety And Privacy - Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs. - Do not publish private user photos or unconsented personal data. - Do not include tokens, credit codes, emails, serial numbers, or credentials. - Keep raw private traces out of public datasets. ## Fallback Behavior - If VLM loading fails, use manual description and stable example flow. - If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety. - If model JSON is invalid, repair and validate before rendering. - Runtime traces do not record literal `TEXT_MODEL_PATH`; they only record that an external GGUF path is configured. - Hosted VLM validation evidence is preserved in `data/traces/space-vlm/`. These traces use real MiniCPM-V object understanding plus mock text generation and should not be described as full real-text-runtime traces. ## Required Notes - Total model parameter count must remain <= 32B. - No commercial model APIs. - Fallback behavior must be documented. - Dataset provenance and privacy rules must be documented before release.