Spaces:
Running on Zero
Running on Zero
| # Model Card | |
| ## Status | |
| Stable local baseline plus live MiniCPM-V Space vision, one published text LoRA v2 adapter, and one published Q4_K_M GGUF. The public Gradio Space defaults to real MiniCPM-V object understanding with deterministic mock text; the GGUF has passed local llama.cpp smoke, but it has not been switched into the live Space runtime. | |
| Local development defaults to deterministic mock backends. The hosted Space runs MiniCPM-V 2.6 vision on ZeroGPU with a hidden non-secret probe for diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`, but the live Space keeps text on the mock runtime for this release. A Modal LoRA v2 run completed, the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`, and the merged Q4_K_M GGUF is published in the same repo. | |
| Hosted MiniCPM-V validation passed after adding an `HF_TOKEN` Space secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The validation uses public mug, keyboard, and shoe images on ZeroGPU, while text generation intentionally remains mock. See `docs/SPACE_VLM_REPORT.md`. | |
| ## Planned Components | |
| - Vision understanding: MiniCPM-V or lightweight fallback VLM. | |
| - Text generation: fine-tuned small LLM. | |
| - Runtime: llama.cpp / llama-cpp-python. | |
| ## Candidate Architecture | |
| | Component | Candidate | Notes | | |
| | --- | --- | --- | | |
| | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Live Space uses MiniCPM-V on ZeroGPU; local runtime can still default to mock. | | |
| | Text | deterministic mock text by default; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA v2 Q4_K_M GGUF for local runtime | Adapter and GGUF published; Space text runtime remains mock for the live vision release. | | |
| | Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; local GGUF smoke passed on 2026-06-08. | | |
| | UI | Gradio Blocks | Required by the hackathon and project rules. | | |
| ## Parameter Budget | |
| Total model parameters must remain <= 32B. | |
| Record final numbers here before submission: | |
| | Component | Model | Parameters | Counted Toward Total | | |
| | --- | --- | ---: | --- | | |
| | Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled | | |
| | Text base | Stable baseline mock text | 0 | no model parameters | | |
| | Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled | | |
| | Published LoRA v2 GGUF | `qqyule/objectverse-diary-qwen15b-lora` / `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` | ~1.5B base, quantized file | yes, if enabled | | |
| | Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled | | |
| | Live Space total | MiniCPM-V vision + mock text | ~8B active model parameters | <= 32B | | |
| If the optional MiniCPM-V 2.6 vision path and planned Qwen 1.5B text base are both enabled, the expected total remains about 9.5B plus a small LoRA adapter, safely under the 32B project budget. | |
| ## Intended Inputs And Outputs | |
| Inputs: | |
| - user-uploaded everyday object photo | |
| - optional object description | |
| - personality mode | |
| Outputs: | |
| - structured object understanding JSON | |
| - hidden object persona JSON | |
| - short English-first diary with Chinese helper text | |
| - object chat response | |
| - share card preview | |
| - anonymized trace record | |
| ## Dataset Notes | |
| Dataset planning lives in `docs/DATASET.md`. | |
| Current preview data is deterministic and mock-generated. It should only be used for schema validation, dry-run validation, and workflow planning until real candidate samples are generated and curated. | |
| The Modal training scaffold defaults to `Qwen/Qwen2.5-1.5B-Instruct` and saves adapter artifacts to a Modal Volume. `data/train/objectverse_sft_curated_v2.jsonl` contains 200 synthetic curated rows covering 40 everyday objects and 5 personality modes. It is published at `https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated` as `objectverse_sft_curated_v2.jsonl`. | |
| Published adapter: | |
| ```text | |
| https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora | |
| ``` | |
| Current v2 training run summary: | |
| - Platform: Modal | |
| - Run name: `objectverse-diary-qwen15b-lora-v2` | |
| - Base model: `Qwen/Qwen2.5-1.5B-Instruct` | |
| - Dataset: 200 synthetic curated v2 rows | |
| - Train / eval rows: 180 / 20 | |
| - Steps: 120 | |
| - Max sequence length: 1536 | |
| - Learning rate: 0.0001 | |
| - Effective batch size: 8 | |
| - LoRA rank / alpha / dropout: 32 / 64 / 0.05 | |
| - Assistant-output-only loss: enabled | |
| - Train loss: 0.3240 | |
| - Eval loss: 0.0162 | |
| - Epoch: 5.2222 | |
| - GGUF conversion: completed with pinned `llama.cpp` commit `8f83d6c271d194bde2d410145a0ce73bc42e85cd` | |
| - Published GGUF: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` | |
| GGUF smoke status: | |
| - Repo: `qqyule/objectverse-diary-qwen15b-lora` | |
| - File: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` | |
| - Local helper: `scripts/check_llama_cpp_smoke.py` | |
| - Local result: passed on 2026-06-08 with `llama-cpp text generation`, no `text-fallback-to-mock`, schema-valid persona and diary, and non-empty chat reply. | |
| - Space result: not run; do not claim live Space text runtime until a separate Space validation passes. | |
| ## Safety And Privacy | |
| - Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs. | |
| - Do not publish private user photos or unconsented personal data. | |
| - Do not include tokens, credit codes, emails, serial numbers, or credentials. | |
| - Keep raw private traces out of public datasets. | |
| ## Fallback Behavior | |
| - If VLM loading fails, use manual description and stable example flow. | |
| - If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety. | |
| - If model JSON is invalid, repair and validate before rendering. | |
| - Runtime traces do not record literal `TEXT_MODEL_PATH`; they only record that an external GGUF path is configured. | |
| - Hosted VLM validation evidence is preserved in `data/traces/space-vlm/`. These traces use real MiniCPM-V object understanding plus mock text generation and should not be described as full real-text-runtime traces. | |
| ## Required Notes | |
| - Total model parameter count must remain <= 32B. | |
| - No commercial model APIs. | |
| - Fallback behavior must be documented. | |
| - Dataset provenance and privacy rules must be documented before release. | |