Spaces:
Running on Zero
Running on Zero
File size: 6,244 Bytes
6f8d8d9 0cadcec bc02199 0cadcec 6f8d8d9 0cadcec 1e2c036 6f8d8d9 bc02199 0cadcec dd6cefc bc02199 1e2c036 9e874de dd6cefc 9e874de 0cadcec bc02199 9e874de bc02199 9e874de dd6cefc 9e874de dd6cefc 9e874de dd6cefc 9e874de dd6cefc bc02199 d30bd8e dd6cefc d30bd8e dd6cefc d30bd8e bc02199 e20e3d9 bc02199 d30bd8e 4a4024d bc02199 6f8d8d9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | # Model Card
## Status
Stable local baseline plus live MiniCPM-V Space vision, one published text LoRA v2 adapter, and one published Q4_K_M GGUF. The public Gradio Space defaults to real MiniCPM-V object understanding with deterministic mock text; the GGUF has passed local llama.cpp smoke, but it has not been switched into the live Space runtime.
Local development defaults to deterministic mock backends. The hosted Space runs MiniCPM-V 2.6 vision on ZeroGPU with a hidden non-secret probe for diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`, but the live Space keeps text on the mock runtime for this release. A Modal LoRA v2 run completed, the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`, and the merged Q4_K_M GGUF is published in the same repo.
Hosted MiniCPM-V validation passed after adding an `HF_TOKEN` Space secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The validation uses public mug, keyboard, and shoe images on ZeroGPU, while text generation intentionally remains mock. See `docs/SPACE_VLM_REPORT.md`.
## Planned Components
- Vision understanding: MiniCPM-V or lightweight fallback VLM.
- Text generation: fine-tuned small LLM.
- Runtime: llama.cpp / llama-cpp-python.
## Candidate Architecture
| Component | Candidate | Notes |
| --- | --- | --- |
| Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Live Space uses MiniCPM-V on ZeroGPU; local runtime can still default to mock. |
| Text | deterministic mock text by default; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA v2 Q4_K_M GGUF for local runtime | Adapter and GGUF published; Space text runtime remains mock for the live vision release. |
| Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; local GGUF smoke passed on 2026-06-08. |
| UI | Gradio Blocks | Required by the hackathon and project rules. |
## Parameter Budget
Total model parameters must remain <= 32B.
Record final numbers here before submission:
| Component | Model | Parameters | Counted Toward Total |
| --- | --- | ---: | --- |
| Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
| Text base | Stable baseline mock text | 0 | no model parameters |
| Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled |
| Published LoRA v2 GGUF | `qqyule/objectverse-diary-qwen15b-lora` / `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` | ~1.5B base, quantized file | yes, if enabled |
| Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled |
| Live Space total | MiniCPM-V vision + mock text | ~8B active model parameters | <= 32B |
If the optional MiniCPM-V 2.6 vision path and planned Qwen 1.5B text base are both enabled, the expected total remains about 9.5B plus a small LoRA adapter, safely under the 32B project budget.
## Intended Inputs And Outputs
Inputs:
- user-uploaded everyday object photo
- optional object description
- personality mode
Outputs:
- structured object understanding JSON
- hidden object persona JSON
- short English-first diary with Chinese helper text
- object chat response
- share card preview
- anonymized trace record
## Dataset Notes
Dataset planning lives in `docs/DATASET.md`.
Current preview data is deterministic and mock-generated. It should only be used for schema validation, dry-run validation, and workflow planning until real candidate samples are generated and curated.
The Modal training scaffold defaults to `Qwen/Qwen2.5-1.5B-Instruct` and saves adapter artifacts to a Modal Volume. `data/train/objectverse_sft_curated_v2.jsonl` contains 200 synthetic curated rows covering 40 everyday objects and 5 personality modes. It is published at `https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated` as `objectverse_sft_curated_v2.jsonl`.
Published adapter:
```text
https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
```
Current v2 training run summary:
- Platform: Modal
- Run name: `objectverse-diary-qwen15b-lora-v2`
- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
- Dataset: 200 synthetic curated v2 rows
- Train / eval rows: 180 / 20
- Steps: 120
- Max sequence length: 1536
- Learning rate: 0.0001
- Effective batch size: 8
- LoRA rank / alpha / dropout: 32 / 64 / 0.05
- Assistant-output-only loss: enabled
- Train loss: 0.3240
- Eval loss: 0.0162
- Epoch: 5.2222
- GGUF conversion: completed with pinned `llama.cpp` commit `8f83d6c271d194bde2d410145a0ce73bc42e85cd`
- Published GGUF: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`
GGUF smoke status:
- Repo: `qqyule/objectverse-diary-qwen15b-lora`
- File: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`
- Local helper: `scripts/check_llama_cpp_smoke.py`
- Local result: passed on 2026-06-08 with `llama-cpp text generation`, no `text-fallback-to-mock`, schema-valid persona and diary, and non-empty chat reply.
- Space result: not run; do not claim live Space text runtime until a separate Space validation passes.
## Safety And Privacy
- Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
- Do not publish private user photos or unconsented personal data.
- Do not include tokens, credit codes, emails, serial numbers, or credentials.
- Keep raw private traces out of public datasets.
## Fallback Behavior
- If VLM loading fails, use manual description and stable example flow.
- If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
- If model JSON is invalid, repair and validate before rendering.
- Runtime traces do not record literal `TEXT_MODEL_PATH`; they only record that an external GGUF path is configured.
- Hosted VLM validation evidence is preserved in `data/traces/space-vlm/`. These traces use real MiniCPM-V object understanding plus mock text generation and should not be described as full real-text-runtime traces.
## Required Notes
- Total model parameter count must remain <= 32B.
- No commercial model APIs.
- Fallback behavior must be documented.
- Dataset provenance and privacy rules must be documented before release.
|