File size: 6,244 Bytes
6f8d8d9
 
 
 
0cadcec
bc02199
0cadcec
6f8d8d9
0cadcec
1e2c036
6f8d8d9
 
 
 
 
 
bc02199
 
 
 
0cadcec
 
dd6cefc
bc02199
 
 
 
 
 
 
 
 
 
1e2c036
 
9e874de
dd6cefc
9e874de
0cadcec
bc02199
9e874de
 
bc02199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e874de
 
dd6cefc
9e874de
 
 
 
 
 
 
dd6cefc
9e874de
 
dd6cefc
9e874de
dd6cefc
 
 
 
 
 
 
 
 
 
 
 
 
bc02199
d30bd8e
 
dd6cefc
 
d30bd8e
dd6cefc
 
d30bd8e
bc02199
 
 
 
 
 
 
 
 
 
e20e3d9
bc02199
d30bd8e
4a4024d
bc02199
6f8d8d9
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# Model Card

## Status

Stable local baseline plus live MiniCPM-V Space vision, one published text LoRA v2 adapter, and one published Q4_K_M GGUF. The public Gradio Space defaults to real MiniCPM-V object understanding with deterministic mock text; the GGUF has passed local llama.cpp smoke, but it has not been switched into the live Space runtime.

Local development defaults to deterministic mock backends. The hosted Space runs MiniCPM-V 2.6 vision on ZeroGPU with a hidden non-secret probe for diagnostics. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`, but the live Space keeps text on the mock runtime for this release. A Modal LoRA v2 run completed, the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`, and the merged Q4_K_M GGUF is published in the same repo.

Hosted MiniCPM-V validation passed after adding an `HF_TOKEN` Space secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The validation uses public mug, keyboard, and shoe images on ZeroGPU, while text generation intentionally remains mock. See `docs/SPACE_VLM_REPORT.md`.

## Planned Components

- Vision understanding: MiniCPM-V or lightweight fallback VLM.
- Text generation: fine-tuned small LLM.
- Runtime: llama.cpp / llama-cpp-python.

## Candidate Architecture

| Component | Candidate | Notes |
| --- | --- | --- |
| Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Live Space uses MiniCPM-V on ZeroGPU; local runtime can still default to mock. |
| Text | deterministic mock text by default; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA v2 Q4_K_M GGUF for local runtime | Adapter and GGUF published; Space text runtime remains mock for the live vision release. |
| Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; local GGUF smoke passed on 2026-06-08. |
| UI | Gradio Blocks | Required by the hackathon and project rules. |

## Parameter Budget

Total model parameters must remain <= 32B.

Record final numbers here before submission:

| Component | Model | Parameters | Counted Toward Total |
| --- | --- | ---: | --- |
| Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
| Text base | Stable baseline mock text | 0 | no model parameters |
| Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled |
| Published LoRA v2 GGUF | `qqyule/objectverse-diary-qwen15b-lora` / `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` | ~1.5B base, quantized file | yes, if enabled |
| Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled |
| Live Space total | MiniCPM-V vision + mock text | ~8B active model parameters | <= 32B |

If the optional MiniCPM-V 2.6 vision path and planned Qwen 1.5B text base are both enabled, the expected total remains about 9.5B plus a small LoRA adapter, safely under the 32B project budget.

## Intended Inputs And Outputs

Inputs:

- user-uploaded everyday object photo
- optional object description
- personality mode

Outputs:

- structured object understanding JSON
- hidden object persona JSON
- short English-first diary with Chinese helper text
- object chat response
- share card preview
- anonymized trace record

## Dataset Notes

Dataset planning lives in `docs/DATASET.md`.

Current preview data is deterministic and mock-generated. It should only be used for schema validation, dry-run validation, and workflow planning until real candidate samples are generated and curated.

The Modal training scaffold defaults to `Qwen/Qwen2.5-1.5B-Instruct` and saves adapter artifacts to a Modal Volume. `data/train/objectverse_sft_curated_v2.jsonl` contains 200 synthetic curated rows covering 40 everyday objects and 5 personality modes. It is published at `https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated` as `objectverse_sft_curated_v2.jsonl`.

Published adapter:

```text
https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
```

Current v2 training run summary:

- Platform: Modal
- Run name: `objectverse-diary-qwen15b-lora-v2`
- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
- Dataset: 200 synthetic curated v2 rows
- Train / eval rows: 180 / 20
- Steps: 120
- Max sequence length: 1536
- Learning rate: 0.0001
- Effective batch size: 8
- LoRA rank / alpha / dropout: 32 / 64 / 0.05
- Assistant-output-only loss: enabled
- Train loss: 0.3240
- Eval loss: 0.0162
- Epoch: 5.2222
- GGUF conversion: completed with pinned `llama.cpp` commit `8f83d6c271d194bde2d410145a0ce73bc42e85cd`
- Published GGUF: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`

GGUF smoke status:

- Repo: `qqyule/objectverse-diary-qwen15b-lora`
- File: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`
- Local helper: `scripts/check_llama_cpp_smoke.py`
- Local result: passed on 2026-06-08 with `llama-cpp text generation`, no `text-fallback-to-mock`, schema-valid persona and diary, and non-empty chat reply.
- Space result: not run; do not claim live Space text runtime until a separate Space validation passes.

## Safety And Privacy

- Do not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs.
- Do not publish private user photos or unconsented personal data.
- Do not include tokens, credit codes, emails, serial numbers, or credentials.
- Keep raw private traces out of public datasets.

## Fallback Behavior

- If VLM loading fails, use manual description and stable example flow.
- If llama.cpp is not installed, `TEXT_MODEL_PATH` is missing, model loading fails, or output JSON is invalid, keep deterministic mock text fallback for demo safety.
- If model JSON is invalid, repair and validate before rendering.
- Runtime traces do not record literal `TEXT_MODEL_PATH`; they only record that an external GGUF path is configured.
- Hosted VLM validation evidence is preserved in `data/traces/space-vlm/`. These traces use real MiniCPM-V object understanding plus mock text generation and should not be described as full real-text-runtime traces.

## Required Notes

- Total model parameter count must remain <= 32B.
- No commercial model APIs.
- Fallback behavior must be documented.
- Dataset provenance and privacy rules must be documented before release.