qqyule commited on
Commit
c45600f
·
verified ·
1 Parent(s): 4a4024d

Deploy Hub GGUF downloader runtime

Browse files
README.md CHANGED
@@ -23,15 +23,15 @@ Upload a photo of any everyday object. The app wakes it up, gives it a secret pe
23
 
24
  ## Current Status
25
 
26
- Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, non-secret hosted vision diagnostics, optional llama.cpp text runtime wiring, a local GGUF smoke-test helper, public mock traces, Space validation evidence, and a published Qwen 1.5B LoRA test adapter are available.
27
 
28
  By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
29
 
30
  `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the optional MiniCPM-V 2.6 vision path. The hosted ZeroGPU validation on June 8, 2026 passed for public mug, keyboard, and shoe images after the Space received an `HF_TOKEN` secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The public Space still rolls back to mock mode after validation so the default demo remains stable.
31
 
32
- `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. No GGUF file is committed in this stable submission baseline. A short Modal-trained LoRA adapter is published for Well-Tuned evidence, but it is not converted to GGUF or wired into the public Space runtime yet.
33
 
34
- `scripts/check_llama_cpp_smoke.py` is available for an explicit-confirmation local GGUF smoke test. The recommended baseline smoke model is `Qwen/Qwen2.5-1.5B-Instruct-GGUF` with `qwen2.5-1.5b-instruct-q4_k_m.gguf`, stored under ignored `models/` when used locally.
35
 
36
  Hugging Face Space:
37
 
@@ -61,14 +61,14 @@ The interface is English-first and Chinese-second.
61
  - [x] Sharing is Caring — public mock traces, JSONL export, prompt templates, and failure notes.
62
  - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
63
  - [x] OpenBMB Special — MiniCPM-V 2.6 wiring exists and hosted ZeroGPU validation passed for mug, keyboard, and shoe.
64
- - [ ] Llama Champion — llama.cpp wiring and smoke helper exist, but real GGUF smoke test is not complete.
65
- - [x] Well-Tuned — synthetic curated SFT dataset and Qwen 1.5B LoRA test adapter are published.
66
  - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
67
 
68
  ## Planned Model Stack
69
 
70
  - Vision: MiniCPM-V 2.6 or deterministic mock fallback
71
- - Text: deterministic mock text now; published Qwen 1.5B LoRA test adapter for training evidence; optional GGUF later
72
  - Runtime: llama.cpp / llama-cpp-python
73
  - UI: Gradio Blocks
74
 
@@ -81,8 +81,8 @@ Stable baseline:
81
  - default vision backend: deterministic mock, 0 active model parameters
82
  - default text backend: deterministic mock, 0 active model parameters
83
  - optional wired vision model: MiniCPM-V 2.6, about 8B parameters when enabled
84
- - optional text base for published LoRA adapter: Qwen/Qwen2.5-1.5B-Instruct, about 1.5B parameters
85
- - optional text GGUF: not converted or committed yet
86
 
87
  The stable public demo therefore stays within the 32B budget. Optional MiniCPM-V plus Qwen 1.5B remains about 9.5B plus a small LoRA adapter, safely under the 32B budget.
88
 
@@ -97,26 +97,36 @@ Then open the local Gradio URL printed in the terminal.
97
 
98
  ## Optional llama.cpp Text Runtime
99
 
100
- The project does not commit GGUF files or require `llama-cpp-python` by default. To try a local GGUF text model:
101
 
102
  ```bash
103
- pip install llama-cpp-python
104
  OBJECTVERSE_TEXT_BACKEND=llama-cpp \
105
  TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
106
  python app.py
107
  ```
108
 
109
- If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
 
 
 
 
 
 
 
 
110
 
111
  Recommended explicit-confirmation smoke path:
112
 
113
  ```bash
114
- # Download externally, do not commit the GGUF:
115
- # https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF
116
- # file: qwen2.5-1.5b-instruct-q4_k_m.gguf
117
-
118
  .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
119
- --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
 
 
 
 
 
 
 
120
  ```
121
 
122
  ## Initial MVP Flow
@@ -141,8 +151,9 @@ The stable submission baseline supports:
141
  - Initial acceptance report: `docs/INITIAL_STAGE_REPORT.md`
142
  - Runtime notes: `docs/RUNTIME.md`
143
  - Dataset preview notes: `docs/DATASET.md`
144
- - Synthetic curated dataset: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
145
- - Fine-tuned LoRA adapter: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
 
146
  - Public mock traces: `data/traces/samples/`
147
  - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
148
  - Hosted VLM validation evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`
 
23
 
24
  ## Current Status
25
 
26
+ Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, non-secret hosted vision diagnostics, optional llama.cpp text runtime wiring, a passing local LoRA v2 GGUF smoke test, public mock traces, Space validation evidence, a published curated v2 SFT dataset, a published Qwen 1.5B LoRA v2 adapter, and a published Q4_K_M GGUF are available.
27
 
28
  By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
29
 
30
  `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the optional MiniCPM-V 2.6 vision path. The hosted ZeroGPU validation on June 8, 2026 passed for public mug, keyboard, and shoe images after the Space received an `HF_TOKEN` secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The public Space still rolls back to mock mode after validation so the default demo remains stable.
31
 
32
+ `OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. The Modal-trained LoRA v2 adapter has been merged with `Qwen/Qwen2.5-1.5B-Instruct`, quantized to Q4_K_M, uploaded to the same model repo, and smoke-tested locally through llama.cpp. No GGUF file is committed in Git, and the public Space is still kept on the mock-safe text runtime until a separate Space validation pass is run.
33
 
34
+ `scripts/check_llama_cpp_smoke.py` passed locally on June 8, 2026 with `models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`. The published GGUF is available in `qqyule/objectverse-diary-qwen15b-lora` as `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`.
35
 
36
  Hugging Face Space:
37
 
 
61
  - [x] Sharing is Caring — public mock traces, JSONL export, prompt templates, and failure notes.
62
  - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
63
  - [x] OpenBMB Special — MiniCPM-V 2.6 wiring exists and hosted ZeroGPU validation passed for mug, keyboard, and shoe.
64
+ - [x] Llama Champion — local llama.cpp GGUF runtime passed with the published LoRA v2 Q4_K_M model; Space text runtime remains mock-safe.
65
+ - [x] Well-Tuned — synthetic curated v2 SFT dataset and Qwen 1.5B LoRA v2 adapter are published.
66
  - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
67
 
68
  ## Planned Model Stack
69
 
70
  - Vision: MiniCPM-V 2.6 or deterministic mock fallback
71
+ - Text: deterministic mock text by default; optional published Qwen 1.5B LoRA v2 Q4_K_M GGUF for local llama.cpp runtime
72
  - Runtime: llama.cpp / llama-cpp-python
73
  - UI: Gradio Blocks
74
 
 
81
  - default vision backend: deterministic mock, 0 active model parameters
82
  - default text backend: deterministic mock, 0 active model parameters
83
  - optional wired vision model: MiniCPM-V 2.6, about 8B parameters when enabled
84
+ - optional text base for published LoRA v2 adapter: Qwen/Qwen2.5-1.5B-Instruct, about 1.5B parameters
85
+ - optional text GGUF: published `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`, about 1.5B base parameters plus a small merged LoRA delta; not committed to Git
86
 
87
  The stable public demo therefore stays within the 32B budget. Optional MiniCPM-V plus Qwen 1.5B remains about 9.5B plus a small LoRA adapter, safely under the 32B budget.
88
 
 
97
 
98
  ## Optional llama.cpp Text Runtime
99
 
100
+ The project does not commit GGUF files. The Space dependencies include `llama-cpp-python`, but the model is only used when `OBJECTVERSE_TEXT_BACKEND=llama-cpp`. To try a local GGUF text model:
101
 
102
  ```bash
 
103
  OBJECTVERSE_TEXT_BACKEND=llama-cpp \
104
  TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
105
  python app.py
106
  ```
107
 
108
+ For Hugging Face Space runtime, use Hub download variables instead of committing the GGUF:
109
+
110
+ ```bash
111
+ OBJECTVERSE_TEXT_BACKEND=llama-cpp
112
+ TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
113
+ TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
114
+ ```
115
+
116
+ If `llama-cpp-python` is missing, no local or Hub model source is configured, the model cannot download/load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
117
 
118
  Recommended explicit-confirmation smoke path:
119
 
120
  ```bash
 
 
 
 
121
  .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
122
+ --model-path models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
123
+ ```
124
+
125
+ Published GGUF source:
126
+
127
+ ```text
128
+ repo: qqyule/objectverse-diary-qwen15b-lora
129
+ file: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
130
  ```
131
 
132
  ## Initial MVP Flow
 
151
  - Initial acceptance report: `docs/INITIAL_STAGE_REPORT.md`
152
  - Runtime notes: `docs/RUNTIME.md`
153
  - Dataset preview notes: `docs/DATASET.md`
154
+ - Synthetic curated v2 dataset: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
155
+ - Fine-tuned LoRA v2 adapter: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
156
+ - LoRA v2 Q4_K_M GGUF: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora/blob/main/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
157
  - Public mock traces: `data/traces/samples/`
158
  - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
159
  - Hosted VLM validation evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`
docs/RUNTIME.md CHANGED
@@ -29,16 +29,33 @@ This only replaces object understanding. Persona generation, diary generation, a
29
  Optional llama.cpp text generation can be enabled without changing the UI:
30
 
31
  ```bash
32
- pip install llama-cpp-python
33
  OBJECTVERSE_TEXT_BACKEND=llama-cpp \
34
  TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
35
  .venv/bin/python app.py
36
  ```
37
 
38
- `llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
 
 
 
 
 
 
 
 
 
 
39
 
40
  The runtime trace intentionally records only whether an external GGUF path was configured, not the literal `TEXT_MODEL_PATH`, so local private paths do not leak into public traces.
41
 
 
 
 
 
 
 
 
 
42
  ## Runtime Diagnostics
43
 
44
  The Gradio app exposes two hidden diagnostic APIs:
@@ -52,19 +69,19 @@ These APIs are for validation scripts and are not visible in the main UI. They m
52
 
53
  ## Optional GGUF Smoke Test
54
 
55
- Recommended baseline smoke model:
56
 
57
  ```text
58
- repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
59
- file: qwen2.5-1.5b-instruct-q4_k_m.gguf
60
- local path: models/qwen2.5-1.5b-instruct-q4_k_m.gguf
61
  ```
62
 
63
- The `models/` directory and `*.gguf` are ignored by Git. After downloading the file externally and installing optional `llama-cpp-python` after confirmation, run:
64
 
65
  ```bash
66
  .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
67
- --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
68
  ```
69
 
70
  A passing smoke test must show `llama-cpp text generation` and must not include `text-fallback-to-mock` in either generation or chat fallback markers.
@@ -76,6 +93,9 @@ OBJECTVERSE_VISION_BACKEND=mock
76
  OBJECTVERSE_TEXT_BACKEND=mock
77
  VISION_MODEL_ID=
78
  TEXT_MODEL_PATH=
 
 
 
79
  TRACE_OUTPUT_DIR=data/traces
80
  ```
81
 
@@ -96,6 +116,15 @@ OBJECTVERSE_TEXT_BACKEND=llama-cpp
96
  TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
97
  ```
98
 
 
 
 
 
 
 
 
 
 
99
  Do not commit GGUF files or private model paths.
100
 
101
  ## Future Runtime Boundary
 
29
  Optional llama.cpp text generation can be enabled without changing the UI:
30
 
31
  ```bash
 
32
  OBJECTVERSE_TEXT_BACKEND=llama-cpp \
33
  TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
34
  .venv/bin/python app.py
35
  ```
36
 
37
+ For a hosted Space where the GGUF is stored on Hugging Face Hub instead of the local filesystem, configure the Hub source instead of `TEXT_MODEL_PATH`:
38
+
39
+ ```bash
40
+ OBJECTVERSE_TEXT_BACKEND=llama-cpp
41
+ TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
42
+ TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
43
+ ```
44
+
45
+ `TEXT_MODEL_REVISION` is optional and defaults to the Hub repo default branch. If `TEXT_MODEL_PATH` is set, it takes precedence over Hub download variables.
46
+
47
+ `llama-cpp-python` and `huggingface_hub` are installed by the Space runtime dependencies. Missing package, missing model path, download errors, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
48
 
49
  The runtime trace intentionally records only whether an external GGUF path was configured, not the literal `TEXT_MODEL_PATH`, so local private paths do not leak into public traces.
50
 
51
+ Local LoRA v2 GGUF status:
52
+
53
+ - Base model: `Qwen/Qwen2.5-1.5B-Instruct`
54
+ - Adapter / GGUF repo: `qqyule/objectverse-diary-qwen15b-lora`
55
+ - Published GGUF: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`
56
+ - Local smoke: passed on 2026-06-08 with `llama-cpp text generation` and no `text-fallback-to-mock`
57
+ - Space runtime: not switched to llama.cpp yet; the public Space text path remains mock-safe until a separate Space validation passes
58
+
59
  ## Runtime Diagnostics
60
 
61
  The Gradio app exposes two hidden diagnostic APIs:
 
69
 
70
  ## Optional GGUF Smoke Test
71
 
72
+ Recommended LoRA v2 smoke model:
73
 
74
  ```text
75
+ repo: qqyule/objectverse-diary-qwen15b-lora
76
+ file: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
77
+ local path: models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
78
  ```
79
 
80
+ The `models/` directory and `*.gguf` are ignored by Git. After downloading the file externally and installing optional `llama-cpp-python`, run:
81
 
82
  ```bash
83
  .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
84
+ --model-path models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
85
  ```
86
 
87
  A passing smoke test must show `llama-cpp text generation` and must not include `text-fallback-to-mock` in either generation or chat fallback markers.
 
93
  OBJECTVERSE_TEXT_BACKEND=mock
94
  VISION_MODEL_ID=
95
  TEXT_MODEL_PATH=
96
+ TEXT_MODEL_REPO_ID=
97
+ TEXT_MODEL_FILENAME=
98
+ TEXT_MODEL_REVISION=
99
  TRACE_OUTPUT_DIR=data/traces
100
  ```
101
 
 
116
  TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
117
  ```
118
 
119
+ For a Space runtime that should download the published LoRA v2 GGUF from Hub, set:
120
+
121
+ ```bash
122
+ OBJECTVERSE_VISION_BACKEND=mock
123
+ OBJECTVERSE_TEXT_BACKEND=llama-cpp
124
+ TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
125
+ TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
126
+ ```
127
+
128
  Do not commit GGUF files or private model paths.
129
 
130
  ## Future Runtime Boundary
requirements.txt CHANGED
@@ -7,3 +7,5 @@ Pillow
7
  sentencepiece
8
  accelerate
9
  spaces>=0.30
 
 
 
7
  sentencepiece
8
  accelerate
9
  spaces>=0.30
10
+ huggingface_hub>=0.34,<1
11
+ llama-cpp-python>=0.3,<0.4
src/config.py CHANGED
@@ -26,6 +26,9 @@ class RuntimeSettings:
26
  vision_backend: str
27
  text_backend: str
28
  text_model_path: str
 
 
 
29
  vision_model_id: str
30
  trace_output_dir: Path
31
 
@@ -36,6 +39,9 @@ def get_runtime_settings(environ: Mapping[str, str] | None = None) -> RuntimeSet
36
  vision_backend=env.get("OBJECTVERSE_VISION_BACKEND", "mock"),
37
  text_backend=env.get("OBJECTVERSE_TEXT_BACKEND", "mock"),
38
  text_model_path=env.get("TEXT_MODEL_PATH", ""),
 
 
 
39
  vision_model_id=env.get("VISION_MODEL_ID", ""),
40
  trace_output_dir=Path(env.get("TRACE_OUTPUT_DIR", "data/traces")),
41
  )
@@ -61,13 +67,21 @@ def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
61
  if text_backend == "mock":
62
  runtime_parts.append("no llama.cpp model connected yet")
63
  else:
64
- runtime_parts.append(f"text model path: {_text_model_path_status(current.text_model_path)}")
65
  runtime = "; ".join(runtime_parts)
66
  return {"vision": vision, "text": text, "runtime": runtime}
67
 
68
 
69
- def _text_model_path_status(text_model_path: str) -> str:
70
- return "[configured external GGUF]" if text_model_path.strip() else "[not configured]"
 
 
 
 
 
 
 
 
71
 
72
 
73
  SETTINGS = get_runtime_settings()
 
26
  vision_backend: str
27
  text_backend: str
28
  text_model_path: str
29
+ text_model_repo_id: str
30
+ text_model_filename: str
31
+ text_model_revision: str
32
  vision_model_id: str
33
  trace_output_dir: Path
34
 
 
39
  vision_backend=env.get("OBJECTVERSE_VISION_BACKEND", "mock"),
40
  text_backend=env.get("OBJECTVERSE_TEXT_BACKEND", "mock"),
41
  text_model_path=env.get("TEXT_MODEL_PATH", ""),
42
+ text_model_repo_id=env.get("TEXT_MODEL_REPO_ID", ""),
43
+ text_model_filename=env.get("TEXT_MODEL_FILENAME", ""),
44
+ text_model_revision=env.get("TEXT_MODEL_REVISION", ""),
45
  vision_model_id=env.get("VISION_MODEL_ID", ""),
46
  trace_output_dir=Path(env.get("TRACE_OUTPUT_DIR", "data/traces")),
47
  )
 
67
  if text_backend == "mock":
68
  runtime_parts.append("no llama.cpp model connected yet")
69
  else:
70
+ runtime_parts.append(f"text model source: {_text_model_source_status(current)}")
71
  runtime = "; ".join(runtime_parts)
72
  return {"vision": vision, "text": text, "runtime": runtime}
73
 
74
 
75
+ def _text_model_source_status(settings: RuntimeSettings) -> str:
76
+ if settings.text_model_path.strip():
77
+ return "[configured external GGUF]"
78
+ repo_id = settings.text_model_repo_id.strip()
79
+ filename = settings.text_model_filename.strip()
80
+ if repo_id and filename:
81
+ revision = settings.text_model_revision.strip()
82
+ suffix = f"@{revision}" if revision else ""
83
+ return f"Hub GGUF: {repo_id}/{filename}{suffix}"
84
+ return "[not configured]"
85
 
86
 
87
  SETTINGS = get_runtime_settings()
src/models/llama_cpp_runner.py CHANGED
@@ -8,7 +8,11 @@ from typing import Any
8
 
9
  from src.config import RuntimeSettings, get_runtime_settings
10
  from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
11
- from src.prompts.diary_generation import CHAT_REPLY_PROMPT, DIARY_GENERATION_PROMPT
 
 
 
 
12
  from src.prompts.persona_generation import PERSONA_GENERATION_PROMPT
13
  from src.utils.json_repair import parse_json_object
14
 
@@ -61,6 +65,22 @@ def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> Pe
61
  return _generate_persona_mock(object_understanding, mode)
62
 
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
65
  settings = get_runtime_settings()
66
  if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
@@ -164,6 +184,25 @@ def _generate_persona_llama_cpp(
164
  return PersonaEnvelope.model_validate(raw)
165
 
166
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
  def _generate_diary_llama_cpp(
168
  persona: PersonaEnvelope,
169
  mode: str,
@@ -209,7 +248,7 @@ def _run_llama_json(
209
  settings: RuntimeSettings,
210
  max_tokens: int,
211
  ) -> dict[str, Any]:
212
- model = _load_llama_model(settings.text_model_path)
213
  user_content = json.dumps(user_payload, ensure_ascii=False, indent=2)
214
  raw = _complete_llama(
215
  model,
@@ -234,7 +273,8 @@ def _complete_llama(
234
  {"role": "system", "content": system_prompt},
235
  {"role": "user", "content": user_content},
236
  ],
237
- temperature=0.75,
 
238
  max_tokens=max_tokens,
239
  stop=stop,
240
  )
@@ -243,7 +283,8 @@ def _complete_llama(
243
  prompt = f"System:\n{system_prompt}\n\nUser:\n{user_content}\n\nAssistant JSON:\n"
244
  response = model(
245
  prompt,
246
- temperature=0.75,
 
247
  max_tokens=max_tokens,
248
  stop=stop,
249
  )
@@ -272,12 +313,10 @@ def _extract_completion_text(response: Any) -> str:
272
  raise ValueError("llama.cpp response did not include text content.")
273
 
274
 
275
- def _load_llama_model(text_model_path: str) -> Any:
276
  global _LLAMA_MODEL, _LLAMA_MODEL_PATH
277
 
278
- clean_path = text_model_path.strip()
279
- if not clean_path:
280
- raise ValueError("TEXT_MODEL_PATH is not configured.")
281
  if not Path(clean_path).exists():
282
  raise FileNotFoundError(f"TEXT_MODEL_PATH does not exist: {clean_path}")
283
 
@@ -295,6 +334,38 @@ def _load_llama_model(text_model_path: str) -> Any:
295
  return _LLAMA_MODEL
296
 
297
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
298
  def _is_llama_cpp_backend(settings: RuntimeSettings) -> bool:
299
  return settings.text_backend.strip().lower() in LLAMA_CPP_BACKENDS
300
 
 
8
 
9
  from src.config import RuntimeSettings, get_runtime_settings
10
  from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
11
+ from src.prompts.diary_generation import (
12
+ CHAT_REPLY_PROMPT,
13
+ DIARY_GENERATION_PROMPT,
14
+ PERSONA_DIARY_GENERATION_PROMPT,
15
+ )
16
  from src.prompts.persona_generation import PERSONA_GENERATION_PROMPT
17
  from src.utils.json_repair import parse_json_object
18
 
 
65
  return _generate_persona_mock(object_understanding, mode)
66
 
67
 
68
+ def generate_persona_and_diary(
69
+ object_understanding: ObjectUnderstanding,
70
+ mode: str,
71
+ ) -> tuple[PersonaEnvelope, DiaryEntry]:
72
+ settings = get_runtime_settings()
73
+ if _is_llama_cpp_backend(settings):
74
+ try:
75
+ return _generate_persona_and_diary_llama_cpp(object_understanding, mode, settings)
76
+ except Exception as exc:
77
+ _log_text_fallback("persona+diary", exc)
78
+ _add_text_fallback(TEXT_FALLBACK_TO_MOCK)
79
+
80
+ persona = _generate_persona_mock(object_understanding, mode)
81
+ return persona, _generate_diary_mock(persona, mode)
82
+
83
+
84
  def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
85
  settings = get_runtime_settings()
86
  if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
 
184
  return PersonaEnvelope.model_validate(raw)
185
 
186
 
187
+ def _generate_persona_and_diary_llama_cpp(
188
+ object_understanding: ObjectUnderstanding,
189
+ mode: str,
190
+ settings: RuntimeSettings,
191
+ ) -> tuple[PersonaEnvelope, DiaryEntry]:
192
+ raw = _run_llama_json(
193
+ system_prompt=PERSONA_DIARY_GENERATION_PROMPT,
194
+ user_payload={
195
+ "mode": mode,
196
+ "object_understanding": object_understanding.model_dump(mode="json"),
197
+ },
198
+ settings=settings,
199
+ max_tokens=1024,
200
+ )
201
+ persona = PersonaEnvelope.model_validate({"persona": raw.get("persona")})
202
+ diary = DiaryEntry.model_validate(raw.get("diary"))
203
+ return persona, diary
204
+
205
+
206
  def _generate_diary_llama_cpp(
207
  persona: PersonaEnvelope,
208
  mode: str,
 
248
  settings: RuntimeSettings,
249
  max_tokens: int,
250
  ) -> dict[str, Any]:
251
+ model = _load_llama_model(settings.text_model_path, settings=settings)
252
  user_content = json.dumps(user_payload, ensure_ascii=False, indent=2)
253
  raw = _complete_llama(
254
  model,
 
273
  {"role": "system", "content": system_prompt},
274
  {"role": "user", "content": user_content},
275
  ],
276
+ temperature=0.2,
277
+ top_p=0.9,
278
  max_tokens=max_tokens,
279
  stop=stop,
280
  )
 
283
  prompt = f"System:\n{system_prompt}\n\nUser:\n{user_content}\n\nAssistant JSON:\n"
284
  response = model(
285
  prompt,
286
+ temperature=0.2,
287
+ top_p=0.9,
288
  max_tokens=max_tokens,
289
  stop=stop,
290
  )
 
313
  raise ValueError("llama.cpp response did not include text content.")
314
 
315
 
316
+ def _load_llama_model(text_model_path: str, *, settings: RuntimeSettings | None = None) -> Any:
317
  global _LLAMA_MODEL, _LLAMA_MODEL_PATH
318
 
319
+ clean_path = _resolve_text_model_path(text_model_path, settings)
 
 
320
  if not Path(clean_path).exists():
321
  raise FileNotFoundError(f"TEXT_MODEL_PATH does not exist: {clean_path}")
322
 
 
334
  return _LLAMA_MODEL
335
 
336
 
337
+ def _resolve_text_model_path(
338
+ text_model_path: str,
339
+ settings: RuntimeSettings | None = None,
340
+ ) -> str:
341
+ clean_path = text_model_path.strip()
342
+ if clean_path:
343
+ return clean_path
344
+
345
+ current = settings or get_runtime_settings()
346
+ if current.text_model_repo_id.strip() and current.text_model_filename.strip():
347
+ return _download_hf_gguf(current)
348
+
349
+ raise ValueError(
350
+ "TEXT_MODEL_PATH is not configured, and TEXT_MODEL_REPO_ID/TEXT_MODEL_FILENAME "
351
+ "are not configured."
352
+ )
353
+
354
+
355
+ def _download_hf_gguf(settings: RuntimeSettings) -> str:
356
+ from huggingface_hub import hf_hub_download
357
+
358
+ kwargs: dict[str, str] = {
359
+ "repo_id": settings.text_model_repo_id.strip(),
360
+ "filename": settings.text_model_filename.strip(),
361
+ "repo_type": "model",
362
+ }
363
+ revision = settings.text_model_revision.strip()
364
+ if revision:
365
+ kwargs["revision"] = revision
366
+ return hf_hub_download(**kwargs)
367
+
368
+
369
  def _is_llama_cpp_backend(settings: RuntimeSettings) -> bool:
370
  return settings.text_backend.strip().lower() in LLAMA_CPP_BACKENDS
371
 
tests/test_mock_mvp.py CHANGED
@@ -3,11 +3,14 @@
3
  from __future__ import annotations
4
 
5
  import json
 
6
  import tempfile
 
7
  import unittest
8
  from pathlib import Path
9
  from unittest.mock import patch
10
 
 
11
  from src.example_cache import load_sample_generation, sample_trace_path
12
  from src.examples import EXAMPLE_OBJECTS, gradio_examples
13
  from src.models.llama_cpp_runner import (
@@ -39,8 +42,10 @@ class FakeMiniCpmModel:
39
  class FakeLlamaModel:
40
  def __init__(self, responses: list[str]) -> None:
41
  self.responses = responses
 
42
 
43
  def create_chat_completion(self, **_: object) -> dict:
 
44
  response = self.responses.pop(0)
45
  return {"choices": [{"message": {"content": response}}]}
46
 
@@ -72,6 +77,62 @@ class MockMvpTest(unittest.TestCase):
72
  self.assertIn("[configured external GGUF]", status["runtime"])
73
  self.assertNotIn("/Users/leo", status["runtime"])
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  def test_examples_cover_six_objects(self) -> None:
76
  self.assertEqual(len(EXAMPLE_OBJECTS), 6)
77
  self.assertEqual(len(gradio_examples()), 6)
@@ -177,6 +238,45 @@ class MockMvpTest(unittest.TestCase):
177
  self.assertIn("text-fallback-to-mock", result.trace.fallbacks)
178
  self.assertEqual(result.trace.model_runtime["text"], "llama-cpp text generation")
179
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180
  def test_minicpm_vision_backend_accepts_valid_json(self) -> None:
181
  response = """
182
  {"object":{"name":"coffee mug","visible_features":["white ceramic","round handle","desk shadow"],"likely_context":"work desk","confidence":0.88}}
 
3
  from __future__ import annotations
4
 
5
  import json
6
+ import sys
7
  import tempfile
8
+ import types
9
  import unittest
10
  from pathlib import Path
11
  from unittest.mock import patch
12
 
13
+ import src.models.llama_cpp_runner as llama_cpp_runner
14
  from src.example_cache import load_sample_generation, sample_trace_path
15
  from src.examples import EXAMPLE_OBJECTS, gradio_examples
16
  from src.models.llama_cpp_runner import (
 
42
  class FakeLlamaModel:
43
  def __init__(self, responses: list[str]) -> None:
44
  self.responses = responses
45
+ self.calls = 0
46
 
47
  def create_chat_completion(self, **_: object) -> dict:
48
+ self.calls += 1
49
  response = self.responses.pop(0)
50
  return {"choices": [{"message": {"content": response}}]}
51
 
 
77
  self.assertIn("[configured external GGUF]", status["runtime"])
78
  self.assertNotIn("/Users/leo", status["runtime"])
79
 
80
+ def test_llama_cpp_hub_runtime_status_uses_public_repo_summary(self) -> None:
81
+ settings = get_runtime_settings(
82
+ {
83
+ "OBJECTVERSE_TEXT_BACKEND": "llama-cpp",
84
+ "TEXT_MODEL_REPO_ID": "qqyule/objectverse-diary-qwen15b-lora",
85
+ "TEXT_MODEL_FILENAME": "objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf",
86
+ }
87
+ )
88
+
89
+ status = runtime_status(settings)
90
+
91
+ self.assertEqual(settings.text_model_repo_id, "qqyule/objectverse-diary-qwen15b-lora")
92
+ self.assertEqual(settings.text_model_filename, "objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf")
93
+ self.assertIn("Hub GGUF", status["runtime"])
94
+ self.assertIn("qqyule/objectverse-diary-qwen15b-lora", status["runtime"])
95
+ self.assertNotIn("/home", status["runtime"])
96
+ self.assertNotIn("/Users", status["runtime"])
97
+
98
+ def test_llama_cpp_loads_model_from_hub_config_when_path_is_missing(self) -> None:
99
+ previous_model = llama_cpp_runner._LLAMA_MODEL
100
+ previous_path = llama_cpp_runner._LLAMA_MODEL_PATH
101
+ llama_cpp_runner._LLAMA_MODEL = None
102
+ llama_cpp_runner._LLAMA_MODEL_PATH = None
103
+
104
+ loaded_paths: list[str] = []
105
+
106
+ class FakeLlama:
107
+ def __init__(self, *, model_path: str, **_: object) -> None:
108
+ loaded_paths.append(model_path)
109
+
110
+ fake_module = types.ModuleType("llama_cpp")
111
+ fake_module.Llama = FakeLlama
112
+
113
+ try:
114
+ with tempfile.TemporaryDirectory() as tmp_dir:
115
+ model_path = Path(tmp_dir) / "model.gguf"
116
+ model_path.write_bytes(b"GGUF")
117
+ settings = get_runtime_settings(
118
+ {
119
+ "OBJECTVERSE_TEXT_BACKEND": "llama-cpp",
120
+ "TEXT_MODEL_REPO_ID": "qqyule/objectverse-diary-qwen15b-lora",
121
+ "TEXT_MODEL_FILENAME": "objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf",
122
+ }
123
+ )
124
+
125
+ with (
126
+ patch.dict(sys.modules, {"llama_cpp": fake_module}),
127
+ patch("src.models.llama_cpp_runner._download_hf_gguf", return_value=str(model_path)),
128
+ ):
129
+ llama_cpp_runner._load_llama_model("", settings=settings)
130
+
131
+ self.assertEqual(loaded_paths, [str(model_path)])
132
+ finally:
133
+ llama_cpp_runner._LLAMA_MODEL = previous_model
134
+ llama_cpp_runner._LLAMA_MODEL_PATH = previous_path
135
+
136
  def test_examples_cover_six_objects(self) -> None:
137
  self.assertEqual(len(EXAMPLE_OBJECTS), 6)
138
  self.assertEqual(len(gradio_examples()), 6)
 
238
  self.assertIn("text-fallback-to-mock", result.trace.fallbacks)
239
  self.assertEqual(result.trace.model_runtime["text"], "llama-cpp text generation")
240
 
241
+ def test_pipeline_uses_combined_llama_cpp_persona_and_diary(self) -> None:
242
+ env = {
243
+ "OBJECTVERSE_TEXT_BACKEND": "llama-cpp",
244
+ "TEXT_MODEL_PATH": "/tmp/objectverse-text-model.gguf",
245
+ }
246
+ fake_llama = FakeLlamaModel(
247
+ [
248
+ """
249
+ {
250
+ "persona": {
251
+ "object_name": "coffee mug",
252
+ "character_name": "Mugworth",
253
+ "mood": "dry and suspicious",
254
+ "secret_fear": "being left empty forever",
255
+ "core_memory": "It remembers every late-night refill.",
256
+ "complaint": "I am treated like a ceramic fuel tank.",
257
+ "tags": ["desk witness", "warm archive", "quiet judgment"]
258
+ },
259
+ "diary": {
260
+ "title": "Secret Diary - Day 418",
261
+ "english": "Today I held another bitter storm and called it service.",
262
+ "chinese": "今天我又装下一场苦涩风暴,并被称为有用。"
263
+ }
264
+ }
265
+ """,
266
+ ]
267
+ )
268
+
269
+ with (
270
+ patch.dict("os.environ", env, clear=False),
271
+ patch("src.models.llama_cpp_runner._load_llama_model", return_value=fake_llama),
272
+ ):
273
+ result = generate_object_diary(None, "old white coffee mug", "Cynical", save=False)
274
+
275
+ self.assertEqual(result.persona.persona.character_name, "Mugworth")
276
+ self.assertEqual(result.diary.title, "Secret Diary - Day 418")
277
+ self.assertEqual(fake_llama.calls, 1)
278
+ self.assertNotIn("text-fallback-to-mock", result.trace.fallbacks)
279
+
280
  def test_minicpm_vision_backend_accepts_valid_json(self) -> None:
281
  response = """
282
  {"object":{"name":"coffee mug","visible_features":["white ceramic","round handle","desk shadow"],"likely_context":"work desk","confidence":0.88}}