Spaces:

build-small-hackathon
/

ObjectverseDiary

Paused

App Files Files Community

qqyule commited on Jun 8

Commit

c45600f

verified ·

1 Parent(s): 4a4024d

Deploy Hub GGUF downloader runtime

Browse files

Files changed (6) hide show

README.md +29 -18
docs/RUNTIME.md +37 -8
requirements.txt +2 -0
src/config.py +17 -3
src/models/llama_cpp_runner.py +79 -8
tests/test_mock_mvp.py +100 -0

README.md CHANGED Viewed

@@ -23,15 +23,15 @@ Upload a photo of any everyday object. The app wakes it up, gives it a secret pe
 ## Current Status
-Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, non-secret hosted vision diagnostics, optional llama.cpp text runtime wiring, a local GGUF smoke-test helper, public mock traces, Space validation evidence, and a published Qwen 1.5B LoRA test adapter are available.
 By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
 `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the optional MiniCPM-V 2.6 vision path. The hosted ZeroGPU validation on June 8, 2026 passed for public mug, keyboard, and shoe images after the Space received an `HF_TOKEN` secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The public Space still rolls back to mock mode after validation so the default demo remains stable.
-`OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. No GGUF file is committed in this stable submission baseline. A short Modal-trained LoRA adapter is published for Well-Tuned evidence, but it is not converted to GGUF or wired into the public Space runtime yet.
-`scripts/check_llama_cpp_smoke.py` is available for an explicit-confirmation local GGUF smoke test. The recommended baseline smoke model is `Qwen/Qwen2.5-1.5B-Instruct-GGUF` with `qwen2.5-1.5b-instruct-q4_k_m.gguf`, stored under ignored `models/` when used locally.
 Hugging Face Space:
@@ -61,14 +61,14 @@ The interface is English-first and Chinese-second.
 - [x] Sharing is Caring — public mock traces, JSONL export, prompt templates, and failure notes.
 - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
 - [x] OpenBMB Special — MiniCPM-V 2.6 wiring exists and hosted ZeroGPU validation passed for mug, keyboard, and shoe.
-- [ ] Llama Champion — llama.cpp wiring and smoke helper exist, but real GGUF smoke test is not complete.
-- [x] Well-Tuned — synthetic curated SFT dataset and Qwen 1.5B LoRA test adapter are published.
 - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
 ## Planned Model Stack
 - Vision: MiniCPM-V 2.6 or deterministic mock fallback
-- Text: deterministic mock text now; published Qwen 1.5B LoRA test adapter for training evidence; optional GGUF later
 - Runtime: llama.cpp / llama-cpp-python
 - UI: Gradio Blocks
@@ -81,8 +81,8 @@ Stable baseline:
 - default vision backend: deterministic mock, 0 active model parameters
 - default text backend: deterministic mock, 0 active model parameters
 - optional wired vision model: MiniCPM-V 2.6, about 8B parameters when enabled
-- optional text base for published LoRA adapter: Qwen/Qwen2.5-1.5B-Instruct, about 1.5B parameters
-- optional text GGUF: not converted or committed yet
 The stable public demo therefore stays within the 32B budget. Optional MiniCPM-V plus Qwen 1.5B remains about 9.5B plus a small LoRA adapter, safely under the 32B budget.
@@ -97,26 +97,36 @@ Then open the local Gradio URL printed in the terminal.
 ## Optional llama.cpp Text Runtime
-The project does not commit GGUF files or require `llama-cpp-python` by default. To try a local GGUF text model:
 ```bash
-pip install llama-cpp-python
 OBJECTVERSE_TEXT_BACKEND=llama-cpp \
 TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
 python app.py
 ```
-If `llama-cpp-python` is missing, `TEXT_MODEL_PATH` is empty, the model cannot load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
 Recommended explicit-confirmation smoke path:
 ```bash
-# Download externally, do not commit the GGUF:
-# https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF
-# file: qwen2.5-1.5b-instruct-q4_k_m.gguf
 .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
-  --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
 ```
 ## Initial MVP Flow
@@ -141,8 +151,9 @@ The stable submission baseline supports:
 - Initial acceptance report: `docs/INITIAL_STAGE_REPORT.md`
 - Runtime notes: `docs/RUNTIME.md`
 - Dataset preview notes: `docs/DATASET.md`
-- Synthetic curated dataset: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
-- Fine-tuned LoRA adapter: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
 - Public mock traces: `data/traces/samples/`
 - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
 - Hosted VLM validation evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`

 ## Current Status
+Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, non-secret hosted vision diagnostics, optional llama.cpp text runtime wiring, a passing local LoRA v2 GGUF smoke test, public mock traces, Space validation evidence, a published curated v2 SFT dataset, a published Qwen 1.5B LoRA v2 adapter, and a published Q4_K_M GGUF are available.
 By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
 `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the optional MiniCPM-V 2.6 vision path. The hosted ZeroGPU validation on June 8, 2026 passed for public mug, keyboard, and shoe images after the Space received an `HF_TOKEN` secret with access to the gated `openbmb/MiniCPM-V-2_6` model. The public Space still rolls back to mock mode after validation so the default demo remains stable.
+`OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. The Modal-trained LoRA v2 adapter has been merged with `Qwen/Qwen2.5-1.5B-Instruct`, quantized to Q4_K_M, uploaded to the same model repo, and smoke-tested locally through llama.cpp. No GGUF file is committed in Git, and the public Space is still kept on the mock-safe text runtime until a separate Space validation pass is run.
+`scripts/check_llama_cpp_smoke.py` passed locally on June 8, 2026 with `models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`. The published GGUF is available in `qqyule/objectverse-diary-qwen15b-lora` as `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`.
 Hugging Face Space:
 - [x] Sharing is Caring — public mock traces, JSONL export, prompt templates, and failure notes.
 - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
 - [x] OpenBMB Special — MiniCPM-V 2.6 wiring exists and hosted ZeroGPU validation passed for mug, keyboard, and shoe.
+- [x] Llama Champion — local llama.cpp GGUF runtime passed with the published LoRA v2 Q4_K_M model; Space text runtime remains mock-safe.
+- [x] Well-Tuned — synthetic curated v2 SFT dataset and Qwen 1.5B LoRA v2 adapter are published.
 - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
 ## Planned Model Stack
 - Vision: MiniCPM-V 2.6 or deterministic mock fallback
+- Text: deterministic mock text by default; optional published Qwen 1.5B LoRA v2 Q4_K_M GGUF for local llama.cpp runtime
 - Runtime: llama.cpp / llama-cpp-python
 - UI: Gradio Blocks
 - default vision backend: deterministic mock, 0 active model parameters
 - default text backend: deterministic mock, 0 active model parameters
 - optional wired vision model: MiniCPM-V 2.6, about 8B parameters when enabled
+- optional text base for published LoRA v2 adapter: Qwen/Qwen2.5-1.5B-Instruct, about 1.5B parameters
+- optional text GGUF: published `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`, about 1.5B base parameters plus a small merged LoRA delta; not committed to Git
 The stable public demo therefore stays within the 32B budget. Optional MiniCPM-V plus Qwen 1.5B remains about 9.5B plus a small LoRA adapter, safely under the 32B budget.
 ## Optional llama.cpp Text Runtime
+The project does not commit GGUF files. The Space dependencies include `llama-cpp-python`, but the model is only used when `OBJECTVERSE_TEXT_BACKEND=llama-cpp`. To try a local GGUF text model:
 ```bash
 OBJECTVERSE_TEXT_BACKEND=llama-cpp \
 TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
 python app.py
 ```
+For Hugging Face Space runtime, use Hub download variables instead of committing the GGUF:
+```bash
+OBJECTVERSE_TEXT_BACKEND=llama-cpp
+TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
+TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
+```
+If `llama-cpp-python` is missing, no local or Hub model source is configured, the model cannot download/load, or the model returns invalid JSON, the app falls back to deterministic mock text generation and records `text-fallback-to-mock` in traces.
 Recommended explicit-confirmation smoke path:
 ```bash
 .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
+  --model-path models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
+```
+Published GGUF source:
+```text
+repo: qqyule/objectverse-diary-qwen15b-lora
+file: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
 ```
 ## Initial MVP Flow
 - Initial acceptance report: `docs/INITIAL_STAGE_REPORT.md`
 - Runtime notes: `docs/RUNTIME.md`
 - Dataset preview notes: `docs/DATASET.md`
+- Synthetic curated v2 dataset: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
+- Fine-tuned LoRA v2 adapter: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
+- LoRA v2 Q4_K_M GGUF: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora/blob/main/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
 - Public mock traces: `data/traces/samples/`
 - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
 - Hosted VLM validation evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`

docs/RUNTIME.md CHANGED Viewed

@@ -29,16 +29,33 @@ This only replaces object understanding. Persona generation, diary generation, a
 Optional llama.cpp text generation can be enabled without changing the UI:
 ```bash
-pip install llama-cpp-python
 OBJECTVERSE_TEXT_BACKEND=llama-cpp \
 TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
 .venv/bin/python app.py
 ```
-`llama-cpp-python` is intentionally not a required dependency yet. Missing package, missing model path, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
 The runtime trace intentionally records only whether an external GGUF path was configured, not the literal `TEXT_MODEL_PATH`, so local private paths do not leak into public traces.
 ## Runtime Diagnostics
 The Gradio app exposes two hidden diagnostic APIs:
@@ -52,19 +69,19 @@ These APIs are for validation scripts and are not visible in the main UI. They m
 ## Optional GGUF Smoke Test
-Recommended baseline smoke model:
 ```text
-repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
-file: qwen2.5-1.5b-instruct-q4_k_m.gguf
-local path: models/qwen2.5-1.5b-instruct-q4_k_m.gguf
 ```
-The `models/` directory and `*.gguf` are ignored by Git. After downloading the file externally and installing optional `llama-cpp-python` after confirmation, run:
 ```bash
 .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
-  --model-path models/qwen2.5-1.5b-instruct-q4_k_m.gguf
 ```
 A passing smoke test must show `llama-cpp text generation` and must not include `text-fallback-to-mock` in either generation or chat fallback markers.
@@ -76,6 +93,9 @@ OBJECTVERSE_VISION_BACKEND=mock
 OBJECTVERSE_TEXT_BACKEND=mock
 VISION_MODEL_ID=
 TEXT_MODEL_PATH=
 TRACE_OUTPUT_DIR=data/traces
 ```
@@ -96,6 +116,15 @@ OBJECTVERSE_TEXT_BACKEND=llama-cpp
 TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
 ```
 Do not commit GGUF files or private model paths.
 ## Future Runtime Boundary

 Optional llama.cpp text generation can be enabled without changing the UI:
 ```bash
 OBJECTVERSE_TEXT_BACKEND=llama-cpp \
 TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf \
 .venv/bin/python app.py
 ```
+For a hosted Space where the GGUF is stored on Hugging Face Hub instead of the local filesystem, configure the Hub source instead of `TEXT_MODEL_PATH`:
+```bash
+OBJECTVERSE_TEXT_BACKEND=llama-cpp
+TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
+TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
+```
+`TEXT_MODEL_REVISION` is optional and defaults to the Hub repo default branch. If `TEXT_MODEL_PATH` is set, it takes precedence over Hub download variables.
+`llama-cpp-python` and `huggingface_hub` are installed by the Space runtime dependencies. Missing package, missing model path, download errors, model loading errors, invalid JSON, or schema validation errors all fall back to deterministic mock text generation.
 The runtime trace intentionally records only whether an external GGUF path was configured, not the literal `TEXT_MODEL_PATH`, so local private paths do not leak into public traces.
+Local LoRA v2 GGUF status:
+- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
+- Adapter / GGUF repo: `qqyule/objectverse-diary-qwen15b-lora`
+- Published GGUF: `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf`
+- Local smoke: passed on 2026-06-08 with `llama-cpp text generation` and no `text-fallback-to-mock`
+- Space runtime: not switched to llama.cpp yet; the public Space text path remains mock-safe until a separate Space validation passes
 ## Runtime Diagnostics
 The Gradio app exposes two hidden diagnostic APIs:
 ## Optional GGUF Smoke Test
+Recommended LoRA v2 smoke model:
 ```text
+repo: qqyule/objectverse-diary-qwen15b-lora
+file: objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
+local path: models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
 ```
+The `models/` directory and `*.gguf` are ignored by Git. After downloading the file externally and installing optional `llama-cpp-python`, run:
 ```bash
 .venv/bin/python -B scripts/check_llama_cpp_smoke.py \
+  --model-path models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
 ```
 A passing smoke test must show `llama-cpp text generation` and must not include `text-fallback-to-mock` in either generation or chat fallback markers.
 OBJECTVERSE_TEXT_BACKEND=mock
 VISION_MODEL_ID=
 TEXT_MODEL_PATH=
+TEXT_MODEL_REPO_ID=
+TEXT_MODEL_FILENAME=
+TEXT_MODEL_REVISION=
 TRACE_OUTPUT_DIR=data/traces
 ```
 TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
 ```
+For a Space runtime that should download the published LoRA v2 GGUF from Hub, set:
+```bash
+OBJECTVERSE_VISION_BACKEND=mock
+OBJECTVERSE_TEXT_BACKEND=llama-cpp
+TEXT_MODEL_REPO_ID=qqyule/objectverse-diary-qwen15b-lora
+TEXT_MODEL_FILENAME=objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
+```
 Do not commit GGUF files or private model paths.
 ## Future Runtime Boundary

requirements.txt CHANGED Viewed

@@ -7,3 +7,5 @@ Pillow
 sentencepiece
 accelerate
 spaces>=0.30

 sentencepiece
 accelerate
 spaces>=0.30
+huggingface_hub>=0.34,<1
+llama-cpp-python>=0.3,<0.4

src/config.py CHANGED Viewed

@@ -26,6 +26,9 @@ class RuntimeSettings:
     vision_backend: str
     text_backend: str
     text_model_path: str
     vision_model_id: str
     trace_output_dir: Path
@@ -36,6 +39,9 @@ def get_runtime_settings(environ: Mapping[str, str] | None = None) -> RuntimeSet
         vision_backend=env.get("OBJECTVERSE_VISION_BACKEND", "mock"),
         text_backend=env.get("OBJECTVERSE_TEXT_BACKEND", "mock"),
         text_model_path=env.get("TEXT_MODEL_PATH", ""),
         vision_model_id=env.get("VISION_MODEL_ID", ""),
         trace_output_dir=Path(env.get("TRACE_OUTPUT_DIR", "data/traces")),
     )
@@ -61,13 +67,21 @@ def runtime_status(settings: RuntimeSettings | None = None) -> dict[str, str]:
     if text_backend == "mock":
         runtime_parts.append("no llama.cpp model connected yet")
     else:
-        runtime_parts.append(f"text model path: {_text_model_path_status(current.text_model_path)}")
     runtime = "; ".join(runtime_parts)
     return {"vision": vision, "text": text, "runtime": runtime}
-def _text_model_path_status(text_model_path: str) -> str:
-    return "[configured external GGUF]" if text_model_path.strip() else "[not configured]"
 SETTINGS = get_runtime_settings()

     vision_backend: str
     text_backend: str
     text_model_path: str
+    text_model_repo_id: str
+    text_model_filename: str
+    text_model_revision: str
     vision_model_id: str
     trace_output_dir: Path
         vision_backend=env.get("OBJECTVERSE_VISION_BACKEND", "mock"),
         text_backend=env.get("OBJECTVERSE_TEXT_BACKEND", "mock"),
         text_model_path=env.get("TEXT_MODEL_PATH", ""),
+        text_model_repo_id=env.get("TEXT_MODEL_REPO_ID", ""),
+        text_model_filename=env.get("TEXT_MODEL_FILENAME", ""),
+        text_model_revision=env.get("TEXT_MODEL_REVISION", ""),
         vision_model_id=env.get("VISION_MODEL_ID", ""),
         trace_output_dir=Path(env.get("TRACE_OUTPUT_DIR", "data/traces")),
     )
     if text_backend == "mock":
         runtime_parts.append("no llama.cpp model connected yet")
     else:
+        runtime_parts.append(f"text model source: {_text_model_source_status(current)}")
     runtime = "; ".join(runtime_parts)
     return {"vision": vision, "text": text, "runtime": runtime}
+def _text_model_source_status(settings: RuntimeSettings) -> str:
+    if settings.text_model_path.strip():
+        return "[configured external GGUF]"
+    repo_id = settings.text_model_repo_id.strip()
+    filename = settings.text_model_filename.strip()
+    if repo_id and filename:
+        revision = settings.text_model_revision.strip()
+        suffix = f"@{revision}" if revision else ""
+        return f"Hub GGUF: {repo_id}/{filename}{suffix}"
+    return "[not configured]"
 SETTINGS = get_runtime_settings()

src/models/llama_cpp_runner.py CHANGED Viewed

@@ -8,7 +8,11 @@ from typing import Any
 from src.config import RuntimeSettings, get_runtime_settings
 from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
-from src.prompts.diary_generation import CHAT_REPLY_PROMPT, DIARY_GENERATION_PROMPT
 from src.prompts.persona_generation import PERSONA_GENERATION_PROMPT
 from src.utils.json_repair import parse_json_object
@@ -61,6 +65,22 @@ def generate_persona(object_understanding: ObjectUnderstanding, mode: str) -> Pe
     return _generate_persona_mock(object_understanding, mode)
 def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
     settings = get_runtime_settings()
     if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
@@ -164,6 +184,25 @@ def _generate_persona_llama_cpp(
     return PersonaEnvelope.model_validate(raw)
 def _generate_diary_llama_cpp(
     persona: PersonaEnvelope,
     mode: str,
@@ -209,7 +248,7 @@ def _run_llama_json(
     settings: RuntimeSettings,
     max_tokens: int,
 ) -> dict[str, Any]:
-    model = _load_llama_model(settings.text_model_path)
     user_content = json.dumps(user_payload, ensure_ascii=False, indent=2)
     raw = _complete_llama(
         model,
@@ -234,7 +273,8 @@ def _complete_llama(
                 {"role": "system", "content": system_prompt},
                 {"role": "user", "content": user_content},
             ],
-            temperature=0.75,
             max_tokens=max_tokens,
             stop=stop,
         )
@@ -243,7 +283,8 @@ def _complete_llama(
     prompt = f"System:\n{system_prompt}\n\nUser:\n{user_content}\n\nAssistant JSON:\n"
     response = model(
         prompt,
-        temperature=0.75,
         max_tokens=max_tokens,
         stop=stop,
     )
@@ -272,12 +313,10 @@ def _extract_completion_text(response: Any) -> str:
     raise ValueError("llama.cpp response did not include text content.")
-def _load_llama_model(text_model_path: str) -> Any:
     global _LLAMA_MODEL, _LLAMA_MODEL_PATH
-    clean_path = text_model_path.strip()
-    if not clean_path:
-        raise ValueError("TEXT_MODEL_PATH is not configured.")
     if not Path(clean_path).exists():
         raise FileNotFoundError(f"TEXT_MODEL_PATH does not exist: {clean_path}")
@@ -295,6 +334,38 @@ def _load_llama_model(text_model_path: str) -> Any:
     return _LLAMA_MODEL
 def _is_llama_cpp_backend(settings: RuntimeSettings) -> bool:
     return settings.text_backend.strip().lower() in LLAMA_CPP_BACKENDS

 from src.config import RuntimeSettings, get_runtime_settings
 from src.models.schema import DiaryEntry, ObjectUnderstanding, Persona, PersonaEnvelope
+from src.prompts.diary_generation import (
+    CHAT_REPLY_PROMPT,
+    DIARY_GENERATION_PROMPT,
+    PERSONA_DIARY_GENERATION_PROMPT,
+)
 from src.prompts.persona_generation import PERSONA_GENERATION_PROMPT
 from src.utils.json_repair import parse_json_object
     return _generate_persona_mock(object_understanding, mode)
+def generate_persona_and_diary(
+    object_understanding: ObjectUnderstanding,
+    mode: str,
+) -> tuple[PersonaEnvelope, DiaryEntry]:
+    settings = get_runtime_settings()
+    if _is_llama_cpp_backend(settings):
+        try:
+            return _generate_persona_and_diary_llama_cpp(object_understanding, mode, settings)
+        except Exception as exc:
+            _log_text_fallback("persona+diary", exc)
+            _add_text_fallback(TEXT_FALLBACK_TO_MOCK)
+    persona = _generate_persona_mock(object_understanding, mode)
+    return persona, _generate_diary_mock(persona, mode)
 def generate_diary(persona: PersonaEnvelope, mode: str) -> DiaryEntry:
     settings = get_runtime_settings()
     if _is_llama_cpp_backend(settings) and TEXT_FALLBACK_TO_MOCK not in _TEXT_FALLBACKS:
     return PersonaEnvelope.model_validate(raw)
+def _generate_persona_and_diary_llama_cpp(
+    object_understanding: ObjectUnderstanding,
+    mode: str,
+    settings: RuntimeSettings,
+) -> tuple[PersonaEnvelope, DiaryEntry]:
+    raw = _run_llama_json(
+        system_prompt=PERSONA_DIARY_GENERATION_PROMPT,
+        user_payload={
+            "mode": mode,
+            "object_understanding": object_understanding.model_dump(mode="json"),
+        },
+        settings=settings,
+        max_tokens=1024,
+    )
+    persona = PersonaEnvelope.model_validate({"persona": raw.get("persona")})
+    diary = DiaryEntry.model_validate(raw.get("diary"))
+    return persona, diary
 def _generate_diary_llama_cpp(
     persona: PersonaEnvelope,
     mode: str,
     settings: RuntimeSettings,
     max_tokens: int,
 ) -> dict[str, Any]:
+    model = _load_llama_model(settings.text_model_path, settings=settings)
     user_content = json.dumps(user_payload, ensure_ascii=False, indent=2)
     raw = _complete_llama(
         model,
                 {"role": "system", "content": system_prompt},
                 {"role": "user", "content": user_content},
             ],
+            temperature=0.2,
+            top_p=0.9,
             max_tokens=max_tokens,
             stop=stop,
         )
     prompt = f"System:\n{system_prompt}\n\nUser:\n{user_content}\n\nAssistant JSON:\n"
     response = model(
         prompt,
+        temperature=0.2,
+        top_p=0.9,
         max_tokens=max_tokens,
         stop=stop,
     )
     raise ValueError("llama.cpp response did not include text content.")
+def _load_llama_model(text_model_path: str, *, settings: RuntimeSettings | None = None) -> Any:
     global _LLAMA_MODEL, _LLAMA_MODEL_PATH
+    clean_path = _resolve_text_model_path(text_model_path, settings)
     if not Path(clean_path).exists():
         raise FileNotFoundError(f"TEXT_MODEL_PATH does not exist: {clean_path}")
     return _LLAMA_MODEL
+def _resolve_text_model_path(
+    text_model_path: str,
+    settings: RuntimeSettings | None = None,
+) -> str:
+    clean_path = text_model_path.strip()
+    if clean_path:
+        return clean_path
+    current = settings or get_runtime_settings()
+    if current.text_model_repo_id.strip() and current.text_model_filename.strip():
+        return _download_hf_gguf(current)
+    raise ValueError(
+        "TEXT_MODEL_PATH is not configured, and TEXT_MODEL_REPO_ID/TEXT_MODEL_FILENAME "
+        "are not configured."
+    )
+def _download_hf_gguf(settings: RuntimeSettings) -> str:
+    from huggingface_hub import hf_hub_download
+    kwargs: dict[str, str] = {
+        "repo_id": settings.text_model_repo_id.strip(),
+        "filename": settings.text_model_filename.strip(),
+        "repo_type": "model",
+    }
+    revision = settings.text_model_revision.strip()
+    if revision:
+        kwargs["revision"] = revision
+    return hf_hub_download(**kwargs)
 def _is_llama_cpp_backend(settings: RuntimeSettings) -> bool:
     return settings.text_backend.strip().lower() in LLAMA_CPP_BACKENDS

tests/test_mock_mvp.py CHANGED Viewed

@@ -3,11 +3,14 @@
 from __future__ import annotations
 import json
 import tempfile
 import unittest
 from pathlib import Path
 from unittest.mock import patch
 from src.example_cache import load_sample_generation, sample_trace_path
 from src.examples import EXAMPLE_OBJECTS, gradio_examples
 from src.models.llama_cpp_runner import (
@@ -39,8 +42,10 @@ class FakeMiniCpmModel:
 class FakeLlamaModel:
     def __init__(self, responses: list[str]) -> None:
         self.responses = responses
     def create_chat_completion(self, **_: object) -> dict:
         response = self.responses.pop(0)
         return {"choices": [{"message": {"content": response}}]}
@@ -72,6 +77,62 @@ class MockMvpTest(unittest.TestCase):
         self.assertIn("[configured external GGUF]", status["runtime"])
         self.assertNotIn("/Users/leo", status["runtime"])
     def test_examples_cover_six_objects(self) -> None:
         self.assertEqual(len(EXAMPLE_OBJECTS), 6)
         self.assertEqual(len(gradio_examples()), 6)
@@ -177,6 +238,45 @@ class MockMvpTest(unittest.TestCase):
         self.assertIn("text-fallback-to-mock", result.trace.fallbacks)
         self.assertEqual(result.trace.model_runtime["text"], "llama-cpp text generation")
     def test_minicpm_vision_backend_accepts_valid_json(self) -> None:
         response = """
         {"object":{"name":"coffee mug","visible_features":["white ceramic","round handle","desk shadow"],"likely_context":"work desk","confidence":0.88}}

 from __future__ import annotations
 import json
+import sys
 import tempfile
+import types
 import unittest
 from pathlib import Path
 from unittest.mock import patch
+import src.models.llama_cpp_runner as llama_cpp_runner
 from src.example_cache import load_sample_generation, sample_trace_path
 from src.examples import EXAMPLE_OBJECTS, gradio_examples
 from src.models.llama_cpp_runner import (
 class FakeLlamaModel:
     def __init__(self, responses: list[str]) -> None:
         self.responses = responses
+        self.calls = 0
     def create_chat_completion(self, **_: object) -> dict:
+        self.calls += 1
         response = self.responses.pop(0)
         return {"choices": [{"message": {"content": response}}]}
         self.assertIn("[configured external GGUF]", status["runtime"])
         self.assertNotIn("/Users/leo", status["runtime"])
+    def test_llama_cpp_hub_runtime_status_uses_public_repo_summary(self) -> None:
+        settings = get_runtime_settings(
+            {
+                "OBJECTVERSE_TEXT_BACKEND": "llama-cpp",
+                "TEXT_MODEL_REPO_ID": "qqyule/objectverse-diary-qwen15b-lora",
+                "TEXT_MODEL_FILENAME": "objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf",
+            }
+        )
+        status = runtime_status(settings)
+        self.assertEqual(settings.text_model_repo_id, "qqyule/objectverse-diary-qwen15b-lora")
+        self.assertEqual(settings.text_model_filename, "objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf")
+        self.assertIn("Hub GGUF", status["runtime"])
+        self.assertIn("qqyule/objectverse-diary-qwen15b-lora", status["runtime"])
+        self.assertNotIn("/home", status["runtime"])
+        self.assertNotIn("/Users", status["runtime"])
+    def test_llama_cpp_loads_model_from_hub_config_when_path_is_missing(self) -> None:
+        previous_model = llama_cpp_runner._LLAMA_MODEL
+        previous_path = llama_cpp_runner._LLAMA_MODEL_PATH
+        llama_cpp_runner._LLAMA_MODEL = None
+        llama_cpp_runner._LLAMA_MODEL_PATH = None
+        loaded_paths: list[str] = []
+        class FakeLlama:
+            def __init__(self, *, model_path: str, **_: object) -> None:
+                loaded_paths.append(model_path)
+        fake_module = types.ModuleType("llama_cpp")
+        fake_module.Llama = FakeLlama
+        try:
+            with tempfile.TemporaryDirectory() as tmp_dir:
+                model_path = Path(tmp_dir) / "model.gguf"
+                model_path.write_bytes(b"GGUF")
+                settings = get_runtime_settings(
+                    {
+                        "OBJECTVERSE_TEXT_BACKEND": "llama-cpp",
+                        "TEXT_MODEL_REPO_ID": "qqyule/objectverse-diary-qwen15b-lora",
+                        "TEXT_MODEL_FILENAME": "objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf",
+                    }
+                )
+                with (
+                    patch.dict(sys.modules, {"llama_cpp": fake_module}),
+                    patch("src.models.llama_cpp_runner._download_hf_gguf", return_value=str(model_path)),
+                ):
+                    llama_cpp_runner._load_llama_model("", settings=settings)
+            self.assertEqual(loaded_paths, [str(model_path)])
+        finally:
+            llama_cpp_runner._LLAMA_MODEL = previous_model
+            llama_cpp_runner._LLAMA_MODEL_PATH = previous_path
     def test_examples_cover_six_objects(self) -> None:
         self.assertEqual(len(EXAMPLE_OBJECTS), 6)
         self.assertEqual(len(gradio_examples()), 6)
         self.assertIn("text-fallback-to-mock", result.trace.fallbacks)
         self.assertEqual(result.trace.model_runtime["text"], "llama-cpp text generation")
+    def test_pipeline_uses_combined_llama_cpp_persona_and_diary(self) -> None:
+        env = {
+            "OBJECTVERSE_TEXT_BACKEND": "llama-cpp",
+            "TEXT_MODEL_PATH": "/tmp/objectverse-text-model.gguf",
+        }
+        fake_llama = FakeLlamaModel(
+            [
+                """
+                {
+                  "persona": {
+                    "object_name": "coffee mug",
+                    "character_name": "Mugworth",
+                    "mood": "dry and suspicious",
+                    "secret_fear": "being left empty forever",
+                    "core_memory": "It remembers every late-night refill.",
+                    "complaint": "I am treated like a ceramic fuel tank.",
+                    "tags": ["desk witness", "warm archive", "quiet judgment"]
+                  },
+                  "diary": {
+                    "title": "Secret Diary - Day 418",
+                    "english": "Today I held another bitter storm and called it service.",
+                    "chinese": "今天我又装下一场苦涩风暴，并被称为有用。"
+                  }
+                }
+                """,
+            ]
+        )
+        with (
+            patch.dict("os.environ", env, clear=False),
+            patch("src.models.llama_cpp_runner._load_llama_model", return_value=fake_llama),
+        ):
+            result = generate_object_diary(None, "old white coffee mug", "Cynical", save=False)
+        self.assertEqual(result.persona.persona.character_name, "Mugworth")
+        self.assertEqual(result.diary.title, "Secret Diary - Day 418")
+        self.assertEqual(fake_llama.calls, 1)
+        self.assertNotIn("text-fallback-to-mock", result.trace.fallbacks)
     def test_minicpm_vision_backend_accepts_valid_json(self) -> None:
         response = """
         {"object":{"name":"coffee mug","visible_features":["white ceramic","round handle","desk shadow"],"likely_context":"work desk","confidence":0.88}}