Spaces:

build-small-hackathon
/

ObjectverseDiary

Paused

App Files Files Community

qqyule commited on Jun 8

Commit

9e874de

verified ·

1 Parent(s): 1e2c036

Update Objectverse Diary submission package

Browse files

Files changed (16) hide show

README.md +9 -6
data/train/objectverse_sft_curated.jsonl +0 -0
docs/DATASET.md +57 -0
docs/DEVELOPMENT_STATUS.md +6 -3
docs/MODEL_CARD.md +29 -6
docs/SUBMISSION_GUIDE.md +7 -5
requirements-training.txt +2 -0
scripts/README.md +57 -2
scripts/finetune_lora.py +426 -0
scripts/prepare_curated_dataset.py +275 -0
scripts/publish_hf_adapter.py +104 -0
src/ui/layout.py +206 -101
src/ui/styles.css +618 -733
tests/test_dataset_tooling.py +24 -0
tests/test_finetune_lora_tooling.py +86 -0
tests/test_publish_hf_adapter.py +40 -0

README.md CHANGED Viewed

@@ -23,13 +23,13 @@ Upload a photo of any everyday object. The app wakes it up, gives it a secret pe
 ## Current Status
-Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, optional llama.cpp text runtime wiring, public mock traces, and Space validation evidence are available.
 By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
 `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the optional MiniCPM-V 2.6 vision path. The hosted ZeroGPU validation on June 8, 2026 reached the Space but fell back to mock vision for all three public test images; this is documented in `docs/SPACE_VLM_REPORT.md` and `docs/FAILURES.md`.
-`OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. No GGUF file or fine-tuned model is committed in this stable submission baseline.
 Hugging Face Space:
@@ -60,13 +60,13 @@ The interface is English-first and Chinese-second.
 - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
 - [ ] OpenBMB Special — MiniCPM-V wiring exists, but hosted validation currently falls back to mock vision.
 - [ ] Llama Champion — llama.cpp wiring exists, but real GGUF smoke test is not complete.
-- [ ] Well-Tuned — dataset preview exists, but LoRA training/model publishing is not complete.
 - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
 ## Planned Model Stack
 - Vision: MiniCPM-V 2.6 or deterministic mock fallback
-- Text: deterministic mock text now; optional GGUF later
 - Runtime: llama.cpp / llama-cpp-python
 - UI: Gradio Blocks
@@ -79,9 +79,10 @@ Stable baseline:
 - default vision backend: deterministic mock, 0 active model parameters
 - default text backend: deterministic mock, 0 active model parameters
 - optional wired vision model: MiniCPM-V 2.6, about 8B parameters when enabled
-- optional text GGUF: not selected or committed yet
-The stable public demo therefore stays within the 32B budget. Future GGUF or LoRA work must update `docs/MODEL_CARD.md` before being claimed in submission materials.
 ## Run Locally
@@ -127,6 +128,8 @@ The stable submission baseline supports:
 - Initial acceptance report: `docs/INITIAL_STAGE_REPORT.md`
 - Runtime notes: `docs/RUNTIME.md`
 - Dataset preview notes: `docs/DATASET.md`
 - Public mock traces: `data/traces/samples/`
 - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
 - Hosted VLM failure evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`

 ## Current Status
+Stable mock-safe submission baseline, MiniCPM-V vision backend wiring, optional llama.cpp text runtime wiring, public mock traces, Space validation evidence, and a published Qwen 1.5B LoRA test adapter are available.
 By default, the app uses deterministic mock outputs for object understanding, persona generation, diary writing, chat replies, share card rendering, and trace saving. This keeps the public demo reproducible and avoids commercial AI APIs.
 `OBJECTVERSE_VISION_BACKEND=minicpm-v` enables the optional MiniCPM-V 2.6 vision path. The hosted ZeroGPU validation on June 8, 2026 reached the Space but fell back to mock vision for all three public test images; this is documented in `docs/SPACE_VLM_REPORT.md` and `docs/FAILURES.md`.
+`OBJECTVERSE_TEXT_BACKEND=llama-cpp` can use a local GGUF model through optional `llama-cpp-python` when `TEXT_MODEL_PATH` is configured. No GGUF file is committed in this stable submission baseline. A short Modal-trained LoRA adapter is published for Well-Tuned evidence, but it is not converted to GGUF or wired into the public Space runtime yet.
 Hugging Face Space:
 - [x] Field Notes — article draft in `docs/FIELD_NOTES.md`.
 - [ ] OpenBMB Special — MiniCPM-V wiring exists, but hosted validation currently falls back to mock vision.
 - [ ] Llama Champion — llama.cpp wiring exists, but real GGUF smoke test is not complete.
+- [x] Well-Tuned — synthetic curated SFT dataset and Qwen 1.5B LoRA test adapter are published.
 - [ ] Off the Grid — no commercial AI APIs are used; final badge eligibility depends on hackathon review.
 ## Planned Model Stack
 - Vision: MiniCPM-V 2.6 or deterministic mock fallback
+- Text: deterministic mock text now; published Qwen 1.5B LoRA test adapter for training evidence; optional GGUF later
 - Runtime: llama.cpp / llama-cpp-python
 - UI: Gradio Blocks
 - default vision backend: deterministic mock, 0 active model parameters
 - default text backend: deterministic mock, 0 active model parameters
 - optional wired vision model: MiniCPM-V 2.6, about 8B parameters when enabled
+- optional text base for published LoRA adapter: Qwen/Qwen2.5-1.5B-Instruct, about 1.5B parameters
+- optional text GGUF: not converted or committed yet
+The stable public demo therefore stays within the 32B budget. Optional MiniCPM-V plus Qwen 1.5B remains about 9.5B plus a small LoRA adapter, safely under the 32B budget.
 ## Run Locally
 - Initial acceptance report: `docs/INITIAL_STAGE_REPORT.md`
 - Runtime notes: `docs/RUNTIME.md`
 - Dataset preview notes: `docs/DATASET.md`
+- Synthetic curated dataset: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
+- Fine-tuned LoRA adapter: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
 - Public mock traces: `data/traces/samples/`
 - Trace JSONL export: `data/traces/samples/objectverse_public_mock_traces.jsonl`
 - Hosted VLM failure evidence: `docs/SPACE_VLM_REPORT.md`, `docs/SPACE_VLM_REPORT.json`, `data/traces/space-vlm/`

data/train/objectverse_sft_curated.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

docs/DATASET.md CHANGED Viewed

@@ -20,6 +20,22 @@ This preview is mock-generated. It is not a final training dataset and should no
 The stable submission baseline does not publish a final Hugging Face Dataset. The current JSONL file is evidence for schema and workflow readiness only.
 ## Target Dataset
 Final target before fine-tuning:
@@ -70,6 +86,47 @@ Manual curation should happen after generation. Do not publish the full candidat
 Space VLM validation traces under `data/traces/space-vlm/` are failure evidence because they include `vision-fallback-to-mock`. Do not mix them into curated training data or describe them as successful real VLM outputs.
 ## Curation Checklist
 - Persona stays consistent with the object.

 The stable submission baseline does not publish a final Hugging Face Dataset. The current JSONL file is evidence for schema and workflow readiness only.
+Additional local training-test artifact:
+```bash
+.venv/bin/python -B scripts/prepare_curated_dataset.py \
+  --count 50 \
+  --output data/train/objectverse_sft_curated.jsonl
+```
+This file is synthetic curated data: hand-shaped, deterministic, privacy-safe, and useful for testing the LoRA pipeline. It is not based on private user photos or commercial AI output.
+Published synthetic curated dataset:
+```text
+https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
+```
 ## Target Dataset
 Final target before fine-tuning:
 Space VLM validation traces under `data/traces/space-vlm/` are failure evidence because they include `vision-fallback-to-mock`. Do not mix them into curated training data or describe them as successful real VLM outputs.
+## Modal LoRA Training Scaffold
+The repository includes a Modal training scaffold for the future Well-Tuned path. It is not run by default and does not affect the Gradio Space runtime.
+Install the local Modal CLI dependency separately:
+```bash
+pip install -r requirements-training.txt
+```
+Validate the local JSONL shape without Modal auth or GPU usage:
+```bash
+.venv/bin/python -B scripts/finetune_lora.py \
+  --dry-run \
+  --dataset data/train/objectverse_sft_curated.jsonl \
+  --run-name objectverse-diary-qwen15b-curated-test
+```
+Intended training command after explicit confirmation:
+```bash
+modal run scripts/finetune_lora.py \
+  --dataset data/train/objectverse_sft_curated.jsonl \
+  --run-name objectverse-diary-qwen15b-curated-test \
+  --max-steps 20
+```
+Current Modal status: the curated test job completed successfully and produced the published LoRA adapter at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`.
+Default training scaffold settings:
+- base model: `Qwen/Qwen2.5-1.5B-Instruct`
+- LoRA adapter target: persona and diary JSON output
+- GPU: Modal `A10G`
+- output: Modal Volume artifacts, not committed files
+The current `objectverse_sft_preview.jsonl` file is mock-generated and should only be used to validate the training pipeline. It is not final Well-Tuned evidence. Do not store Modal credit codes, tokens, Hugging Face tokens, or private datasets in the repo.
+The published `objectverse_sft_curated.jsonl` dataset is synthetic curated training-test data. It is suitable for hackathon training evidence, but it should still be described honestly as a small synthetic set rather than real user trace data.
 ## Curation Checklist
 - Persona stays consistent with the object.

docs/DEVELOPMENT_STATUS.md CHANGED Viewed

@@ -32,6 +32,10 @@ Last updated: 2026-06-08
   - demo video script
   - social post draft
   - stable submission guide
 - Local tests and initial acceptance currently pass.
 ## Not Completed
@@ -40,9 +44,8 @@ Last updated: 2026-06-08
 - Passing real VLM demo trace capture. Failed Space VLM traces are kept as fallback evidence and do not replace mock sample traces.
 - Real GGUF model selection, download/configuration outside Git, and `TEXT_MODEL_PATH` smoke test.
 - Final text model parameter count documentation.
-- Real model traces and curated object-persona dataset.
-- LoRA training, adapter/model export, GGUF conversion, and Hugging Face model publishing.
-- Hugging Face dataset publishing.
 - GitHub sync / final public repository confirmation.
 - Published Field Notes URL, recorded demo video URL, social post URL, and final public submission.

   - demo video script
   - social post draft
   - stable submission guide
+- Well-Tuned evidence:
+  - 50-row synthetic curated SFT dataset published at https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
+  - Modal Qwen 1.5B LoRA test run completed with 20 steps
+  - LoRA adapter published at https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
 - Local tests and initial acceptance currently pass.
 ## Not Completed
 - Passing real VLM demo trace capture. Failed Space VLM traces are kept as fallback evidence and do not replace mock sample traces.
 - Real GGUF model selection, download/configuration outside Git, and `TEXT_MODEL_PATH` smoke test.
 - Final text model parameter count documentation.
+- Real model traces from non-mock runtime.
+- GGUF conversion and runtime wiring for the published LoRA adapter.
 - GitHub sync / final public repository confirmation.
 - Published Field Notes URL, recorded demo video URL, social post URL, and final public submission.

docs/MODEL_CARD.md CHANGED Viewed

@@ -2,9 +2,9 @@
 ## Status
-Stable submission baseline. No text model has been fine-tuned, converted, or published yet.
-The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`.
 Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validation reached the Space, but all three public object checks fell back to mock vision. See `docs/SPACE_VLM_REPORT.md` and `docs/FAILURES.md`.
@@ -19,7 +19,7 @@ Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validat
 | Component | Candidate | Notes |
 | --- | --- | --- |
 | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Wired as optional backend; hosted validation currently falls back to mock. |
-| Text | deterministic mock text; optional externally configured GGUF later | Final base model still pending. |
 | Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; real-model smoke test still pending. |
 | UI | Gradio Blocks | Required by the hackathon and project rules. |
@@ -33,10 +33,12 @@ Record final numbers here before submission:
 | --- | --- | ---: | --- |
 | Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
 | Text base | Stable baseline mock text | 0 | no model parameters |
-| Future text base | Externally configured GGUF, final model TBD | TBD | yes, when enabled |
-| Future LoRA adapter | TBD | TBD | yes, when enabled |
 | Stable baseline total | Mock text + optional wired vision not active by default | 0 active model parameters by default | <= 32B |
 ## Intended Inputs And Outputs
 Inputs:
@@ -58,7 +60,28 @@ Outputs:
 Dataset planning lives in `docs/DATASET.md`.
-Current preview data is deterministic and mock-generated. It should only be used for schema validation and workflow planning until real candidate samples are generated and curated.
 ## Safety And Privacy

 ## Status
+Stable submission baseline plus one published text LoRA test adapter. The public Gradio Space still defaults to deterministic mock text; the adapter is training evidence and has not been converted to GGUF or wired into the live runtime.
+The app defaults to deterministic mock backends. MiniCPM-V 2.6 vision is wired as an optional runtime backend for GPU environments. Text generation has optional llama.cpp wiring for an externally configured GGUF model via `TEXT_MODEL_PATH`. A Modal LoRA test run completed for the planned text model path and the adapter is published at `https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora`.
 Hosted MiniCPM-V validation is not passing yet. The June 8, 2026 ZeroGPU validation reached the Space, but all three public object checks fell back to mock vision. See `docs/SPACE_VLM_REPORT.md` and `docs/FAILURES.md`.
 | Component | Candidate | Notes |
 | --- | --- | --- |
 | Vision | `openbmb/MiniCPM-V-2_6` or mock fallback | Wired as optional backend; hosted validation currently falls back to mock. |
+| Text | deterministic mock text; published `Qwen/Qwen2.5-1.5B-Instruct` LoRA test adapter | Adapter published; not converted to GGUF or wired into Space runtime. |
 | Runtime | optional GGUF through llama.cpp / llama-cpp-python | Wired with mock fallback; real-model smoke test still pending. |
 | UI | Gradio Blocks | Required by the hackathon and project rules. |
 | --- | --- | ---: | --- |
 | Vision | MiniCPM-V 2.6 optional path | ~8B | yes, when enabled |
 | Text base | Stable baseline mock text | 0 | no model parameters |
+| Optional text base | `Qwen/Qwen2.5-1.5B-Instruct` | ~1.5B | yes, when enabled |
+| Published LoRA adapter | `qqyule/objectverse-diary-qwen15b-lora` | small adapter over base model | yes, when enabled |
 | Stable baseline total | Mock text + optional wired vision not active by default | 0 active model parameters by default | <= 32B |
+If the optional MiniCPM-V 2.6 vision path and planned Qwen 1.5B text base are both enabled, the expected total remains about 9.5B plus a small LoRA adapter, safely under the 32B project budget.
 ## Intended Inputs And Outputs
 Inputs:
 Dataset planning lives in `docs/DATASET.md`.
+Current preview data is deterministic and mock-generated. It should only be used for schema validation, dry-run validation, and workflow planning until real candidate samples are generated and curated.
+The Modal training scaffold defaults to `Qwen/Qwen2.5-1.5B-Instruct` and saves adapter artifacts to a Modal Volume. `data/train/objectverse_sft_curated.jsonl` contains 50 synthetic curated rows for pipeline testing and is published at `https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated`.
+Published adapter:
+```text
+https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
+```
+Training run summary:
+- Platform: Modal
+- Run name: `objectverse-diary-qwen15b-curated-test`
+- Base model: `Qwen/Qwen2.5-1.5B-Instruct`
+- Dataset: 50 synthetic curated rows
+- Steps: 20
+- Max sequence length: 1024
+- Learning rate: 0.0002
+- LoRA rank / alpha / dropout: 16 / 32 / 0.05
+- Train loss: 1.6697
+- GGUF conversion: not completed
 ## Safety And Privacy

docs/SUBMISSION_GUIDE.md CHANGED Viewed

@@ -6,8 +6,8 @@
 - [x] GitHub Repository URL: local `origin` configured as `https://github.com/qqyule/Objectverse-Diary.git`; push still requires explicit confirmation
 - [x] Demo Video Script: `docs/DEMO_VIDEO_SCRIPT.md`
 - [x] Social Media Post Draft: `docs/SOCIAL_POST.md`
-- [ ] Fine-tuned Model URL: not included in stable baseline; LoRA/model publishing remains future work
-- [ ] Dataset URL: not included in stable baseline; local mock preview exists
 - [x] Trace Dataset: local public mock JSONL export at `data/traces/samples/objectverse_public_mock_traces.jsonl`
 - [x] Field Notes Draft: `docs/FIELD_NOTES.md`
 - [x] Short project description: available in README
@@ -31,13 +31,15 @@
 - MiniCPM-V 2.6 backend wiring with fallback markers.
 - Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
 - Hosted Space VLM validation script, report, JSON summary, and trace evidence export.
 - Field Notes draft, demo video script, and social post draft for the stable submission package.
 ## Not Completed Yet
 - Hosted Space MiniCPM-V validation for mug, keyboard, and shoe; ZeroGPU validation reached the app but currently falls back to mock vision.
 - Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count.
-- Real model traces, curated dataset, LoRA training, model/dataset publishing.
 - Field Notes publication URL, recorded demo video URL, social post URL, and final public push/submission.
 ## Final Checks
@@ -48,8 +50,8 @@
 - [x] README includes stable-baseline parameter budget and links to the model card.
 - [ ] No commercial cloud AI APIs are used.
 - [x] Mock-safe local demo baseline is reproducible from committed sample traces.
-- [ ] Fine-tuned model is linked.
-- [ ] Dataset is linked.
 - [ ] Traces are linked.
 - [ ] Field Notes are linked.
 - [ ] UI remains English-first and Chinese-second.

 - [x] GitHub Repository URL: local `origin` configured as `https://github.com/qqyule/Objectverse-Diary.git`; push still requires explicit confirmation
 - [x] Demo Video Script: `docs/DEMO_VIDEO_SCRIPT.md`
 - [x] Social Media Post Draft: `docs/SOCIAL_POST.md`
+- [x] Fine-tuned Model URL: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
+- [x] Dataset URL: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
 - [x] Trace Dataset: local public mock JSONL export at `data/traces/samples/objectverse_public_mock_traces.jsonl`
 - [x] Field Notes Draft: `docs/FIELD_NOTES.md`
 - [x] Short project description: available in README
 - MiniCPM-V 2.6 backend wiring with fallback markers.
 - Optional llama.cpp text runtime wiring through `TEXT_MODEL_PATH`.
 - Hosted Space VLM validation script, report, JSON summary, and trace evidence export.
+- Synthetic curated SFT dataset published to Hugging Face Datasets.
+- Modal Qwen 1.5B LoRA test run completed and adapter published to Hugging Face Models.
 - Field Notes draft, demo video script, and social post draft for the stable submission package.
 ## Not Completed Yet
 - Hosted Space MiniCPM-V validation for mug, keyboard, and shoe; ZeroGPU validation reached the app but currently falls back to mock vision.
 - Real GGUF `TEXT_MODEL_PATH` smoke test and final text model parameter count.
+- Real model traces, GGUF conversion, and app runtime wiring for the published adapter.
 - Field Notes publication URL, recorded demo video URL, social post URL, and final public push/submission.
 ## Final Checks
 - [x] README includes stable-baseline parameter budget and links to the model card.
 - [ ] No commercial cloud AI APIs are used.
 - [x] Mock-safe local demo baseline is reproducible from committed sample traces.
+- [x] Fine-tuned model is linked.
+- [x] Dataset is linked.
 - [ ] Traces are linked.
 - [ ] Field Notes are linked.
 - [ ] UI remains English-first and Chinese-second.

requirements-training.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ modal>=1,<2
2	+ huggingface_hub>=0.34,<1

scripts/README.md CHANGED Viewed

@@ -7,15 +7,70 @@ Implemented initial scripts:
 - `check_initial_stage.py`: verifies required files, runtime defaults, sample traces, pipeline, and Gradio build.
 - `generate_sample_traces.py`: creates six stable public mock traces under `data/traces/samples/`.
 - `generate_dataset.py`: creates deterministic SFT preview JSONL for schema and curation planning.
 - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
 - `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
 Expected files during implementation:
-- `finetune_lora.py`
 - `convert_to_gguf.sh`
 - `run_llama_cpp.sh`
 Space VLM validation:
 ```bash
@@ -30,4 +85,4 @@ External Space changes are explicit:
 .venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
 ```
-Current status: mock trace generation, trace JSONL export, SFT preview generation, optional MiniCPM-V wiring, optional llama.cpp wiring, and hosted Space VLM validation tooling are implemented. Real model validation on Space, fine-tuning, and GGUF conversion are not completed yet.

 - `check_initial_stage.py`: verifies required files, runtime defaults, sample traces, pipeline, and Gradio build.
 - `generate_sample_traces.py`: creates six stable public mock traces under `data/traces/samples/`.
 - `generate_dataset.py`: creates deterministic SFT preview JSONL for schema and curation planning.
+- `prepare_curated_dataset.py`: creates 50 synthetic curated SFT rows for Modal LoRA pipeline testing.
 - `export_traces.py`: exports validated public sample traces to JSONL for dataset-style publishing.
 - `check_space_vlm.py`: validates MiniCPM-V object understanding on the hosted Hugging Face Space with three temporary public test images.
+- `finetune_lora.py`: validates SFT JSONL locally and defines the Modal LoRA training scaffold for the future Well-Tuned path.
+- `publish_hf_adapter.py`: uploads a downloaded LoRA adapter folder to Hugging Face Hub.
 Expected files during implementation:
 - `convert_to_gguf.sh`
 - `run_llama_cpp.sh`
+Modal LoRA dry-run:
+```bash
+.venv/bin/python -B scripts/finetune_lora.py \
+  --dry-run \
+  --dataset data/train/objectverse_sft_curated.jsonl \
+  --run-name objectverse-diary-qwen15b-curated-test
+```
+Modal LoRA training after explicit confirmation:
+```bash
+modal run scripts/finetune_lora.py \
+  --dataset data/train/objectverse_sft_curated.jsonl \
+  --run-name objectverse-diary-qwen15b-curated-test \
+  --max-steps 20
+```
+Training dependencies are intentionally separate from the Space runtime:
+```bash
+pip install -r requirements-training.txt
+```
+Do not commit Modal credit codes, tokens, Hugging Face tokens, generated adapters, GGUF files, or private datasets.
+If `modal run` reports `Token missing`, authenticate outside the repository first:
+```bash
+modal token new
+```
+or configure `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` through your shell/secret manager.
+After a successful Modal run, download the adapter from the output volume into ignored local exports. Modal's directory download behavior can vary; downloading individual adapter files into a directory is the safest path.
+```bash
+mkdir -p exports/objectverse-diary-qwen15b-curated-test-adapter-dir
+for file in vocab.json tokenizer_config.json tokenizer.json special_tokens_map.json merges.txt chat_template.jinja added_tokens.json adapter_model.safetensors adapter_config.json README.md; do
+  modal volume get objectverse-diary-lora-output \
+    "objectverse-diary-qwen15b-curated-test/adapter/$file" \
+    "exports/objectverse-diary-qwen15b-curated-test-adapter-dir/$file"
+done
+```
+Then upload the adapter to Hugging Face Hub:
+```bash
+.venv/bin/python -B scripts/publish_hf_adapter.py \
+  --adapter-dir exports/objectverse-diary-qwen15b-curated-test-adapter-dir \
+  --repo-id qqyule/objectverse-diary-qwen15b-lora
+```
 Space VLM validation:
 ```bash
 .venv/bin/python -B scripts/check_space_vlm.py --configure-space --rollback-to-mock
 ```
+Current status: mock trace generation, trace JSONL export, SFT preview generation, synthetic curated dataset publishing, optional MiniCPM-V wiring, optional llama.cpp wiring, hosted Space VLM validation tooling, Modal LoRA training scaffolding, one Modal LoRA test run, and HF adapter publishing are implemented. Real model validation on Space, GGUF conversion, and app runtime wiring for the adapter are not completed yet.

scripts/finetune_lora.py ADDED Viewed

	@@ -0,0 +1,426 @@

+"""Modal LoRA fine-tuning scaffold for Objectverse Diary text generation."""
+from __future__ import annotations
+import argparse
+import json
+import sys
+from collections.abc import Callable, Mapping, Sequence
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+from typing import Any
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+try:
+    import modal
+except ImportError:  # Modal is optional for local dry-run and tests.
+    modal = None  # type: ignore[assignment]
+APP_NAME = "objectverse-diary-lora"
+DEFAULT_DATASET_PATH = Path("data/train/objectverse_sft_preview.jsonl")
+DEFAULT_RUN_NAME = "objectverse-diary-qwen15b-preview"
+DEFAULT_BASE_MODEL = "Qwen/Qwen2.5-1.5B-Instruct"
+HOURS = 60 * 60
+CACHE_DIR = "/cache"
+OUTPUT_DIR = "/outputs"
+LORA_TARGET_MODULES = (
+    "q_proj",
+    "k_proj",
+    "v_proj",
+    "o_proj",
+    "gate_proj",
+    "up_proj",
+    "down_proj",
+)
+@dataclass(frozen=True)
+class TrainingConfig:
+    """Serializable training settings shared by dry-run and Modal execution."""
+    run_name: str = DEFAULT_RUN_NAME
+    base_model: str = DEFAULT_BASE_MODEL
+    max_steps: int = 80
+    learning_rate: float = 2e-4
+    max_seq_length: int = 1024
+    lora_r: int = 16
+    lora_alpha: int = 32
+    lora_dropout: float = 0.05
+    target_modules: tuple[str, ...] = field(default_factory=lambda: LORA_TARGET_MODULES)
+    def as_remote_dict(self) -> dict[str, object]:
+        payload = asdict(self)
+        payload["target_modules"] = list(self.target_modules)
+        return payload
+def load_sft_records(path: Path) -> list[dict[str, object]]:
+    """Load and validate chat-style SFT JSONL records."""
+    if not path.exists():
+        raise FileNotFoundError(f"Dataset path does not exist: {path}")
+    records: list[dict[str, object]] = []
+    for line_number, line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
+        if not line.strip():
+            continue
+        try:
+            raw = json.loads(line)
+        except json.JSONDecodeError as exc:
+            raise ValueError(f"Invalid JSON on line {line_number}: {exc.msg}") from exc
+        if not isinstance(raw, dict):
+            raise ValueError(f"Line {line_number} must be a JSON object.")
+        records.append(_validate_sft_record(raw, line_number))
+    if not records:
+        raise ValueError(f"Dataset has no records: {path}")
+    return records
+def record_to_training_text(record: Mapping[str, object]) -> str:
+    """Convert one validated chat record into a simple fallback training string."""
+    messages = _validate_messages(record.get("messages"), line_number=None)
+    blocks = []
+    for message in messages:
+        role = str(message["role"]).strip().lower()
+        content = str(message["content"]).strip()
+        blocks.append(f"{role}:\n{content}")
+    return "\n\n".join(blocks).strip()
+def run_training_entrypoint(
+    *,
+    dataset: Path,
+    config: TrainingConfig,
+    dry_run: bool,
+    allow_remote: bool,
+    remote_runner: Callable[[list[dict[str, object]], TrainingConfig], dict[str, object]] | None = None,
+) -> dict[str, object]:
+    """Validate inputs and either return a dry-run summary or launch Modal training."""
+    records = load_sft_records(dataset)
+    if dry_run:
+        return _dry_run_summary(dataset, records, config)
+    if not allow_remote:
+        raise RuntimeError("Use `modal run scripts/finetune_lora.py ...` for real training.")
+    runner = remote_runner or _run_modal_training
+    return runner(records, config)
+def _validate_sft_record(raw: dict[str, object], line_number: int) -> dict[str, object]:
+    _validate_messages(raw.get("messages"), line_number=line_number)
+    return raw
+def _validate_messages(raw_messages: object, line_number: int | None) -> list[dict[str, str]]:
+    location = f"line {line_number}" if line_number is not None else "record"
+    if not isinstance(raw_messages, list) or not raw_messages:
+        raise ValueError(f"{location} must include a non-empty messages list.")
+    messages: list[dict[str, str]] = []
+    for index, raw_message in enumerate(raw_messages, start=1):
+        if not isinstance(raw_message, dict):
+            raise ValueError(f"{location} message {index} must be an object.")
+        role = raw_message.get("role")
+        content = raw_message.get("content")
+        if not isinstance(role, str) or not role.strip():
+            raise ValueError(f"{location} message {index} must include a role.")
+        if not isinstance(content, str) or not content.strip():
+            raise ValueError(f"{location} message {index} must include content.")
+        messages.append({"role": role.strip(), "content": content.strip()})
+    return messages
+def _dry_run_summary(
+    dataset: Path,
+    records: Sequence[Mapping[str, object]],
+    config: TrainingConfig,
+) -> dict[str, object]:
+    first_text = record_to_training_text(records[0])
+    return {
+        "mode": "dry-run",
+        "dataset": str(dataset),
+        "record_count": len(records),
+        "base_model": config.base_model,
+        "run_name": config.run_name,
+        "max_steps": config.max_steps,
+        "learning_rate": config.learning_rate,
+        "max_seq_length": config.max_seq_length,
+        "lora": {
+            "r": config.lora_r,
+            "alpha": config.lora_alpha,
+            "dropout": config.lora_dropout,
+            "target_modules": list(config.target_modules),
+        },
+        "first_training_text_chars": len(first_text),
+        "will_launch_modal": False,
+    }
+def _run_modal_training(
+    records: list[dict[str, object]],
+    config: TrainingConfig,
+) -> dict[str, object]:
+    if modal is None:
+        raise RuntimeError("Modal is not installed. Install `requirements-training.txt` first.")
+    return train_lora_remote.remote(records, config.as_remote_dict())
+def _train_lora_impl(
+    records: list[dict[str, object]],
+    config_payload: Mapping[str, object],
+) -> dict[str, object]:
+    from datasets import Dataset
+    import torch
+    from peft import LoraConfig, TaskType, get_peft_model
+    from transformers import (
+        AutoModelForCausalLM,
+        AutoTokenizer,
+        DataCollatorForLanguageModeling,
+        Trainer,
+        TrainingArguments,
+    )
+    config = _training_config_from_payload(config_payload)
+    output_path = Path(OUTPUT_DIR) / config.run_name
+    adapter_path = output_path / "adapter"
+    output_path.mkdir(parents=True, exist_ok=True)
+    tokenizer = AutoTokenizer.from_pretrained(config.base_model, trust_remote_code=True)
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+    model_kwargs: dict[str, object] = {"trust_remote_code": True}
+    if torch.cuda.is_available():
+        model_kwargs["torch_dtype"] = torch.float16
+    model = AutoModelForCausalLM.from_pretrained(config.base_model, **model_kwargs)
+    if hasattr(model, "config"):
+        model.config.use_cache = False
+    peft_config = LoraConfig(
+        r=config.lora_r,
+        lora_alpha=config.lora_alpha,
+        lora_dropout=config.lora_dropout,
+        target_modules=list(config.target_modules),
+        bias="none",
+        task_type=TaskType.CAUSAL_LM,
+    )
+    model = get_peft_model(model, peft_config)
+    model.print_trainable_parameters()
+    dataset = Dataset.from_list(
+        [{"text": _format_training_text(record, tokenizer)} for record in records]
+    )
+    def tokenize_batch(batch: Mapping[str, list[str]]) -> dict[str, object]:
+        return tokenizer(
+            batch["text"],
+            truncation=True,
+            max_length=config.max_seq_length,
+            padding=False,
+        )
+    tokenized = dataset.map(
+        tokenize_batch,
+        batched=True,
+        remove_columns=["text"],
+        desc="Tokenize Objectverse Diary SFT records",
+    )
+    training_args = TrainingArguments(
+        output_dir=str(output_path / "trainer"),
+        max_steps=config.max_steps,
+        per_device_train_batch_size=1,
+        gradient_accumulation_steps=4,
+        learning_rate=config.learning_rate,
+        logging_steps=5,
+        save_strategy="no",
+        fp16=torch.cuda.is_available(),
+        report_to=[],
+        optim="adamw_torch",
+    )
+    trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=tokenized,
+        data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
+    )
+    train_result = trainer.train()
+    model.save_pretrained(adapter_path)
+    tokenizer.save_pretrained(adapter_path)
+    metrics = dict(train_result.metrics)
+    metrics["train_records"] = len(records)
+    metrics["base_model"] = config.base_model
+    (output_path / "metrics.json").write_text(
+        json.dumps(metrics, indent=2, sort_keys=True),
+        encoding="utf-8",
+    )
+    (output_path / "training_config.json").write_text(
+        json.dumps(config.as_remote_dict(), indent=2, sort_keys=True),
+        encoding="utf-8",
+    )
+    if _OUTPUT_VOLUME is not None:
+        _OUTPUT_VOLUME.commit()
+    return {
+        "mode": "modal-training",
+        "run_name": config.run_name,
+        "record_count": len(records),
+        "adapter_path": str(adapter_path),
+        "metrics_path": str(output_path / "metrics.json"),
+    }
+def _training_config_from_payload(payload: Mapping[str, object]) -> TrainingConfig:
+    target_modules = payload.get("target_modules", LORA_TARGET_MODULES)
+    if not isinstance(target_modules, Sequence) or isinstance(target_modules, (str, bytes)):
+        raise ValueError("target_modules must be a sequence of strings.")
+    return TrainingConfig(
+        run_name=str(payload.get("run_name", DEFAULT_RUN_NAME)),
+        base_model=str(payload.get("base_model", DEFAULT_BASE_MODEL)),
+        max_steps=int(payload.get("max_steps", 80)),
+        learning_rate=float(payload.get("learning_rate", 2e-4)),
+        max_seq_length=int(payload.get("max_seq_length", 1024)),
+        lora_r=int(payload.get("lora_r", 16)),
+        lora_alpha=int(payload.get("lora_alpha", 32)),
+        lora_dropout=float(payload.get("lora_dropout", 0.05)),
+        target_modules=tuple(str(module) for module in target_modules),
+    )
+def _format_training_text(record: Mapping[str, object], tokenizer: Any) -> str:
+    messages = _validate_messages(record.get("messages"), line_number=None)
+    if hasattr(tokenizer, "apply_chat_template"):
+        try:
+            return tokenizer.apply_chat_template(
+                messages,
+                tokenize=False,
+                add_generation_prompt=False,
+            )
+        except Exception:
+            pass
+    return record_to_training_text(record)
+def _print_json(payload: Mapping[str, object]) -> None:
+    print(json.dumps(payload, indent=2, sort_keys=True), flush=True)
+def _build_config_from_args(args: argparse.Namespace) -> TrainingConfig:
+    return TrainingConfig(
+        run_name=args.run_name,
+        base_model=args.base_model,
+        max_steps=args.max_steps,
+        learning_rate=args.learning_rate,
+        max_seq_length=args.max_seq_length,
+    )
+def _parse_args(argv: Sequence[str] | None = None) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--dataset", type=Path, default=DEFAULT_DATASET_PATH)
+    parser.add_argument("--run-name", default=DEFAULT_RUN_NAME)
+    parser.add_argument("--base-model", default=DEFAULT_BASE_MODEL)
+    parser.add_argument("--max-steps", type=int, default=80)
+    parser.add_argument("--learning-rate", type=float, default=2e-4)
+    parser.add_argument("--max-seq-length", type=int, default=1024)
+    parser.add_argument("--dry-run", action="store_true")
+    return parser.parse_args(argv)
+def _main(argv: Sequence[str] | None = None, *, allow_remote: bool = False) -> dict[str, object]:
+    args = _parse_args(argv)
+    result = run_training_entrypoint(
+        dataset=args.dataset,
+        config=_build_config_from_args(args),
+        dry_run=args.dry_run,
+        allow_remote=allow_remote,
+    )
+    _print_json(result)
+    return result
+if modal is not None:
+    _IMAGE = (
+        modal.Image.debian_slim(python_version="3.10")
+        .uv_pip_install(
+            "torch",
+            "transformers>=4.40,<5",
+            "datasets",
+            "accelerate",
+            "peft",
+            "sentencepiece",
+        )
+        .env({"HF_HOME": CACHE_DIR})
+    )
+    _CACHE_VOLUME = modal.Volume.from_name("objectverse-diary-hf-cache", create_if_missing=True)
+    _OUTPUT_VOLUME = modal.Volume.from_name(
+        "objectverse-diary-lora-output",
+        create_if_missing=True,
+    )
+    app = modal.App(APP_NAME)
+    @app.function(
+        image=_IMAGE,
+        gpu="A10G",
+        timeout=2 * HOURS,
+        volumes={CACHE_DIR: _CACHE_VOLUME, OUTPUT_DIR: _OUTPUT_VOLUME},
+    )
+    def train_lora_remote(
+        records: list[dict[str, object]],
+        config_payload: dict[str, object],
+    ) -> dict[str, object]:
+        return _train_lora_impl(records, config_payload)
+    @app.local_entrypoint()
+    def modal_entrypoint(
+        dataset: str = str(DEFAULT_DATASET_PATH),
+        run_name: str = DEFAULT_RUN_NAME,
+        base_model: str = DEFAULT_BASE_MODEL,
+        max_steps: int = 80,
+        learning_rate: float = 2e-4,
+        max_seq_length: int = 1024,
+        dry_run: bool = False,
+    ) -> None:
+        result = run_training_entrypoint(
+            dataset=Path(dataset),
+            config=TrainingConfig(
+                run_name=run_name,
+                base_model=base_model,
+                max_steps=max_steps,
+                learning_rate=learning_rate,
+                max_seq_length=max_seq_length,
+            ),
+            dry_run=dry_run,
+            allow_remote=True,
+        )
+        _print_json(result)
+else:
+    _OUTPUT_VOLUME = None
+    app = None
+    def train_lora_remote(
+        records: list[dict[str, object]],
+        config_payload: dict[str, object],
+    ) -> dict[str, object]:
+        raise RuntimeError("Modal is not installed. Install `requirements-training.txt` first.")
+if __name__ == "__main__":
+    try:
+        _main(allow_remote=False)
+    except Exception as exc:
+        raise SystemExit(str(exc)) from exc

scripts/prepare_curated_dataset.py ADDED Viewed

	@@ -0,0 +1,275 @@

+"""Prepare synthetic curated SFT data for Objectverse Diary LoRA tests."""
+from __future__ import annotations
+import argparse
+import json
+import sys
+from collections.abc import Mapping, Sequence
+from pathlib import Path
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+if str(PROJECT_ROOT) not in sys.path:
+    sys.path.insert(0, str(PROJECT_ROOT))
+from src.models.schema import DiaryEntry, ObjectInfo, ObjectUnderstanding, Persona, PersonaEnvelope
+DEFAULT_OUTPUT_PATH = Path("data/train/objectverse_sft_curated.jsonl")
+DEFAULT_COUNT = 50
+SOURCE = "objectverse-diary-synthetic-curated-v1"
+SYSTEM_PROMPT = (
+    "You are Objectverse Diary, an English-first small-model assistant. "
+    "Given structured object understanding and a requested personality mode, "
+    "return strict JSON with keys persona and diary. Keep the voice strange, "
+    "specific to the object, and suitable for a shareable object archive."
+)
+MODES = ("Cynical", "Dramatic", "Lonely", "Philosopher", "Romantic")
+OBJECTS = [
+    {
+        "name": "coffee mug",
+        "features": ["white ceramic", "coffee ring", "tiny handle shadow"],
+        "context": "developer desk",
+        "memory": "listened to morning promises dissolve into cold coffee",
+    },
+    {
+        "name": "mechanical keyboard",
+        "features": ["black keycaps", "dust in the rows", "one glossy spacebar"],
+        "context": "office corner",
+        "memory": "translated panic into clicking long after midnight",
+    },
+    {
+        "name": "running shoe",
+        "features": ["creased mesh", "mud on the sole", "loose lace"],
+        "context": "bedroom doorway",
+        "memory": "carried brave intentions to the end of the block and back",
+    },
+    {
+        "name": "desk lamp",
+        "features": ["brushed metal neck", "warm bulb", "tilted shade"],
+        "context": "late-night desk",
+        "memory": "held a circle of light over notes nobody finished",
+    },
+    {
+        "name": "water bottle",
+        "features": ["clear plastic wall", "scratched cap", "half-full body"],
+        "context": "kitchen counter",
+        "memory": "survived every resolution to drink more water",
+    },
+    {
+        "name": "notebook",
+        "features": ["bent corner", "blue ink ghosts", "elastic strap"],
+        "context": "bag pocket",
+        "memory": "guarded three plans, two lists, and one sentence crossed out hard",
+    },
+    {
+        "name": "umbrella",
+        "features": ["folded black canopy", "wet seam", "curved handle"],
+        "context": "entryway hook",
+        "memory": "became useful only when the sky was already theatrical",
+    },
+    {
+        "name": "house key",
+        "features": ["brass teeth", "scratched bow", "small metal ring"],
+        "context": "coat pocket",
+        "memory": "opened the same door for every version of its human",
+    },
+    {
+        "name": "charging cable",
+        "features": ["frayed sleeve", "white plastic tip", "gentle knot"],
+        "context": "bedside floor",
+        "memory": "fed glowing rectangles while pretending not to resent them",
+    },
+    {
+        "name": "teaspoon",
+        "features": ["silver bowl", "thin handle", "tea stain near the neck"],
+        "context": "sink edge",
+        "memory": "stirred sweetness into cups and suspicion into silence",
+    },
+]
+MODE_PROFILES = {
+    "Cynical": {
+        "mood": "tired but sharply observant",
+        "fear": "being replaced by something newer and less honest",
+        "tag": ["dry witness", "domestic sarcasm", "small rebellion"],
+        "voice": "withholding applause",
+    },
+    "Dramatic": {
+        "mood": "grandly wounded",
+        "fear": "being forgotten before the curtain falls",
+        "tag": ["tragic prop", "household opera", "minor thunder"],
+        "voice": "making every scratch sound like fate",
+    },
+    "Lonely": {
+        "mood": "quietly abandoned",
+        "fear": "becoming background forever",
+        "tag": ["soft echo", "forgotten corner", "patient dust"],
+        "voice": "speaking as if the room almost listened",
+    },
+    "Philosopher": {
+        "mood": "curious and needlessly profound",
+        "fear": "discovering usefulness is not the same as meaning",
+        "tag": ["tiny ontology", "useful doubt", "object soul"],
+        "voice": "turning chores into metaphysics",
+    },
+    "Romantic": {
+        "mood": "hopelessly sentimental",
+        "fear": "loving a human who mistakes devotion for convenience",
+        "tag": ["tender witness", "secret devotion", "warm ache"],
+        "voice": "saving every ordinary touch as evidence",
+    },
+}
+def build_curated_records(count: int = DEFAULT_COUNT) -> list[dict[str, object]]:
+    if count < 1:
+        raise ValueError("count must be at least 1")
+    records: list[dict[str, object]] = []
+    for index in range(count):
+        obj = OBJECTS[index % len(OBJECTS)]
+        mode = MODES[(index // len(OBJECTS)) % len(MODES)]
+        record_id = f"curated-synthetic-{index + 1:04d}"
+        understanding = _build_object_understanding(obj)
+        persona = _build_persona(obj, mode)
+        diary = _build_diary(obj, mode, persona.persona, index)
+        assistant_payload = {
+            "persona": persona.persona.model_dump(mode="json"),
+            "diary": diary.model_dump(mode="json"),
+        }
+        records.append(
+            {
+                "id": record_id,
+                "source": SOURCE,
+                "split": "train",
+                "mode": mode,
+                "object_description": _object_description(obj),
+                "object_understanding": understanding.model_dump(mode="json"),
+                "curation_notes": (
+                    "Synthetic curated row: no private photo, no personal identifier, "
+                    "English-first output with Chinese helper text."
+                ),
+                "messages": [
+                    {"role": "system", "content": SYSTEM_PROMPT},
+                    {
+                        "role": "user",
+                        "content": _user_prompt(understanding.model_dump(mode="json"), mode),
+                    },
+                    {
+                        "role": "assistant",
+                        "content": json.dumps(assistant_payload, ensure_ascii=False),
+                    },
+                ],
+            }
+        )
+    return records
+def write_jsonl(records: Sequence[Mapping[str, object]], output_path: Path) -> Path:
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    lines = [json.dumps(record, ensure_ascii=False, sort_keys=True) for record in records]
+    output_path.write_text("\n".join(lines) + "\n", encoding="utf-8")
+    return output_path
+def prepare_curated_dataset(output_path: Path = DEFAULT_OUTPUT_PATH, count: int = DEFAULT_COUNT) -> Path:
+    return write_jsonl(build_curated_records(count), output_path)
+def _build_object_understanding(obj: Mapping[str, object]) -> ObjectUnderstanding:
+    return ObjectUnderstanding(
+        object=ObjectInfo(
+            name=str(obj["name"]),
+            visible_features=[str(feature) for feature in obj["features"]],
+            likely_context=str(obj["context"]),
+            confidence=0.9,
+        )
+    )
+def _build_persona(obj: Mapping[str, object], mode: str) -> PersonaEnvelope:
+    profile = MODE_PROFILES[mode]
+    object_name = str(obj["name"])
+    character_name = _character_name(object_name, mode)
+    return PersonaEnvelope(
+        persona=Persona(
+            object_name=object_name,
+            character_name=character_name,
+            mood=str(profile["mood"]),
+            secret_fear=str(profile["fear"]),
+            core_memory=str(obj["memory"]),
+            complaint=f"I am not merely a {object_name}; I am an archive of what humans do when they think things cannot testify.",
+            tags=[str(tag) for tag in profile["tag"]],
+        )
+    )
+def _build_diary(obj: Mapping[str, object], mode: str, persona: Persona, index: int) -> DiaryEntry:
+    profile = MODE_PROFILES[mode]
+    object_name = str(obj["name"])
+    features = ", ".join(str(feature) for feature in obj["features"][:2])
+    day_number = 300 + index + len(object_name)
+    english = (
+        f"Today I waited in the {obj['context']} wearing my {features} like official records. "
+        f"The humans moved around me with the confidence of temporary weather. "
+        f"I remembered how I {obj['memory']}, and I answered in my own way: {profile['voice']}. "
+        f"My mood is {persona.mood}, but I am still here, collecting proof that ordinary things notice everything."
+    )
+    chinese = (
+        f"今天我待在 {obj['context']}，带着 {features}，像一份安静的档案。"
+        f"人类从我身边经过，好像自己不是短暂天气。"
+        f"我记得自己曾经 {obj['memory']}，于是用自己的方式回应：{profile['voice']}。"
+        f"我的情绪是 {persona.mood}，但我仍在这里，记录普通物品也会注意到的一切。"
+    )
+    return DiaryEntry(
+        title=f"Secret Diary - Day {day_number}",
+        english=english,
+        chinese=chinese,
+    )
+def _character_name(object_name: str, mode: str) -> str:
+    compact = "".join(part.capitalize() for part in object_name.split()[:2])
+    suffix = {
+        "Cynical": "Ash",
+        "Dramatic": "of the Minor Stage",
+        "Lonely": "Afterlight",
+        "Philosopher": "the Questioning",
+        "Romantic": "de Moon",
+    }[mode]
+    return f"{compact} {suffix}".strip()
+def _object_description(obj: Mapping[str, object]) -> str:
+    features = ", ".join(str(feature) for feature in obj["features"])
+    return f"{obj['name']} in a {obj['context']} with {features}"
+def _user_prompt(object_understanding: Mapping[str, object], mode: str) -> str:
+    payload = json.dumps(object_understanding, ensure_ascii=False, sort_keys=True)
+    return (
+        f"Personality mode: {mode}\n"
+        f"Object understanding JSON: {payload}\n"
+        "Return JSON with keys persona and diary only."
+    )
+def _parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--count", type=int, default=DEFAULT_COUNT)
+    parser.add_argument("--output", type=Path, default=DEFAULT_OUTPUT_PATH)
+    return parser.parse_args()
+def main() -> None:
+    args = _parse_args()
+    output_path = prepare_curated_dataset(args.output, args.count)
+    print(f"wrote {args.count} synthetic curated SFT records to {output_path}")
+if __name__ == "__main__":
+    main()

scripts/publish_hf_adapter.py ADDED Viewed

	@@ -0,0 +1,104 @@

+"""Upload a trained Objectverse Diary LoRA adapter folder to Hugging Face Hub."""
+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+from typing import Any
+REQUIRED_ADAPTER_FILES = ("adapter_config.json",)
+ADAPTER_WEIGHT_FILES = ("adapter_model.safetensors", "adapter_model.bin")
+def validate_adapter_dir(adapter_dir: Path) -> dict[str, object]:
+    if not adapter_dir.exists() or not adapter_dir.is_dir():
+        raise FileNotFoundError(f"Adapter directory does not exist: {adapter_dir}")
+    missing = [name for name in REQUIRED_ADAPTER_FILES if not (adapter_dir / name).exists()]
+    has_weights = any((adapter_dir / name).exists() for name in ADAPTER_WEIGHT_FILES)
+    if not has_weights:
+        missing.append("adapter_model.safetensors or adapter_model.bin")
+    if missing:
+        raise ValueError(f"Adapter directory is missing required files: {', '.join(missing)}")
+    files = sorted(path.name for path in adapter_dir.iterdir() if path.is_file())
+    return {
+        "adapter_dir": str(adapter_dir),
+        "files": files,
+        "file_count": len(files),
+    }
+def upload_adapter(
+    *,
+    adapter_dir: Path,
+    repo_id: str,
+    private: bool,
+    commit_message: str,
+    dry_run: bool,
+) -> dict[str, object]:
+    summary = validate_adapter_dir(adapter_dir)
+    summary.update(
+        {
+            "repo_id": repo_id,
+            "private": private,
+            "commit_message": commit_message,
+            "dry_run": dry_run,
+        }
+    )
+    if dry_run:
+        summary["uploaded"] = False
+        return summary
+    from huggingface_hub import HfApi
+    api = HfApi()
+    api.create_repo(repo_id=repo_id, repo_type="model", private=private, exist_ok=True)
+    api.upload_folder(
+        folder_path=str(adapter_dir),
+        repo_id=repo_id,
+        repo_type="model",
+        commit_message=commit_message,
+    )
+    summary["uploaded"] = True
+    summary["url"] = f"https://huggingface.co/{repo_id}"
+    return summary
+def _print_json(payload: dict[str, Any]) -> None:
+    print(json.dumps(payload, indent=2, sort_keys=True), flush=True)
+def _parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--adapter-dir", type=Path, required=True)
+    parser.add_argument("--repo-id", required=True)
+    parser.add_argument("--private", action="store_true")
+    parser.add_argument(
+        "--commit-message",
+        default="Upload Objectverse Diary LoRA adapter",
+    )
+    parser.add_argument("--dry-run", action="store_true")
+    return parser.parse_args()
+def main() -> None:
+    args = _parse_args()
+    _print_json(
+        upload_adapter(
+            adapter_dir=args.adapter_dir,
+            repo_id=args.repo_id,
+            private=args.private,
+            commit_message=args.commit_message,
+            dry_run=args.dry_run,
+        )
+    )
+if __name__ == "__main__":
+    try:
+        main()
+    except Exception as exc:
+        raise SystemExit(str(exc)) from exc

src/ui/layout.py CHANGED Viewed

@@ -66,116 +66,221 @@ GenerationUiResult = tuple[
 def build_app() -> gr.Blocks:
     css = Path("src/ui/styles.css").read_text(encoding="utf-8")
-    with gr.Blocks(head=f"<style>{css}</style>", title=APP_TITLE, fill_width=True) as demo:
-        gr.HTML(
-            f"""
-            <section id="objectverse-hero">
-              <div class="hero-mark">
-                <span>OVD</span>
-                <small>000827</small>
-              </div>
-              <div class="hero-copy">
-                <p class="hero-kicker">Local small-model object archive<br><span>本地小模型物品档案</span></p>
-                <h1>{APP_TITLE}</h1>
-                <p>Every object has a secret life.<br><span>每个物品都有秘密人生。</span></p>
-              </div>
-              <div class="hero-badges" aria-label="Project constraints">
-                <span>Small Models</span>
-                <span>Local-First</span>
-                <span>No Cloud APIs</span>
-              </div>
-            </section>
-            """,
-            padding=False,
-        )
-        result_state = gr.State()
-        zero_gpu_probe_button = gr.Button(visible=False)
-        zero_gpu_probe_output = gr.JSON(visible=False)
-        with gr.Row(elem_id="archive-main-grid", elem_classes=["archive-grid"]):
-            with gr.Column(scale=4, elem_classes=["archive-panel", "intake-panel"]):
-                gr.HTML(_panel_header("01", "Object Intake", "物品接收", "Upload, describe, or pick a sample."), padding=False)
-                image_input = gr.Image(
-                    label=copy.UPLOAD_LABEL,
-                    show_label=False,
-                    type="filepath",
-                    sources=["upload"],
-                    elem_id="object-upload",
-                )
-                description_input = gr.Textbox(
-                    label=copy.DESCRIPTION_LABEL,
-                    placeholder=copy.DESCRIPTION_PLACEHOLDER,
-                    lines=3,
-                    max_lines=5,
-                    elem_id="object-description",
-                )
-                mode_input = gr.Radio(
-                    label=copy.MODE_LABEL,
-                    choices=PERSONALITY_MODES,
-                    value=DEFAULT_MODE,
-                    elem_id="personality-mode",
-                    elem_classes=["mode-switch"],
                 )
-                generate_button = gr.Button(copy.GENERATE_LABEL, variant="primary", elem_id="wake-button")
                 gr.HTML(
-                    """
-                    <div class="example-section-title">
-                      <span>Example Objects / 示例物品</span>
-                      <small>Click a file to generate instantly.</small>
-                    </div>
                     """,
                     padding=False,
                 )
-                example_buttons: list[gr.Button] = []
-                for index in range(len(EXAMPLE_OBJECTS)):
-                    example_buttons.append(
-                        gr.Button(
-                            example_button_label(index),
-                            elem_classes=["example-card"],
-                            variant="secondary",
                         )
-                    )
-            with gr.Column(scale=4, elem_classes=["archive-panel", "file-panel"]):
-                gr.HTML(_panel_header("02", "Object File", "物品档案", "Structured mock understanding and persona."), padding=False)
-                object_file_summary = gr.HTML(value=OBJECT_FILE_EMPTY, elem_id="object-file-summary", padding=False)
-                with gr.Accordion("Raw object understanding JSON / 原始物品识别 JSON", open=False):
-                    object_json = gr.JSON(value={}, label=copy.OBJECT_JSON_LABEL)
-                with gr.Accordion("Raw persona JSON / 原始人格 JSON", open=False):
-                    persona_json = gr.JSON(value={}, label=copy.PERSONA_JSON_LABEL)
-            with gr.Column(scale=4, elem_classes=["archive-panel", "diary-panel"]):
-                gr.HTML(_panel_header("03", "Secret Diary", "秘密日记", "A private note written by the object."), padding=False)
-                diary_output = gr.Markdown(
-                    value=DIARY_EMPTY,
-                    label=copy.DIARY_LABEL,
-                    elem_id="diary-output",
-                )
-        with gr.Row(elem_id="archive-bottom-grid", elem_classes=["archive-grid", "bottom-grid"]):
-            with gr.Column(scale=5, elem_classes=["archive-panel", "share-panel"]):
-                gr.HTML(_panel_header("04", "Share Card", "分享卡片", "Fixed-width card for screenshots."), padding=False)
-                share_card = gr.HTML(value=SHARE_CARD_EMPTY, label=copy.SHARE_CARD_LABEL, padding=False)
-            with gr.Column(scale=4, elem_classes=["archive-panel", "chat-panel"]):
-                gr.HTML(_panel_header("05", "Object Chat", "物品对话", "Ask after the object wakes up."), padding=False)
-                chatbot = gr.Chatbot(
-                    value=_empty_chat_history(),
-                    label=copy.CHAT_LABEL,
-                    type="messages",
-                    height=300,
-                    allow_tags=False,
-                )
-                chat_input = gr.Textbox(placeholder=copy.CHAT_INPUT_PLACEHOLDER, show_label=False)
-                chat_button = gr.Button(copy.CHAT_BUTTON_LABEL, elem_classes=["quiet-button"])
-            with gr.Column(scale=3, elem_classes=["archive-panel", "trace-panel"]):
-                gr.HTML(_panel_header("06", "Trace", "模型轨迹", "Saved JSON record for reproducibility."), padding=False)
-                trace_summary = gr.HTML(value=TRACE_EMPTY, elem_id="trace-summary", padding=False)
-                trace_json = gr.JSON(value={}, label=copy.TRACE_JSON_LABEL)
-                trace_path = gr.Textbox(label=copy.TRACE_PATH_LABEL, interactive=False)
         manual_outputs = [
             object_file_summary,

 def build_app() -> gr.Blocks:
     css = Path("src/ui/styles.css").read_text(encoding="utf-8")
+    custom_theme = gr.themes.Monochrome(
+        primary_hue="amber",
+        secondary_hue="yellow",
+        neutral_hue="stone",
+    ).set(
+        body_background_fill="#161513",
+        body_background_fill_dark="#161513",
+        background_fill_primary="#161513",
+        background_fill_primary_dark="#161513",
+        background_fill_secondary="rgba(30, 28, 25, 0.6)",
+        background_fill_secondary_dark="rgba(30, 28, 25, 0.6)",
+        border_color_primary="rgba(212, 175, 55, 0.15)",
+        border_color_primary_dark="rgba(212, 175, 55, 0.15)",
+        block_background_fill="transparent",
+        block_background_fill_dark="transparent",
+        block_border_width="0px",
+        panel_background_fill="transparent",
+        panel_background_fill_dark="transparent",
+    )
+    with gr.Blocks(theme=custom_theme, head=f"<style>{css}</style>", title=APP_TITLE, fill_width=True, elem_id="objectverse-app") as demo:
+        with gr.Row(elem_id="app-container"):
+            # === Sidebar ===
+            with gr.Column(elem_id="sidebar", scale=0, min_width=240):
+                gr.HTML(
+                    """
+                    <nav class="sidebar-nav">
+                      <div class="sidebar-logo">
+                        <div class="logo-icon"></div>
+                        <h2>Objectverse<br>Diary</h2>
+                      </div>
+                      <ul class="sidebar-menu">
+                        <li class="active"><a href="#intake">Home</a></li>
+                        <li><a href="#intake">Intake</a></li>
+                        <li><a href="#object-file">Object File</a></li>
+                        <li><a href="#diary">Diary</a></li>
+                        <li><a href="#chat-panel">Chat</a></li>
+                        <li><a href="#share-panel">Share Card</a></li>
+                        <li><a href="#trace">Trace</a></li>
+                        <li><a href="#settings">Settings</a></li>
+                      </ul>
+                      <div class="sidebar-footer">
+                        <div class="footer-stamp">
+                          <small>OBJECTVERSE ARCHIVE</small>
+                          <span>No. 000827</span>
+                          <small>Curate. Converse. Cherish.</small>
+                        </div>
+                        <div class="lang-switch">
+                          <button class="active">EN</button>
+                          <button>中文</button>
+                        </div>
+                      </div>
+                    </nav>
+                    """,
+                    padding=False,
                 )
+            # === Main Content Area ===
+            with gr.Column(elem_id="main-content", scale=1):
                 gr.HTML(
+                    f"""
+                    <section id="objectverse-hero">
+                        <div class="hero-copy">
+                          <h1>{APP_TITLE}</h1>
+                          <p class="hero-kicker">Every object has a secret life.<br><span>万物日记：每个物品都有秘密人生</span></p>
+                        </div>
+                        <div class="hero-badges" aria-label="Project constraints">
+                          <span>Small Models</span>
+                          <span>Local-First</span>
+                          <span>No Cloud APIs</span>
+                        </div>
+                    </section>
                     """,
                     padding=False,
                 )
+                result_state = gr.State()
+                zero_gpu_probe_button = gr.Button(visible=False)
+                zero_gpu_probe_output = gr.JSON(visible=False)
+                # Intake & Examples Row
+                with gr.Row(elem_id="intake", elem_classes=["content-section"]):
+                    # Left: Intake
+                    with gr.Column(scale=7, elem_classes=["archive-panel", "intake-panel"]):
+                        image_input = gr.Image(
+                            label=copy.UPLOAD_LABEL,
+                            show_label=False,
+                            type="filepath",
+                            sources=["upload"],
+                            elem_id="object-upload",
+                        )
+                        gr.HTML("""<div class="or-divider"><span>OR</span></div>""", padding=False)
+                        description_input = gr.Textbox(
+                            label=copy.DESCRIPTION_LABEL,
+                            placeholder=copy.DESCRIPTION_PLACEHOLDER,
+                            lines=2,
+                            max_lines=5,
+                            elem_id="object-description",
+                        )
+                        gr.HTML("""<div class="mode-header">Personality mode <small>人格模式</small> <span class="help-icon">?</span></div>""", padding=False)
+                        mode_input = gr.Radio(
+                            label=copy.MODE_LABEL,
+                            show_label=False,
+                            choices=PERSONALITY_MODES,
+                            value=DEFAULT_MODE,
+                            elem_id="personality-mode",
+                            elem_classes=["mode-switch"],
+                        )
+                        generate_button = gr.Button("Wake the Object\n唤醒物品", variant="primary", elem_id="wake-button")
+                        gr.HTML(
+                            """
+                            <div class="how-it-works">
+                              <div class="step">
+                                <span class="step-num">01</span>
+                                <div class="step-icon img-icon"></div>
+                                <div class="step-text">
+                                  <strong>Upload or describe</strong>
+                                  <small>上传物品或描述心情</small>
+                                  <p>Give me a photo or words—anything that holds a story.</p>
+                                </div>
+                              </div>
+                              <div class="step">
+                                <span class="step-num">02</span>
+                                <div class="step-icon pen-icon"></div>
+                                <div class="step-text">
+                                  <strong>I imagine its life</strong>
+                                  <small>我为它编织人生</small>
+                                  <p>I'll step into its shoes and imagine its secret life.</p>
+                                </div>
+                              </div>
+                              <div class="step">
+                                <span class="step-num">03</span>
+                                <div class="step-icon book-icon"></div>
+                                <div class="step-text">
+                                  <strong>Read its diary</strong>
+                                  <small>阅读物品日记</small>
+                                  <p>Receive a diary entry written from its perspective.</p>
+                                </div>
+                              </div>
+                            </div>
+                            """,
+                            padding=False,
                         )
+                    # Right: Examples
+                    with gr.Column(scale=4, elem_classes=["archive-panel", "examples-panel"]):
+                        gr.HTML(
+                            """
+                            <div class="example-header">
+                              <div class="books-icon"></div>
+                              <div>
+                                <strong>Example Objects</strong>
+                                <span>灵感库</span>
+                              </div>
+                            </div>
+                            """,
+                            padding=False,
+                        )
+                        example_buttons: list[gr.Button] = []
+                        for index in range(len(EXAMPLE_OBJECTS)):
+                            example_buttons.append(
+                                gr.Button(
+                                    example_button_label(index),
+                                    elem_classes=["example-card"],
+                                    variant="secondary",
+                                )
+                            )
+                        gr.HTML("""<a href="#object-file" class="view-more">View more in Object File →</a>""", padding=False)
+                # Object File Section
+                with gr.Row(elem_id="object-file", elem_classes=["content-section"]):
+                    with gr.Column(scale=1, elem_classes=["archive-panel", "file-panel"]):
+                        gr.HTML(_panel_header("02", "Object File / Recognition", "物品档案", "Structured mock understanding and persona."), padding=False)
+                        object_file_summary = gr.HTML(value=OBJECT_FILE_EMPTY, elem_id="object-file-summary", padding=False)
+                        with gr.Accordion("Raw JSON", open=False):
+                            object_json = gr.JSON(value={}, label=copy.OBJECT_JSON_LABEL)
+                            persona_json = gr.JSON(value={}, label=copy.PERSONA_JSON_LABEL)
+                # Diary Section
+                with gr.Row(elem_id="diary", elem_classes=["content-section"]):
+                    with gr.Column(scale=1, elem_classes=["archive-panel", "diary-panel"]):
+                        gr.HTML(_panel_header("03", "Secret Diary", "秘密日记", "A private note written by the object."), padding=False)
+                        diary_output = gr.Markdown(
+                            value=DIARY_EMPTY,
+                            label=copy.DIARY_LABEL,
+                            elem_id="diary-output",
+                        )
+                # Share & Chat Section
+                with gr.Row(elem_id="share", elem_classes=["content-section", "split-section"]):
+                    with gr.Column(scale=5, elem_classes=["archive-panel", "share-panel", "anchored"], elem_id="share-panel"):
+                        gr.HTML(_panel_header("04", "Share Card", "分享卡片", "Fixed-width card for screenshots."), padding=False)
+                        share_card = gr.HTML(value=SHARE_CARD_EMPTY, label=copy.SHARE_CARD_LABEL, padding=False)
+                    with gr.Column(scale=4, elem_classes=["archive-panel", "chat-panel", "anchored"], elem_id="chat-panel"):
+                        gr.HTML(_panel_header("05", "Object Chat", "物品对话", "Ask after the object wakes up."), padding=False)
+                        chatbot = gr.Chatbot(
+                            value=_empty_chat_history(),
+                            label=copy.CHAT_LABEL,
+                            type="messages",
+                            height=300,
+                            allow_tags=False,
+                        )
+                        chat_input = gr.Textbox(placeholder=copy.CHAT_INPUT_PLACEHOLDER, show_label=False)
+                        chat_button = gr.Button(copy.CHAT_BUTTON_LABEL, elem_classes=["quiet-button"])
+                # Trace Section
+                with gr.Row(elem_id="trace", elem_classes=["content-section"]):
+                    with gr.Column(scale=1, elem_classes=["archive-panel", "trace-panel"]):
+                        gr.HTML(_panel_header("06", "Trace", "模型轨迹", "Saved JSON record for reproducibility."), padding=False)
+                        trace_summary = gr.HTML(value=TRACE_EMPTY, elem_id="trace-summary", padding=False)
+                        trace_json = gr.JSON(value={}, label=copy.TRACE_JSON_LABEL)
+                        trace_path = gr.Textbox(label=copy.TRACE_PATH_LABEL, interactive=False)
         manual_outputs = [
             object_file_summary,

src/ui/styles.css CHANGED Viewed

@@ -1,733 +1,618 @@
-:root {
-  --ov-bg: #15120e;
-  --ov-panel: rgba(31, 27, 21, 0.9);
-  --ov-panel-soft: rgba(44, 37, 28, 0.72);
-  --ov-border: rgba(214, 165, 82, 0.34);
-  --ov-border-strong: rgba(232, 176, 82, 0.58);
-  --ov-text: #f0e4d0;
-  --ov-muted: #c4ad8d;
-  --ov-faint: #8f7b5f;
-  --ov-amber: #d99a35;
-  --ov-amber-bright: #f0bd62;
-  --ov-green: #9fb37a;
-  --ov-red: #b96f55;
-  --ov-shadow: 0 18px 50px rgba(0, 0, 0, 0.38);
-}
-html,
-body,
-gradio-app {
-  background: var(--ov-bg);
-  overflow-x: hidden;
-  width: 100%;
-}
-body {
-  margin: 0;
-}
-.gradio-container,
-.gradio-container * {
-  box-sizing: border-box;
-  min-width: 0;
-}
-.gradio-container {
-  background:
-    linear-gradient(rgba(255, 255, 255, 0.018) 1px, transparent 1px),
-    linear-gradient(90deg, rgba(255, 255, 255, 0.014) 1px, transparent 1px),
-    linear-gradient(135deg, #18140f 0%, #241d16 45%, #11100e 100%);
-  background-size: 28px 28px, 28px 28px, auto;
-  color: var(--ov-text);
-  font-family: Georgia, "Times New Roman", serif;
-  margin: 0 auto !important;
-  max-width: 1480px !important;
-  min-height: 100vh;
-  overflow-x: hidden;
-  padding: 24px !important;
-}
-.gradio-container > main,
-.gradio-container > main > .wrap,
-.gradio-container > main > .wrap > .contain {
-  margin-left: 0 !important;
-  margin-right: 0 !important;
-  max-width: 100% !important;
-  padding-left: 0 !important;
-  padding-right: 0 !important;
-  width: 100% !important;
-}
-.gradio-container .contain {
-  max-width: none !important;
-}
-#objectverse-hero {
-  align-items: center;
-  background:
-    linear-gradient(90deg, rgba(217, 154, 53, 0.08), transparent 34%),
-    rgba(23, 20, 16, 0.86);
-  border: 1px solid var(--ov-border);
-  border-radius: 8px;
-  box-shadow: var(--ov-shadow);
-  display: grid;
-  gap: 18px;
-  grid-template-columns: auto 1fr auto;
-  padding: 18px;
-}
-.hero-mark {
-  align-items: center;
-  border: 1px solid var(--ov-border-strong);
-  border-radius: 50%;
-  color: var(--ov-amber-bright);
-  display: flex;
-  flex-direction: column;
-  height: 82px;
-  justify-content: center;
-  width: 82px;
-}
-.hero-mark span,
-.hero-mark small {
-  letter-spacing: 0;
-}
-.hero-mark span {
-  font-size: 20px;
-}
-.hero-mark small {
-  color: var(--ov-muted);
-  font-size: 11px;
-}
-.hero-kicker {
-  color: var(--ov-amber-bright);
-  font-size: 13px;
-  font-style: italic;
-  margin: 0 0 8px;
-  overflow-wrap: anywhere;
-  white-space: normal;
-}
-#objectverse-hero h1 {
-  color: var(--ov-text);
-  font-size: 40px;
-  line-height: 1.05;
-  margin: 0 0 8px;
-}
-#objectverse-hero p {
-  color: var(--ov-muted);
-  font-size: 16px;
-  line-height: 1.5;
-  margin: 0;
-  overflow-wrap: anywhere;
-  white-space: normal;
-}
-#objectverse-hero .hero-copy span {
-  color: var(--ov-muted);
-}
-.hero-badges {
-  display: flex;
-  flex-wrap: wrap;
-  gap: 10px;
-  justify-content: flex-end;
-}
-#objectverse-hero .hero-badges span {
-  border: 1px solid var(--ov-border);
-  border-radius: 6px;
-  color: var(--ov-amber-bright);
-  font-size: 13px;
-  padding: 10px 14px;
-  white-space: nowrap;
-}
-#objectverse-hero .hero-mark span {
-  color: var(--ov-amber-bright);
-}
-#archive-main-grid,
-#archive-bottom-grid {
-  gap: 16px;
-  margin-top: 16px;
-}
-.archive-panel {
-  background:
-    linear-gradient(rgba(255, 255, 255, 0.025), transparent),
-    var(--ov-panel);
-  border: 1px solid var(--ov-border);
-  border-radius: 8px;
-  box-shadow: var(--ov-shadow);
-  padding: 16px;
-}
-.archive-panel .block,
-.archive-panel .form,
-.archive-panel .wrap,
-.archive-panel .input-container,
-.archive-panel textarea,
-.archive-panel input {
-  background-color: transparent !important;
-  color: var(--ov-text) !important;
-}
-.panel-header {
-  align-items: flex-start;
-  border-bottom: 1px solid rgba(214, 165, 82, 0.18);
-  display: flex;
-  gap: 12px;
-  margin-bottom: 16px;
-  padding-bottom: 12px;
-}
-.panel-header > span {
-  background: rgba(217, 154, 53, 0.16);
-  border: 1px solid var(--ov-border);
-  border-radius: 6px;
-  color: var(--ov-amber-bright) !important;
-  display: inline-flex;
-  flex: 0 0 auto;
-  font-size: 13px;
-  justify-content: center;
-  padding: 7px 9px;
-}
-.panel-header h2 {
-  color: var(--ov-text) !important;
-  font-size: 19px;
-  line-height: 1.2;
-  margin: 0;
-}
-.panel-header small {
-  color: var(--ov-muted) !important;
-  font-size: 13px;
-  font-weight: 400;
-}
-.panel-header p {
-  color: var(--ov-muted) !important;
-  font-size: 13px;
-  line-height: 1.45;
-  margin: 5px 0 0;
-}
-.gradio-container label,
-.gradio-container .label-wrap span {
-  color: var(--ov-muted) !important;
-}
-.gradio-container textarea,
-.gradio-container input[type="text"] {
-  border: 1px solid rgba(214, 165, 82, 0.24) !important;
-  border-radius: 6px !important;
-  color: var(--ov-text) !important;
-  font-family: Georgia, "Times New Roman", serif !important;
-  line-height: 1.5 !important;
-  overflow-wrap: break-word !important;
-  white-space: pre-wrap !important;
-}
-.gradio-container textarea::placeholder,
-.gradio-container input::placeholder {
-  color: rgba(196, 173, 141, 0.68) !important;
-}
-#object-upload {
-  border: 1px dashed rgba(217, 154, 53, 0.52);
-  border-radius: 8px;
-  overflow: hidden;
-}
-#personality-mode .wrap,
-.mode-switch .wrap {
-  display: flex !important;
-  flex-wrap: wrap !important;
-  gap: 8px !important;
-}
-#personality-mode label,
-.mode-switch label {
-  align-items: center;
-  background: rgba(31, 27, 21, 0.92) !important;
-  border: 1px solid rgba(214, 165, 82, 0.34) !important;
-  border-radius: 7px !important;
-  color: var(--ov-muted) !important;
-  display: flex !important;
-  flex: 1 1 102px;
-  justify-content: center;
-  min-height: 48px;
-  padding: 8px !important;
-  text-align: center;
-  white-space: normal !important;
-}
-#personality-mode label span,
-.mode-switch label span {
-  color: var(--ov-muted) !important;
-  overflow-wrap: anywhere;
-  white-space: normal !important;
-}
-#personality-mode label:has(input:checked),
-.mode-switch label:has(input:checked) {
-  background: rgba(217, 154, 53, 0.18) !important;
-  border-color: var(--ov-amber) !important;
-  color: var(--ov-amber-bright) !important;
-}
-#personality-mode label:has(input:checked) span,
-.mode-switch label:has(input:checked) span {
-  color: var(--ov-amber-bright) !important;
-}
-#wake-button {
-  background: linear-gradient(180deg, #e0ad62 0%, #bd7926 100%) !important;
-  border: 1px solid #f0bd62 !important;
-  border-radius: 8px !important;
-  color: #1d140b !important;
-  font-family: Georgia, "Times New Roman", serif !important;
-  font-size: 18px !important;
-  font-weight: 700 !important;
-  min-height: 58px;
-  text-shadow: none !important;
-}
-.quiet-button {
-  border: 1px solid var(--ov-border) !important;
-  color: var(--ov-amber-bright) !important;
-}
-.example-section-title {
-  align-items: baseline;
-  border-top: 1px solid rgba(214, 165, 82, 0.18);
-  display: flex;
-  gap: 10px;
-  justify-content: space-between;
-  margin: 18px 0 10px;
-  padding-top: 14px;
-}
-.example-section-title span {
-  color: var(--ov-text);
-  font-size: 15px;
-}
-.example-section-title small {
-  color: var(--ov-faint);
-  font-size: 12px;
-}
-button.example-card {
-  background:
-    linear-gradient(90deg, rgba(217, 154, 53, 0.1), transparent 52%),
-    var(--ov-panel-soft) !important;
-  border: 1px solid rgba(214, 165, 82, 0.26) !important;
-  border-radius: 7px !important;
-  color: var(--ov-text) !important;
-  display: block !important;
-  font-family: Georgia, "Times New Roman", serif !important;
-  height: auto !important;
-  line-height: 1.4 !important;
-  margin-top: 8px !important;
-  min-height: 78px;
-  overflow-wrap: anywhere;
-  padding: 12px !important;
-  text-align: left !important;
-  white-space: pre-wrap !important;
-  width: 100%;
-}
-button.example-card:hover,
-.quiet-button:hover {
-  border-color: var(--ov-amber) !important;
-  color: var(--ov-amber-bright) !important;
-}
-.archive-empty,
-.objectverse-placeholder,
-.archive-error {
-  border: 1px dashed rgba(214, 165, 82, 0.3);
-  border-radius: 8px;
-  color: var(--ov-muted) !important;
-  line-height: 1.55;
-  padding: 18px;
-}
-.archive-empty h3,
-.objectverse-placeholder strong,
-.archive-error strong {
-  color: var(--ov-text) !important;
-  display: block;
-  font-size: 20px;
-  margin: 8px 0;
-}
-.archive-empty.compact,
-.trace-card {
-  padding: 14px;
-}
-.archive-label,
-.objectverse-placeholder span,
-.archive-error span {
-  color: var(--ov-amber-bright) !important;
-  display: block;
-  font-size: 12px;
-  text-transform: uppercase;
-}
-.archive-error {
-  border-color: rgba(185, 111, 85, 0.72);
-}
-.archive-error span,
-.archive-error strong {
-  color: #f3a184 !important;
-}
-.archive-empty p,
-.objectverse-placeholder p,
-.archive-error p {
-  color: var(--ov-muted) !important;
-}
-.object-file-card,
-.trace-card {
-  background: rgba(18, 16, 13, 0.52);
-  border: 1px solid rgba(214, 165, 82, 0.24);
-  border-radius: 8px;
-  padding: 18px;
-}
-.file-meta {
-  display: flex;
-  flex-wrap: wrap;
-  gap: 8px;
-  margin-bottom: 12px;
-}
-.file-meta span,
-.file-tags span,
-.card-tags span {
-  border: 1px solid rgba(214, 165, 82, 0.28);
-  border-radius: 999px;
-  color: var(--ov-amber-bright);
-  display: inline-flex;
-  font-size: 12px;
-  line-height: 1;
-  padding: 7px 9px;
-}
-.object-file-card h3 {
-  color: var(--ov-text);
-  font-size: 28px;
-  line-height: 1.12;
-  margin: 0 0 8px;
-}
-.object-name {
-  color: var(--ov-muted);
-  margin: 0 0 16px;
-}
-.object-file-card dl {
-  display: grid;
-  gap: 10px;
-  margin: 0 0 16px;
-}
-.object-file-card dl > div {
-  border-top: 1px solid rgba(214, 165, 82, 0.14);
-  padding-top: 10px;
-}
-.object-file-card dt {
-  color: var(--ov-faint);
-  font-size: 12px;
-  margin-bottom: 3px;
-  text-transform: uppercase;
-}
-.object-file-card dd {
-  color: var(--ov-text);
-  line-height: 1.45;
-  margin: 0;
-}
-.feature-list {
-  border: 1px solid rgba(159, 179, 122, 0.25);
-  border-radius: 7px;
-  margin-bottom: 16px;
-  padding: 12px 14px;
-}
-.feature-list strong {
-  color: var(--ov-green);
-}
-.feature-list ul {
-  color: var(--ov-muted);
-  margin: 8px 0 0;
-  padding-left: 18px;
-}
-.complaint {
-  border-left: 3px solid var(--ov-red);
-  color: var(--ov-text);
-  font-style: italic;
-  line-height: 1.55;
-  margin: 0 0 14px;
-  padding-left: 12px;
-}
-.file-tags,
-.card-tags {
-  display: flex;
-  flex-wrap: wrap;
-  gap: 8px;
-}
-#diary-output,
-#diary-output * {
-  color: var(--ov-muted) !important;
-}
-#diary-output {
-  background: rgba(18, 16, 13, 0.5);
-  border: 1px solid rgba(214, 165, 82, 0.22);
-  border-radius: 8px;
-  min-height: 320px;
-  padding: 18px !important;
-}
-#diary-output h1,
-#diary-output h2,
-#diary-output h3 {
-  color: var(--ov-amber-bright) !important;
-}
-#diary-output p {
-  font-size: 16px;
-  line-height: 1.7;
-}
-.objectverse-card {
-  background:
-    linear-gradient(180deg, rgba(255, 245, 218, 0.06), rgba(34, 24, 14, 0.1)),
-    #241b12;
-  border: 1px solid rgba(240, 189, 98, 0.58);
-  border-radius: 8px;
-  box-shadow: 0 22px 55px rgba(0, 0, 0, 0.45);
-  color: var(--ov-text);
-  max-width: 560px;
-  padding: 24px;
-  width: 100%;
-}
-.card-header {
-  align-items: flex-start;
-  display: flex;
-  gap: 12px;
-  justify-content: space-between;
-}
-.objectverse-card h2 {
-  color: var(--ov-text);
-  font-size: 32px;
-  line-height: 1.08;
-  margin: 8px 0;
-}
-.card-kicker,
-.card-object,
-.card-cn {
-  color: var(--ov-muted);
-  letter-spacing: 0;
-}
-.card-kicker {
-  font-size: 12px;
-  text-transform: uppercase;
-}
-.card-stamp {
-  border: 1px solid rgba(217, 154, 53, 0.42);
-  border-radius: 50%;
-  color: var(--ov-amber-bright);
-  flex: 0 0 auto;
-  font-size: 11px;
-  height: 64px;
-  padding-top: 24px;
-  text-align: center;
-  width: 64px;
-}
-.card-quote {
-  border-left: 3px solid var(--ov-amber);
-  color: var(--ov-text);
-  font-size: 18px;
-  line-height: 1.62;
-  margin: 20px 0 14px;
-  padding-left: 14px;
-}
-.card-cn {
-  font-size: 14px;
-  line-height: 1.6;
-  margin: 0 0 18px;
-}
-.trace-card strong {
-  color: var(--ov-text);
-  display: block;
-  margin: 8px 0;
-  overflow-wrap: anywhere;
-}
-.trace-card p {
-  color: var(--ov-muted);
-  line-height: 1.5;
-  margin: 0;
-}
-.gradio-container .json-holder,
-.gradio-container pre {
-  max-width: 100%;
-  overflow: auto !important;
-}
-@media (max-width: 980px) {
-  gradio-app,
-  .gradio-container,
-  .gradio-container .main,
-  .gradio-container .contain {
-    margin: 0 !important;
-    max-width: 100vw !important;
-    overflow-x: hidden !important;
-    width: 100vw !important;
-  }
-  .gradio-container {
-    padding: 14px !important;
-  }
-  .gradio-container > main,
-  .gradio-container > main > .wrap,
-  .gradio-container > main > .wrap > .contain {
-    max-width: 100% !important;
-    overflow-x: hidden !important;
-    padding-left: 0 !important;
-    padding-right: 0 !important;
-    width: 100% !important;
-  }
-  #objectverse-hero {
-    grid-template-columns: 1fr;
-    max-width: calc(100vw - 28px);
-    width: calc(100vw - 28px);
-  }
-  #archive-main-grid,
-  #archive-bottom-grid {
-    max-width: calc(100vw - 28px);
-    width: calc(100vw - 28px);
-  }
-  .hero-mark {
-    height: 68px;
-    width: 68px;
-  }
-  #objectverse-hero h1 {
-    font-size: 32px;
-    overflow-wrap: anywhere;
-  }
-  .hero-badges {
-    justify-content: flex-start;
-  }
-  .hero-badges span {
-    flex: 1 1 100%;
-    text-align: center;
-  }
-  #archive-main-grid,
-  #archive-bottom-grid,
-  .gradio-container .gr-row {
-    flex-direction: column !important;
-    gap: 14px !important;
-  }
-  .archive-panel {
-    padding: 14px;
-    width: 100% !important;
-  }
-  #personality-mode label,
-  .mode-switch label {
-    flex-basis: 120px;
-  }
-  .example-section-title {
-    align-items: flex-start;
-    flex-direction: column;
-    gap: 4px;
-  }
-  #diary-output {
-    min-height: 240px;
-  }
-  .objectverse-card {
-    max-width: 100%;
-  }
-}
-@media (max-width: 430px) {
-  .gradio-container {
-    padding-left: 10px !important;
-    padding-right: 10px !important;
-  }
-  #objectverse-hero,
-  .archive-panel {
-    border-radius: 7px;
-  }
-  .panel-header h2 {
-    font-size: 17px;
-  }
-  .panel-header {
-    gap: 9px;
-  }
-  #personality-mode label,
-  .mode-switch label {
-    flex-basis: 100%;
-  }
-  .object-file-card h3,
-  .objectverse-card h2 {
-    font-size: 25px;
-  }
-  .card-header {
-    flex-direction: column;
-  }
-  .card-stamp {
-    border-radius: 999px;
-    height: auto;
-    padding: 7px 10px;
-    width: auto;
-  }
-}

+/*
+ * Objectverse Diary - Dark Academia / Vintage Archive Theme
+ * Updated to match reference UI.
+ */
+ @import url('https://fonts.googleapis.com/css2?family=Space+Mono:ital,wght@0,400;0,700;1,400&family=Courier+Prime:ital,wght@0,400;0,700;1,400&display=swap');
+ :root {
+   --ov-bg: #161513;
+   --ov-bg-panel: rgba(30, 28, 25, 0.6);
+   --ov-bg-input: #1b1a18;
+   --ov-border-faint: rgba(212, 175, 55, 0.15);
+   --ov-border-light: rgba(212, 175, 55, 0.3);
+   --ov-border-strong: rgba(212, 175, 55, 0.8);
+   --ov-text-main: #E6E1D3;
+   --ov-text-muted: #8B8678;
+   --ov-text-dark: #2a261f;
+   --ov-gold: #D4AF37;
+   --ov-gold-bright: #F5D061;
+   --font-typewriter: 'Courier Prime', 'Space Mono', 'Courier New', monospace;
+   --font-sans: 'Inter', -apple-system, sans-serif;
+   --font-serif: Georgia, serif;
+ }
+ html, body, gradio-app {
+   background-color: var(--ov-bg);
+   margin: 0;
+   padding: 0;
+   width: 100%;
+   height: 100%;
+   color: var(--ov-text-main);
+ }
+ /* Subtle noise overlay */
+ body::before {
+   content: "";
+   position: fixed;
+   top: 0; left: 0; right: 0; bottom: 0;
+   background-image: url("data:image/svg+xml,%3Csvg viewBox='0 0 200 200' xmlns='http://www.w3.org/2000/svg'%3E%3Cfilter id='noiseFilter'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.85' numOctaves='3' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='100%25' height='100%25' filter='url(%23noiseFilter)' opacity='0.03'/%3E%3C/svg%3E");
+   pointer-events: none;
+   z-index: 9999;
+ }
+ .gradio-container {
+   max-width: 100% !important;
+   padding: 0 !important;
+   background: transparent !important;
+   font-family: var(--font-sans);
+ }
+ /* Layout wrapper */
+ #app-container {
+   display: flex;
+   flex-direction: row;
+   min-height: 100vh;
+   align-items: stretch;
+   gap: 0 !important;
+   margin: 0 !important;
+ }
+ /* ====================
+    Sidebar Styles
+    ==================== */
+ #sidebar {
+   width: 240px;
+   min-width: 240px !important;
+   max-width: 240px !important;
+   border-right: 1px solid var(--ov-border-faint);
+   background: rgba(22, 21, 19, 0.95);
+   position: fixed;
+   top: 0;
+   bottom: 0;
+   left: 0;
+   display: flex;
+   flex-direction: column;
+   z-index: 100;
+   padding: 30px 0;
+ }
+ .sidebar-logo {
+   text-align: center;
+   margin-bottom: 40px;
+ }
+ .sidebar-logo h2 {
+   font-family: var(--font-typewriter);
+   font-size: 18px;
+   color: var(--ov-text-main);
+   margin: 10px 0 0;
+   line-height: 1.2;
+   font-weight: normal;
+ }
+ .logo-icon {
+   width: 48px;
+   height: 64px;
+   margin: 0 auto;
+   border: 1px solid var(--ov-gold);
+   border-radius: 24px;
+   display: flex;
+   align-items: center;
+   justify-content: center;
+   position: relative;
+ }
+ .logo-icon::after {
+   content: "⚷"; /* Key symbol placeholder */
+   color: var(--ov-gold);
+   font-size: 24px;
+ }
+ .sidebar-menu {
+   list-style: none;
+   padding: 0;
+   margin: 0;
+   flex-grow: 1;
+ }
+ .sidebar-menu li {
+   margin-bottom: 5px;
+ }
+ .sidebar-menu a {
+   display: flex;
+   align-items: center;
+   padding: 12px 30px;
+   color: var(--ov-text-muted);
+   text-decoration: none;
+   font-size: 15px;
+   font-family: var(--font-typewriter);
+   border-left: 3px solid transparent;
+   transition: all 0.2s;
+ }
+ .sidebar-menu li.active a,
+ .sidebar-menu a:hover {
+   color: var(--ov-gold);
+   background: linear-gradient(90deg, rgba(212, 175, 55, 0.1) 0%, transparent 100%);
+   border-left-color: var(--ov-gold);
+ }
+ .sidebar-footer {
+   padding: 0 20px;
+ }
+ .footer-stamp {
+   border: 1px solid var(--ov-border-faint);
+   padding: 15px;
+   text-align: center;
+   border-radius: 4px;
+   margin-bottom: 20px;
+ }
+ .footer-stamp small {
+   display: block;
+   font-size: 9px;
+   color: var(--ov-text-muted);
+   text-transform: uppercase;
+   letter-spacing: 1px;
+ }
+ .footer-stamp span {
+   display: block;
+   font-family: var(--font-typewriter);
+   color: var(--ov-gold);
+   font-size: 13px;
+   margin: 5px 0;
+ }
+ .lang-switch {
+   display: flex;
+   border: 1px solid var(--ov-border-light);
+   border-radius: 4px;
+   overflow: hidden;
+ }
+ .lang-switch button {
+   flex: 1;
+   background: transparent;
+   border: none;
+   color: var(--ov-text-muted);
+   padding: 8px 0;
+   font-size: 12px;
+   cursor: pointer;
+ }
+ .lang-switch button.active {
+   color: var(--ov-gold);
+   background: rgba(212, 175, 55, 0.05);
+ }
+ /* ====================
+    Main Content Area
+    ==================== */
+ #main-content {
+   margin-left: 240px;
+   padding: 40px 60px;
+   max-width: 1200px;
+ }
+ #objectverse-hero {
+   margin-bottom: 40px;
+   position: relative;
+ }
+ #objectverse-hero h1 {
+   font-family: var(--font-typewriter);
+   font-size: 42px;
+   color: var(--ov-text-main);
+   margin: 0 0 10px 0;
+   letter-spacing: -0.5px;
+ }
+ .hero-kicker {
+   font-size: 18px;
+   color: var(--ov-gold);
+   font-style: italic;
+   font-family: var(--font-serif);
+   margin: 0;
+ }
+ .hero-kicker span {
+   font-size: 14px;
+   font-style: normal;
+   color: #A89B84 !important;
+   font-family: var(--font-sans);
+ }
+ .hero-badges {
+   display: flex;
+   gap: 15px;
+   margin-top: 25px;
+ }
+ .hero-badges span {
+   border: 1px solid var(--ov-border-light);
+   padding: 6px 16px;
+   border-radius: 20px;
+   font-size: 13px;
+   color: var(--ov-text-muted);
+   font-family: var(--font-typewriter);
+   display: flex;
+   align-items: center;
+   gap: 6px;
+ }
+ .content-section {
+   margin-bottom: 30px;
+   gap: 30px !important;
+ }
+ .archive-panel {
+   background: var(--ov-bg-panel) !important;
+   border: 1px solid var(--ov-border-faint) !important;
+   border-radius: 8px;
+   padding: 25px;
+   position: relative;
+ }
+ /* Gradio Overrides */
+ .gradio-container .block,
+ .gradio-container .form,
+ .gradio-container .box {
+   background: transparent !important;
+   border: none !important;
+   box-shadow: none !important;
+ }
+ .gradio-container label, .gradio-container span.svelte-1gfknul {
+   color: var(--ov-text-muted) !important;
+   font-family: var(--font-typewriter);
+ }
+ .gradio-container input, .gradio-container textarea {
+   background: var(--ov-bg-input) !important;
+   border: 1px solid var(--ov-border-light) !important;
+   border-radius: 4px !important;
+   color: var(--ov-text-main) !important;
+   font-family: var(--font-sans) !important;
+ }
+ .gradio-container input:focus, .gradio-container textarea:focus {
+   border-color: var(--ov-gold) !important;
+   box-shadow: none !important;
+ }
+ /* Upload Box */
+ #object-upload {
+   border: 2px dashed var(--ov-border-light) !important;
+   background: transparent !important;
+   border-radius: 8px;
+   padding: 40px 20px;
+   text-align: center;
+   min-height: 180px;
+   display: flex;
+   align-items: center;
+   justify-content: center;
+ }
+ .or-divider {
+   text-align: center;
+   position: relative;
+   margin: 20px 0;
+ }
+ .or-divider::before {
+   content: "";
+   position: absolute;
+   left: 0; right: 0; top: 50%;
+   height: 1px;
+   background: var(--ov-border-faint);
+   z-index: 1;
+ }
+ .or-divider span {
+   background: var(--ov-bg-panel);
+   padding: 0 15px;
+   position: relative;
+   z-index: 2;
+   color: var(--ov-text-muted);
+   font-family: var(--font-typewriter);
+   font-size: 14px;
+ }
+ /* Personality Mode Radio */
+ .mode-header {
+   font-family: var(--font-typewriter);
+   color: var(--ov-text-main);
+   margin-bottom: 15px;
+   display: flex;
+   align-items: center;
+   gap: 10px;
+ }
+ .mode-header small {
+   color: var(--ov-text-muted);
+   font-family: var(--font-sans);
+ }
+ #personality-mode .wrap {
+   display: flex !important;
+   gap: 10px !important;
+   flex-wrap: wrap !important;
+ }
+ #personality-mode label {
+   flex: 1;
+   background: transparent !important;
+   border: 1px solid var(--ov-border-light) !important;
+   border-radius: 6px !important;
+   padding: 15px 10px !important;
+   text-align: center;
+   cursor: pointer;
+   transition: all 0.2s;
+ }
+ #personality-mode label span {
+   display: block;
+   font-family: var(--font-typewriter);
+   color: var(--ov-text-main) !important;
+   font-size: 14px;
+ }
+ #personality-mode label:has(input:checked) {
+   border-color: var(--ov-gold) !important;
+   background: rgba(212, 175, 55, 0.05) !important;
+   box-shadow: 0 0 0 1px var(--ov-gold) inset;
+ }
+ #personality-mode label:has(input:checked) span {
+   color: var(--ov-gold-bright) !important;
+ }
+ /* Wake Button */
+ #wake-button {
+   background: linear-gradient(180deg, #d8ac54 0%, #a67c2d 100%) !important;
+   border: none !important;
+   border-radius: 4px !important;
+   color: var(--ov-text-dark) !important;
+   font-family: var(--font-typewriter);
+   font-size: 20px !important;
+   font-weight: bold;
+   padding: 20px !important;
+   margin-top: 25px;
+   box-shadow: inset 0 1px 1px rgba(255,255,255,0.3), 0 4px 15px rgba(0,0,0,0.5) !important;
+   text-shadow: 0 1px 0 rgba(255,255,255,0.2);
+   transition: all 0.2s;
+ }
+ #wake-button:hover {
+   filter: brightness(1.1);
+   transform: translateY(-1px);
+ }
+ /* How it works */
+ .how-it-works {
+   display: flex;
+   gap: 20px;
+   margin-top: 40px;
+   padding-top: 30px;
+   border-top: 1px dashed var(--ov-border-faint);
+ }
+ .step {
+   flex: 1;
+   position: relative;
+ }
+ .step-num {
+   position: absolute;
+   top: -10px; left: -10px;
+   background: var(--ov-bg);
+   border: 1px solid var(--ov-border-light);
+   color: var(--ov-gold);
+   font-family: var(--font-typewriter);
+   font-size: 12px;
+   padding: 2px 8px;
+ }
+ .step-text strong {
+   display: block;
+   color: var(--ov-text-main);
+   font-family: var(--font-typewriter);
+   font-size: 14px;
+   margin-top: 15px;
+ }
+ .step-text small {
+   display: block;
+   color: var(--ov-text-muted);
+   font-size: 12px;
+   margin-bottom: 8px;
+ }
+ .step-text p {
+   color: var(--ov-text-muted);
+   font-size: 13px;
+   line-height: 1.4;
+   margin: 0;
+ }
+ /* Example Objects Panel */
+ .example-header {
+   display: flex;
+   align-items: center;
+   gap: 15px;
+   margin-bottom: 20px;
+   border-bottom: 1px solid var(--ov-border-faint);
+   padding-bottom: 15px;
+ }
+ .example-header strong {
+   display: block;
+   font-family: var(--font-typewriter);
+   font-size: 16px;
+   font-weight: normal;
+ }
+ .example-header span {
+   color: var(--ov-text-muted);
+   font-size: 13px;
+ }
+ button.example-card {
+   background: rgba(22, 21, 19, 0.8) !important;
+   border: 1px solid var(--ov-border-faint) !important;
+   border-radius: 4px !important;
+   color: var(--ov-text-main) !important;
+   text-align: left !important;
+   padding: 15px !important;
+   margin-bottom: 12px !important;
+   font-family: var(--font-typewriter) !important;
+   display: block;
+   width: 100%;
+   transition: border-color 0.2s;
+ }
+ button.example-card:hover {
+   border-color: var(--ov-gold) !important;
+ }
+ .view-more {
+   display: block;
+   text-align: right;
+   color: var(--ov-gold);
+   text-decoration: none;
+   font-family: var(--font-typewriter);
+   font-size: 14px;
+   margin-top: 15px;
+ }
+ /* Other Panels Formatting */
+ .panel-header h2 {
+   font-family: var(--font-typewriter);
+   font-size: 24px;
+   color: var(--ov-text-main);
+   margin: 0 0 5px 0;
+ }
+ .panel-header {
+   border-bottom: 1px solid var(--ov-border-faint);
+   padding-bottom: 15px;
+   margin-bottom: 20px;
+ }
+ .panel-header > span {
+   background: transparent;
+   border: none;
+   color: var(--ov-gold) !important;
+   font-family: var(--font-typewriter);
+   font-size: 18px;
+   padding: 0;
+ }
+ /* Markdown & Typography */
+ #diary-output {
+   font-family: var(--font-serif) !important;
+   font-size: 18px;
+   line-height: 1.8;
+   color: #D6D1C4 !important;
+ }
+ #diary-output h3 {
+   font-family: var(--font-typewriter);
+   color: var(--ov-gold);
+   text-transform: uppercase;
+   font-size: 16px;
+ }
+ .archive-empty {
+   text-align: center;
+   padding: 40px;
+   border: 1px dashed var(--ov-border-light);
+ }
+ .archive-empty h3 {
+   font-family: var(--font-typewriter);
+ }
+ /* Responsive */
+ @media (max-width: 980px) {
+   #app-container {
+     flex-direction: column;
+   }
+   #sidebar {
+     position: static;
+     width: 100% !important;
+     max-width: 100% !important;
+     height: auto;
+     padding: 20px;
+     border-right: none;
+     border-bottom: 1px solid var(--ov-border-faint);
+   }
+   #main-content {
+     margin-left: 0;
+     padding: 20px;
+   }
+   .content-section {
+     flex-direction: column !important;
+   }
+   .split-section {
+     flex-direction: column !important;
+   }
+ }
+ @media (max-width: 600px) {
+   #main-content {
+     padding: 15px !important;
+   }
+   #objectverse-hero h1 {
+     font-size: 28px !important;
+     word-break: break-word;
+   }
+   .hero-kicker {
+     font-size: 15px !important;
+   }
+   #personality-mode label {
+     flex: 1 1 45% !important;
+     padding: 10px 5px !important;
+   }
+   .sidebar-menu {
+     display: flex;
+     flex-wrap: wrap;
+     gap: 5px;
+   }
+   .sidebar-menu li {
+     margin-bottom: 0;
+   }
+   .sidebar-menu a {
+     padding: 8px 10px;
+     font-size: 13px;
+     border-left: none;
+     border-bottom: 2px solid transparent;
+   }
+   .sidebar-menu li.active a {
+     border-bottom-color: var(--ov-gold);
+     border-left: none;
+     background: rgba(212, 175, 55, 0.1);
+   }
+   .lang-switch {
+     margin-top: 10px;
+   }
+   .how-it-works {
+     flex-direction: column;
+     gap: 20px;
+   }
+   .hero-badges {
+     flex-wrap: wrap;
+   }
+   .hero-badges span {
+     flex: 1 1 100%;
+     justify-content: center;
+   }
+ }

tests/test_dataset_tooling.py CHANGED Viewed

@@ -10,6 +10,7 @@ from pathlib import Path
 from scripts.export_traces import export_trace_jsonl
 from scripts.generate_dataset import build_sft_records, write_sft_jsonl
 from scripts.generate_sample_traces import generate_sample_traces
 from src.models.schema import TraceRecord
@@ -36,6 +37,29 @@ class DatasetToolingTest(unittest.TestCase):
         self.assertEqual(len(rows), 3)
         self.assertEqual(rows[0]["id"], "sft-preview-0001")
     def test_export_trace_jsonl(self) -> None:
         with tempfile.TemporaryDirectory() as tmp_dir:
             sample_dir = Path(tmp_dir) / "samples"

 from scripts.export_traces import export_trace_jsonl
 from scripts.generate_dataset import build_sft_records, write_sft_jsonl
 from scripts.generate_sample_traces import generate_sample_traces
+from scripts.prepare_curated_dataset import build_curated_records, write_jsonl
 from src.models.schema import TraceRecord
         self.assertEqual(len(rows), 3)
         self.assertEqual(rows[0]["id"], "sft-preview-0001")
+    def test_build_curated_records(self) -> None:
+        records = build_curated_records(10)
+        assistant_payload = json.loads(records[0]["messages"][2]["content"])
+        self.assertEqual(len(records), 10)
+        self.assertEqual(records[0]["split"], "train")
+        self.assertEqual(records[0]["source"], "objectverse-diary-synthetic-curated-v1")
+        self.assertIn("curation_notes", records[0])
+        self.assertIn("persona", assistant_payload)
+        self.assertIn("diary", assistant_payload)
+    def test_write_curated_jsonl(self) -> None:
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            output_path = Path(tmp_dir) / "curated.jsonl"
+            write_jsonl(build_curated_records(2), output_path)
+            rows = [
+                json.loads(line)
+                for line in output_path.read_text(encoding="utf-8").splitlines()
+            ]
+        self.assertEqual(len(rows), 2)
+        self.assertEqual(rows[0]["id"], "curated-synthetic-0001")
     def test_export_trace_jsonl(self) -> None:
         with tempfile.TemporaryDirectory() as tmp_dir:
             sample_dir = Path(tmp_dir) / "samples"

tests/test_finetune_lora_tooling.py ADDED Viewed

	@@ -0,0 +1,86 @@

+"""Tests for Modal LoRA fine-tuning scaffolding helpers."""
+from __future__ import annotations
+import json
+import tempfile
+import unittest
+from pathlib import Path
+from scripts import finetune_lora
+def _valid_record() -> dict[str, object]:
+    return {
+        "id": "sft-preview-0001",
+        "messages": [
+            {"role": "system", "content": "You are Objectverse Diary."},
+            {"role": "user", "content": "Create a persona."},
+            {"role": "assistant", "content": "{\"persona\": {}, \"diary\": {}}"},
+        ],
+    }
+class FinetuneLoraToolingTest(unittest.TestCase):
+    def test_load_sft_records_rejects_missing_messages(self) -> None:
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            path = Path(tmp_dir) / "bad.jsonl"
+            path.write_text(json.dumps({"id": "bad"}) + "\n", encoding="utf-8")
+            with self.assertRaises(ValueError):
+                finetune_lora.load_sft_records(path)
+    def test_load_sft_records_rejects_malformed_messages(self) -> None:
+        bad_record = {"id": "bad", "messages": [{"role": "user"}]}
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            path = Path(tmp_dir) / "bad.jsonl"
+            path.write_text(json.dumps(bad_record) + "\n", encoding="utf-8")
+            with self.assertRaises(ValueError):
+                finetune_lora.load_sft_records(path)
+    def test_record_to_training_text_is_non_empty(self) -> None:
+        text = finetune_lora.record_to_training_text(_valid_record())
+        self.assertIn("system:", text)
+        self.assertIn("user:", text)
+        self.assertIn("assistant:", text)
+        self.assertIn("Objectverse Diary", text)
+    def test_default_training_config_uses_safe_qwen_lora_defaults(self) -> None:
+        config = finetune_lora.TrainingConfig()
+        self.assertEqual(config.base_model, "Qwen/Qwen2.5-1.5B-Instruct")
+        self.assertEqual(config.lora_r, 16)
+        self.assertEqual(config.lora_alpha, 32)
+        self.assertEqual(config.lora_dropout, 0.05)
+        self.assertEqual(config.max_steps, 80)
+        self.assertIn("q_proj", config.target_modules)
+        self.assertIn("down_proj", config.target_modules)
+    def test_dry_run_does_not_call_remote_runner(self) -> None:
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            path = Path(tmp_dir) / "records.jsonl"
+            path.write_text(json.dumps(_valid_record()) + "\n", encoding="utf-8")
+            def fail_remote_call(
+                records: list[dict[str, object]],
+                config: finetune_lora.TrainingConfig,
+            ) -> dict[str, object]:
+                raise AssertionError("dry-run should not call remote training")
+            summary = finetune_lora.run_training_entrypoint(
+                dataset=path,
+                config=finetune_lora.TrainingConfig(),
+                dry_run=True,
+                allow_remote=False,
+                remote_runner=fail_remote_call,
+            )
+        self.assertEqual(summary["mode"], "dry-run")
+        self.assertEqual(summary["record_count"], 1)
+        self.assertEqual(summary["base_model"], "Qwen/Qwen2.5-1.5B-Instruct")
+if __name__ == "__main__":
+    unittest.main()

tests/test_publish_hf_adapter.py ADDED Viewed

	@@ -0,0 +1,40 @@

+"""Tests for Hugging Face adapter publishing helpers."""
+from __future__ import annotations
+import tempfile
+import unittest
+from pathlib import Path
+from scripts.publish_hf_adapter import upload_adapter, validate_adapter_dir
+class PublishHfAdapterTest(unittest.TestCase):
+    def test_validate_adapter_dir_requires_config_and_weights(self) -> None:
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            adapter_dir = Path(tmp_dir)
+            with self.assertRaises(ValueError):
+                validate_adapter_dir(adapter_dir)
+    def test_upload_adapter_dry_run_does_not_require_hub_client(self) -> None:
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            adapter_dir = Path(tmp_dir)
+            (adapter_dir / "adapter_config.json").write_text("{}", encoding="utf-8")
+            (adapter_dir / "adapter_model.safetensors").write_text("fake", encoding="utf-8")
+            summary = upload_adapter(
+                adapter_dir=adapter_dir,
+                repo_id="qqyule/objectverse-diary-lora-test",
+                private=False,
+                commit_message="Dry run",
+                dry_run=True,
+            )
+        self.assertFalse(summary["uploaded"])
+        self.assertEqual(summary["repo_id"], "qqyule/objectverse-diary-lora-test")
+        self.assertIn("adapter_config.json", summary["files"])
+if __name__ == "__main__":
+    unittest.main()