Spaces:
Running on Zero
Running on Zero
File size: 8,295 Bytes
1e2c036 6f8d8d9 1e2c036 6f8d8d9 d30bd8e 6f8d8d9 1e2c036 d30bd8e 1e2c036 bc02199 1e2c036 bc02199 1e2c036 6f8d8d9 1e2c036 6f8d8d9 1e2c036 bc02199 d30bd8e 1e2c036 bc02199 1e2c036 bc02199 1e2c036 bc02199 1e2c036 bc02199 4a4024d bc02199 4a4024d bc02199 4a4024d bc02199 1e2c036 bc02199 4a4024d bc02199 4a4024d d30bd8e 1e2c036 bc02199 1e2c036 bc02199 4a4024d bc02199 1e2c036 bc02199 1e2c036 bc02199 d30bd8e dd6cefc d30bd8e dd6cefc d30bd8e 1e2c036 bc02199 1e2c036 bc02199 1e2c036 bc02199 1e2c036 bc02199 dd6cefc bc02199 1e2c036 bc02199 dd6cefc d30bd8e 1e2c036 bc02199 1e2c036 bc02199 1e2c036 6f8d8d9 1e2c036 d30bd8e 1e2c036 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | # Building Objectverse Diary: A Small-Model AI Toy Where Everyday Objects Come Alive
## Status
Publication-ready draft. Fill the public GitHub, demo video, and social post URLs before posting; do not publish until those external actions are explicitly confirmed.
## 1. Why I Built It
Objectverse Diary began with a small, silly question: what if the objects around us were quietly keeping emotional records of our lives?
The product loop is intentionally simple. A user uploads an everyday object photo, chooses a personality mode, and the app turns the object into a hidden character. The object gets a structured file, a secret diary entry, a short chat voice, and a shareable card.
The joke only works if the app treats ordinary objects with strange seriousness. A coffee mug is not just a mug; it is a tired witness. A keyboard is not just a keyboard; it is a percussion instrument for anxious deadlines. The app is a tiny archive for that kind of imagined life.
## 2. Why This Fits The Track
Objectverse Diary was built for the Build Small Hackathon track "An Adventure in Thousand Token Wood." The core experience is AI-native:
- vision understanding turns a photo into structured object facts
- persona generation invents the object's hidden self
- diary generation writes in a consistent first-person voice
- chat lets the object keep that voice across replies
- trace logging makes each generation inspectable and reproducible
It is not a productivity wrapper. It is a compact AI toy with a specific emotional shape.
## 3. Why Small Models Are Enough
This project does not need a frontier model to be interesting. It needs:
- useful object recognition
- compact structured JSON output
- a distinctive writing style
- consistent persona fields
- reliable fallback behavior
- a UI that makes the output feel intentional
The architecture is designed around a <= 32B total parameter budget. MiniCPM-V 2.6 is wired as the optional vision path, and llama.cpp is wired as the optional local text runtime. The stable public baseline still defaults to deterministic mock generation so the demo stays reproducible without commercial model APIs.
## 4. Product Design
The interface is English-first and Chinese-second. The visual direction is a strange object archive: warm dark paper, amber highlights, museum-label copy, and typewriter-like diary output.
The product avoids a generic chatbot layout. The main flow is closer to opening an object file:
1. intake the object
2. generate an object record
3. reveal the persona
4. read the diary
5. chat with the object
6. export or inspect the trace
Six stable examples are included so the demo can run even when hosted model resources are unavailable.
## 5. Architecture
The app keeps the Gradio UI separate from model execution:
- `src/ui/layout.py` builds the Gradio Blocks interface
- `src/pipeline.py` coordinates generation
- `src/models/vision_runner.py` handles mock or MiniCPM-V object understanding
- `src/models/llama_cpp_runner.py` handles mock text or optional llama.cpp text generation
- `src/traces/logger.py` writes anonymized trace records
- `src/renderer/share_card.py` renders the shareable card preview
This boundary matters. It lets the mock MVP, hosted Space validation, diagnostics, and local GGUF experiments share the same data shapes and fallback markers.
## 6. Runtime And Fallbacks
The stable baseline uses:
```bash
OBJECTVERSE_VISION_BACKEND=mock
OBJECTVERSE_TEXT_BACKEND=mock
```
Optional MiniCPM-V vision can be enabled with:
```bash
OBJECTVERSE_VISION_BACKEND=minicpm-v
VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
OBJECTVERSE_TEXT_BACKEND=mock
```
Optional llama.cpp text generation can be enabled with:
```bash
OBJECTVERSE_TEXT_BACKEND=llama-cpp
TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf
```
The fallback behavior is explicit. If MiniCPM-V fails or returns invalid JSON, the trace records `vision-fallback-to-mock`. If llama.cpp is unavailable, missing a model path, or returns invalid JSON, the trace records `text-fallback-to-mock`.
The hosted Space also has a hidden `/vision_runtime_probe` endpoint for non-secret runtime diagnostics. It checks Torch and Transformers imports, GPU visibility, and whether MiniCPM-V can load, while redacting token markers and private paths.
## 7. What Worked
The stable loop works locally and in the mock-safe Space:
- upload or choose an example object
- generate object facts, persona, diary, chat state, share card, and trace JSON
- replay six committed sample traces
- export public mock traces to JSONL
- run local unittest and initial-stage checks
The Gradio UI also moves away from the default demo feel. It is still Gradio, but the experience reads like a small archive interface.
## 8. What Failed, Then Got Fixed
The important deployment failure was hosted MiniCPM-V validation.
Paid L4 hardware on the hackathon organization returned `402 Payment Required`. ZeroGPU CUDA probing later succeeded, and the full validation command reached the hosted Space on June 8, 2026. The first probe-aware run showed the real blocker: `openbmb/MiniCPM-V-2_6` is gated, and the Space runtime did not yet have access.
After adding an `HF_TOKEN` Space secret with the required model access, the same ZeroGPU validation passed for public mug, keyboard, and shoe images. The evidence is saved in:
- `docs/SPACE_VLM_REPORT.md`
- `docs/SPACE_VLM_REPORT.json`
- `data/traces/space-vlm/`
This is not hidden in the submission. The stable baseline keeps the public demo mock-safe by default, but hosted MiniCPM-V evidence now exists for the vision path.
The probe-aware validation path remains useful because it can report whether future failures happen at dependency import, GPU visibility, model loading, or generation time.
## 9. Traces And Reproducibility
The project includes public mock traces for the six stable examples under `data/traces/samples/`. They are deterministic and intended for demo replay, schema validation, and public inspection.
The Space VLM traces under `data/traces/space-vlm/` are different: they are hosted validation evidence for real MiniCPM-V object understanding plus mock text generation. They should be described honestly as VLM evidence, not full real text-runtime traces.
The export command is:
```bash
.venv/bin/python -B scripts/export_traces.py
```
For text runtime evidence, the project now includes a local smoke helper for an external GGUF:
```bash
.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
--model-path models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf
```
The published local-smoke file is `objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf` from `qqyule/objectverse-diary-qwen15b-lora`. It is intentionally not committed. Local smoke passed on June 8, 2026; Space text runtime still needs a separate validation before it should be described as live.
## 10. Privacy And Safety
The project does not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs. It does not commit GGUF files, private images, tokens, credit codes, or `.env` files.
Trace logging anonymizes text inputs before public export. The current public traces are synthetic mock examples rather than private user photos.
## 11. What I Would Improve Next
The next model-focused step is to validate the published GGUF in the hosted Space runtime, or keep it as local llama.cpp evidence while the public demo remains mock-safe.
After that:
- download or mount the published GGUF in the target runtime
- set `OBJECTVERSE_TEXT_BACKEND=llama-cpp` and `TEXT_MODEL_PATH` for that runtime
- generate real non-mock traces if hosted/local model validation passes
- record a final demo video from the stable Space
The current version is intentionally honest: it is a stable, reproducible small-model toy baseline with clear boundaries, visible failures, and a path to stronger model evidence.
## Evidence Links To Fill Before Final Submission
- Hugging Face Space: https://huggingface.co/spaces/build-small-hackathon/ObjectverseDiary
- Dataset: https://huggingface.co/datasets/qqyule/objectverse-diary-sft-curated
- LoRA adapter: https://huggingface.co/qqyule/objectverse-diary-qwen15b-lora
- GitHub repository: pending push confirmation
- Demo video: pending recording
- Social post: pending publishing
|