ObjectverseDiary / docs /FIELD_NOTES.md
qqyule's picture
Deploy latest Objectverse Diary from fa09aac
dd6cefc verified

A newer version of the Gradio SDK is available: 6.18.0

Upgrade

Building Objectverse Diary: A Small-Model AI Toy Where Everyday Objects Come Alive

Status

Publication-ready draft. Fill the public GitHub, demo video, and social post URLs before posting; do not publish until those external actions are explicitly confirmed.

1. Why I Built It

Objectverse Diary began with a small, silly question: what if the objects around us were quietly keeping emotional records of our lives?

The product loop is intentionally simple. A user uploads an everyday object photo, chooses a personality mode, and the app turns the object into a hidden character. The object gets a structured file, a secret diary entry, a short chat voice, and a shareable card.

The joke only works if the app treats ordinary objects with strange seriousness. A coffee mug is not just a mug; it is a tired witness. A keyboard is not just a keyboard; it is a percussion instrument for anxious deadlines. The app is a tiny archive for that kind of imagined life.

2. Why This Fits The Track

Objectverse Diary was built for the Build Small Hackathon track "An Adventure in Thousand Token Wood." The core experience is AI-native:

  • vision understanding turns a photo into structured object facts
  • persona generation invents the object's hidden self
  • diary generation writes in a consistent first-person voice
  • chat lets the object keep that voice across replies
  • trace logging makes each generation inspectable and reproducible

It is not a productivity wrapper. It is a compact AI toy with a specific emotional shape.

3. Why Small Models Are Enough

This project does not need a frontier model to be interesting. It needs:

  • useful object recognition
  • compact structured JSON output
  • a distinctive writing style
  • consistent persona fields
  • reliable fallback behavior
  • a UI that makes the output feel intentional

The architecture is designed around a <= 32B total parameter budget. MiniCPM-V 2.6 is wired as the optional vision path, and llama.cpp is wired as the optional local text runtime. The stable public baseline still defaults to deterministic mock generation so the demo stays reproducible without commercial model APIs.

4. Product Design

The interface is English-first and Chinese-second. The visual direction is a strange object archive: warm dark paper, amber highlights, museum-label copy, and typewriter-like diary output.

The product avoids a generic chatbot layout. The main flow is closer to opening an object file:

  1. intake the object
  2. generate an object record
  3. reveal the persona
  4. read the diary
  5. chat with the object
  6. export or inspect the trace

Six stable examples are included so the demo can run even when hosted model resources are unavailable.

5. Architecture

The app keeps the Gradio UI separate from model execution:

  • src/ui/layout.py builds the Gradio Blocks interface
  • src/pipeline.py coordinates generation
  • src/models/vision_runner.py handles mock or MiniCPM-V object understanding
  • src/models/llama_cpp_runner.py handles mock text or optional llama.cpp text generation
  • src/traces/logger.py writes anonymized trace records
  • src/renderer/share_card.py renders the shareable card preview

This boundary matters. It lets the mock MVP, hosted Space validation, diagnostics, and local GGUF experiments share the same data shapes and fallback markers.

6. Runtime And Fallbacks

The stable baseline uses:

OBJECTVERSE_VISION_BACKEND=mock
OBJECTVERSE_TEXT_BACKEND=mock

Optional MiniCPM-V vision can be enabled with:

OBJECTVERSE_VISION_BACKEND=minicpm-v
VISION_MODEL_ID=openbmb/MiniCPM-V-2_6
OBJECTVERSE_TEXT_BACKEND=mock

Optional llama.cpp text generation can be enabled with:

OBJECTVERSE_TEXT_BACKEND=llama-cpp
TEXT_MODEL_PATH=/absolute/path/to/text-model.gguf

The fallback behavior is explicit. If MiniCPM-V fails or returns invalid JSON, the trace records vision-fallback-to-mock. If llama.cpp is unavailable, missing a model path, or returns invalid JSON, the trace records text-fallback-to-mock.

The hosted Space also has a hidden /vision_runtime_probe endpoint for non-secret runtime diagnostics. It checks Torch and Transformers imports, GPU visibility, and whether MiniCPM-V can load, while redacting token markers and private paths.

7. What Worked

The stable loop works locally and in the mock-safe Space:

  • upload or choose an example object
  • generate object facts, persona, diary, chat state, share card, and trace JSON
  • replay six committed sample traces
  • export public mock traces to JSONL
  • run local unittest and initial-stage checks

The Gradio UI also moves away from the default demo feel. It is still Gradio, but the experience reads like a small archive interface.

8. What Failed, Then Got Fixed

The important deployment failure was hosted MiniCPM-V validation.

Paid L4 hardware on the hackathon organization returned 402 Payment Required. ZeroGPU CUDA probing later succeeded, and the full validation command reached the hosted Space on June 8, 2026. The first probe-aware run showed the real blocker: openbmb/MiniCPM-V-2_6 is gated, and the Space runtime did not yet have access.

After adding an HF_TOKEN Space secret with the required model access, the same ZeroGPU validation passed for public mug, keyboard, and shoe images. The evidence is saved in:

  • docs/SPACE_VLM_REPORT.md
  • docs/SPACE_VLM_REPORT.json
  • data/traces/space-vlm/

This is not hidden in the submission. The stable baseline keeps the public demo mock-safe by default, but hosted MiniCPM-V evidence now exists for the vision path.

The probe-aware validation path remains useful because it can report whether future failures happen at dependency import, GPU visibility, model loading, or generation time.

9. Traces And Reproducibility

The project includes public mock traces for the six stable examples under data/traces/samples/. They are deterministic and intended for demo replay, schema validation, and public inspection.

The Space VLM traces under data/traces/space-vlm/ are different: they are hosted validation evidence for real MiniCPM-V object understanding plus mock text generation. They should be described honestly as VLM evidence, not full real text-runtime traces.

The export command is:

.venv/bin/python -B scripts/export_traces.py

For text runtime evidence, the project now includes a local smoke helper for an external GGUF:

.venv/bin/python -B scripts/check_llama_cpp_smoke.py \
  --model-path models/objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf

The published local-smoke file is objectverse-diary-qwen15b-lora-v2-q4_k_m.gguf from qqyule/objectverse-diary-qwen15b-lora. It is intentionally not committed. Local smoke passed on June 8, 2026; Space text runtime still needs a separate validation before it should be described as live.

10. Privacy And Safety

The project does not use OpenAI, Anthropic, Gemini, Cohere, or other commercial model APIs. It does not commit GGUF files, private images, tokens, credit codes, or .env files.

Trace logging anonymizes text inputs before public export. The current public traces are synthetic mock examples rather than private user photos.

11. What I Would Improve Next

The next model-focused step is to validate the published GGUF in the hosted Space runtime, or keep it as local llama.cpp evidence while the public demo remains mock-safe.

After that:

  • download or mount the published GGUF in the target runtime
  • set OBJECTVERSE_TEXT_BACKEND=llama-cpp and TEXT_MODEL_PATH for that runtime
  • generate real non-mock traces if hosted/local model validation passes
  • record a final demo video from the stable Space

The current version is intentionally honest: it is a stable, reproducible small-model toy baseline with clear boundaries, visible failures, and a path to stronger model evidence.

Evidence Links To Fill Before Final Submission