Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.17.3
Objectverse Diary — 开发计划(Day-by-Day)
周期:June 5 - June 15, 2026(共 11 天)
目标:完成 MVP、打磨 UI、冲全部徽章、提交视频与社交文案
Day 1:立项 + 项目骨架
目标:确定项目不可变范围。
- 配置 GitHub origin
- 确认并同步 GitHub repo
- 创建 Hugging Face Space
- 创建基础 Gradio app
- 写 README 草稿
- 确定英文主界面文案
- 建立
AGENTS.md - 建立
.codex/skills/
Day 2:MVP 交互闭环
目标:先不管模型,跑通产品流程。
- 图片上传
- 文本描述输入
- personality mode 选择
- mock object JSON
- mock diary 输出
- trace JSON 保存
- share card HTML 预览
- mock example gallery
- MVP smoke tests
- six public mock sample traces
- local initial-stage acceptance script
- local initial-stage completion report
交付:Upload → Generate → Diary → Share Card → Trace
Day 3:接入 VLM
目标:让 AI 真正看图。
- 接入 MiniCPM-V 或轻量 VLM
- 输出 object understanding JSON
- 做 JSON repair
- 加 example gallery
- 新增 Space VLM 验证脚本
- 新增 ZeroGPU 兼容装饰器
- ZeroGPU CUDA probe
- 缓存示例输出
- Space 真实图片验证(L4 因 HF
402 Payment Required阻塞;ZeroGPU CUDA probe 成功;2026-06-08 full validation reached the app but fell back to mock vision for mug/keyboard/shoe)
验收:上传杯子/键盘/鞋子,模型能识别物品并提取外观特征。
完成记录:MiniCPM-V 2.6 已作为可配置 vision backend 接入,默认仍是 mock vision;scripts/check_space_vlm.py 已可用三张临时公开图片验证 Space 端 mug/keyboard/shoe。2026-06-06 已尝试切到 L4,但 Hugging Face 返回 402 Payment Required;随后 ZeroGPU CUDA probe 成功。2026-06-08 full validation reached the app through the direct hf.space path, but all three objects included vision-fallback-to-mock。文本生成已接入可选 llama.cpp runtime wiring,但最终 GGUF 模型仍未选择/下载。
Day 4:文本模型 + llama.cpp
目标:让核心人格生成走小模型本地推理。
- 下载小模型 GGUF
- 接入可选 llama.cpp / llama-cpp-python runtime wiring
- 封装
generate_persona() - 封装
generate_diary() - README 说明运行方式
- 用真实 GGUF 做本地 smoke test
- README 说明最终模型参数量
交付:src/models/llama_cpp_runner.py 已支持 TEXT_MODEL_PATH;不提交 models/text_model.gguf。后续仍需确定真实 GGUF、参数量和训练/发布路径。
Day 5:训练数据 + 微调准备
目标:冲 Well-Tuned 勋章。
- 设计 SFT schema
- 生成 mock SFT preview 数据
- 生成 200-500 条 real/candidate object-persona 样本
- 手工精选 50 条高质量样本
- 上传 dataset 到 HF
- 准备 LoRA 训练脚本
数据格式示例:
{
"instruction": "Create a secret diary persona for this object.",
"input": {
"object": "old keyboard",
"features": ["dusty", "mechanical keys", "developer desk"],
"mode": "cynical"
},
"output": {
"character_name": "Clackwell",
"diary": "He calls it productivity. I call it percussion with anxiety.",
"tags": ["burnout instrument", "debug witness", "plastic philosopher"]
}
}
Day 6:LoRA 微调 + Hub 发布
目标:拿到可展示的自微调模型。
- 用 Modal credits 进行训练
- 导出 LoRA adapter
- 发布 HF model repo
- app 中加入模型说明
- README 加
Well-Tunedsection
交付:HF model repo、HF dataset repo、train log、model card
⚠️ Modal credits 兑换码不应公开分享,项目文档里只写"used Modal credits"。
Day 7:UI 魔改
目标:冲 Off-Brand 勋章。
视觉方向:
A strange archive room for everyday objects.
Dark paper texture, amber highlights, typewriter output, museum labels.
界面布局:
Left: Object Intake
Middle: Object File
Right: Secret Diary
Bottom: Share Card + Trace
- 自定义 CSS
- 自定义 hero section
- 隐藏 Gradio 默认风格
- 加 typewriter / archive reveal 视觉感
- 做英文主文案 + 中文辅助
- 做 6 个示例卡片
完成记录:Phase 2 UI 已完成为 archive dashboard。MiniCPM-V 2.6 vision backend 和可选 llama.cpp runtime wiring 已接入但默认仍 mock;LoRA 未接入;UI 参考/ 仅作为本地视觉参考,不入库。
Day 8:Trace + Sharing is Caring
目标:公开可复现材料。
- trace logger
- sample traces
- prompt templates
- dataset preview
- trace JSONL export
- 失败案例记录
- Space VLM validation report 模板
- 真实模型 traces
- GitHub repo 同步整理
Day 9:Field Notes
目标:完成技术博客。
英文标题:Building Objectverse Diary: A Small-Model AI Toy Where Everyday Objects Come Alive
博客结构:
- Why I built it
- Why Track 2
- Why small models are enough
- Product design
- Model architecture
- Gradio Off-Brand UI
- llama.cpp runtime
- Fine-tuning dataset
- Traces and reproducibility
- What failed
- What I would improve next
Day 10:Demo 视频
目标:视频必须比代码更能打。
建议长度:90 秒
0- 8s What if every object around you had a secret life?
8-20s This is Objectverse Diary, a small-model AI toy built with Gradio.
20-35s Upload a photo of any everyday object.
35-50s A vision model reads the object, then a small fine-tuned model creates its hidden personality.
50-70s Now this coffee mug writes its secret diary and complains about its owner.
70-82s You can chat with the object and generate a shareable personality card.
82-90s Built with small models, Gradio, llama.cpp, public traces, and no commercial cloud APIs.
Day 11:提交检查
- Space under official org
- Space MiniCPM-V validation passes for mug, keyboard, and shoe(当前 wired but hosted validation falls back to mock)
- Demo video ready
- Social post ready
- README complete
- Model parameter count documented
- No commercial API
- Fine-tuned model linked
- Dataset linked
- Traces linked
- Field Notes linked
- UI English-first, Chinese-second
- Submit before June 15, 2026