Spaces:
Running on Zero
Running on Zero
| # Objectverse Diary — 开发计划(Day-by-Day) | |
| ```text | |
| 周期:June 5 - June 15, 2026(共 11 天) | |
| 目标:完成 MVP、打磨 UI、冲全部徽章、提交视频与社交文案 | |
| ``` | |
| --- | |
| ## Day 1:立项 + 项目骨架 | |
| **目标:确定项目不可变范围。** | |
| - [x] 配置 GitHub origin | |
| - [ ] 确认并同步 GitHub repo | |
| - [x] 创建 Hugging Face Space | |
| - [x] 创建基础 Gradio app | |
| - [x] 写 README 草稿 | |
| - [x] 确定英文主界面文案 | |
| - [x] 建立 `AGENTS.md` | |
| - [x] 建立 `.codex/skills/` | |
| --- | |
| ## Day 2:MVP 交互闭环 | |
| **目标:先不管模型,跑通产品流程。** | |
| - [x] 图片上传 | |
| - [x] 文本描述输入 | |
| - [x] personality mode 选择 | |
| - [x] mock object JSON | |
| - [x] mock diary 输出 | |
| - [x] trace JSON 保存 | |
| - [x] share card HTML 预览 | |
| - [x] mock example gallery | |
| - [x] MVP smoke tests | |
| - [x] six public mock sample traces | |
| - [x] local initial-stage acceptance script | |
| - [x] local initial-stage completion report | |
| 交付:`Upload → Generate → Diary → Share Card → Trace` | |
| --- | |
| ## Day 3:接入 VLM | |
| **目标:让 AI 真正看图。** | |
| - [x] 接入 MiniCPM-V 或轻量 VLM | |
| - [x] 输出 object understanding JSON | |
| - [x] 做 JSON repair | |
| - [x] 加 example gallery | |
| - [x] 新增 Space VLM 验证脚本 | |
| - [x] 新增 ZeroGPU 兼容装饰器 | |
| - [x] ZeroGPU CUDA probe | |
| - [ ] 缓存示例输出 | |
| - [ ] Space 真实图片验证(L4 因 HF `402 Payment Required` 阻塞;ZeroGPU CUDA probe 成功;2026-06-08 full validation reached the app but fell back to mock vision for mug/keyboard/shoe) | |
| 验收:上传杯子/键盘/鞋子,模型能识别物品并提取外观特征。 | |
| 完成记录:MiniCPM-V 2.6 已作为可配置 vision backend 接入,默认仍是 mock vision;`scripts/check_space_vlm.py` 已可用三张临时公开图片验证 Space 端 mug/keyboard/shoe。2026-06-06 已尝试切到 L4,但 Hugging Face 返回 `402 Payment Required`;随后 ZeroGPU CUDA probe 成功。2026-06-08 full validation reached the app through the direct `hf.space` path, but all three objects included `vision-fallback-to-mock`。文本生成已接入可选 llama.cpp runtime wiring,但最终 GGUF 模型仍未选择/下载。 | |
| --- | |
| ## Day 4:文本模型 + llama.cpp | |
| **目标:让核心人格生成走小模型本地推理。** | |
| - [ ] 下载小模型 GGUF | |
| - [x] 接入可选 llama.cpp / llama-cpp-python runtime wiring | |
| - [x] 封装 `generate_persona()` | |
| - [x] 封装 `generate_diary()` | |
| - [x] README 说明运行方式 | |
| - [ ] 用真实 GGUF 做本地 smoke test | |
| - [ ] README 说明最终模型参数量 | |
| 交付:`src/models/llama_cpp_runner.py` 已支持 `TEXT_MODEL_PATH`;不提交 `models/text_model.gguf`。后续仍需确定真实 GGUF、参数量和训练/发布路径。 | |
| --- | |
| ## Day 5:训练数据 + 微调准备 | |
| **目标:冲 Well-Tuned 勋章。** | |
| - [x] 设计 SFT schema | |
| - [x] 生成 mock SFT preview 数据 | |
| - [ ] 生成 200-500 条 real/candidate object-persona 样本 | |
| - [ ] 手工精选 50 条高质量样本 | |
| - [ ] 上传 dataset 到 HF | |
| - [ ] 准备 LoRA 训练脚本 | |
| 数据格式示例: | |
| ```json | |
| { | |
| "instruction": "Create a secret diary persona for this object.", | |
| "input": { | |
| "object": "old keyboard", | |
| "features": ["dusty", "mechanical keys", "developer desk"], | |
| "mode": "cynical" | |
| }, | |
| "output": { | |
| "character_name": "Clackwell", | |
| "diary": "He calls it productivity. I call it percussion with anxiety.", | |
| "tags": ["burnout instrument", "debug witness", "plastic philosopher"] | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Day 6:LoRA 微调 + Hub 发布 | |
| **目标:拿到可展示的自微调模型。** | |
| - [ ] 用 Modal credits 进行训练 | |
| - [ ] 导出 LoRA adapter | |
| - [ ] 发布 HF model repo | |
| - [ ] app 中加入模型说明 | |
| - [ ] README 加 `Well-Tuned` section | |
| 交付:HF model repo、HF dataset repo、train log、model card | |
| > ⚠️ Modal credits 兑换码不应公开分享,项目文档里只写"used Modal credits"。 | |
| --- | |
| ## Day 7:UI 魔改 | |
| **目标:冲 Off-Brand 勋章。** | |
| 视觉方向: | |
| ```text | |
| A strange archive room for everyday objects. | |
| Dark paper texture, amber highlights, typewriter output, museum labels. | |
| ``` | |
| 界面布局: | |
| ```text | |
| Left: Object Intake | |
| Middle: Object File | |
| Right: Secret Diary | |
| Bottom: Share Card + Trace | |
| ``` | |
| - [x] 自定义 CSS | |
| - [x] 自定义 hero section | |
| - [x] 隐藏 Gradio 默认风格 | |
| - [x] 加 typewriter / archive reveal 视觉感 | |
| - [x] 做英文主文案 + 中文辅助 | |
| - [x] 做 6 个示例卡片 | |
| 完成记录:Phase 2 UI 已完成为 archive dashboard。MiniCPM-V 2.6 vision backend 和可选 llama.cpp runtime wiring 已接入但默认仍 mock;LoRA 未接入;`UI 参考/` 仅作为本地视觉参考,不入库。 | |
| --- | |
| ## Day 8:Trace + Sharing is Caring | |
| **目标:公开可复现材料。** | |
| - [x] trace logger | |
| - [x] sample traces | |
| - [x] prompt templates | |
| - [x] dataset preview | |
| - [x] trace JSONL export | |
| - [x] 失败案例记录 | |
| - [x] Space VLM validation report 模板 | |
| - [ ] 真实模型 traces | |
| - [ ] GitHub repo 同步整理 | |
| --- | |
| ## Day 9:Field Notes | |
| **目标:完成技术博客。** | |
| 英文标题:`Building Objectverse Diary: A Small-Model AI Toy Where Everyday Objects Come Alive` | |
| 博客结构: | |
| 1. Why I built it | |
| 2. Why Track 2 | |
| 3. Why small models are enough | |
| 4. Product design | |
| 5. Model architecture | |
| 6. Gradio Off-Brand UI | |
| 7. llama.cpp runtime | |
| 8. Fine-tuning dataset | |
| 9. Traces and reproducibility | |
| 10. What failed | |
| 11. What I would improve next | |
| --- | |
| ## Day 10:Demo 视频 | |
| **目标:视频必须比代码更能打。** | |
| 建议长度:90 秒 | |
| ```text | |
| 0- 8s What if every object around you had a secret life? | |
| 8-20s This is Objectverse Diary, a small-model AI toy built with Gradio. | |
| 20-35s Upload a photo of any everyday object. | |
| 35-50s A vision model reads the object, then a small fine-tuned model creates its hidden personality. | |
| 50-70s Now this coffee mug writes its secret diary and complains about its owner. | |
| 70-82s You can chat with the object and generate a shareable personality card. | |
| 82-90s Built with small models, Gradio, llama.cpp, public traces, and no commercial cloud APIs. | |
| ``` | |
| --- | |
| ## Day 11:提交检查 | |
| - [ ] Space under official org | |
| - [ ] Space MiniCPM-V validation passes for mug, keyboard, and shoe(当前 wired but hosted validation falls back to mock) | |
| - [ ] Demo video ready | |
| - [ ] Social post ready | |
| - [ ] README complete | |
| - [ ] Model parameter count documented | |
| - [ ] No commercial API | |
| - [ ] Fine-tuned model linked | |
| - [ ] Dataset linked | |
| - [ ] Traces linked | |
| - [ ] Field Notes linked | |
| - [ ] UI English-first, Chinese-second | |
| - [ ] Submit before June 15, 2026 | |