Spaces:
Running on Zero
Running on Zero
| # Full Task Checklist | |
| This is the shared task list for you and Codex. It covers the hackathon MVP, the main PRD, and | |
| the extension PRD. A task is complete only when the matching acceptance criteria are met and | |
| `docs/IMPLEMENTATION_STATUS.md` is updated. | |
| ## Legend | |
| - `[x]` done and documented | |
| - `[~]` partially implemented or placeholder exists | |
| - `[ ]` not started | |
| - `[blocked]` blocked by missing local setup, credentials, hardware, or external decision | |
| ## Phase 0 - Project Memory And Setup | |
| - [x] Add root `README.md`. | |
| - [x] Add root `AGENTS.md`. | |
| - [x] Add `.gitignore`. | |
| - [x] Add `requirements.txt`. | |
| - [x] Add `docs/` folder. | |
| - [x] Add docs index. | |
| - [x] Add full task checklist. | |
| - [x] Add implementation status doc. | |
| - [x] Add usage guide. | |
| - [x] Add architecture guide. | |
| - [x] Add extension guide. | |
| - [x] Add acceptance criteria. | |
| - [x] Add roadmap. | |
| - [x] Add critical judge-oriented improvement roadmap. | |
| - [x] Add template how-to for building new domain apps. | |
| - [x] Add Plant Discovery reference app checklist. | |
| - [x] Add Plant Discovery model and training how-to. | |
| - [x] Add PRD implementation matrix. | |
| - [x] Add test folder. | |
| - [x] Add user-story test folder. | |
| - [x] Add dev requirements. | |
| - [x] Add Python quality config. | |
| - [x] Add test runner script. | |
| - [x] Add quality runner script. | |
| - [x] Add CI workflow. | |
| - [x] Add coverage gate. | |
| - [x] Add performance test script. | |
| - [x] Install Python 3.11+. | |
| - [x] Verify `python --version`. | |
| - [x] Create `.venv`. | |
| - [x] Install dependencies. | |
| - [x] Run `python app.py`. | |
| - [x] Capture screenshot or note local URL. | |
| ## Phase 1 - Hackathon Definition | |
| - [x] Choose track: Backyard AI or Thousand Token Wood. | |
| - [x] Write one-sentence project story. | |
| - [x] Define target user. | |
| - [x] Define measurable user benefit. | |
| - [x] Decide final model family and model IDs. | |
| - [x] Confirm every model is <= 32B parameters. | |
| - [x] Decide local-first badge target. | |
| - [x] Decide llama.cpp badge target. | |
| - [x] Decide open trace badge target. | |
| - [x] Decide field notes/report badge target. | |
| - [x] Write final demo flow. | |
| - [x] Write demo video script. | |
| - [x] Write social post draft. | |
| - [ ] Add final submission checklist with exact URLs. | |
| ## Phase 2 - MVP Gradio App | |
| - [x] Add `app.py`. | |
| - [x] Add Gradio `Blocks` shell. | |
| - [x] Add model config loader. | |
| - [x] Add model metadata display. | |
| - [x] Add Chat tab. | |
| - [x] Add Vision tab. | |
| - [x] Add Dataset tab placeholder. | |
| - [x] Add Train tab placeholder. | |
| - [x] Add Export tab placeholder. | |
| - [x] Add Field Notes tab. | |
| - [x] Add placeholder text service. | |
| - [x] Add placeholder vision service. | |
| - [x] Add Traces tab placeholder. | |
| - [x] Add Agent tab placeholder. | |
| - [x] Add Status tab placeholder. | |
| - [x] Add PowerShell structure verification script. | |
| - [x] Run structure verification script. | |
| - [x] Run app locally. | |
| - [x] Fix local launch errors found so far. | |
| - [x] Add screenshot capture path to docs or README. | |
| - [x] Add first demo GIF/video plan. | |
| ## Phase 3 - Config-Driven Model Registry | |
| - [x] Add `config/models.yaml`. | |
| - [x] Add text model entry for MiniCPM5-1B. | |
| - [x] Add vision model entry for MiniCPM-V-4.6. | |
| - [x] Add omnimodal model entry for MiniCPM-o-4.5. | |
| - [x] Add typed `ModelInfo`. | |
| - [x] Add `load_model_catalog()`. | |
| - [x] Add `model_choices()`. | |
| - [x] Add `model_summary()`. | |
| - [x] Add MiniCPM5-1B-Thinking config. | |
| - [x] Add MiniCPM4.1-8B config. | |
| - [x] Add MiniCPM-V-4.6-Thinking config. | |
| - [x] Add GGUF metadata in config. | |
| - [x] Add backend capability metadata. | |
| - [x] Add lightweight catalog validation helper. | |
| - [x] Show warnings for models over 32B parameters. | |
| ## Phase 4 - Core Architecture | |
| - [x] Add `core/events.py`. | |
| - [x] Add `EventType`. | |
| - [x] Add `Event`. | |
| - [x] Add `EventBus`. | |
| - [x] Add `core/registry.py`. | |
| - [x] Add generic `Registry`. | |
| - [x] Add global app state. | |
| - [x] Register model services in a service registry. | |
| - [x] Emit inference events from UI. | |
| - [x] Emit field note events. | |
| - [x] Add lightweight logging. | |
| - [x] Add unit tests for config and registry. | |
| ## Phase 5 - Testing And Quality | |
| - [x] Add `tests/unit/`. | |
| - [x] Add `tests/user_stories/`. | |
| - [x] Add model catalog unit tests. | |
| - [x] Add field notes unit tests. | |
| - [x] Add new-user user-story test. | |
| - [x] Add `requirements-dev.txt`. | |
| - [x] Add `pyproject.toml`. | |
| - [x] Add `scripts/run_tests.ps1`. | |
| - [x] Add `scripts/run_quality.ps1`. | |
| - [x] Run unit and user-story tests. | |
| - [x] Install dev quality tools. | |
| - [x] Run `ruff`. | |
| - [x] Run `mypy`. | |
| - [x] Run `pylint`. | |
| - [x] Run `bandit`. | |
| - [x] Run `pip-audit`. | |
| - [x] Add rule: failing bug/check requires a new or updated test. | |
| - [x] Add coverage report. | |
| - [x] Add lightweight performance tests. | |
| - [x] Add CI pipeline. | |
| - [x] Add Playwright or equivalent browser e2e test after Gradio runs. | |
| - [ ] Add tests for each real backend as it is implemented. | |
| - [x] Add tests for backend service selection. | |
| - [x] Add tests for Ollama unavailable path. | |
| - [x] Add tests for llama.cpp unavailable path and command building. | |
| - [x] Add tests for llama-cpp-python unavailable path. | |
| - [x] Add tests for OpenAI-compatible/LM Studio unavailable and request paths. | |
| ## Phase 6 - Local Inference Backends | |
| - [x] Choose first real backend. | |
| - [x] Add backend selector in UI. | |
| - [x] Add model status panel. | |
| - [x] Add explicit model load button. | |
| - [x] Ensure no model weights download on startup. | |
| ### Ollama Backend | |
| - [x] Confirm Ollama is installed. | |
| - [x] Add `models/ollama_service.py`. | |
| - [x] Add local model list. | |
| - [x] Add pull model command with explicit user action. | |
| - [x] Add text chat through Ollama. | |
| - [x] Add vision chat through Ollama when supported. | |
| - [x] Document Ollama setup. | |
| ### llama.cpp Backend | |
| - [x] Confirm llama.cpp tools are installed. | |
| - [x] Add `models/llama_cpp_service.py`. | |
| - [x] Add `models/llama_cpp_python_service.py`. | |
| - [x] Add GGUF file picker. | |
| - [x] Add `llama-server` launch command builder. | |
| - [x] Add health check. | |
| - [x] Add text generation through server. | |
| - [x] Add vision `mmproj` support metadata. | |
| - [x] Document llama.cpp setup. | |
| ### llama-cpp-python Backend | |
| - [x] Add optional Python binding service. | |
| - [x] Add backend selector support. | |
| - [x] Install `llama-cpp-python` locally. | |
| - [x] Configure local GGUF path. | |
| - [x] Verify real text generation through Python binding. | |
| - [x] Decide whether to keep Python binding as fallback or primary local path. | |
| ### Transformers Backend | |
| - [x] Add `models/transformers_text.py`. | |
| - [x] Add `AutoModelForCausalLM` loading for text models. | |
| - [x] Add tokenizer loading. | |
| - [x] Add explicit trust-remote-code handling. | |
| - [x] Add device/dtype settings. | |
| - [x] Add streaming generation. | |
| - [x] Document hardware expectations. | |
| ### OpenAI-Compatible / LM Studio Backend | |
| - [x] Add `models/openai_compatible_service.py`. | |
| - [x] Add backend selector support. | |
| - [x] Add local base URL and served-model-name config. | |
| - [x] Add Status tab setup and reachability check. | |
| - [x] Add text chat through OpenAI-compatible `/v1/chat/completions`. | |
| - [x] Document LM Studio setup. | |
| - [x] Verify real text generation through LM Studio. | |
| ### MiniCPM Vision Backend | |
| - [x] Add `models/minicpm_vision.py`. | |
| - [x] Use `AutoModelForImageTextToText`. | |
| - [x] Use `AutoProcessor`. | |
| - [x] Add image prompt formatting. | |
| - [x] Add thinking-mode toggle mapping. | |
| - [x] Add video support plan. | |
| ### SGLang Backend | |
| - [x] Add `models/sglang_runner.py`. | |
| - [x] Add server start/stop. | |
| - [x] Add MiniCPM5 tool parser config. | |
| - [x] Add health check. | |
| - [x] Add chat endpoint client. | |
| - [x] Install `sglang` locally. | |
| ## Phase 7 - UI Tabs From Main PRD | |
| - [x] Chat tab placeholder. | |
| - [x] Vision tab placeholder. | |
| - [x] Dataset tab placeholder. | |
| - [x] Train tab placeholder. | |
| - [x] Export tab placeholder. | |
| - [x] Field Notes tab minimal save. | |
| - [x] Add Traces tab with local event preview. | |
| - [x] Add Agent tab placeholder. | |
| - [x] Add model/backend status tab or panel. | |
| - [x] Add settings panel. | |
| - [x] Add tab-level error messages. | |
| - [x] Add loading/progress states. | |
| - [x] Add compact responsive layout review. | |
| ## Phase 8 - Dataset Layer | |
| - [x] Add `datasets/` package. | |
| - [x] Add local CSV loader. | |
| - [x] Add local JSONL loader. | |
| - [x] Add Hugging Face dataset loader. | |
| - [x] Add dataset schema preview. | |
| - [x] Add split selector. | |
| - [x] Add row count and sample preview. | |
| - [x] Add dataset statistics tool. | |
| - [x] Emit `DATASET_LOADED` event. | |
| - [x] Document dataset formats. | |
| ## Phase 9 - Field Notes And Correction Loop | |
| - [x] Save field notes to CSV. | |
| - [x] Move field note logic out of UI into `datasets/field_notes.py`. | |
| - [x] Add `FieldNote` dataclass. | |
| - [x] Add SQLite-backed store. | |
| - [x] Add JSONL export. | |
| - [x] Add local HF Dataset export. | |
| - [x] Add corrected-only filter. | |
| - [x] Add tags filter. | |
| - [x] Add image path support. | |
| - [x] Add video path support. | |
| - [x] Add use-for-training flag. | |
| - [x] Add docs for correction loop. | |
| ## Phase 10 - Training Pipeline | |
| - [x] Add training config placeholder. | |
| - [x] Add training UI placeholder. | |
| - [x] Add `training/` package. | |
| - [x] Add LoRA text trainer. | |
| - [x] Add LoRA config parser. | |
| - [ ] Add PEFT/TRL dependencies when ready. | |
| - [x] Add training dry-run validation. | |
| - [x] Add local checkpoint output. | |
| - [x] Add Trackio integration. | |
| - [x] Add evaluation after training. | |
| - [x] Add LoRA vs base comparison. | |
| - [x] Add vision fine-tuning plan using SWIFT or LLaMA-Factory. | |
| - [x] Document training hardware requirements. | |
| ## Phase 11 - Evaluation | |
| - [x] Add `training/evaluation.py`. | |
| - [x] Add simple prompt test set. | |
| - [x] Add exact-match metric. | |
| - [x] Add qualitative eval table. | |
| - [x] Add perplexity metric where appropriate. | |
| - [x] Add base vs tuned comparison. | |
| - [x] Log eval results. | |
| - [x] Document evaluation method. | |
| ## Phase 12 - Export And Quantization | |
| - [x] Add export UI placeholder. | |
| - [x] Add `training/export.py`. | |
| - [x] Add official GGUF download path. | |
| - [x] Add local HF-to-GGUF conversion path. | |
| - [x] Add quantization selector. | |
| - [x] Add llama.cpp tool detection. | |
| - [x] Add exported file listing. | |
| - [x] Add download link in UI. | |
| - [x] Document GGUF export. | |
| ## Phase 13 - Trackio Tracing | |
| - [x] Add `tracking/` package. | |
| - [x] Add Trackio config. | |
| - [x] Add `trackio.init()`. | |
| - [x] Add `trackio.log()`. | |
| - [x] Add `trackio.finish()`. | |
| - [x] Log inference events locally. | |
| - [x] Log dataset events locally. | |
| - [x] Log training metrics. | |
| - [x] Add Traces tab. | |
| - [x] Add HF Space sync docs. | |
| ## Phase 13 - MCP Layer | |
| - [x] Decide MCP path: Gradio native, `gradio.Server` | |
| - [x] Add MCP tools module. | |
| - [x] Add dataset stats tool. | |
| - [x] Add HF search tool. | |
| - [x] Add safe calculator tool. | |
| - [x] Add model inference tool. | |
| - [x] Expose tools through selected MCP path. | |
| - [x] Document MCP endpoint. | |
| - [x] Verify endpoint locally. | |
| ## Phase 14 - Agent Mode | |
| - [x] Add `agent/` package. | |
| - [x] Add agent system prompt. | |
| - [x] Add research-plan-implement loop placeholder. | |
| - [x] Add tool registry integration. | |
| - [x] Add session trace logging. | |
| - [x] Add Agent tab. | |
| - [x] Add trace export to JSONL. | |
| - [x] Add local HF Dataset export for traces. | |
| - [x] Document limitations. | |
| ## Phase 15 - Hugging Face Space Deployment | |
| - [x] Install/verify `huggingface_hub`. | |
| - [x] Login with `hf auth login`. | |
| - [ ] Create Space. | |
| - [x] Add Space README metadata if needed. | |
| - [x] Add Space remote. | |
| - [x] Push to Space. | |
| - [ ] Verify Space builds. | |
| - [ ] Add Space URL to README. | |
| - [x] Document hardware choice. | |
| - [x] Document model download behavior. | |
| ## Phase 16 - GitHub | |
| - [x] Create GitHub repo. | |
| - [x] Add GitHub remote. | |
| - [x] Commit initial project. | |
| - [x] Push to GitHub. | |
| - [x] Add GitHub URL to README. | |
| - [ ] Add issue checklist or project board if desired. | |
| ## Phase 17 - Hackathon Submission Package | |
| - [x] Finalize app name. | |
| - [x] Finalize track. | |
| - [x] Verify Gradio app polish. | |
| - [x] Verify model-size compliance. | |
| - [ ] Verify Space URLs. | |
| - [x] Verify GitHub URL. | |
| - [ ] Record demo video. | |
| - [ ] Publish social post. | |
| - [ ] Add field notes/report link. | |
| - [ ] Submit before June 15, 2026. | |
| ## Extension PRD Backlog | |
| ### vLLM Serving Tab | |
| - [x] Add vLLM runner. | |
| - [x] Add vLLM start/stop UI. | |
| - [x] Add OpenAI-compatible client. | |
| - [x] Add metrics parsing. | |
| - [x] Add Trackio benchmark logging. | |
| ### Ollama Quick-Start | |
| - [x] Add Ollama pull/list UI. | |
| - [x] Add Ollama chat service. | |
| - [x] Add Ollama vision service. | |
| - [x] Add setup docs. | |
| ### Llama.cpp Champion Path | |
| - [x] Add llama.cpp backend selection. | |
| - [x] Add llama.cpp service. | |
| - [x] Add llama-cpp-python service. | |
| - [x] Add llama.cpp status check. | |
| - [x] Install llama.cpp locally. | |
| - [x] Download/pick GGUF model. | |
| - [x] Verify real text generation. | |
| - [ ] Verify MiniCPM-V mmproj flow. | |
| ### Reward Model Eval | |
| - [x] Add reward evaluator. | |
| - [x] Add best-of-N generation. | |
| - [x] Add DPO pair generation. | |
| - [x] Add LoRA vs base reward report. | |
| ### Synthetic Data Generation | |
| - [x] Add synthetic generator. | |
| - [x] Add JSON validation. | |
| - [x] Add quality filters. | |
| - [x] Add augmentation flow. | |
| - [x] Add dataset save/export. | |
| ### Paper-To-Code Agent | |
| - [x] Add paper input UI. | |
| - [x] Add research phase. | |
| - [x] Add plan phase. | |
| - [x] Add implementation trace. | |
| - [x] Add safety gates. | |
| ### HF Spaces Deploy Tool | |
| - [x] Add deployment helper script. | |
| - [x] Add Space creation docs. | |
| - [x] Add remote validation. | |
| - [x] Add build status checks. | |
| ### VINDEX Integration | |
| - [x] Define integration boundary. | |
| - [x] Add tool stub. | |
| - [x] Add verification report. | |
| - [x] Document dependency. | |
| ### OCR Pipeline Hook | |
| - [x] Add OCR loader. | |
| - [x] Add confidence threshold. | |
| - [x] Add uncertain prediction import. | |
| - [x] Add correction UI. | |
| - [x] Add corrected export. | |
| ### MiniCPM Desk-Pet | |
| - [ ] Add persona data schema. | |
| - [ ] Add persona training plan. | |
| - [ ] Add Desk-Pet export plan. | |
| - [ ] Add docs. | |
| ### MiniCPM-o Audio Tab | |
| - [ ] Add audio tab. | |
| - [ ] Add microphone input. | |
| - [ ] Add omnimodal service. | |
| - [ ] Add TTS plan. | |
| - [ ] Add streaming plan. | |
| ### Cross-Extension Wiring | |
| - [x] Document OCR -> Field Notes -> Training. | |
| - [x] Document Synthetic Gen -> Reward Eval -> DPO. | |
| - [x] Document Agent -> Desk-Pet Persona. | |
| - [x] Document HF Spaces -> Trackio. | |
| ## Phase 18 - Template And Reference Apps | |
| ### Template How-To | |
| - [x] Document branch strategy for new domain apps. | |
| - [x] Document required domain app file contract. | |
| - [x] Document schema, service, loader, UI, tools, tests, and docs pattern. | |
| - [x] Document no-model/demo-mode requirement. | |
| - [x] Document correction-loop-first workflow. | |
| - [x] Document optional training and real-model verification steps. | |
| - [x] Document security requirements for public Space mode. | |
| ### Plant Discovery Reference App | |
| - [x] Add `plant/` package. | |
| - [x] Add standalone Plant Discovery Gradio entrypoint. | |
| - [x] Add clean plant model/domain config. | |
| - [x] Add deterministic no-model plant service. | |
| - [x] Add optional MiniCPM-V plant service adapter. | |
| - [x] Make OpenBMB MiniCPM-V the default real model mode. | |
| - [x] Add explicit demo/openbmb/finetuned runtime modes. | |
| - [x] Add optional fine-tuned adapter loading path. | |
| - [x] Keep optional model dependencies lazy. | |
| - [x] Add plant structured result schema and parser. | |
| - [x] Add species index builder. | |
| - [x] Add local image-folder loader. | |
| - [x] Add field-note correction export to plant training JSONL. | |
| - [x] Add focused Identify, Field Guide, Corrections, and Stats UI. | |
| - [x] Replace direct training execution with non-executing training plan. | |
| - [x] Add optional plant tool functions with lazy MCP server construction. | |
| - [x] Add non-executing plant training planner. | |
| - [x] Add `scripts/plan_plant_training.py`. | |
| - [x] Add Plant Discovery unit tests. | |
| - [x] Verify no-model app shell builds. | |
| - [x] Run Plant Discovery as a long-running local app. | |
| - [x] Generate Plant Discovery screenshots. | |
| - [x] Add Plant Discovery screenshots to README/docs. | |
| - [x] Decide whether hackathon Space launches root workbench or Plant Discovery app. | |
| - [x] Verify real MiniCPM-V plant identification with optional dependencies. | |
| - [ ] Train or configure a real Plant Discovery adapter. | |
| - [ ] Verify `--model-mode finetuned` with the real adapter. | |
| - [ ] Add public-mode file/path/url hardening before Space deployment. | |
| ## Ongoing Maintenance | |
| - [x] Update docs after every implemented feature. | |
| - [x] Keep `IMPLEMENTATION_STATUS.md` current. | |
| - [x] Keep unchecked tasks visible. | |
| - [x] Keep secrets and model weights out of git. | |
| - [x] Re-run local app after code changes. | |