Spaces:

build-small-hackathon
/

workbench

Running on Zero

File size: 16,123 Bytes

7f9dfed

# Full Task Checklist

This is the shared task list for you and Codex. It covers the hackathon MVP, the main PRD, and
the extension PRD. A task is complete only when the matching acceptance criteria are met and
`docs/IMPLEMENTATION_STATUS.md` is updated.

## Legend

- `[x]` done and documented
- `[~]` partially implemented or placeholder exists
- `[ ]` not started
- `[blocked]` blocked by missing local setup, credentials, hardware, or external decision

## Phase 0 - Project Memory And Setup

- [x] Add root `README.md`.
- [x] Add root `AGENTS.md`.
- [x] Add `.gitignore`.
- [x] Add `requirements.txt`.
- [x] Add `docs/` folder.
- [x] Add docs index.
- [x] Add full task checklist.
- [x] Add implementation status doc.
- [x] Add usage guide.
- [x] Add architecture guide.
- [x] Add extension guide.
- [x] Add acceptance criteria.
- [x] Add roadmap.
- [x] Add critical judge-oriented improvement roadmap.
- [x] Add template how-to for building new domain apps.
- [x] Add Plant Discovery reference app checklist.
- [x] Add Plant Discovery model and training how-to.
- [x] Add PRD implementation matrix.
- [x] Add test folder.
- [x] Add user-story test folder.
- [x] Add dev requirements.
- [x] Add Python quality config.
- [x] Add test runner script.
- [x] Add quality runner script.
- [x] Add CI workflow.
- [x] Add coverage gate.
- [x] Add performance test script.
- [x] Install Python 3.11+.
- [x] Verify `python --version`.
- [x] Create `.venv`.
- [x] Install dependencies.
- [x] Run `python app.py`.
- [x] Capture screenshot or note local URL.

## Phase 1 - Hackathon Definition

- [x] Choose track: Backyard AI or Thousand Token Wood.
- [x] Write one-sentence project story.
- [x] Define target user.
- [x] Define measurable user benefit.
- [x] Decide final model family and model IDs.
- [x] Confirm every model is <= 32B parameters.
- [x] Decide local-first badge target.
- [x] Decide llama.cpp badge target.
- [x] Decide open trace badge target.
- [x] Decide field notes/report badge target.
- [x] Write final demo flow.
- [x] Write demo video script.
- [x] Write social post draft.
- [ ] Add final submission checklist with exact URLs.

## Phase 2 - MVP Gradio App

- [x] Add `app.py`.
- [x] Add Gradio `Blocks` shell.
- [x] Add model config loader.
- [x] Add model metadata display.
- [x] Add Chat tab.
- [x] Add Vision tab.
- [x] Add Dataset tab placeholder.
- [x] Add Train tab placeholder.
- [x] Add Export tab placeholder.
- [x] Add Field Notes tab.
- [x] Add placeholder text service.
- [x] Add placeholder vision service.
- [x] Add Traces tab placeholder.
- [x] Add Agent tab placeholder.
- [x] Add Status tab placeholder.
- [x] Add PowerShell structure verification script.
- [x] Run structure verification script.
- [x] Run app locally.
- [x] Fix local launch errors found so far.
- [x] Add screenshot capture path to docs or README.
- [x] Add first demo GIF/video plan.

## Phase 3 - Config-Driven Model Registry

- [x] Add `config/models.yaml`.
- [x] Add text model entry for MiniCPM5-1B.
- [x] Add vision model entry for MiniCPM-V-4.6.
- [x] Add omnimodal model entry for MiniCPM-o-4.5.
- [x] Add typed `ModelInfo`.
- [x] Add `load_model_catalog()`.
- [x] Add `model_choices()`.
- [x] Add `model_summary()`.
- [x] Add MiniCPM5-1B-Thinking config.
- [x] Add MiniCPM4.1-8B config.
- [x] Add MiniCPM-V-4.6-Thinking config.
- [x] Add GGUF metadata in config.
- [x] Add backend capability metadata.
- [x] Add lightweight catalog validation helper.
- [x] Show warnings for models over 32B parameters.

## Phase 4 - Core Architecture

- [x] Add `core/events.py`.
- [x] Add `EventType`.
- [x] Add `Event`.
- [x] Add `EventBus`.
- [x] Add `core/registry.py`.
- [x] Add generic `Registry`.
- [x] Add global app state.
- [x] Register model services in a service registry.
- [x] Emit inference events from UI.
- [x] Emit field note events.
- [x] Add lightweight logging.
- [x] Add unit tests for config and registry.

## Phase 5 - Testing And Quality

- [x] Add `tests/unit/`.
- [x] Add `tests/user_stories/`.
- [x] Add model catalog unit tests.
- [x] Add field notes unit tests.
- [x] Add new-user user-story test.
- [x] Add `requirements-dev.txt`.
- [x] Add `pyproject.toml`.
- [x] Add `scripts/run_tests.ps1`.
- [x] Add `scripts/run_quality.ps1`.
- [x] Run unit and user-story tests.
- [x] Install dev quality tools.
- [x] Run `ruff`.
- [x] Run `mypy`.
- [x] Run `pylint`.
- [x] Run `bandit`.
- [x] Run `pip-audit`.
- [x] Add rule: failing bug/check requires a new or updated test.
- [x] Add coverage report.
- [x] Add lightweight performance tests.
- [x] Add CI pipeline.
- [x] Add Playwright or equivalent browser e2e test after Gradio runs.
- [ ] Add tests for each real backend as it is implemented.
- [x] Add tests for backend service selection.
- [x] Add tests for Ollama unavailable path.
- [x] Add tests for llama.cpp unavailable path and command building.
- [x] Add tests for llama-cpp-python unavailable path.
- [x] Add tests for OpenAI-compatible/LM Studio unavailable and request paths.

## Phase 6 - Local Inference Backends

- [x] Choose first real backend.
- [x] Add backend selector in UI.
- [x] Add model status panel.
- [x] Add explicit model load button.
- [x] Ensure no model weights download on startup.

### Ollama Backend

- [x] Confirm Ollama is installed.
- [x] Add `models/ollama_service.py`.
- [x] Add local model list.
- [x] Add pull model command with explicit user action.
- [x] Add text chat through Ollama.
- [x] Add vision chat through Ollama when supported.
- [x] Document Ollama setup.

### llama.cpp Backend

- [x] Confirm llama.cpp tools are installed.
- [x] Add `models/llama_cpp_service.py`.
- [x] Add `models/llama_cpp_python_service.py`.
- [x] Add GGUF file picker.
- [x] Add `llama-server` launch command builder.
- [x] Add health check.
- [x] Add text generation through server.
- [x] Add vision `mmproj` support metadata.
- [x] Document llama.cpp setup.

### llama-cpp-python Backend

- [x] Add optional Python binding service.
- [x] Add backend selector support.
- [x] Install `llama-cpp-python` locally.
- [x] Configure local GGUF path.
- [x] Verify real text generation through Python binding.
- [x] Decide whether to keep Python binding as fallback or primary local path.

### Transformers Backend

- [x] Add `models/transformers_text.py`.
- [x] Add `AutoModelForCausalLM` loading for text models.
- [x] Add tokenizer loading.
- [x] Add explicit trust-remote-code handling.
- [x] Add device/dtype settings.
- [x] Add streaming generation.
- [x] Document hardware expectations.

### OpenAI-Compatible / LM Studio Backend

- [x] Add `models/openai_compatible_service.py`.
- [x] Add backend selector support.
- [x] Add local base URL and served-model-name config.
- [x] Add Status tab setup and reachability check.
- [x] Add text chat through OpenAI-compatible `/v1/chat/completions`.
- [x] Document LM Studio setup.
- [x] Verify real text generation through LM Studio.

### MiniCPM Vision Backend

- [x] Add `models/minicpm_vision.py`.
- [x] Use `AutoModelForImageTextToText`.
- [x] Use `AutoProcessor`.
- [x] Add image prompt formatting.
- [x] Add thinking-mode toggle mapping.
- [x] Add video support plan.

### SGLang Backend

- [x] Add `models/sglang_runner.py`.
- [x] Add server start/stop.
- [x] Add MiniCPM5 tool parser config.
- [x] Add health check.
- [x] Add chat endpoint client.
- [x] Install `sglang` locally.

## Phase 7 - UI Tabs From Main PRD

- [x] Chat tab placeholder.
- [x] Vision tab placeholder.
- [x] Dataset tab placeholder.
- [x] Train tab placeholder.
- [x] Export tab placeholder.
- [x] Field Notes tab minimal save.
- [x] Add Traces tab with local event preview.
- [x] Add Agent tab placeholder.
- [x] Add model/backend status tab or panel.
- [x] Add settings panel.
- [x] Add tab-level error messages.
- [x] Add loading/progress states.
- [x] Add compact responsive layout review.

## Phase 8 - Dataset Layer

- [x] Add `datasets/` package.
- [x] Add local CSV loader.
- [x] Add local JSONL loader.
- [x] Add Hugging Face dataset loader.
- [x] Add dataset schema preview.
- [x] Add split selector.
- [x] Add row count and sample preview.
- [x] Add dataset statistics tool.
- [x] Emit `DATASET_LOADED` event.
- [x] Document dataset formats.

## Phase 9 - Field Notes And Correction Loop

- [x] Save field notes to CSV.
- [x] Move field note logic out of UI into `datasets/field_notes.py`.
- [x] Add `FieldNote` dataclass.
- [x] Add SQLite-backed store.
- [x] Add JSONL export.
- [x] Add local HF Dataset export.
- [x] Add corrected-only filter.
- [x] Add tags filter.
- [x] Add image path support.
- [x] Add video path support.
- [x] Add use-for-training flag.
- [x] Add docs for correction loop.

## Phase 10 - Training Pipeline

- [x] Add training config placeholder.
- [x] Add training UI placeholder.
- [x] Add `training/` package.
- [x] Add LoRA text trainer.
- [x] Add LoRA config parser.
- [ ] Add PEFT/TRL dependencies when ready.
- [x] Add training dry-run validation.
- [x] Add local checkpoint output.
- [x] Add Trackio integration.
- [x] Add evaluation after training.
- [x] Add LoRA vs base comparison.
- [x] Add vision fine-tuning plan using SWIFT or LLaMA-Factory.
- [x] Document training hardware requirements.

## Phase 11 - Evaluation

- [x] Add `training/evaluation.py`.
- [x] Add simple prompt test set.
- [x] Add exact-match metric.
- [x] Add qualitative eval table.
- [x] Add perplexity metric where appropriate.
- [x] Add base vs tuned comparison.
- [x] Log eval results.
- [x] Document evaluation method.

## Phase 12 - Export And Quantization

- [x] Add export UI placeholder.
- [x] Add `training/export.py`.
- [x] Add official GGUF download path.
- [x] Add local HF-to-GGUF conversion path.
- [x] Add quantization selector.
- [x] Add llama.cpp tool detection.
- [x] Add exported file listing.
- [x] Add download link in UI.
- [x] Document GGUF export.

## Phase 13 - Trackio Tracing

- [x] Add `tracking/` package.
- [x] Add Trackio config.
- [x] Add `trackio.init()`.
- [x] Add `trackio.log()`.
- [x] Add `trackio.finish()`.
- [x] Log inference events locally.
- [x] Log dataset events locally.
- [x] Log training metrics.
- [x] Add Traces tab.
- [x] Add HF Space sync docs.

## Phase 13 - MCP Layer

- [x] Decide MCP path: Gradio native, `gradio.Server` 
- [x] Add MCP tools module.
- [x] Add dataset stats tool.
- [x] Add HF search tool.
- [x] Add safe calculator tool.
- [x] Add model inference tool.
- [x] Expose tools through selected MCP path.
- [x] Document MCP endpoint.
- [x] Verify endpoint locally.

## Phase 14 - Agent Mode

- [x] Add `agent/` package.
- [x] Add agent system prompt.
- [x] Add research-plan-implement loop placeholder.
- [x] Add tool registry integration.
- [x] Add session trace logging.
- [x] Add Agent tab.
- [x] Add trace export to JSONL.
- [x] Add local HF Dataset export for traces.
- [x] Document limitations.

## Phase 15 - Hugging Face Space Deployment

- [x] Install/verify `huggingface_hub`.
- [x] Login with `hf auth login`.
- [ ] Create Space.
- [x] Add Space README metadata if needed.
- [x] Add Space remote.
- [x] Push to Space.
- [ ] Verify Space builds.
- [ ] Add Space URL to README.
- [x] Document hardware choice.
- [x] Document model download behavior.

## Phase 16 - GitHub

- [x] Create GitHub repo.
- [x] Add GitHub remote.
- [x] Commit initial project.
- [x] Push to GitHub.
- [x] Add GitHub URL to README.
- [ ] Add issue checklist or project board if desired.

## Phase 17 - Hackathon Submission Package

- [x] Finalize app name.
- [x] Finalize track.
- [x] Verify Gradio app polish.
- [x] Verify model-size compliance.
- [ ] Verify Space URLs.
- [x] Verify GitHub URL.
- [ ] Record demo video.
- [ ] Publish social post.
- [ ] Add field notes/report link.
- [ ] Submit before June 15, 2026.

## Extension PRD Backlog

### vLLM Serving Tab

- [x] Add vLLM runner.
- [x] Add vLLM start/stop UI.
- [x] Add OpenAI-compatible client.
- [x] Add metrics parsing.
- [x] Add Trackio benchmark logging.

### Ollama Quick-Start

- [x] Add Ollama pull/list UI.
- [x] Add Ollama chat service.
- [x] Add Ollama vision service.
- [x] Add setup docs.

### Llama.cpp Champion Path

- [x] Add llama.cpp backend selection.
- [x] Add llama.cpp service.
- [x] Add llama-cpp-python service.
- [x] Add llama.cpp status check.
- [x] Install llama.cpp locally.
- [x] Download/pick GGUF model.
- [x] Verify real text generation.
- [ ] Verify MiniCPM-V mmproj flow.

### Reward Model Eval

- [x] Add reward evaluator.
- [x] Add best-of-N generation.
- [x] Add DPO pair generation.
- [x] Add LoRA vs base reward report.

### Synthetic Data Generation

- [x] Add synthetic generator.
- [x] Add JSON validation.
- [x] Add quality filters.
- [x] Add augmentation flow.
- [x] Add dataset save/export.

### Paper-To-Code Agent

- [x] Add paper input UI.
- [x] Add research phase.
- [x] Add plan phase.
- [x] Add implementation trace.
- [x] Add safety gates.

### HF Spaces Deploy Tool

- [x] Add deployment helper script.
- [x] Add Space creation docs.
- [x] Add remote validation.
- [x] Add build status checks.

### VINDEX Integration

- [x] Define integration boundary.
- [x] Add tool stub.
- [x] Add verification report.
- [x] Document dependency.

### OCR Pipeline Hook

- [x] Add OCR loader.
- [x] Add confidence threshold.
- [x] Add uncertain prediction import.
- [x] Add correction UI.
- [x] Add corrected export.

### MiniCPM Desk-Pet

- [ ] Add persona data schema.
- [ ] Add persona training plan.
- [ ] Add Desk-Pet export plan.
- [ ] Add docs.

### MiniCPM-o Audio Tab

- [ ] Add audio tab.
- [ ] Add microphone input.
- [ ] Add omnimodal service.
- [ ] Add TTS plan.
- [ ] Add streaming plan.

### Cross-Extension Wiring

- [x] Document OCR -> Field Notes -> Training.
- [x] Document Synthetic Gen -> Reward Eval -> DPO.
- [x] Document Agent -> Desk-Pet Persona.
- [x] Document HF Spaces -> Trackio.

## Phase 18 - Template And Reference Apps

### Template How-To

- [x] Document branch strategy for new domain apps.
- [x] Document required domain app file contract.
- [x] Document schema, service, loader, UI, tools, tests, and docs pattern.
- [x] Document no-model/demo-mode requirement.
- [x] Document correction-loop-first workflow.
- [x] Document optional training and real-model verification steps.
- [x] Document security requirements for public Space mode.

### Plant Discovery Reference App

- [x] Add `plant/` package.
- [x] Add standalone Plant Discovery Gradio entrypoint.
- [x] Add clean plant model/domain config.
- [x] Add deterministic no-model plant service.
- [x] Add optional MiniCPM-V plant service adapter.
- [x] Make OpenBMB MiniCPM-V the default real model mode.
- [x] Add explicit demo/openbmb/finetuned runtime modes.
- [x] Add optional fine-tuned adapter loading path.
- [x] Keep optional model dependencies lazy.
- [x] Add plant structured result schema and parser.
- [x] Add species index builder.
- [x] Add local image-folder loader.
- [x] Add field-note correction export to plant training JSONL.
- [x] Add focused Identify, Field Guide, Corrections, and Stats UI.
- [x] Replace direct training execution with non-executing training plan.
- [x] Add optional plant tool functions with lazy MCP server construction.
- [x] Add non-executing plant training planner.
- [x] Add `scripts/plan_plant_training.py`.
- [x] Add Plant Discovery unit tests.
- [x] Verify no-model app shell builds.
- [x] Run Plant Discovery as a long-running local app.
- [x] Generate Plant Discovery screenshots.
- [x] Add Plant Discovery screenshots to README/docs.
- [x] Decide whether hackathon Space launches root workbench or Plant Discovery app.
- [x] Verify real MiniCPM-V plant identification with optional dependencies.
- [ ] Train or configure a real Plant Discovery adapter.
- [ ] Verify `--model-mode finetuned` with the real adapter.
- [ ] Add public-mode file/path/url hardening before Space deployment.

## Ongoing Maintenance

- [x] Update docs after every implemented feature.
- [x] Keep `IMPLEMENTATION_STATUS.md` current.
- [x] Keep unchecked tasks visible.
- [x] Keep secrets and model weights out of git.
- [x] Re-run local app after code changes.