Spaces:

build-small-hackathon
/

workbench

Running on Zero

App Files Files Community

workbench / docs /TASKS.md

GitHub Actions

Initial ZeroGPU deployment with spaces shim

7f9dfed 18 days ago

preview code

Raw

History Blame Contribute Delete

16.1 kB

	# Full Task Checklist

	This is the shared task list for you and Codex. It covers the hackathon MVP, the main PRD, and
	the extension PRD. A task is complete only when the matching acceptance criteria are met and
	`docs/IMPLEMENTATION_STATUS.md` is updated.

	## Legend

	- `[x]` done and documented
	- `[~]` partially implemented or placeholder exists
	- `[ ]` not started
	- `[blocked]` blocked by missing local setup, credentials, hardware, or external decision

	## Phase 0 - Project Memory And Setup

	- [x] Add root `README.md`.
	- [x] Add root `AGENTS.md`.
	- [x] Add `.gitignore`.
	- [x] Add `requirements.txt`.
	- [x] Add `docs/` folder.
	- [x] Add docs index.
	- [x] Add full task checklist.
	- [x] Add implementation status doc.
	- [x] Add usage guide.
	- [x] Add architecture guide.
	- [x] Add extension guide.
	- [x] Add acceptance criteria.
	- [x] Add roadmap.
	- [x] Add critical judge-oriented improvement roadmap.
	- [x] Add template how-to for building new domain apps.
	- [x] Add Plant Discovery reference app checklist.
	- [x] Add Plant Discovery model and training how-to.
	- [x] Add PRD implementation matrix.
	- [x] Add test folder.
	- [x] Add user-story test folder.
	- [x] Add dev requirements.
	- [x] Add Python quality config.
	- [x] Add test runner script.
	- [x] Add quality runner script.
	- [x] Add CI workflow.
	- [x] Add coverage gate.
	- [x] Add performance test script.
	- [x] Install Python 3.11+.
	- [x] Verify `python --version`.
	- [x] Create `.venv`.
	- [x] Install dependencies.
	- [x] Run `python app.py`.
	- [x] Capture screenshot or note local URL.

	## Phase 1 - Hackathon Definition

	- [x] Choose track: Backyard AI or Thousand Token Wood.
	- [x] Write one-sentence project story.
	- [x] Define target user.
	- [x] Define measurable user benefit.
	- [x] Decide final model family and model IDs.
	- [x] Confirm every model is <= 32B parameters.
	- [x] Decide local-first badge target.
	- [x] Decide llama.cpp badge target.
	- [x] Decide open trace badge target.
	- [x] Decide field notes/report badge target.
	- [x] Write final demo flow.
	- [x] Write demo video script.
	- [x] Write social post draft.
	- [ ] Add final submission checklist with exact URLs.

	## Phase 2 - MVP Gradio App

	- [x] Add `app.py`.
	- [x] Add Gradio `Blocks` shell.
	- [x] Add model config loader.
	- [x] Add model metadata display.
	- [x] Add Chat tab.
	- [x] Add Vision tab.
	- [x] Add Dataset tab placeholder.
	- [x] Add Train tab placeholder.
	- [x] Add Export tab placeholder.
	- [x] Add Field Notes tab.
	- [x] Add placeholder text service.
	- [x] Add placeholder vision service.
	- [x] Add Traces tab placeholder.
	- [x] Add Agent tab placeholder.
	- [x] Add Status tab placeholder.
	- [x] Add PowerShell structure verification script.
	- [x] Run structure verification script.
	- [x] Run app locally.
	- [x] Fix local launch errors found so far.
	- [x] Add screenshot capture path to docs or README.
	- [x] Add first demo GIF/video plan.

	## Phase 3 - Config-Driven Model Registry

	- [x] Add `config/models.yaml`.
	- [x] Add text model entry for MiniCPM5-1B.
	- [x] Add vision model entry for MiniCPM-V-4.6.
	- [x] Add omnimodal model entry for MiniCPM-o-4.5.
	- [x] Add typed `ModelInfo`.
	- [x] Add `load_model_catalog()`.
	- [x] Add `model_choices()`.
	- [x] Add `model_summary()`.
	- [x] Add MiniCPM5-1B-Thinking config.
	- [x] Add MiniCPM4.1-8B config.
	- [x] Add MiniCPM-V-4.6-Thinking config.
	- [x] Add GGUF metadata in config.
	- [x] Add backend capability metadata.
	- [x] Add lightweight catalog validation helper.
	- [x] Show warnings for models over 32B parameters.

	## Phase 4 - Core Architecture

	- [x] Add `core/events.py`.
	- [x] Add `EventType`.
	- [x] Add `Event`.
	- [x] Add `EventBus`.
	- [x] Add `core/registry.py`.
	- [x] Add generic `Registry`.
	- [x] Add global app state.
	- [x] Register model services in a service registry.
	- [x] Emit inference events from UI.
	- [x] Emit field note events.
	- [x] Add lightweight logging.
	- [x] Add unit tests for config and registry.

	## Phase 5 - Testing And Quality

	- [x] Add `tests/unit/`.
	- [x] Add `tests/user_stories/`.
	- [x] Add model catalog unit tests.
	- [x] Add field notes unit tests.
	- [x] Add new-user user-story test.
	- [x] Add `requirements-dev.txt`.
	- [x] Add `pyproject.toml`.
	- [x] Add `scripts/run_tests.ps1`.
	- [x] Add `scripts/run_quality.ps1`.
	- [x] Run unit and user-story tests.
	- [x] Install dev quality tools.
	- [x] Run `ruff`.
	- [x] Run `mypy`.
	- [x] Run `pylint`.
	- [x] Run `bandit`.
	- [x] Run `pip-audit`.
	- [x] Add rule: failing bug/check requires a new or updated test.
	- [x] Add coverage report.
	- [x] Add lightweight performance tests.
	- [x] Add CI pipeline.
	- [x] Add Playwright or equivalent browser e2e test after Gradio runs.
	- [ ] Add tests for each real backend as it is implemented.
	- [x] Add tests for backend service selection.
	- [x] Add tests for Ollama unavailable path.
	- [x] Add tests for llama.cpp unavailable path and command building.
	- [x] Add tests for llama-cpp-python unavailable path.
	- [x] Add tests for OpenAI-compatible/LM Studio unavailable and request paths.

	## Phase 6 - Local Inference Backends

	- [x] Choose first real backend.
	- [x] Add backend selector in UI.
	- [x] Add model status panel.
	- [x] Add explicit model load button.
	- [x] Ensure no model weights download on startup.

	### Ollama Backend

	- [x] Confirm Ollama is installed.
	- [x] Add `models/ollama_service.py`.
	- [x] Add local model list.
	- [x] Add pull model command with explicit user action.
	- [x] Add text chat through Ollama.
	- [x] Add vision chat through Ollama when supported.
	- [x] Document Ollama setup.

	### llama.cpp Backend

	- [x] Confirm llama.cpp tools are installed.
	- [x] Add `models/llama_cpp_service.py`.
	- [x] Add `models/llama_cpp_python_service.py`.
	- [x] Add GGUF file picker.
	- [x] Add `llama-server` launch command builder.
	- [x] Add health check.
	- [x] Add text generation through server.
	- [x] Add vision `mmproj` support metadata.
	- [x] Document llama.cpp setup.

	### llama-cpp-python Backend

	- [x] Add optional Python binding service.
	- [x] Add backend selector support.
	- [x] Install `llama-cpp-python` locally.
	- [x] Configure local GGUF path.
	- [x] Verify real text generation through Python binding.
	- [x] Decide whether to keep Python binding as fallback or primary local path.

	### Transformers Backend

	- [x] Add `models/transformers_text.py`.
	- [x] Add `AutoModelForCausalLM` loading for text models.
	- [x] Add tokenizer loading.
	- [x] Add explicit trust-remote-code handling.
	- [x] Add device/dtype settings.
	- [x] Add streaming generation.
	- [x] Document hardware expectations.

	### OpenAI-Compatible / LM Studio Backend

	- [x] Add `models/openai_compatible_service.py`.
	- [x] Add backend selector support.
	- [x] Add local base URL and served-model-name config.
	- [x] Add Status tab setup and reachability check.
	- [x] Add text chat through OpenAI-compatible `/v1/chat/completions`.
	- [x] Document LM Studio setup.
	- [x] Verify real text generation through LM Studio.

	### MiniCPM Vision Backend

	- [x] Add `models/minicpm_vision.py`.
	- [x] Use `AutoModelForImageTextToText`.
	- [x] Use `AutoProcessor`.
	- [x] Add image prompt formatting.
	- [x] Add thinking-mode toggle mapping.
	- [x] Add video support plan.

	### SGLang Backend

	- [x] Add `models/sglang_runner.py`.
	- [x] Add server start/stop.
	- [x] Add MiniCPM5 tool parser config.
	- [x] Add health check.
	- [x] Add chat endpoint client.
	- [x] Install `sglang` locally.

	## Phase 7 - UI Tabs From Main PRD

	- [x] Chat tab placeholder.
	- [x] Vision tab placeholder.
	- [x] Dataset tab placeholder.
	- [x] Train tab placeholder.
	- [x] Export tab placeholder.
	- [x] Field Notes tab minimal save.
	- [x] Add Traces tab with local event preview.
	- [x] Add Agent tab placeholder.
	- [x] Add model/backend status tab or panel.
	- [x] Add settings panel.
	- [x] Add tab-level error messages.
	- [x] Add loading/progress states.
	- [x] Add compact responsive layout review.

	## Phase 8 - Dataset Layer

	- [x] Add `datasets/` package.
	- [x] Add local CSV loader.
	- [x] Add local JSONL loader.
	- [x] Add Hugging Face dataset loader.
	- [x] Add dataset schema preview.
	- [x] Add split selector.
	- [x] Add row count and sample preview.
	- [x] Add dataset statistics tool.
	- [x] Emit `DATASET_LOADED` event.
	- [x] Document dataset formats.

	## Phase 9 - Field Notes And Correction Loop

	- [x] Save field notes to CSV.
	- [x] Move field note logic out of UI into `datasets/field_notes.py`.
	- [x] Add `FieldNote` dataclass.
	- [x] Add SQLite-backed store.
	- [x] Add JSONL export.
	- [x] Add local HF Dataset export.
	- [x] Add corrected-only filter.
	- [x] Add tags filter.
	- [x] Add image path support.
	- [x] Add video path support.
	- [x] Add use-for-training flag.
	- [x] Add docs for correction loop.

	## Phase 10 - Training Pipeline

	- [x] Add training config placeholder.
	- [x] Add training UI placeholder.
	- [x] Add `training/` package.
	- [x] Add LoRA text trainer.
	- [x] Add LoRA config parser.
	- [ ] Add PEFT/TRL dependencies when ready.
	- [x] Add training dry-run validation.
	- [x] Add local checkpoint output.
	- [x] Add Trackio integration.
	- [x] Add evaluation after training.
	- [x] Add LoRA vs base comparison.
	- [x] Add vision fine-tuning plan using SWIFT or LLaMA-Factory.
	- [x] Document training hardware requirements.

	## Phase 11 - Evaluation

	- [x] Add `training/evaluation.py`.
	- [x] Add simple prompt test set.
	- [x] Add exact-match metric.
	- [x] Add qualitative eval table.
	- [x] Add perplexity metric where appropriate.
	- [x] Add base vs tuned comparison.
	- [x] Log eval results.
	- [x] Document evaluation method.

	## Phase 12 - Export And Quantization

	- [x] Add export UI placeholder.
	- [x] Add `training/export.py`.
	- [x] Add official GGUF download path.
	- [x] Add local HF-to-GGUF conversion path.
	- [x] Add quantization selector.
	- [x] Add llama.cpp tool detection.
	- [x] Add exported file listing.
	- [x] Add download link in UI.
	- [x] Document GGUF export.

	## Phase 13 - Trackio Tracing

	- [x] Add `tracking/` package.
	- [x] Add Trackio config.
	- [x] Add `trackio.init()`.
	- [x] Add `trackio.log()`.
	- [x] Add `trackio.finish()`.
	- [x] Log inference events locally.
	- [x] Log dataset events locally.
	- [x] Log training metrics.
	- [x] Add Traces tab.
	- [x] Add HF Space sync docs.

	## Phase 13 - MCP Layer

	- [x] Decide MCP path: Gradio native, `gradio.Server`
	- [x] Add MCP tools module.
	- [x] Add dataset stats tool.
	- [x] Add HF search tool.
	- [x] Add safe calculator tool.
	- [x] Add model inference tool.
	- [x] Expose tools through selected MCP path.
	- [x] Document MCP endpoint.
	- [x] Verify endpoint locally.

	## Phase 14 - Agent Mode

	- [x] Add `agent/` package.
	- [x] Add agent system prompt.
	- [x] Add research-plan-implement loop placeholder.
	- [x] Add tool registry integration.
	- [x] Add session trace logging.
	- [x] Add Agent tab.
	- [x] Add trace export to JSONL.
	- [x] Add local HF Dataset export for traces.
	- [x] Document limitations.

	## Phase 15 - Hugging Face Space Deployment

	- [x] Install/verify `huggingface_hub`.
	- [x] Login with `hf auth login`.
	- [ ] Create Space.
	- [x] Add Space README metadata if needed.
	- [x] Add Space remote.
	- [x] Push to Space.
	- [ ] Verify Space builds.
	- [ ] Add Space URL to README.
	- [x] Document hardware choice.
	- [x] Document model download behavior.

	## Phase 16 - GitHub

	- [x] Create GitHub repo.
	- [x] Add GitHub remote.
	- [x] Commit initial project.
	- [x] Push to GitHub.
	- [x] Add GitHub URL to README.
	- [ ] Add issue checklist or project board if desired.

	## Phase 17 - Hackathon Submission Package

	- [x] Finalize app name.
	- [x] Finalize track.
	- [x] Verify Gradio app polish.
	- [x] Verify model-size compliance.
	- [ ] Verify Space URLs.
	- [x] Verify GitHub URL.
	- [ ] Record demo video.
	- [ ] Publish social post.
	- [ ] Add field notes/report link.
	- [ ] Submit before June 15, 2026.

	## Extension PRD Backlog

	### vLLM Serving Tab

	- [x] Add vLLM runner.
	- [x] Add vLLM start/stop UI.
	- [x] Add OpenAI-compatible client.
	- [x] Add metrics parsing.
	- [x] Add Trackio benchmark logging.

	### Ollama Quick-Start

	- [x] Add Ollama pull/list UI.
	- [x] Add Ollama chat service.
	- [x] Add Ollama vision service.
	- [x] Add setup docs.

	### Llama.cpp Champion Path

	- [x] Add llama.cpp backend selection.
	- [x] Add llama.cpp service.
	- [x] Add llama-cpp-python service.
	- [x] Add llama.cpp status check.
	- [x] Install llama.cpp locally.
	- [x] Download/pick GGUF model.
	- [x] Verify real text generation.
	- [ ] Verify MiniCPM-V mmproj flow.

	### Reward Model Eval

	- [x] Add reward evaluator.
	- [x] Add best-of-N generation.
	- [x] Add DPO pair generation.
	- [x] Add LoRA vs base reward report.

	### Synthetic Data Generation

	- [x] Add synthetic generator.
	- [x] Add JSON validation.
	- [x] Add quality filters.
	- [x] Add augmentation flow.
	- [x] Add dataset save/export.

	### Paper-To-Code Agent

	- [x] Add paper input UI.
	- [x] Add research phase.
	- [x] Add plan phase.
	- [x] Add implementation trace.
	- [x] Add safety gates.

	### HF Spaces Deploy Tool

	- [x] Add deployment helper script.
	- [x] Add Space creation docs.
	- [x] Add remote validation.
	- [x] Add build status checks.

	### VINDEX Integration

	- [x] Define integration boundary.
	- [x] Add tool stub.
	- [x] Add verification report.
	- [x] Document dependency.

	### OCR Pipeline Hook

	- [x] Add OCR loader.
	- [x] Add confidence threshold.
	- [x] Add uncertain prediction import.
	- [x] Add correction UI.
	- [x] Add corrected export.

	### MiniCPM Desk-Pet

	- [ ] Add persona data schema.
	- [ ] Add persona training plan.
	- [ ] Add Desk-Pet export plan.
	- [ ] Add docs.

	### MiniCPM-o Audio Tab

	- [ ] Add audio tab.
	- [ ] Add microphone input.
	- [ ] Add omnimodal service.
	- [ ] Add TTS plan.
	- [ ] Add streaming plan.

	### Cross-Extension Wiring

	- [x] Document OCR -> Field Notes -> Training.
	- [x] Document Synthetic Gen -> Reward Eval -> DPO.
	- [x] Document Agent -> Desk-Pet Persona.
	- [x] Document HF Spaces -> Trackio.

	## Phase 18 - Template And Reference Apps

	### Template How-To

	- [x] Document branch strategy for new domain apps.
	- [x] Document required domain app file contract.
	- [x] Document schema, service, loader, UI, tools, tests, and docs pattern.
	- [x] Document no-model/demo-mode requirement.
	- [x] Document correction-loop-first workflow.
	- [x] Document optional training and real-model verification steps.
	- [x] Document security requirements for public Space mode.

	### Plant Discovery Reference App

	- [x] Add `plant/` package.
	- [x] Add standalone Plant Discovery Gradio entrypoint.
	- [x] Add clean plant model/domain config.
	- [x] Add deterministic no-model plant service.
	- [x] Add optional MiniCPM-V plant service adapter.
	- [x] Make OpenBMB MiniCPM-V the default real model mode.
	- [x] Add explicit demo/openbmb/finetuned runtime modes.
	- [x] Add optional fine-tuned adapter loading path.
	- [x] Keep optional model dependencies lazy.
	- [x] Add plant structured result schema and parser.
	- [x] Add species index builder.
	- [x] Add local image-folder loader.
	- [x] Add field-note correction export to plant training JSONL.
	- [x] Add focused Identify, Field Guide, Corrections, and Stats UI.
	- [x] Replace direct training execution with non-executing training plan.
	- [x] Add optional plant tool functions with lazy MCP server construction.
	- [x] Add non-executing plant training planner.
	- [x] Add `scripts/plan_plant_training.py`.
	- [x] Add Plant Discovery unit tests.
	- [x] Verify no-model app shell builds.
	- [x] Run Plant Discovery as a long-running local app.
	- [x] Generate Plant Discovery screenshots.
	- [x] Add Plant Discovery screenshots to README/docs.
	- [x] Decide whether hackathon Space launches root workbench or Plant Discovery app.
	- [x] Verify real MiniCPM-V plant identification with optional dependencies.
	- [ ] Train or configure a real Plant Discovery adapter.
	- [ ] Verify `--model-mode finetuned` with the real adapter.
	- [ ] Add public-mode file/path/url hardening before Space deployment.

	## Ongoing Maintenance

	- [x] Update docs after every implemented feature.
	- [x] Keep `IMPLEMENTATION_STATUS.md` current.
	- [x] Keep unchecked tasks visible.
	- [x] Keep secrets and model weights out of git.
	- [x] Re-run local app after code changes.