Spaces:

build-small-hackathon
/

workbench

Sleeping

App Files Files Community

workbench / docs /IMPLEMENTATION_STATUS.md

GitHub Actions

Initial ZeroGPU deployment with spaces shim

7f9dfed 16 days ago

preview code

Raw

History Blame Contribute Delete

18.8 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Implementation Status

This file answers: "How do we know what is done?"

An item is done only when:

The code or document exists.
It is linked from the relevant docs.
It has been manually reviewed.
If it is executable, it has been run locally or the blocker is documented.

Current Status

Area	Status	Evidence
PRDs	Done	`HF_PRD_v1.md`, `HF_PRD_ext.md` exist at repo root
Codex guidance	Done	`AGENTS.md` exists
Docs folder	Done	`docs/` exists with task, usage, architecture, extension docs
README	Done	`README.md` exists
Hackathon submission package	Drafted	`docs/HACKATHON_SUBMISSION.md` contains track, story, user, benefit, model compliance, demo flow, video script, social post draft, and submission checklist
Critical improvement roadmap	Done	`docs/ROADMAP_V2_CRITICAL_IMPROVEMENT_PLAN.md` gives a hard judge-oriented rating, architecture/security critique, and second roadmap
Template how-to	Done	`docs/TEMPLATE_HOWTO.md` documents how to build new domain apps from the template
Plant Discovery plan	Done	`docs/PLANT_DISCOVERY_APP_PLAN.md` tracks the first reference app and remaining work
Plant model/training how-to	Done	`docs/PLANT_MODEL_AND_TRAINING_HOWTO.md` documents demo, OpenBMB zero-shot, fine-tuned adapter mode, correction export, training plan, and adapter configuration
Gradio app shell	Implemented and launch-verified	`app.py` builds and foreground launch holds the server open; not currently running
Model config	Implemented	`config/models.yaml` exists
Expanded model config	Implemented	MiniCPM text, thinking, 4.1, V 4.6, V thinking, omnimodal entries, NVIDIA Nemotron Nano 9B v2, GGUF metadata, and backend capability metadata including OpenAI-compatible text serving
Catalog validation	Implemented	`validate_catalog()` and Status tab warnings
Training config	Implemented	`config/training.yaml` exists
Placeholder chat	Implemented	`ui/chat_tab.py`, `models/placeholder_service.py`
Placeholder vision	Implemented	`ui/vision_tab.py`, `models/placeholder_service.py`
Service abstraction	Implemented	`models/base.py`, `models/service_factory.py`
Local backend config	Implemented	`models/local_backend_config.py` saves ignored local settings in `data/local_backends.yaml`
llama.cpp backend	Implemented and locally verified for text CLI	`models/llama_cpp_service.py`; Status tab can pick GGUF/mmproj paths and build a command using `C:\llama-b9587-bin-win-cuda-13.3-x64\llama-server.exe`; `llama-cli.exe` generated a real response from a local GGUF
llama-cpp-python backend	Implemented and locally verified	`models/llama_cpp_python_service.py`; `llama_cpp 0.3.8` generated a real local response from `Llama-3.2-1B-Instruct-Q4_K_M.gguf`; Workbench Playwright captures this response
Ollama backend	Implemented, not locally verified	`models/ollama_service.py`; Status tab lists local Ollama models and prepares explicit `ollama pull` commands; Ollama executable not found on PATH
OpenAI-compatible backend	Implemented and live-verified	`models/openai_compatible_service.py`; Status tab stores LM Studio/vLLM-style base URL and optional served model name, checks `/v1/models`, and posts to `/v1/chat/completions` only when selected; verified `http://192.168.188.37:1234` with `llama-3.2-1b-instruct`
Transformers text backend	Implemented, package installed	`models/transformers_text.py`; lazy-loads tokenizer/model only when selected; `transformers 5.10.2` is installed for MiniCPM-V support
MiniCPM vision backend	Implemented and Plant-verified	`models/minicpm_vision.py` and `plant/plant_service.py` use `AutoProcessor` and `AutoModelForImageTextToText` lazily; `assets/plant_sample.jpg` produced a structured OpenBMB MiniCPM-V result
SGLang backend	Implemented, optional local backend	`models/sglang_runner.py`; builds explicit local start commands, reports health, sends OpenAI-compatible chat requests, and provides a shutdown request; SGLang server launch remains unverified and is not a root Space dependency
App state	Implemented	`core/app_state.py` records local events and dispatches through `EventBus`; `core/tab_feedback.py` emits tab-level UI errors
Service registry	Implemented	`models/service_factory.py` registers text and vision backend factories
Dataset tab	Partial	Local CSV/JSONL preview, optional HF dataset preview, schema, split selector, row count, samples, stats, dataset event emission, and tab-level error status
Local dataset preview	Implemented	CSV, JSONL, NDJSON preview and statistics via `datasets/loader.py`
MCP tools	Implemented locally, Gradio MCP path selected	`mcp_tools/tools.py` provides dataset stats, HF dataset preview/search-style helper, safe calculator, and model inference tool functions; `mcp_tools/bridge.py` documents `/gradio_api/mcp/sse` and verifies local invocation
VINDEX boundary	Implemented locally, execution disabled	`mcp_tools/vindex_tool.py` validates the eight PRD methods, builds non-executing call plans, caps risky edit parameters, and reports local package/server availability
Training tab	Partial	`ui/train_tab.py` builds a LoRA dry-run plan, checkpoint output path, hardware notes, local deterministic evaluation, and optional loss-based perplexity summary
Training planner	Implemented, non-executing	`training/planner.py` parses LoRA/training config, validates dry runs, and never starts training
LoRA trainer planner	Implemented locally, execution disabled	`training/lora_trainer.py` reports PEFT/TRL/Transformers/Torch availability, builds a non-executing LoRA request, and documents SWIFT/LLaMA-Factory vision fine-tuning path
Evaluation	Implemented, local-only	`training/evaluation.py` provides prompt cases, exact-match scoring, optional loss-based perplexity, qualitative table, base-vs-tuned comparison, and JSONL logging
Export tab	Partial	`ui/export_tab.py` builds GGUF download/conversion/quantization plans, lists exported files, and exposes existing exported files through a download output
Export planner	Implemented, non-executing	`training/export.py` detects llama.cpp tools, builds explicit commands, and does not run downloads/conversions
Reward evaluation	Implemented locally	`training/reward_eval.py` provides deterministic reward scoring, best-of-N selection, DPO pair generation, and LoRA-vs-base reward reports
Synthetic data generation	Implemented locally	`datasets/synthetic.py` provides deterministic generation, validation, quality filtering, augmentation, and JSONL export
Field notes	Partial	`ui/notes_tab.py` saves CSV, supports media paths/training flag, imports uncertain OCR predictions, emits field-note events, and exports JSONL/local HF Dataset files
Field note module	Implemented	CSV save, SQLite store, corrected/tag/training filters, JSONL export, local HF Dataset export via `datasets/field_notes.py`
OCR correction loop	Implemented locally	`datasets/ocr.py` loads local CSV/JSONL OCR predictions, filters by confidence threshold, imports uncertain rows to Field Notes, and exports corrected OCR JSONL
Tracking	Implemented with local fallback	`tracking/trackio_client.py` loads Trackio config, writes local JSONL traces, and calls Trackio when installed/enabled
Traces tab	Partial	`ui/traces_tab.py` previews app events, reads local trace rows, shows tracking status, and exports traces
Agent mode	Implemented locally, non-autonomous	`agent/runner.py` provides system prompt, deterministic research-plan-implement-verify trace, paper-to-code trace mode, safety gates, tool registry integration, JSONL trace save/export, and local HF Dataset-style export
Agent tab	Implemented locally	`ui/agent_tab.py` drafts task traces, paper-to-code traces, and exports trace files/datasets
Status tab	Implemented	`ui/status_tab.py` lists model config, backend status, local llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang setup, and Ollama list/pull planning
Tab-level error messages	Implemented	Chat, Vision, and Dataset tabs show status/error messages and emit `ui_error` trace events
Loading/progress states	Implemented	`ui/progress.py` applies full Gradio progress indicators to tab actions
Compact responsive layout	Implemented	`APP_CSS` constrains app width, keeps tabs scrollable, sizes touch targets, and adds mobile padding/type rules
Structure verification	Done	`scripts/verify_structure.ps1` passed
Unit tests	Passing	187 unit/user-story tests pass
User-story tests	Passing	Included in the 187-test suite
Coverage	Passing	68% line/branch coverage at current configured threshold
Performance tests	Passing	2 lightweight performance tests pass
Playwright E2E	Passing with real response screenshots	Workbench Playwright captures a real local GGUF `llama-cpp-python` chat response; Plant Playwright with `RUN_REAL_MODEL_E2E=1` captures a real OpenBMB MiniCPM-V image result from `assets/plant_sample.jpg`
CI pipeline	Added, not run remotely	`.github/workflows/ci.yml`
Quality tooling	Passing	Tests, coverage, performance, ruff, mypy, pylint, bandit, and pip-audit pass through `scripts/run_quality.ps1`
Secrets and model-weight git policy	Implemented	`.gitignore` excludes env files, keys, caches, generated data/exports, and common model weight formats; policy has a unit test
Real model inference	Partial but materially verified	Verified paths: LM Studio/OpenAI-compatible text, llama.cpp CLI text, llama-cpp-python GGUF text, and OpenBMB MiniCPM-V Plant image inference. Remaining unverified paths: Ollama generation, SGLang server generation, llama.cpp MiniCPM-V mmproj vision, full Transformers text generation
Hugging Face Space deploy	Pushed, startup/build pending	Workbench pushed to `build-small-hackathon/workbench` at `6aafdc2083a9b82e9dca2cca5b87c3a1be05121b`; Plant pushed to `build-small-hackathon/plant_identification_tool` at `50897b3167a844b6a66ca0552b73a1791cdff926`; both include Python 3.10 compatibility fixes for HF Spaces
HF Space deployment helper	Implemented locally	`deployment/hf_space.py` and `scripts/plan_hf_space.py` validate required files, README Space metadata, Workbench/Plant remote status, and manual `hf` deployment commands
vLLM serving tab	Implemented locally, not locally verified	`models/vllm_runner.py` and `ui/vllm_tab.py` build explicit vLLM commands, check health, parse metrics, log benchmark metrics through local tracking, and use OpenAI-compatible chat when a server is running
Plant Discovery reference app	Implemented locally, no-model verified; Space wrapper added	`plant/` is a standalone template-built app with demo/no-model service for local tests, default OpenBMB MiniCPM-V mode, optional fine-tuned adapter mode, local species index, correction export, non-executing training plan, optional MCP tools, unit tests, HTTP smoke verification on port 7861, and `plant_space_app.py` for real-model Space launch
GitHub push	Done	GitHub remote `https://github.com/Ckal/codex.git`; commits pushed to `origin/main`

Known Blockers

python is available on PATH as Python 3.13 in the current shell. The documented .venv was not visible during the latest verification run, so tests ran against the global Python environment after reinstalling/updating requirements.txt.
App launch was verified by Playwright through python app.py; no long-running server is currently active.
llama.cpp tools are installed at C:\llama-b9587-bin-win-cuda-13.3-x64; they are not on PATH, so the app stores/uses the explicit llama-server.exe path.
Export planning works, but actual conversion/quantization remains blocked until a specific export run is requested and verified.
A GGUF path is configured locally for Llama-3.2-1B-Instruct-Q4_K_M.gguf; direct llama-cli.exe and llama-cpp-python generation both produced real text.
Ollama is installed on PATH (ollama version is 0.24.0). ollama pull openbmb/minicpm-v4.6 succeeded and ollama list shows openbmb/minicpm-v4.6:latest; a tiny ollama run prompt currently fails with a 500 model-load error, so real Ollama generation remains unverified.
LM Studio/OpenAI-compatible setup is implemented and verified for http://192.168.188.37:1234 with served model override llama-3.2-1b-instruct. This local override is stored in ignored data/local_backends.yaml.
transformers 5.10.2, torch, and MiniCPM-V dependencies are installed. This conflicts with the currently installed sentence-transformers 3.4.1 requirement of transformers<5.0.0, so future quality runs should watch for that dependency edge.
MiniCPM-V Plant inference is verified with assets/plant_sample.jpg; the app also supports bounded E2E inference through PLANT_MAX_NEW_TOKENS=320 and PLANT_AUTO_THINKING=0.
SGLang command planning, health, stop, and chat client code is implemented. SGLang remains an optional local backend, not a root Space dependency, because the local Python 3.13 package index cannot install the newer non-vulnerable SGLang server stack.
vLLM command planning, health, metrics, benchmark logging, and chat client code is implemented, but the vllm package/server is not installed or running in this workspace.
LoRA training request planning is implemented, but real execution remains blocked until PEFT/TRL, Transformers, Torch, and final hardware are approved and installed.
Trackio is optional; local JSONL tracing works without the trackio package, but remote Trackio/HF sync still needs package availability and credentials.
LoRA dry-run planning works locally, but real training remains blocked until a final backend, PEFT/TRL or SWIFT/LLaMA-Factory path, and hardware are chosen.
Hugging Face dataset preview is optional and requires the external datasets package; the app reports a clear status when it is not installed.
Hugging Face Space deployment is pushed with Python 3.10 compatibility fixes for StrEnum and UTC timestamps. Final run verification and smoke workflows remain open until rebuilt Spaces report RUNNING.
Node.js and npm are installed (node v23.11.0, npm 10.9.2). npm install, npm run e2e:install, and npm run e2e pass.
Plant Discovery OpenBMB MiniCPM-V inference is implemented as the default real mode, and plant_space_app.py launches that mode for Space deployment. RUN_REAL_MODEL_E2E=1 Playwright passed and captured a real response screenshot.
Plant Discovery fine-tuned adapter mode is implemented, but no trained plant adapter exists in this workspace yet.
Plant Discovery public Space mode still needs path/url hardening and screenshots.
Full PRD implementation is not complete. There are still unchecked tasks in docs/TASKS.md.
Current unchecked task count needs recounting after the latest Workbench/Space changes. Several PRD/ext PRD items still need real local setup, credentials, hardware, product decisions, or hackathon submission artifacts.

Latest Local Verification

.\scripts\run_tests.ps1 passed: 200 tests and 70% coverage.
.\scripts\run_quality.ps1 passed: tests, app smoke, 70% coverage, performance, Ruff, mypy, Pylint, Bandit, and project-scoped pip-audit.
Plant Discovery no-model HTTP smoke passed on http://127.0.0.1:7861; the process was stopped after verification.
python scripts\plan_plant_training.py --corrected-examples 30 prints a non-executing SWIFT / LLaMA-Factory adapter training plan; current environment is missing torch, transformers, PEFT, TRL, and SWIFT.
pytest tests/unit/test_local_backend_config.py tests/unit/test_llama_cpp_service.py tests/unit/test_model_catalog.py tests/unit/test_service_factory.py -q passed: 30 tests.
npm run e2e:workbench passed and captured a visible local GGUF response through llama-cpp-python.
RUN_REAL_MODEL_E2E=1 PLANT_MAX_NEW_TOKENS=320 PLANT_AUTO_THINKING=0 npx playwright test tests/e2e/plant_real_model.spec.ts --config playwright.plant.config.ts --reporter=list passed and captured a real MiniCPM-V response.
hf upload build-small-hackathon/workbench C:\tmp\workbench_space_payload . --repo-type space pushed commit 6aafdc2083a9b82e9dca2cca5b87c3a1be05121b.
hf upload build-small-hackathon/plant_identification_tool C:\tmp\plant_space_payload . --repo-type space pushed commit 50897b3167a844b6a66ca0552b73a1791cdff926.
hf spaces variables add set WORKBENCH_DEPLOYMENT=space for Workbench and Plant; Plant also has PLANT_MAX_NEW_TOKENS=320 and PLANT_AUTO_THINKING=0.
hf spaces info showed Workbench in APP_STARTING and Plant in BUILDING on zero-a10g; final build/run verification remains pending.
pytest tests/unit tests/user_stories -q passed earlier in this workstream: 197 tests.
node --version returned v23.11.0; npm --version returned 10.9.2.
ollama --version returned ollama version is 0.24.0; ollama list returned gemma4:latest.
ollama pull openbmb/minicpm5-1b failed because the registry manifest does not exist.
ollama pull openbmb/minicpm-v4.6 succeeded; ollama run openbmb/minicpm-v4.6 failed with a local 500 model-load error.
hf --version returned 1.17.0; huggingface-cli --version reports the legacy command is deprecated and recommends hf.
hf auth whoami failed with "Invalid user token"; Space push/build verification is blocked until hf auth login --force is run with a fresh token.
llama-server --version failed because llama-server is not on PATH.
git remote -v shows space-workbench and space-plant remotes configured.
Direct ruff check . passed; cache-write warnings were caused by OneDrive permissions.
Direct mypy . --no-incremental passed when MYPY_CACHE_DIR was moved to %TEMP%.
LM Studio /v1/models at http://192.168.188.37:1234 returned text-embedding-nomic-embed-text-v1.5, qwen2.5-coder-3b-instruct, and llama-3.2-1b-instruct.
LM Studio /v1/chat/completions returned a text response from llama-3.2-1b-instruct.
Focused OCR callback and pipeline tests pass: 8 tests.
Focused VINDEX/MCP tests pass: 13 tests.

Verification Commands

Run these after installing Python:

.\scripts\verify_structure.ps1
.venv\Scripts\python.exe --version
.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt
.\scripts\run_tests.ps1
.\scripts\run_quality.ps1
python app.py

When the app starts, update this file and docs/TASKS.md.