workbench / docs /IMPLEMENTATION_STATUS.md
GitHub Actions
Initial ZeroGPU deployment with spaces shim
7f9dfed
|
Raw
History Blame Contribute Delete
18.8 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Implementation Status

This file answers: "How do we know what is done?"

An item is done only when:

  1. The code or document exists.
  2. It is linked from the relevant docs.
  3. It has been manually reviewed.
  4. If it is executable, it has been run locally or the blocker is documented.

Current Status

Area Status Evidence
PRDs Done HF_PRD_v1.md, HF_PRD_ext.md exist at repo root
Codex guidance Done AGENTS.md exists
Docs folder Done docs/ exists with task, usage, architecture, extension docs
README Done README.md exists
Hackathon submission package Drafted docs/HACKATHON_SUBMISSION.md contains track, story, user, benefit, model compliance, demo flow, video script, social post draft, and submission checklist
Critical improvement roadmap Done docs/ROADMAP_V2_CRITICAL_IMPROVEMENT_PLAN.md gives a hard judge-oriented rating, architecture/security critique, and second roadmap
Template how-to Done docs/TEMPLATE_HOWTO.md documents how to build new domain apps from the template
Plant Discovery plan Done docs/PLANT_DISCOVERY_APP_PLAN.md tracks the first reference app and remaining work
Plant model/training how-to Done docs/PLANT_MODEL_AND_TRAINING_HOWTO.md documents demo, OpenBMB zero-shot, fine-tuned adapter mode, correction export, training plan, and adapter configuration
Gradio app shell Implemented and launch-verified app.py builds and foreground launch holds the server open; not currently running
Model config Implemented config/models.yaml exists
Expanded model config Implemented MiniCPM text, thinking, 4.1, V 4.6, V thinking, omnimodal entries, NVIDIA Nemotron Nano 9B v2, GGUF metadata, and backend capability metadata including OpenAI-compatible text serving
Catalog validation Implemented validate_catalog() and Status tab warnings
Training config Implemented config/training.yaml exists
Placeholder chat Implemented ui/chat_tab.py, models/placeholder_service.py
Placeholder vision Implemented ui/vision_tab.py, models/placeholder_service.py
Service abstraction Implemented models/base.py, models/service_factory.py
Local backend config Implemented models/local_backend_config.py saves ignored local settings in data/local_backends.yaml
llama.cpp backend Implemented and locally verified for text CLI models/llama_cpp_service.py; Status tab can pick GGUF/mmproj paths and build a command using C:\llama-b9587-bin-win-cuda-13.3-x64\llama-server.exe; llama-cli.exe generated a real response from a local GGUF
llama-cpp-python backend Implemented and locally verified models/llama_cpp_python_service.py; llama_cpp 0.3.8 generated a real local response from Llama-3.2-1B-Instruct-Q4_K_M.gguf; Workbench Playwright captures this response
Ollama backend Implemented, not locally verified models/ollama_service.py; Status tab lists local Ollama models and prepares explicit ollama pull commands; Ollama executable not found on PATH
OpenAI-compatible backend Implemented and live-verified models/openai_compatible_service.py; Status tab stores LM Studio/vLLM-style base URL and optional served model name, checks /v1/models, and posts to /v1/chat/completions only when selected; verified http://192.168.188.37:1234 with llama-3.2-1b-instruct
Transformers text backend Implemented, package installed models/transformers_text.py; lazy-loads tokenizer/model only when selected; transformers 5.10.2 is installed for MiniCPM-V support
MiniCPM vision backend Implemented and Plant-verified models/minicpm_vision.py and plant/plant_service.py use AutoProcessor and AutoModelForImageTextToText lazily; assets/plant_sample.jpg produced a structured OpenBMB MiniCPM-V result
SGLang backend Implemented, optional local backend models/sglang_runner.py; builds explicit local start commands, reports health, sends OpenAI-compatible chat requests, and provides a shutdown request; SGLang server launch remains unverified and is not a root Space dependency
App state Implemented core/app_state.py records local events and dispatches through EventBus; core/tab_feedback.py emits tab-level UI errors
Service registry Implemented models/service_factory.py registers text and vision backend factories
Dataset tab Partial Local CSV/JSONL preview, optional HF dataset preview, schema, split selector, row count, samples, stats, dataset event emission, and tab-level error status
Local dataset preview Implemented CSV, JSONL, NDJSON preview and statistics via datasets/loader.py
MCP tools Implemented locally, Gradio MCP path selected mcp_tools/tools.py provides dataset stats, HF dataset preview/search-style helper, safe calculator, and model inference tool functions; mcp_tools/bridge.py documents /gradio_api/mcp/sse and verifies local invocation
VINDEX boundary Implemented locally, execution disabled mcp_tools/vindex_tool.py validates the eight PRD methods, builds non-executing call plans, caps risky edit parameters, and reports local package/server availability
Training tab Partial ui/train_tab.py builds a LoRA dry-run plan, checkpoint output path, hardware notes, local deterministic evaluation, and optional loss-based perplexity summary
Training planner Implemented, non-executing training/planner.py parses LoRA/training config, validates dry runs, and never starts training
LoRA trainer planner Implemented locally, execution disabled training/lora_trainer.py reports PEFT/TRL/Transformers/Torch availability, builds a non-executing LoRA request, and documents SWIFT/LLaMA-Factory vision fine-tuning path
Evaluation Implemented, local-only training/evaluation.py provides prompt cases, exact-match scoring, optional loss-based perplexity, qualitative table, base-vs-tuned comparison, and JSONL logging
Export tab Partial ui/export_tab.py builds GGUF download/conversion/quantization plans, lists exported files, and exposes existing exported files through a download output
Export planner Implemented, non-executing training/export.py detects llama.cpp tools, builds explicit commands, and does not run downloads/conversions
Reward evaluation Implemented locally training/reward_eval.py provides deterministic reward scoring, best-of-N selection, DPO pair generation, and LoRA-vs-base reward reports
Synthetic data generation Implemented locally datasets/synthetic.py provides deterministic generation, validation, quality filtering, augmentation, and JSONL export
Field notes Partial ui/notes_tab.py saves CSV, supports media paths/training flag, imports uncertain OCR predictions, emits field-note events, and exports JSONL/local HF Dataset files
Field note module Implemented CSV save, SQLite store, corrected/tag/training filters, JSONL export, local HF Dataset export via datasets/field_notes.py
OCR correction loop Implemented locally datasets/ocr.py loads local CSV/JSONL OCR predictions, filters by confidence threshold, imports uncertain rows to Field Notes, and exports corrected OCR JSONL
Tracking Implemented with local fallback tracking/trackio_client.py loads Trackio config, writes local JSONL traces, and calls Trackio when installed/enabled
Traces tab Partial ui/traces_tab.py previews app events, reads local trace rows, shows tracking status, and exports traces
Agent mode Implemented locally, non-autonomous agent/runner.py provides system prompt, deterministic research-plan-implement-verify trace, paper-to-code trace mode, safety gates, tool registry integration, JSONL trace save/export, and local HF Dataset-style export
Agent tab Implemented locally ui/agent_tab.py drafts task traces, paper-to-code traces, and exports trace files/datasets
Status tab Implemented ui/status_tab.py lists model config, backend status, local llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang setup, and Ollama list/pull planning
Tab-level error messages Implemented Chat, Vision, and Dataset tabs show status/error messages and emit ui_error trace events
Loading/progress states Implemented ui/progress.py applies full Gradio progress indicators to tab actions
Compact responsive layout Implemented APP_CSS constrains app width, keeps tabs scrollable, sizes touch targets, and adds mobile padding/type rules
Structure verification Done scripts/verify_structure.ps1 passed
Unit tests Passing 187 unit/user-story tests pass
User-story tests Passing Included in the 187-test suite
Coverage Passing 68% line/branch coverage at current configured threshold
Performance tests Passing 2 lightweight performance tests pass
Playwright E2E Passing with real response screenshots Workbench Playwright captures a real local GGUF llama-cpp-python chat response; Plant Playwright with RUN_REAL_MODEL_E2E=1 captures a real OpenBMB MiniCPM-V image result from assets/plant_sample.jpg
CI pipeline Added, not run remotely .github/workflows/ci.yml
Quality tooling Passing Tests, coverage, performance, ruff, mypy, pylint, bandit, and pip-audit pass through scripts/run_quality.ps1
Secrets and model-weight git policy Implemented .gitignore excludes env files, keys, caches, generated data/exports, and common model weight formats; policy has a unit test
Real model inference Partial but materially verified Verified paths: LM Studio/OpenAI-compatible text, llama.cpp CLI text, llama-cpp-python GGUF text, and OpenBMB MiniCPM-V Plant image inference. Remaining unverified paths: Ollama generation, SGLang server generation, llama.cpp MiniCPM-V mmproj vision, full Transformers text generation
Hugging Face Space deploy Pushed, startup/build pending Workbench pushed to build-small-hackathon/workbench at 6aafdc2083a9b82e9dca2cca5b87c3a1be05121b; Plant pushed to build-small-hackathon/plant_identification_tool at 50897b3167a844b6a66ca0552b73a1791cdff926; both include Python 3.10 compatibility fixes for HF Spaces
HF Space deployment helper Implemented locally deployment/hf_space.py and scripts/plan_hf_space.py validate required files, README Space metadata, Workbench/Plant remote status, and manual hf deployment commands
vLLM serving tab Implemented locally, not locally verified models/vllm_runner.py and ui/vllm_tab.py build explicit vLLM commands, check health, parse metrics, log benchmark metrics through local tracking, and use OpenAI-compatible chat when a server is running
Plant Discovery reference app Implemented locally, no-model verified; Space wrapper added plant/ is a standalone template-built app with demo/no-model service for local tests, default OpenBMB MiniCPM-V mode, optional fine-tuned adapter mode, local species index, correction export, non-executing training plan, optional MCP tools, unit tests, HTTP smoke verification on port 7861, and plant_space_app.py for real-model Space launch
GitHub push Done GitHub remote https://github.com/Ckal/codex.git; commits pushed to origin/main

Known Blockers

  • python is available on PATH as Python 3.13 in the current shell. The documented .venv was not visible during the latest verification run, so tests ran against the global Python environment after reinstalling/updating requirements.txt.
  • App launch was verified by Playwright through python app.py; no long-running server is currently active.
  • llama.cpp tools are installed at C:\llama-b9587-bin-win-cuda-13.3-x64; they are not on PATH, so the app stores/uses the explicit llama-server.exe path.
  • Export planning works, but actual conversion/quantization remains blocked until a specific export run is requested and verified.
  • A GGUF path is configured locally for Llama-3.2-1B-Instruct-Q4_K_M.gguf; direct llama-cli.exe and llama-cpp-python generation both produced real text.
  • Ollama is installed on PATH (ollama version is 0.24.0). ollama pull openbmb/minicpm-v4.6 succeeded and ollama list shows openbmb/minicpm-v4.6:latest; a tiny ollama run prompt currently fails with a 500 model-load error, so real Ollama generation remains unverified.
  • LM Studio/OpenAI-compatible setup is implemented and verified for http://192.168.188.37:1234 with served model override llama-3.2-1b-instruct. This local override is stored in ignored data/local_backends.yaml.
  • transformers 5.10.2, torch, and MiniCPM-V dependencies are installed. This conflicts with the currently installed sentence-transformers 3.4.1 requirement of transformers<5.0.0, so future quality runs should watch for that dependency edge.
  • MiniCPM-V Plant inference is verified with assets/plant_sample.jpg; the app also supports bounded E2E inference through PLANT_MAX_NEW_TOKENS=320 and PLANT_AUTO_THINKING=0.
  • SGLang command planning, health, stop, and chat client code is implemented. SGLang remains an optional local backend, not a root Space dependency, because the local Python 3.13 package index cannot install the newer non-vulnerable SGLang server stack.
  • vLLM command planning, health, metrics, benchmark logging, and chat client code is implemented, but the vllm package/server is not installed or running in this workspace.
  • LoRA training request planning is implemented, but real execution remains blocked until PEFT/TRL, Transformers, Torch, and final hardware are approved and installed.
  • Trackio is optional; local JSONL tracing works without the trackio package, but remote Trackio/HF sync still needs package availability and credentials.
  • LoRA dry-run planning works locally, but real training remains blocked until a final backend, PEFT/TRL or SWIFT/LLaMA-Factory path, and hardware are chosen.
  • Hugging Face dataset preview is optional and requires the external datasets package; the app reports a clear status when it is not installed.
  • Hugging Face Space deployment is pushed with Python 3.10 compatibility fixes for StrEnum and UTC timestamps. Final run verification and smoke workflows remain open until rebuilt Spaces report RUNNING.
  • Node.js and npm are installed (node v23.11.0, npm 10.9.2). npm install, npm run e2e:install, and npm run e2e pass.
  • Plant Discovery OpenBMB MiniCPM-V inference is implemented as the default real mode, and plant_space_app.py launches that mode for Space deployment. RUN_REAL_MODEL_E2E=1 Playwright passed and captured a real response screenshot.
  • Plant Discovery fine-tuned adapter mode is implemented, but no trained plant adapter exists in this workspace yet.
  • Plant Discovery public Space mode still needs path/url hardening and screenshots.
  • Full PRD implementation is not complete. There are still unchecked tasks in docs/TASKS.md.
  • Current unchecked task count needs recounting after the latest Workbench/Space changes. Several PRD/ext PRD items still need real local setup, credentials, hardware, product decisions, or hackathon submission artifacts.

Latest Local Verification

  • .\scripts\run_tests.ps1 passed: 200 tests and 70% coverage.
  • .\scripts\run_quality.ps1 passed: tests, app smoke, 70% coverage, performance, Ruff, mypy, Pylint, Bandit, and project-scoped pip-audit.
  • Plant Discovery no-model HTTP smoke passed on http://127.0.0.1:7861; the process was stopped after verification.
  • python scripts\plan_plant_training.py --corrected-examples 30 prints a non-executing SWIFT / LLaMA-Factory adapter training plan; current environment is missing torch, transformers, PEFT, TRL, and SWIFT.
  • pytest tests/unit/test_local_backend_config.py tests/unit/test_llama_cpp_service.py tests/unit/test_model_catalog.py tests/unit/test_service_factory.py -q passed: 30 tests.
  • npm run e2e:workbench passed and captured a visible local GGUF response through llama-cpp-python.
  • RUN_REAL_MODEL_E2E=1 PLANT_MAX_NEW_TOKENS=320 PLANT_AUTO_THINKING=0 npx playwright test tests/e2e/plant_real_model.spec.ts --config playwright.plant.config.ts --reporter=list passed and captured a real MiniCPM-V response.
  • hf upload build-small-hackathon/workbench C:\tmp\workbench_space_payload . --repo-type space pushed commit 6aafdc2083a9b82e9dca2cca5b87c3a1be05121b.
  • hf upload build-small-hackathon/plant_identification_tool C:\tmp\plant_space_payload . --repo-type space pushed commit 50897b3167a844b6a66ca0552b73a1791cdff926.
  • hf spaces variables add set WORKBENCH_DEPLOYMENT=space for Workbench and Plant; Plant also has PLANT_MAX_NEW_TOKENS=320 and PLANT_AUTO_THINKING=0.
  • hf spaces info showed Workbench in APP_STARTING and Plant in BUILDING on zero-a10g; final build/run verification remains pending.
  • pytest tests/unit tests/user_stories -q passed earlier in this workstream: 197 tests.
  • node --version returned v23.11.0; npm --version returned 10.9.2.
  • ollama --version returned ollama version is 0.24.0; ollama list returned gemma4:latest.
  • ollama pull openbmb/minicpm5-1b failed because the registry manifest does not exist.
  • ollama pull openbmb/minicpm-v4.6 succeeded; ollama run openbmb/minicpm-v4.6 failed with a local 500 model-load error.
  • hf --version returned 1.17.0; huggingface-cli --version reports the legacy command is deprecated and recommends hf.
  • hf auth whoami failed with "Invalid user token"; Space push/build verification is blocked until hf auth login --force is run with a fresh token.
  • llama-server --version failed because llama-server is not on PATH.
  • git remote -v shows space-workbench and space-plant remotes configured.
  • Direct ruff check . passed; cache-write warnings were caused by OneDrive permissions.
  • Direct mypy . --no-incremental passed when MYPY_CACHE_DIR was moved to %TEMP%.
  • LM Studio /v1/models at http://192.168.188.37:1234 returned text-embedding-nomic-embed-text-v1.5, qwen2.5-coder-3b-instruct, and llama-3.2-1b-instruct.
  • LM Studio /v1/chat/completions returned a text response from llama-3.2-1b-instruct.
  • Focused OCR callback and pipeline tests pass: 8 tests.
  • Focused VINDEX/MCP tests pass: 13 tests.

Verification Commands

Run these after installing Python:

.\scripts\verify_structure.ps1
.venv\Scripts\python.exe --version
.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt
.\scripts\run_tests.ps1
.\scripts\run_quality.ps1
python app.py

When the app starts, update this file and docs/TASKS.md.