workbench / docs /USAGE.md
GitHub Actions
Initial ZeroGPU deployment with spaces shim
7f9dfed
|
Raw
History Blame Contribute Delete
11.2 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

How To Use The Project

1. Install Python

Install Python 3.11 or newer. On Windows, make sure "Add Python to PATH" is enabled.

Verify:

python --version

Before Python is available, you can still verify the repository structure:

.\scripts\verify_structure.ps1

In this workspace, the direct WindowsApps interpreter worked even though python was not on PATH:

& "$env:LOCALAPPDATA\Microsoft\WindowsApps\python3.11.exe" --version

2. Create A Virtual Environment

python -m venv .venv
.venv\Scripts\Activate.ps1

If python is not on PATH, use the WindowsApps interpreter:

& "$env:LOCALAPPDATA\Microsoft\WindowsApps\python3.11.exe" -m venv .venv

3. Install Dependencies

python -m pip install -r requirements.txt

For tests and quality checks:

python -m pip install -r requirements-dev.txt

4. Run The App

python app.py

5. Run Tests

.\scripts\run_tests.ps1

6. Run Quality Checks

.\scripts\run_quality.ps1

Open the local Gradio URL printed in the terminal, usually:

http://127.0.0.1:7860

7. Current App Behavior

Local development can still use deterministic helper services for tests, but deployed Space mode uses WORKBENCH_DEPLOYMENT=space, hides placeholder backends, and requires real backend configuration for model calls. The app still does not download model weights on startup.

Available tabs. User-triggered tab actions show Gradio progress while callbacks run, and the app uses compact responsive styling for narrow screens:

  • Chat - llama.cpp, llama-cpp-python, Ollama, OpenAI-compatible/LM Studio, SGLang, or Transformers text inference with tab status/errors.
  • Vision - llama.cpp, llama-cpp-python, Ollama, or Transformers MiniCPM image + prompt inference with tab status/errors.
  • Dataset - local CSV/JSONL/NDJSON preview, optional Hugging Face dataset preview, stats, and tab status/errors.
  • Train - LoRA dry-run training plan plus local base-vs-tuned exact-match evaluation.
  • Export - GGUF download/conversion/quantization planning, exported-file listing, and existing-file downloads.
  • Field Notes - saves human corrections to CSV, imports uncertain OCR predictions, captures media paths/training flags, exports corrected JSONL, corrected OCR JSONL, and local HF Dataset files.
  • Traces - local event preview, JSONL trace rows, tracking status, and trace export.
  • Agent - local non-autonomous research-plan-implement-verify trace mode.
  • Status - shows configured models, backend metadata, local llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop planning, and Ollama list/pull planning.

Ollama is optional and is not installed automatically. Install and start Ollama yourself, then pull a compatible model explicitly before selecting the Ollama backend in the app. The Status tab can list models from a running local Ollama server and prepare an explicit ollama pull <model> command. It shows the command only; it does not run downloads for you.

llama.cpp is the preferred hackathon backend path. Install llama.cpp separately, open the Status tab, pick or type an explicit GGUF path, optionally pick an mmproj GGUF for vision, then click Prepare local model config. The app writes data/local_backends.yaml, shows the llama-server command, and still does not download or load model weights on startup. Start llama-server yourself with that command, then select the llama.cpp backend in the app. llama-cpp-python is also available as an optional backend when the package is installed and a local GGUF path is configured.

LM Studio or another local OpenAI-compatible server is optional and is not a cloud API path. Start the local server yourself, open the Status tab, set the server URL such as http://127.0.0.1:1234, optionally enter the exact served model name shown by LM Studio, and click Save OpenAI-compatible config. Check server calls /v1/models; selecting openai-compatible in the Chat tab posts to /v1/chat/completions only after you submit a prompt. No model weights are downloaded or loaded on app startup.

The Transformers text backend is optional. It requires installing transformers and a compatible PyTorch build for your CPU/GPU, and it may download model weights when you explicitly select it and run a prompt. It is not installed automatically and is not used on startup.

The Transformers vision backend is optional for MiniCPM-V models. It uses AutoProcessor and AutoModelForImageTextToText only after you select transformers in the Vision tab and submit an image. The current app maps the thinking toggle into the prompt template and documents video as a future frame-sampling path; it does not download or load vision weights on startup.

SGLang is optional. The Status tab can prepare a local python -m sglang.launch_server command, check /health, request /shutdown, and the Chat tab can use the sglang backend against /v1/chat/completions after you start the server yourself. The app does not start SGLang on startup.

Dataset preview supports local .csv, .jsonl, and .ndjson files, split names for optional Hugging Face dataset preview, and basic local statistics. Field Notes can export corrected training rows to data/field_notes.jsonl and local HF Dataset-style files to data/hf_field_notes/.

The OCR correction hook expects a local .csv, .jsonl, or .ndjson prediction file with fields such as source_path, text, and confidence. The Field Notes tab can preview rows at or below a confidence threshold, import those uncertain rows as correction tasks, and export corrected OCR rows to data/ocr_corrections.jsonl. This is the local OCR -> Field Notes -> Training path: import uncertain predictions, correct them in Field Notes, then export corrected JSONL/HF Dataset files for future fine-tuning or evaluation.

Local MCP-style tools live in mcp_tools/tools.py. The selected MCP path is Gradio native MCP, enabled by launch(mcp_server=True). The documented endpoint path is /gradio_api/mcp/sse. mcp_tools/bridge.py exposes a local manifest and invocation helper for dataset stats, optional HF dataset preview, safe arithmetic, model inference, and non-executing VINDEX call planning. Full external MCP client verification still depends on launching the app and connecting a client.

The VINDEX helper is intentionally planning-only. It validates the PRD method names, reports whether the local vindex package or http://127.0.0.1:8765/health endpoint is available, and caps risky edit parameters. It does not edit model weights.

The Agent tab drafts local research-plan-implement-verify traces and paper-to-code traces, stores them in data/agent_traces.jsonl, and can export JSONL or local HF Dataset-style trace files. It includes safety gates and does not run shell commands, commit, push, deploy, download models, or call external services. Agent traces can later become persona/source material for the Desk-Pet extension, but no Desk-Pet runtime is implemented yet.

The Export tab is a planning surface. It shows explicit huggingface-cli, convert_hf_to_gguf.py, and llama-quantize commands for the selected model and quantization, plus a list of files already present in the export directory. Existing files are exposed through the download output. It does not run those commands yet.

The Train tab does not start LoRA training yet. It builds a dry-run plan from config/training.yaml, validates the dataset path, shows the planned checkpoint output directory, documents hardware expectations, and builds a non-executing LoRA trainer request with PEFT/TRL dependency status. It can also run a local evaluation by comparing newline-separated base and tuned responses against the built-in prompt cases. It reports exact match, can calculate perplexity from optional negative log likelihood values, shows a qualitative table, and appends tuned results to data/eval_results.jsonl.

The vLLM tab is a local serving planner. It prepares an explicit vllm serve ... command, checks /health, fetches /metrics, logs parsed benchmark metrics through local tracking, and points the chat client at /v1/chat/completions only after you start vLLM yourself. It does not install vLLM or start a server on app startup.

Tracing writes local events to data/traces.jsonl by default. The Traces tab can show recent app events, show JSONL trace rows, report whether optional Trackio is available, and export local traces to exports/traces.jsonl. Remote Trackio/HF sync still requires credentials and setup.

8. Hugging Face Space Deployment

The repo includes Hugging Face Space metadata in README.md and a local planning helper:

.venv\Scripts\python.exe scripts\plan_hf_space.py --user <hf-user-or-org>

The helper validates required files and prints the manual commands for login, Space creation, remote setup, and push. It does not login, create a repo, push, or store tokens.

Required Space files:

  • app.py
  • requirements.txt
  • README.md
  • config/models.yaml
  • config/training.yaml

Target Spaces:

  • Workbench: https://huggingface.co/spaces/build-small-hackathon/workbench
  • Plant Identification Tool: https://huggingface.co/spaces/build-small-hackathon/plant_identification_tool

Use hf auth login with a freshly generated token. The app does not download model weights on startup; model downloads happen only through explicit backend actions such as ollama pull, hf download, or selecting a real Transformers backend and running a prompt. Upgrade Space hardware if real OpenBMB text or vision inference cannot run on the default CPU tier.

Trackio/HF sync path: local traces are written to JSONL first, then optional Trackio can be enabled in config/training.yaml after package availability and credentials are ready.

Synthetic data and reward evaluation helpers are local Python utilities. datasets/synthetic.py can generate, validate, filter, augment, and export JSONL examples. training/reward_eval.py can score supplied responses, select best-of-N candidates, create DPO chosen/rejected pairs, and compare base responses against LoRA responses. They do not load reward models or call external services.

9. How To Work With Codex

Useful prompts:

Read docs/TASKS.md and implement the next unchecked MVP task.
Read docs/IMPLEMENTATION_STATUS.md and tell me what is blocked.
Wire the next backend, but keep automatic model downloads disabled on startup.
Update the docs after the change and mark the matching checklist item.
If this failed, add or update a test that catches it, then fix the code.

10. What To Avoid

  • Do not start with the whole PRD at once.
  • Do not download huge models automatically.
  • Do not mark tasks done before running or documenting the blocker.
  • Do not add features without tests.
  • Do not add broad coverage escapes such as pragma no cover without a documented reason.
  • Do not push secrets, tokens, model caches, GGUF files, or virtual environments to git.