Spaces:

build-small-hackathon
/

workbench

Sleeping

App Files Files Community

workbench / docs /USAGE.md

GitHub Actions

Initial ZeroGPU deployment with spaces shim

7f9dfed 17 days ago

preview code

Raw

History Blame Contribute Delete

11.2 kB

	# How To Use The Project

	## 1. Install Python

	Install Python 3.11 or newer. On Windows, make sure "Add Python to PATH" is enabled.

	Verify:

	```powershell
	python --version
	```

	Before Python is available, you can still verify the repository structure:

	```powershell
	.\scripts\verify_structure.ps1
	```

	In this workspace, the direct WindowsApps interpreter worked even though `python` was not on PATH:

	```powershell
	& "$env:LOCALAPPDATA\Microsoft\WindowsApps\python3.11.exe" --version
	```

	## 2. Create A Virtual Environment

	```powershell
	python -m venv .venv
	.venv\Scripts\Activate.ps1
	```

	If `python` is not on PATH, use the WindowsApps interpreter:

	```powershell
	& "$env:LOCALAPPDATA\Microsoft\WindowsApps\python3.11.exe" -m venv .venv
	```

	## 3. Install Dependencies

	```powershell
	python -m pip install -r requirements.txt
	```

	For tests and quality checks:

	```powershell
	python -m pip install -r requirements-dev.txt
	```

	## 4. Run The App

	```powershell
	python app.py
	```

	## 5. Run Tests

	```powershell
	.\scripts\run_tests.ps1
	```

	## 6. Run Quality Checks

	```powershell
	.\scripts\run_quality.ps1
	```

	Open the local Gradio URL printed in the terminal, usually:

	```text
	http://127.0.0.1:7860
	```

	## 7. Current App Behavior

	Local development can still use deterministic helper services for tests, but deployed Space mode
	uses `WORKBENCH_DEPLOYMENT=space`, hides placeholder backends, and requires real backend
	configuration for model calls. The app still does not download model weights on startup.

	Available tabs. User-triggered tab actions show Gradio progress while callbacks run, and the app
	uses compact responsive styling for narrow screens:

	- Chat - llama.cpp, llama-cpp-python, Ollama, OpenAI-compatible/LM Studio, SGLang, or Transformers text inference with tab status/errors.
	- Vision - llama.cpp, llama-cpp-python, Ollama, or Transformers MiniCPM image + prompt inference with tab status/errors.
	- Dataset - local CSV/JSONL/NDJSON preview, optional Hugging Face dataset preview, stats, and tab status/errors.
	- Train - LoRA dry-run training plan plus local base-vs-tuned exact-match evaluation.
	- Export - GGUF download/conversion/quantization planning, exported-file listing, and existing-file downloads.
	- Field Notes - saves human corrections to CSV, imports uncertain OCR predictions, captures media paths/training flags, exports corrected JSONL, corrected OCR JSONL, and local HF Dataset files.
	- Traces - local event preview, JSONL trace rows, tracking status, and trace export.
	- Agent - local non-autonomous research-plan-implement-verify trace mode.
	- Status - shows configured models, backend metadata, local llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop planning, and Ollama list/pull planning.

	Ollama is optional and is not installed automatically. Install and start Ollama yourself, then
	pull a compatible model explicitly before selecting the Ollama backend in the app. The Status tab
	can list models from a running local Ollama server and prepare an explicit
	`ollama pull <model>` command. It shows the command only; it does not run downloads for you.

	llama.cpp is the preferred hackathon backend path. Install llama.cpp separately, open the Status
	tab, pick or type an explicit GGUF path, optionally pick an mmproj GGUF for vision, then click
	`Prepare local model config`. The app writes `data/local_backends.yaml`, shows the
	`llama-server` command, and still does not download or load model weights on startup.
	Start `llama-server` yourself with that command, then select the `llama.cpp` backend in the app.
	`llama-cpp-python` is also available as an optional backend when the package is installed and a
	local GGUF path is configured.

	LM Studio or another local OpenAI-compatible server is optional and is not a cloud API path. Start
	the local server yourself, open the Status tab, set the server URL such as
	`http://127.0.0.1:1234`, optionally enter the exact served model name shown by LM Studio, and
	click `Save OpenAI-compatible config`. `Check server` calls `/v1/models`; selecting
	`openai-compatible` in the Chat tab posts to `/v1/chat/completions` only after you submit a prompt.
	No model weights are downloaded or loaded on app startup.

	The Transformers text backend is optional. It requires installing `transformers` and a compatible
	PyTorch build for your CPU/GPU, and it may download model weights when you explicitly select it and
	run a prompt. It is not installed automatically and is not used on startup.

	The Transformers vision backend is optional for MiniCPM-V models. It uses `AutoProcessor` and
	`AutoModelForImageTextToText` only after you select `transformers` in the Vision tab and submit an
	image. The current app maps the thinking toggle into the prompt template and documents video as a
	future frame-sampling path; it does not download or load vision weights on startup.

	SGLang is optional. The Status tab can prepare a local `python -m sglang.launch_server` command,
	check `/health`, request `/shutdown`, and the Chat tab can use the `sglang` backend against
	`/v1/chat/completions` after you start the server yourself. The app does not start SGLang on
	startup.

	Dataset preview supports local `.csv`, `.jsonl`, and `.ndjson` files, split names for optional
	Hugging Face dataset preview, and basic local statistics. Field Notes can export corrected
	training rows to `data/field_notes.jsonl` and local HF Dataset-style files to
	`data/hf_field_notes/`.

	The OCR correction hook expects a local `.csv`, `.jsonl`, or `.ndjson` prediction file with fields
	such as `source_path`, `text`, and `confidence`. The Field Notes tab can preview rows at or below a
	confidence threshold, import those uncertain rows as correction tasks, and export corrected OCR rows
	to `data/ocr_corrections.jsonl`. This is the local OCR -> Field Notes -> Training path: import
	uncertain predictions, correct them in Field Notes, then export corrected JSONL/HF Dataset files for
	future fine-tuning or evaluation.

	Local MCP-style tools live in `mcp_tools/tools.py`. The selected MCP path is Gradio native MCP,
	enabled by `launch(mcp_server=True)`. The documented endpoint path is `/gradio_api/mcp/sse`.
	`mcp_tools/bridge.py` exposes a local manifest and invocation helper for dataset stats, optional
	HF dataset preview, safe arithmetic, model inference, and non-executing VINDEX call planning. Full
	external MCP client verification still depends on launching the app and connecting a client.

	The VINDEX helper is intentionally planning-only. It validates the PRD method names, reports
	whether the local `vindex` package or `http://127.0.0.1:8765/health` endpoint is available, and
	caps risky edit parameters. It does not edit model weights.

	The Agent tab drafts local research-plan-implement-verify traces and paper-to-code traces, stores
	them in `data/agent_traces.jsonl`, and can export JSONL or local HF Dataset-style trace files. It
	includes safety gates and does not run shell commands, commit, push, deploy, download models, or
	call external services. Agent traces can later become persona/source material for the Desk-Pet
	extension, but no Desk-Pet runtime is implemented yet.

	The Export tab is a planning surface. It shows explicit `huggingface-cli`,
	`convert_hf_to_gguf.py`, and `llama-quantize` commands for the selected model and quantization,
	plus a list of files already present in the export directory. Existing files are exposed through
	the download output. It does not run those commands yet.

	The Train tab does not start LoRA training yet. It builds a dry-run plan from
	`config/training.yaml`, validates the dataset path, shows the planned checkpoint output directory,
	documents hardware expectations, and builds a non-executing LoRA trainer request with PEFT/TRL
	dependency status. It can also run a local evaluation by comparing newline-separated base and tuned
	responses against the built-in prompt cases. It reports exact match, can calculate perplexity from
	optional negative log likelihood values, shows a qualitative table, and appends tuned results to
	`data/eval_results.jsonl`.

	The vLLM tab is a local serving planner. It prepares an explicit `vllm serve ...` command, checks
	`/health`, fetches `/metrics`, logs parsed benchmark metrics through local tracking, and points the
	chat client at `/v1/chat/completions` only after you start vLLM yourself. It does not install vLLM
	or start a server on app startup.

	Tracing writes local events to `data/traces.jsonl` by default. The Traces tab can show recent app
	events, show JSONL trace rows, report whether optional Trackio is available, and export local
	traces to `exports/traces.jsonl`. Remote Trackio/HF sync still requires credentials and setup.

	## 8. Hugging Face Space Deployment

	The repo includes Hugging Face Space metadata in `README.md` and a local planning helper:

	```powershell
	.venv\Scripts\python.exe scripts\plan_hf_space.py --user <hf-user-or-org>
	```

	The helper validates required files and prints the manual commands for login, Space creation,
	remote setup, and push. It does not login, create a repo, push, or store tokens.

	Required Space files:

	- `app.py`
	- `requirements.txt`
	- `README.md`
	- `config/models.yaml`
	- `config/training.yaml`

	Target Spaces:

	- Workbench: `https://huggingface.co/spaces/build-small-hackathon/workbench`
	- Plant Identification Tool: `https://huggingface.co/spaces/build-small-hackathon/plant_identification_tool`

	Use `hf auth login` with a freshly generated token. The app does not download model weights on
	startup; model downloads happen only through explicit backend actions such as `ollama pull`,
	`hf download`, or selecting a real Transformers backend and running a prompt. Upgrade Space
	hardware if real OpenBMB text or vision inference cannot run on the default CPU tier.

	Trackio/HF sync path: local traces are written to JSONL first, then optional Trackio can be enabled
	in `config/training.yaml` after package availability and credentials are ready.

	Synthetic data and reward evaluation helpers are local Python utilities. `datasets/synthetic.py`
	can generate, validate, filter, augment, and export JSONL examples. `training/reward_eval.py` can
	score supplied responses, select best-of-N candidates, create DPO chosen/rejected pairs, and compare
	base responses against LoRA responses. They do not load reward models or call external services.

	## 9. How To Work With Codex

	Useful prompts:

	```text
	Read docs/TASKS.md and implement the next unchecked MVP task.
	```

	```text
	Read docs/IMPLEMENTATION_STATUS.md and tell me what is blocked.
	```

	```text
	Wire the next backend, but keep automatic model downloads disabled on startup.
	```

	```text
	Update the docs after the change and mark the matching checklist item.
	```

	```text
	If this failed, add or update a test that catches it, then fix the code.
	```

	## 10. What To Avoid

	- Do not start with the whole PRD at once.
	- Do not download huge models automatically.
	- Do not mark tasks done before running or documenting the blocker.
	- Do not add features without tests.
	- Do not add broad coverage escapes such as pragma no cover without a documented reason.
	- Do not push secrets, tokens, model caches, GGUF files, or virtual environments to git.