# How To Use The Project ## 1. Install Python Install Python 3.11 or newer. On Windows, make sure "Add Python to PATH" is enabled. Verify: ```powershell python --version ``` Before Python is available, you can still verify the repository structure: ```powershell .\scripts\verify_structure.ps1 ``` In this workspace, the direct WindowsApps interpreter worked even though `python` was not on PATH: ```powershell & "$env:LOCALAPPDATA\Microsoft\WindowsApps\python3.11.exe" --version ``` ## 2. Create A Virtual Environment ```powershell python -m venv .venv .venv\Scripts\Activate.ps1 ``` If `python` is not on PATH, use the WindowsApps interpreter: ```powershell & "$env:LOCALAPPDATA\Microsoft\WindowsApps\python3.11.exe" -m venv .venv ``` ## 3. Install Dependencies ```powershell python -m pip install -r requirements.txt ``` For tests and quality checks: ```powershell python -m pip install -r requirements-dev.txt ``` ## 4. Run The App ```powershell python app.py ``` ## 5. Run Tests ```powershell .\scripts\run_tests.ps1 ``` ## 6. Run Quality Checks ```powershell .\scripts\run_quality.ps1 ``` Open the local Gradio URL printed in the terminal, usually: ```text http://127.0.0.1:7860 ``` ## 7. Current App Behavior Local development can still use deterministic helper services for tests, but deployed Space mode uses `WORKBENCH_DEPLOYMENT=space`, hides placeholder backends, and requires real backend configuration for model calls. The app still does not download model weights on startup. Available tabs. User-triggered tab actions show Gradio progress while callbacks run, and the app uses compact responsive styling for narrow screens: - Chat - llama.cpp, llama-cpp-python, Ollama, OpenAI-compatible/LM Studio, SGLang, or Transformers text inference with tab status/errors. - Vision - llama.cpp, llama-cpp-python, Ollama, or Transformers MiniCPM image + prompt inference with tab status/errors. - Dataset - local CSV/JSONL/NDJSON preview, optional Hugging Face dataset preview, stats, and tab status/errors. - Train - LoRA dry-run training plan plus local base-vs-tuned exact-match evaluation. - Export - GGUF download/conversion/quantization planning, exported-file listing, and existing-file downloads. - Field Notes - saves human corrections to CSV, imports uncertain OCR predictions, captures media paths/training flags, exports corrected JSONL, corrected OCR JSONL, and local HF Dataset files. - Traces - local event preview, JSONL trace rows, tracking status, and trace export. - Agent - local non-autonomous research-plan-implement-verify trace mode. - Status - shows configured models, backend metadata, local llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop planning, and Ollama list/pull planning. Ollama is optional and is not installed automatically. Install and start Ollama yourself, then pull a compatible model explicitly before selecting the Ollama backend in the app. The Status tab can list models from a running local Ollama server and prepare an explicit `ollama pull ` command. It shows the command only; it does not run downloads for you. llama.cpp is the preferred hackathon backend path. Install llama.cpp separately, open the Status tab, pick or type an explicit GGUF path, optionally pick an mmproj GGUF for vision, then click `Prepare local model config`. The app writes `data/local_backends.yaml`, shows the `llama-server` command, and still does not download or load model weights on startup. Start `llama-server` yourself with that command, then select the `llama.cpp` backend in the app. `llama-cpp-python` is also available as an optional backend when the package is installed and a local GGUF path is configured. LM Studio or another local OpenAI-compatible server is optional and is not a cloud API path. Start the local server yourself, open the Status tab, set the server URL such as `http://127.0.0.1:1234`, optionally enter the exact served model name shown by LM Studio, and click `Save OpenAI-compatible config`. `Check server` calls `/v1/models`; selecting `openai-compatible` in the Chat tab posts to `/v1/chat/completions` only after you submit a prompt. No model weights are downloaded or loaded on app startup. The Transformers text backend is optional. It requires installing `transformers` and a compatible PyTorch build for your CPU/GPU, and it may download model weights when you explicitly select it and run a prompt. It is not installed automatically and is not used on startup. The Transformers vision backend is optional for MiniCPM-V models. It uses `AutoProcessor` and `AutoModelForImageTextToText` only after you select `transformers` in the Vision tab and submit an image. The current app maps the thinking toggle into the prompt template and documents video as a future frame-sampling path; it does not download or load vision weights on startup. SGLang is optional. The Status tab can prepare a local `python -m sglang.launch_server` command, check `/health`, request `/shutdown`, and the Chat tab can use the `sglang` backend against `/v1/chat/completions` after you start the server yourself. The app does not start SGLang on startup. Dataset preview supports local `.csv`, `.jsonl`, and `.ndjson` files, split names for optional Hugging Face dataset preview, and basic local statistics. Field Notes can export corrected training rows to `data/field_notes.jsonl` and local HF Dataset-style files to `data/hf_field_notes/`. The OCR correction hook expects a local `.csv`, `.jsonl`, or `.ndjson` prediction file with fields such as `source_path`, `text`, and `confidence`. The Field Notes tab can preview rows at or below a confidence threshold, import those uncertain rows as correction tasks, and export corrected OCR rows to `data/ocr_corrections.jsonl`. This is the local OCR -> Field Notes -> Training path: import uncertain predictions, correct them in Field Notes, then export corrected JSONL/HF Dataset files for future fine-tuning or evaluation. Local MCP-style tools live in `mcp_tools/tools.py`. The selected MCP path is Gradio native MCP, enabled by `launch(mcp_server=True)`. The documented endpoint path is `/gradio_api/mcp/sse`. `mcp_tools/bridge.py` exposes a local manifest and invocation helper for dataset stats, optional HF dataset preview, safe arithmetic, model inference, and non-executing VINDEX call planning. Full external MCP client verification still depends on launching the app and connecting a client. The VINDEX helper is intentionally planning-only. It validates the PRD method names, reports whether the local `vindex` package or `http://127.0.0.1:8765/health` endpoint is available, and caps risky edit parameters. It does not edit model weights. The Agent tab drafts local research-plan-implement-verify traces and paper-to-code traces, stores them in `data/agent_traces.jsonl`, and can export JSONL or local HF Dataset-style trace files. It includes safety gates and does not run shell commands, commit, push, deploy, download models, or call external services. Agent traces can later become persona/source material for the Desk-Pet extension, but no Desk-Pet runtime is implemented yet. The Export tab is a planning surface. It shows explicit `huggingface-cli`, `convert_hf_to_gguf.py`, and `llama-quantize` commands for the selected model and quantization, plus a list of files already present in the export directory. Existing files are exposed through the download output. It does not run those commands yet. The Train tab does not start LoRA training yet. It builds a dry-run plan from `config/training.yaml`, validates the dataset path, shows the planned checkpoint output directory, documents hardware expectations, and builds a non-executing LoRA trainer request with PEFT/TRL dependency status. It can also run a local evaluation by comparing newline-separated base and tuned responses against the built-in prompt cases. It reports exact match, can calculate perplexity from optional negative log likelihood values, shows a qualitative table, and appends tuned results to `data/eval_results.jsonl`. The vLLM tab is a local serving planner. It prepares an explicit `vllm serve ...` command, checks `/health`, fetches `/metrics`, logs parsed benchmark metrics through local tracking, and points the chat client at `/v1/chat/completions` only after you start vLLM yourself. It does not install vLLM or start a server on app startup. Tracing writes local events to `data/traces.jsonl` by default. The Traces tab can show recent app events, show JSONL trace rows, report whether optional Trackio is available, and export local traces to `exports/traces.jsonl`. Remote Trackio/HF sync still requires credentials and setup. ## 8. Hugging Face Space Deployment The repo includes Hugging Face Space metadata in `README.md` and a local planning helper: ```powershell .venv\Scripts\python.exe scripts\plan_hf_space.py --user ``` The helper validates required files and prints the manual commands for login, Space creation, remote setup, and push. It does not login, create a repo, push, or store tokens. Required Space files: - `app.py` - `requirements.txt` - `README.md` - `config/models.yaml` - `config/training.yaml` Target Spaces: - Workbench: `https://huggingface.co/spaces/build-small-hackathon/workbench` - Plant Identification Tool: `https://huggingface.co/spaces/build-small-hackathon/plant_identification_tool` Use `hf auth login` with a freshly generated token. The app does not download model weights on startup; model downloads happen only through explicit backend actions such as `ollama pull`, `hf download`, or selecting a real Transformers backend and running a prompt. Upgrade Space hardware if real OpenBMB text or vision inference cannot run on the default CPU tier. Trackio/HF sync path: local traces are written to JSONL first, then optional Trackio can be enabled in `config/training.yaml` after package availability and credentials are ready. Synthetic data and reward evaluation helpers are local Python utilities. `datasets/synthetic.py` can generate, validate, filter, augment, and export JSONL examples. `training/reward_eval.py` can score supplied responses, select best-of-N candidates, create DPO chosen/rejected pairs, and compare base responses against LoRA responses. They do not load reward models or call external services. ## 9. How To Work With Codex Useful prompts: ```text Read docs/TASKS.md and implement the next unchecked MVP task. ``` ```text Read docs/IMPLEMENTATION_STATUS.md and tell me what is blocked. ``` ```text Wire the next backend, but keep automatic model downloads disabled on startup. ``` ```text Update the docs after the change and mark the matching checklist item. ``` ```text If this failed, add or update a test that catches it, then fix the code. ``` ## 10. What To Avoid - Do not start with the whole PRD at once. - Do not download huge models automatically. - Do not mark tasks done before running or documenting the blocker. - Do not add features without tests. - Do not add broad coverage escapes such as pragma no cover without a documented reason. - Do not push secrets, tokens, model caches, GGUF files, or virtual environments to git.