# How To Use The Project

## 1. Install Python

Install Python 3.11 or newer. On Windows, make sure "Add Python to PATH" is enabled.

Verify:

```powershell
python --version
```

Before Python is available, you can still verify the repository structure:

```powershell
.\scripts\verify_structure.ps1
```

In this workspace, the direct WindowsApps interpreter worked even though `python` was not on PATH:

```powershell
& "$env:LOCALAPPDATA\Microsoft\WindowsApps\python3.11.exe" --version
```

## 2. Create A Virtual Environment

```powershell
python -m venv .venv
.venv\Scripts\Activate.ps1
```

If `python` is not on PATH, use the WindowsApps interpreter:

```powershell
& "$env:LOCALAPPDATA\Microsoft\WindowsApps\python3.11.exe" -m venv .venv
```

## 3. Install Dependencies

```powershell
python -m pip install -r requirements.txt
```

For tests and quality checks:

```powershell
python -m pip install -r requirements-dev.txt
```

## 4. Run The App

```powershell
python app.py
```

## 5. Run Tests

```powershell
.\scripts\run_tests.ps1
```

## 6. Run Quality Checks

```powershell
.\scripts\run_quality.ps1
```

Open the local Gradio URL printed in the terminal, usually:

```text
http://127.0.0.1:7860
```

## 7. Current App Behavior

Local development can still use deterministic helper services for tests, but deployed Space mode
uses `WORKBENCH_DEPLOYMENT=space`, hides placeholder backends, and requires real backend
configuration for model calls. The app still does not download model weights on startup.

Available tabs. User-triggered tab actions show Gradio progress while callbacks run, and the app
uses compact responsive styling for narrow screens:

- Chat - llama.cpp, llama-cpp-python, Ollama, OpenAI-compatible/LM Studio, SGLang, or Transformers text inference with tab status/errors.
- Vision - llama.cpp, llama-cpp-python, Ollama, or Transformers MiniCPM image + prompt inference with tab status/errors.
- Dataset - local CSV/JSONL/NDJSON preview, optional Hugging Face dataset preview, stats, and tab status/errors.
- Train - LoRA dry-run training plan plus local base-vs-tuned exact-match evaluation.
- Export - GGUF download/conversion/quantization planning, exported-file listing, and existing-file downloads.
- Field Notes - saves human corrections to CSV, imports uncertain OCR predictions, captures media paths/training flags, exports corrected JSONL, corrected OCR JSONL, and local HF Dataset files.
- Traces - local event preview, JSONL trace rows, tracking status, and trace export.
- Agent - local non-autonomous research-plan-implement-verify trace mode.
- Status - shows configured models, backend metadata, local llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop planning, and Ollama list/pull planning.

Ollama is optional and is not installed automatically. Install and start Ollama yourself, then
pull a compatible model explicitly before selecting the Ollama backend in the app. The Status tab
can list models from a running local Ollama server and prepare an explicit
`ollama pull <model>` command. It shows the command only; it does not run downloads for you.

llama.cpp is the preferred hackathon backend path. Install llama.cpp separately, open the Status
tab, pick or type an explicit GGUF path, optionally pick an mmproj GGUF for vision, then click
`Prepare local model config`. The app writes `data/local_backends.yaml`, shows the
`llama-server` command, and still does not download or load model weights on startup.
Start `llama-server` yourself with that command, then select the `llama.cpp` backend in the app.
`llama-cpp-python` is also available as an optional backend when the package is installed and a
local GGUF path is configured.

LM Studio or another local OpenAI-compatible server is optional and is not a cloud API path. Start
the local server yourself, open the Status tab, set the server URL such as
`http://127.0.0.1:1234`, optionally enter the exact served model name shown by LM Studio, and
click `Save OpenAI-compatible config`. `Check server` calls `/v1/models`; selecting
`openai-compatible` in the Chat tab posts to `/v1/chat/completions` only after you submit a prompt.
No model weights are downloaded or loaded on app startup.

The Transformers text backend is optional. It requires installing `transformers` and a compatible
PyTorch build for your CPU/GPU, and it may download model weights when you explicitly select it and
run a prompt. It is not installed automatically and is not used on startup.

The Transformers vision backend is optional for MiniCPM-V models. It uses `AutoProcessor` and
`AutoModelForImageTextToText` only after you select `transformers` in the Vision tab and submit an
image. The current app maps the thinking toggle into the prompt template and documents video as a
future frame-sampling path; it does not download or load vision weights on startup.

SGLang is optional. The Status tab can prepare a local `python -m sglang.launch_server` command,
check `/health`, request `/shutdown`, and the Chat tab can use the `sglang` backend against
`/v1/chat/completions` after you start the server yourself. The app does not start SGLang on
startup.

Dataset preview supports local `.csv`, `.jsonl`, and `.ndjson` files, split names for optional
Hugging Face dataset preview, and basic local statistics. Field Notes can export corrected
training rows to `data/field_notes.jsonl` and local HF Dataset-style files to
`data/hf_field_notes/`.

The OCR correction hook expects a local `.csv`, `.jsonl`, or `.ndjson` prediction file with fields
such as `source_path`, `text`, and `confidence`. The Field Notes tab can preview rows at or below a
confidence threshold, import those uncertain rows as correction tasks, and export corrected OCR rows
to `data/ocr_corrections.jsonl`. This is the local OCR -> Field Notes -> Training path: import
uncertain predictions, correct them in Field Notes, then export corrected JSONL/HF Dataset files for
future fine-tuning or evaluation.

Local MCP-style tools live in `mcp_tools/tools.py`. The selected MCP path is Gradio native MCP,
enabled by `launch(mcp_server=True)`. The documented endpoint path is `/gradio_api/mcp/sse`.
`mcp_tools/bridge.py` exposes a local manifest and invocation helper for dataset stats, optional
HF dataset preview, safe arithmetic, model inference, and non-executing VINDEX call planning. Full
external MCP client verification still depends on launching the app and connecting a client.

The VINDEX helper is intentionally planning-only. It validates the PRD method names, reports
whether the local `vindex` package or `http://127.0.0.1:8765/health` endpoint is available, and
caps risky edit parameters. It does not edit model weights.

The Agent tab drafts local research-plan-implement-verify traces and paper-to-code traces, stores
them in `data/agent_traces.jsonl`, and can export JSONL or local HF Dataset-style trace files. It
includes safety gates and does not run shell commands, commit, push, deploy, download models, or
call external services. Agent traces can later become persona/source material for the Desk-Pet
extension, but no Desk-Pet runtime is implemented yet.

The Export tab is a planning surface. It shows explicit `huggingface-cli`,
`convert_hf_to_gguf.py`, and `llama-quantize` commands for the selected model and quantization,
plus a list of files already present in the export directory. Existing files are exposed through
the download output. It does not run those commands yet.

The Train tab does not start LoRA training yet. It builds a dry-run plan from
`config/training.yaml`, validates the dataset path, shows the planned checkpoint output directory,
documents hardware expectations, and builds a non-executing LoRA trainer request with PEFT/TRL
dependency status. It can also run a local evaluation by comparing newline-separated base and tuned
responses against the built-in prompt cases. It reports exact match, can calculate perplexity from
optional negative log likelihood values, shows a qualitative table, and appends tuned results to
`data/eval_results.jsonl`.

The vLLM tab is a local serving planner. It prepares an explicit `vllm serve ...` command, checks
`/health`, fetches `/metrics`, logs parsed benchmark metrics through local tracking, and points the
chat client at `/v1/chat/completions` only after you start vLLM yourself. It does not install vLLM
or start a server on app startup.

Tracing writes local events to `data/traces.jsonl` by default. The Traces tab can show recent app
events, show JSONL trace rows, report whether optional Trackio is available, and export local
traces to `exports/traces.jsonl`. Remote Trackio/HF sync still requires credentials and setup.

## 8. Hugging Face Space Deployment

The repo includes Hugging Face Space metadata in `README.md` and a local planning helper:

```powershell
.venv\Scripts\python.exe scripts\plan_hf_space.py --user <hf-user-or-org>
```

The helper validates required files and prints the manual commands for login, Space creation,
remote setup, and push. It does not login, create a repo, push, or store tokens.

Required Space files:

- `app.py`
- `requirements.txt`
- `README.md`
- `config/models.yaml`
- `config/training.yaml`

Target Spaces:

- Workbench: `https://huggingface.co/spaces/build-small-hackathon/workbench`
- Plant Identification Tool: `https://huggingface.co/spaces/build-small-hackathon/plant_identification_tool`

Use `hf auth login` with a freshly generated token. The app does not download model weights on
startup; model downloads happen only through explicit backend actions such as `ollama pull`,
`hf download`, or selecting a real Transformers backend and running a prompt. Upgrade Space
hardware if real OpenBMB text or vision inference cannot run on the default CPU tier.

Trackio/HF sync path: local traces are written to JSONL first, then optional Trackio can be enabled
in `config/training.yaml` after package availability and credentials are ready.

Synthetic data and reward evaluation helpers are local Python utilities. `datasets/synthetic.py`
can generate, validate, filter, augment, and export JSONL examples. `training/reward_eval.py` can
score supplied responses, select best-of-N candidates, create DPO chosen/rejected pairs, and compare
base responses against LoRA responses. They do not load reward models or call external services.

## 9. How To Work With Codex

Useful prompts:

```text
Read docs/TASKS.md and implement the next unchecked MVP task.
```

```text
Read docs/IMPLEMENTATION_STATUS.md and tell me what is blocked.
```

```text
Wire the next backend, but keep automatic model downloads disabled on startup.
```

```text
Update the docs after the change and mark the matching checklist item.
```

```text
If this failed, add or update a test that catches it, then fix the code.
```

## 10. What To Avoid

- Do not start with the whole PRD at once.
- Do not download huge models automatically.
- Do not mark tasks done before running or documenting the blocker.
- Do not add features without tests.
- Do not add broad coverage escapes such as pragma no cover without a documented reason.
- Do not push secrets, tokens, model caches, GGUF files, or virtual environments to git.