File size: 7,317 Bytes
7f9dfed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# Template How-To: Build A New Domain App

This repository is a local-first Gradio AI app template. The base workbench provides shared
patterns for model configuration, field notes, tracking, export planning, tests, docs, and
deployment. A domain app is a focused product built around those patterns.

Use `plant/` as the first reference domain app.

## Core Principle

Do not start by training a model. Start by shipping a useful zero-shot or demo-mode workflow:

```text
domain idea
  -> user story
  -> schema
  -> model choice
  -> focused UI
  -> correction loop
  -> export data
  -> optional fine-tune
  -> deploy and document
```

Training is a later optimization after you have corrected examples and a reason to tune.

## Recommended Branch Flow

1. Keep `main` as the reusable template.
2. Create a branch for each app:

   ```powershell
   git checkout -b plant-discovery-app
   ```

3. Build the app under a domain folder such as `plant/`, `invoice/`, `recipe/`, or `field_notes/`.
4. Keep domain-specific heavy requirements in `<domain>/requirements.txt`.
5. Merge reusable improvements back into `main` only after they are generic.

## Domain App File Contract

Each generated app should have these files:

```text
<domain>/
  __init__.py
  app.py              # standalone Gradio entrypoint
  models.yaml         # domain config, model IDs, data sources, training defaults
  <domain>_service.py # optional real model adapter plus demo/no-model fallback
  <domain>_loader.py  # data loading, schema normalization, export rows
  <domain>_tab.py     # focused Gradio UI
  <domain>_tools.py   # optional MCP/local tools with no hard optional imports
  requirements.txt    # optional heavy dependencies for this app only
```

Add tests under:

```text
tests/unit/test_<domain>_reference_app.py
```

Add docs under:

```text
docs/<DOMAIN>_APP_PLAN.md
```

## Step-By-Step Build Process

### 1. Define The Product

- [ ] Pick one user.
- [ ] Pick one job they need done.
- [ ] Write one sentence: "This app helps X do Y without Z."
- [ ] Choose one golden path that works in under two minutes.
- [ ] Decide whether the app is a standalone product or a tab inside the workbench.
- [ ] Decide whether it must run on a public Hugging Face Space.

Example:

> Plant Discovery helps gardeners identify a plant from a photo, correct mistakes, and export
> local training examples without sending private field notes to a cloud API.

### 2. Define The Domain Schema

- [ ] Create a dataclass for the structured output.
- [ ] Include confidence and model metadata.
- [ ] Include a `to_dict()` method for Gradio JSON.
- [ ] Add a robust parser for model responses.
- [ ] Add tests for valid JSON, fenced JSON, trailing commas, and unparseable text.

Plant example: `PlantID` in `plant/plant_service.py`.

### 3. Pick The Model

- [ ] Pick a small model at or below 32B parameters.
- [ ] Document the exact model ID.
- [ ] Add model metadata to `<domain>/models.yaml`.
- [ ] Avoid loading weights on startup.
- [ ] Add a deterministic demo/no-model service for screenshots and tests.
- [ ] Add an unavailable-path response when optional packages are missing.
- [ ] Add explicit runtime modes such as `demo`, `base-model`, and `finetuned`.
- [ ] Do not claim a fine-tuned model until a real adapter/checkpoint is configured and verified.

For vision apps, start with a VLM such as MiniCPM-V. For text apps, start with a small instruct
model through LM Studio, Ollama, llama.cpp, or Transformers.

### 4. Build The Focused UI

- [ ] Make the first screen the golden path, not a generic dashboard.
- [ ] Add only the controls needed for the user story.
- [ ] Keep advanced setup behind a secondary tab or accordion.
- [ ] Add visible status messages.
- [ ] Add structured JSON output for debugging and reproducibility.
- [ ] Add correction capture if model output can be wrong.
- [ ] Add screenshots through Playwright after the UI is stable.

### 5. Add The Correction Loop

- [ ] Save user corrections locally.
- [ ] Reuse `datasets.field_notes.FieldNoteStore` where possible.
- [ ] Mark training-ready rows explicitly.
- [ ] Export JSONL without starting training.
- [ ] Add tests for save, filter, and export.

### 6. Add Data Loaders

- [ ] Support a small local demo dataset.
- [ ] Support domain data from local folders or CSV/JSONL.
- [ ] Keep Hugging Face dataset loading optional and explicit.
- [ ] Do not download large datasets on startup.
- [ ] Normalize every source into one training row schema.
- [ ] Add loader tests with temporary local files.

### 7. Add Optional Tools

- [ ] Keep MCP/tool imports optional.
- [ ] Tool functions should work locally without starting a server.
- [ ] Add `build_mcp_server()` only if `mcp` is installed.
- [ ] Avoid direct shell execution from tools.
- [ ] Return command plans rather than running commands.
- [ ] Add tests for pure tool functions.

### 8. Add Training Plans

- [ ] Start with a non-executing training plan.
- [ ] Include required dependencies, hardware notes, and command preview.
- [ ] Require enough corrected examples before recommending training.
- [ ] Keep real training as a separate local command or approved action.
- [ ] Add evaluation before/after tuning.
- [ ] Add a small script that prints the training plan as JSON.

### 9. Add Security Guardrails

- [ ] Escape model text rendered as HTML.
- [ ] Restrict file paths in public Space mode.
- [ ] Disable arbitrary backend URL checks in public Space mode.
- [ ] Do not execute subprocesses from Gradio callbacks.
- [ ] Keep tokens, private data, model weights, and exports out of git.
- [ ] Add tests for path traversal and malformed inputs when public deployment is planned.

### 10. Verify The App

Minimum local verification:

```powershell
.venv\Scripts\python.exe -m pytest tests/unit/test_<domain>_reference_app.py -q
.venv\Scripts\ruff.exe check <domain> tests/unit/test_<domain>_reference_app.py --no-cache
.venv\Scripts\python.exe -m mypy <domain> tests/unit/test_<domain>_reference_app.py --cache-dir "$env:TEMP\openbmb-workbench-mypy-cache"
.venv\Scripts\python.exe -c "from <domain>.app import build_app; app=build_app(no_model=True); print(type(app).__name__)"
```

Before claiming it works:

- [ ] Run the standalone app.
- [ ] Generate screenshots.
- [ ] Add screenshot links to README/docs.
- [ ] Run full quality checks.
- [ ] Commit and push.

## When To Integrate Into The Main Workbench

Keep the domain app standalone if:

- it has its own brand/story,
- it needs a focused judging experience,
- it has domain-specific dependencies,
- it should become a Hugging Face Space.

Add it to the main workbench only if:

- it is a generic reusable tab,
- it does not add heavy dependencies,
- it strengthens the template for all future apps.

For the hackathon, standalone `plant/` is the better route because judges need one clear product.

## What "Done" Means For A Domain App

- [ ] Standalone no-model app builds.
- [ ] Optional real model adapter is documented and lazy-loaded.
- [ ] Golden path has tests.
- [ ] Corrections export to training data.
- [ ] Training is planned, not accidentally executed.
- [ ] Screenshots are generated.
- [ ] README explains setup, model choice, demo flow, and limitations.
- [ ] Space deployment is verified or blocker is documented.