File size: 7,664 Bytes
dae60e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
# Deployment Setup: Modal β†’ HF Space

This guide walks through deploying the three Modal model services and wiring
their endpoint URLs into the public Hugging Face Space as secrets.

---

## What gets deployed

| Service | Model | Env var | App name |
|---------|-------|---------|----------|
| Receipt OCR | MiniCPM-V 4.6 | `MODAL_RECEIPT_ENDPOINT` | `dukaan-saathi-receipt-vlm` |
| Speech ASR | Distil-Whisper small | `MODAL_SPEECH_ENDPOINT` | `dukaan-saathi-speech-asr` |
| Voice NLU | Qwen2.5-1.5B-Instruct | `MODAL_NLU_ENDPOINT` | `dukaan-saathi-command-nlu` |

All three are optional β€” the app falls back to deterministic parsers when any
endpoint is missing. Deploy whichever you want active on the Space.

---

## Part 1 β€” Prerequisites

### 1.1 Modal account and CLI

Create a free Modal account at https://modal.com if you do not have one.

Install the Modal CLI and log in:

```bash
uv add modal                  # adds to this project's venv
uv run modal setup            # opens a browser to authenticate
```

After `modal setup` completes, verify you are logged in:

```bash
uv run modal token show
```

You should see your workspace name (e.g., `zappandy`).

### 1.2 Hugging Face account

You need a Hugging Face account with write access to the Space at
`https://huggingface.co/spaces/Zappandy/Kirana_AI`. If this is your Space
you already have access.

---

## Part 2 β€” Deploy Modal services

Run each deploy command from the project root. Each command:

1. Deploys (or re-deploys) the Modal app
2. Fetches the generated endpoint URL from Modal
3. Writes it to your local `.env` file

### 2.1 Receipt image OCR (MiniCPM-V)

```bash
scripts/modal_deploy.sh modal_apps/receipt_vlm_service.py
```

When done, `.env` will contain:

```text
MODAL_RECEIPT_ENDPOINT=https://<workspace>--dukaan-saathi-receipt-vlm-api.modal.run/extract
```

Verify it is responding (replace with your actual URL from `.env`):

```bash
source scripts/_env.sh
curl "${MODAL_RECEIPT_ENDPOINT%/extract}/health"
```

Expected response:

```json
{"status": "ok", "model": "openbmb/MiniCPM-V-2_6"}
```

First call may take 30–60 seconds while the GPU container starts. Subsequent
calls within the `scaledown_window` are fast.

### 2.2 Speech transcription (Distil-Whisper)

```bash
scripts/modal_deploy.sh modal_apps/speech_asr_service.py
```

When done, `.env` will contain:

```text
MODAL_SPEECH_ENDPOINT=https://<workspace>--speech-transcribe.modal.run
```

Verify:

```bash
source scripts/_env.sh
SPEECH_HEALTH="${MODAL_SPEECH_ENDPOINT/speech-transcribe/speech-health}"
curl "$SPEECH_HEALTH"
```

Expected:

```json
{"status": "ok", "model": "distil-whisper/distil-small.en"}
```

### 2.3 Voice command NLU (Qwen2.5-1.5B-Instruct)

```bash
scripts/modal_deploy.sh modal_apps/command_nlu_service.py
```

When done, `.env` will contain:

```text
MODAL_NLU_ENDPOINT=https://<workspace>--nlu-extract.modal.run
```

Verify:

```bash
source scripts/_env.sh
curl -s -X POST "$MODAL_NLU_ENDPOINT" \
  -H "Content-Type: application/json" \
  -d '{"command": "add Bun 12"}' | python3 -m json.tool
```

Expected:

```json
{
  "intent": "add_stock",
  "product_name": "Bun",
  "quantity": 12,
  "unit": null,
  "confidence": "high",
  "model": "Qwen/Qwen2.5-1.5B-Instruct"
}
```

---

## Part 3 β€” Add secrets to the HF Space

The HF Space container does not read your local `.env` file. You must add each
endpoint URL as a Space secret through the Hugging Face web UI.

### 3.1 Open Space settings

1. Go to https://huggingface.co/spaces/Zappandy/Kirana_AI
2. Click the **Settings** tab (top of the Space page)
3. Scroll down to **Variables and secrets**

### 3.2 Add each secret

Click **New secret** for each of the following. Use the exact variable names
below β€” the app reads these from the environment at runtime.

| Secret name | Value |
|-------------|-------|
| `MODAL_RECEIPT_ENDPOINT` | the URL written to `.env` in step 2.1 |
| `MODAL_SPEECH_ENDPOINT` | the URL written to `.env` in step 2.2 |
| `MODAL_NLU_ENDPOINT` | the URL written to `.env` in step 2.3 |
| `HF_TOKEN` | your HF write token (only needed if `HF_RECEIPT_MODEL_REPO` is private) |
| `HF_RECEIPT_MODEL_REPO` | e.g. `Zappandy/dukaan-saathi-receipt-lora` |

Secrets are encrypted and only visible to the Space runtime β€” not to other
users or in the Space logs.

**Do not add** `DB_PATH` unless you have enabled persistent storage on the
Space. Without persistent storage, leave it unset and the DB stays
runtime-local (resets on restart).

### 3.3 Restart the Space

After adding secrets, click **Factory reset** or wait for the Space to rebuild
on its own. The new environment variables take effect on the next container
start.

To force an immediate rebuild, push any change to the Space remote:

```bash
git checkout --orphan _hf_tmp
git add -A
git commit -m "trigger rebuild"
git push space HEAD:main --force
git checkout main
git branch -D _hf_tmp
```

---

## Part 4 β€” Verify end-to-end on the Space

After the Space rebuilds:

1. Open the Space URL and wait for the app to finish loading
2. Go to **Voice** tab β†’ type `add Bun 12` β†’ click **Parse for approval**
   - The agent reasoning panel should show NLU steps if `MODAL_NLU_ENDPOINT` is set
3. Go to **Bill Desk** β†’ upload a receipt photo
   - Cold start message appears while MiniCPM-V loads (~30 s first time)
   - Editable rows appear after extraction
4. Go to **Voice** β†’ click **Transcribe with Modal** and upload a `.wav` file
   - Transcript fills in automatically

If any Modal service times out or returns an error, the app falls back to the
deterministic parser and shows a trace message explaining the fallback.

---

## Part 5 β€” Managing costs

Modal charges only for GPU time. Each service has a `scaledown_window=300` (5
minutes) β€” after 5 minutes of inactivity the container stops and you stop being
charged.

To stop all services immediately:

```bash
uv run modal app stop dukaan-saathi-receipt-vlm || true
uv run modal app stop dukaan-saathi-speech-asr  || true
uv run modal app stop dukaan-saathi-command-nlu || true
uv run modal app list
```

Look for `Tasks 0` in the output to confirm containers are stopped.

To redeploy after stopping (same commands as Part 2):

```bash
scripts/modal_deploy.sh modal_apps/receipt_vlm_service.py
scripts/modal_deploy.sh modal_apps/speech_asr_service.py
scripts/modal_deploy.sh modal_apps/command_nlu_service.py
```

The endpoint URLs do not change between deploys, so no HF Space secret update
is needed unless you deploy under a different workspace.

---

## Troubleshooting

**`modal setup` hangs or fails**
Run `uv run modal token show`. If it shows no token, re-run `uv run modal setup`
and complete the browser authentication flow.

**Deploy command fails with "app not found"**
Check you are in the project root (`ls modal_apps/` should list the service
files) and that `uv` has Modal installed (`uv run modal --version`).

**HF Space shows no NLU trace after rebuild**
Confirm the secret name is exactly `MODAL_NLU_ENDPOINT` (no spaces, correct
case). Check the Space logs (Settings β†’ Logs) for any startup errors.

**`curl` health check returns 502 or times out**
The container is cold-starting. Wait 30–60 seconds and retry. Modal T4 GPU
containers take longer on first cold start because the model weights are
downloaded into the container volume.

**Endpoint URL has extra `/extract` suffix**
The `write_modal_endpoint.py` script derives the URL from the deployed function.
If it appends a route that the endpoint doesn't use, edit `.env` and the HF
Space secret to remove the suffix. Test the corrected URL with `curl` before
updating the secret.