Spaces:

Zappandy
/

Kirana_AI

Sleeping

App Files Files Community

Kirana_AI / docs /deployment_setup.md

Zappandy

Deploy to HF Space

dae60e5 15 days ago

preview code

Raw

History Blame Contribute Delete

7.66 kB

Deployment Setup: Modal → HF Space

This guide walks through deploying the three Modal model services and wiring their endpoint URLs into the public Hugging Face Space as secrets.

What gets deployed

Service	Model	Env var	App name
Receipt OCR	MiniCPM-V 4.6	`MODAL_RECEIPT_ENDPOINT`	`dukaan-saathi-receipt-vlm`
Speech ASR	Distil-Whisper small	`MODAL_SPEECH_ENDPOINT`	`dukaan-saathi-speech-asr`
Voice NLU	Qwen2.5-1.5B-Instruct	`MODAL_NLU_ENDPOINT`	`dukaan-saathi-command-nlu`

All three are optional — the app falls back to deterministic parsers when any endpoint is missing. Deploy whichever you want active on the Space.

Part 1 — Prerequisites

1.1 Modal account and CLI

Create a free Modal account at https://modal.com if you do not have one.

Install the Modal CLI and log in:

uv add modal                  # adds to this project's venv
uv run modal setup            # opens a browser to authenticate

After modal setup completes, verify you are logged in:

uv run modal token show

You should see your workspace name (e.g., zappandy).

1.2 Hugging Face account

You need a Hugging Face account with write access to the Space at https://huggingface.co/spaces/Zappandy/Kirana_AI. If this is your Space you already have access.

Part 2 — Deploy Modal services

Run each deploy command from the project root. Each command:

Deploys (or re-deploys) the Modal app
Fetches the generated endpoint URL from Modal
Writes it to your local .env file

2.1 Receipt image OCR (MiniCPM-V)

scripts/modal_deploy.sh modal_apps/receipt_vlm_service.py

When done, .env will contain:

MODAL_RECEIPT_ENDPOINT=https://<workspace>--dukaan-saathi-receipt-vlm-api.modal.run/extract

Verify it is responding (replace with your actual URL from .env):

source scripts/_env.sh
curl "${MODAL_RECEIPT_ENDPOINT%/extract}/health"

Expected response:

{"status": "ok", "model": "openbmb/MiniCPM-V-2_6"}

First call may take 30–60 seconds while the GPU container starts. Subsequent calls within the scaledown_window are fast.

2.2 Speech transcription (Distil-Whisper)

scripts/modal_deploy.sh modal_apps/speech_asr_service.py

When done, .env will contain:

MODAL_SPEECH_ENDPOINT=https://<workspace>--speech-transcribe.modal.run

Verify:

source scripts/_env.sh
SPEECH_HEALTH="${MODAL_SPEECH_ENDPOINT/speech-transcribe/speech-health}"
curl "$SPEECH_HEALTH"

Expected:

{"status": "ok", "model": "distil-whisper/distil-small.en"}

2.3 Voice command NLU (Qwen2.5-1.5B-Instruct)

scripts/modal_deploy.sh modal_apps/command_nlu_service.py

When done, .env will contain:

MODAL_NLU_ENDPOINT=https://<workspace>--nlu-extract.modal.run

Verify:

source scripts/_env.sh
curl -s -X POST "$MODAL_NLU_ENDPOINT" \
  -H "Content-Type: application/json" \
  -d '{"command": "add Bun 12"}' | python3 -m json.tool

Expected:

{
  "intent": "add_stock",
  "product_name": "Bun",
  "quantity": 12,
  "unit": null,
  "confidence": "high",
  "model": "Qwen/Qwen2.5-1.5B-Instruct"
}

Part 3 — Add secrets to the HF Space

The HF Space container does not read your local .env file. You must add each endpoint URL as a Space secret through the Hugging Face web UI.

3.1 Open Space settings

Go to https://huggingface.co/spaces/Zappandy/Kirana_AI
Click the Settings tab (top of the Space page)
Scroll down to Variables and secrets

3.2 Add each secret

Click New secret for each of the following. Use the exact variable names below — the app reads these from the environment at runtime.

Secret name	Value
`MODAL_RECEIPT_ENDPOINT`	the URL written to `.env` in step 2.1
`MODAL_SPEECH_ENDPOINT`	the URL written to `.env` in step 2.2
`MODAL_NLU_ENDPOINT`	the URL written to `.env` in step 2.3
`HF_TOKEN`	your HF write token (only needed if `HF_RECEIPT_MODEL_REPO` is private)
`HF_RECEIPT_MODEL_REPO`	e.g. `Zappandy/dukaan-saathi-receipt-lora`

Secrets are encrypted and only visible to the Space runtime — not to other users or in the Space logs.

Do not add DB_PATH unless you have enabled persistent storage on the Space. Without persistent storage, leave it unset and the DB stays runtime-local (resets on restart).

3.3 Restart the Space

After adding secrets, click Factory reset or wait for the Space to rebuild on its own. The new environment variables take effect on the next container start.

To force an immediate rebuild, push any change to the Space remote:

git checkout --orphan _hf_tmp
git add -A
git commit -m "trigger rebuild"
git push space HEAD:main --force
git checkout main
git branch -D _hf_tmp

Part 4 — Verify end-to-end on the Space

After the Space rebuilds:

Open the Space URL and wait for the app to finish loading
Go to Voice tab → type add Bun 12 → click Parse for approval
- The agent reasoning panel should show NLU steps if MODAL_NLU_ENDPOINT is set
Go to Bill Desk → upload a receipt photo
- Cold start message appears while MiniCPM-V loads (~30 s first time)
- Editable rows appear after extraction
Go to Voice → click Transcribe with Modal and upload a .wav file
- Transcript fills in automatically

If any Modal service times out or returns an error, the app falls back to the deterministic parser and shows a trace message explaining the fallback.

Part 5 — Managing costs

Modal charges only for GPU time. Each service has a scaledown_window=300 (5 minutes) — after 5 minutes of inactivity the container stops and you stop being charged.

To stop all services immediately:

uv run modal app stop dukaan-saathi-receipt-vlm || true
uv run modal app stop dukaan-saathi-speech-asr  || true
uv run modal app stop dukaan-saathi-command-nlu || true
uv run modal app list

Look for Tasks 0 in the output to confirm containers are stopped.

To redeploy after stopping (same commands as Part 2):

scripts/modal_deploy.sh modal_apps/receipt_vlm_service.py
scripts/modal_deploy.sh modal_apps/speech_asr_service.py
scripts/modal_deploy.sh modal_apps/command_nlu_service.py

The endpoint URLs do not change between deploys, so no HF Space secret update is needed unless you deploy under a different workspace.

Troubleshooting

modal setup hangs or fails Run uv run modal token show. If it shows no token, re-run uv run modal setup and complete the browser authentication flow.

Deploy command fails with "app not found" Check you are in the project root (ls modal_apps/ should list the service files) and that uv has Modal installed (uv run modal --version).

HF Space shows no NLU trace after rebuild Confirm the secret name is exactly MODAL_NLU_ENDPOINT (no spaces, correct case). Check the Space logs (Settings → Logs) for any startup errors.

curl health check returns 502 or times out The container is cold-starting. Wait 30–60 seconds and retry. Modal T4 GPU containers take longer on first cold start because the model weights are downloaded into the container volume.

Endpoint URL has extra /extract suffix The write_modal_endpoint.py script derives the URL from the deployed function. If it appends a route that the endpoint doesn't use, edit .env and the HF Space secret to remove the suffix. Test the corrected URL with curl before updating the secret.