# Field Notes: Making NoticeCheck Fully Local

NoticeCheck started as a cloud-backed version of my Pakistan Notice Helper app.
For the Hugging Face Hackathon, I rebuilt it so the same pipeline could also run
locally with Docker Compose and an NVIDIA GPU.

## What I Tried

I tested several vision-language model setups before settling on the current
architecture. Smaller MiniCPM-V experiments were not reliable enough on
high-risk scam cases. Qwen experiments performed better, but introduced larger
models, separate vision projectors, cold starts, and more infrastructure.

The final app uses:

- `openbmb/MiniCPM5-1B` for structured notice assessment
- `nvidia/NVIDIA-Nemotron-Parse-v1.2` for screenshot text extraction
- Hugging Face ZeroGPU for the hosted demo
- Docker Compose and local CUDA for private local deployment

## Problems I Hit

Structured output was one of the first major issues. Models sometimes returned
incomplete or malformed JSON. I added a strict schema, bounded prompts,
normalization, retries, and a repair pass so every successful result follows the
same contract.

The ZeroGPU deployment exposed several integration problems:

- CUDA and PyTorch ABI mismatches
- missing OCR dependencies such as `einops`, `open_clip_torch`, and `ftfy`
- GPU quota handling and Hugging Face iframe token forwarding
- model-loading and cold-start failures hidden behind worker wrappers

Screenshot handling also required more than OCR. Ordinary photos could produce
image descriptions or parser output instead of notice text. Sending that output
to the language model caused generic generation failures. I added semantic
region filtering and a dedicated warning that asks the user to upload a clear
notice or message screenshot.

The local Docker build revealed another practical problem: one Python dependency
needed compilation, so the CUDA image required `build-essential`. CUDA base
images and model caches are also large, which made persistent volumes and Docker
disk cleanup important parts of testing.

## What I Learned

Making an AI application local is not only about downloading model weights. A
usable local product also needs:

- reproducible GPU and dependency setup
- predictable structured output
- explicit input validation
- clear user-facing failure messages
- privacy-aware tracing
- persistent model caching
- realistic disk and VRAM planning

I also learned to treat model evaluation as part of product development. A model
that works in a simple smoke test may still fail on phishing links, OTP theft,
Roman Urdu screenshots, harmless reminders, or the application's JSON contract.

## Result

NoticeCheck now has a redesigned English interface and can run in two modes:

- hosted on Hugging Face ZeroGPU
- fully local on an NVIDIA GPU

The local version starts with:

```bash
docker compose up --build
```

## Links

- [Live demo](https://huggingface.co/spaces/build-small-hackathon/noticecheck)
- [GitHub repository](https://github.com/kingabzpro/local-notice-check)
- [Privacy-safe trace dataset](https://huggingface.co/datasets/build-small-hackathon/pakistan-notice-helper-traces)
- [LinkedIn project post](https://www.linkedin.com/posts/1abidaliawan_huggingfacehackathon-huggingface-ai-ugcPost-7471594790506192896--_53/)