noticecheck / docs /local_model_setup.md
Abid Ali Awan
Add local CUDA deployment and reject non-notice images
8ccb3e9
|
Raw
History Blame Contribute Delete
2.15 kB
# Local CUDA model setup
The Docker Compose deployment runs the application completely locally:
- `openbmb/MiniCPM5-1B` through Transformers
- `nvidia/NVIDIA-Nemotron-Parse-v1.2` through Transformers
- PyTorch CUDA on one local NVIDIA GPU
It does not set `SPACE_ID`, request ZeroGPU, or call a remote inference API.
Internet access is required on the first run to download model files from
Hugging Face.
## Prerequisites
Use a Linux amd64 Docker host with:
- Docker Engine and Docker Compose 2.30 or newer
- an NVIDIA GPU supported by CUDA 12.8
- a current NVIDIA driver
- NVIDIA Container Toolkit configured for Docker
On Windows, use Docker Desktop with its WSL 2 backend and an NVIDIA driver that
supports CUDA in WSL.
Confirm GPU access before building the application:
```bash
docker run --rm --gpus all \
pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime \
python -c "import torch; print(torch.cuda.is_available())"
```
## Start
```bash
docker compose up --build
```
The startup process preloads Nemotron-Parse before opening port `7860`, so the
first health check may take several minutes. MiniCPM5-1B loads on the first
non-cached assessment. Model files persist in the `huggingface-cache` volume.
Open <http://localhost:7860>.
## Configuration
Compose sets these required values:
```text
MODEL_RUNTIME=transformers
REQUIRE_CUDA=1
HF_HOME=/root/.cache/huggingface
```
Optional `.env` values:
```dotenv
NOTICECHECK_PORT=7860
TRANSFORMERS_MODEL_REPO=openbmb/MiniCPM5-1B
MODEL_ENABLE_THINKING=0
HF_TOKEN=
```
`REQUIRE_CUDA=1` prevents accidental CPU fallback. If Docker cannot expose the
GPU, model status reports that CUDA is unavailable and startup fails while
preloading OCR.
## Operations
View logs:
```bash
docker compose logs --follow noticecheck
```
Check the GPU from the application image:
```bash
docker compose run --rm noticecheck python -c \
"import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
```
Stop containers while retaining downloaded models:
```bash
docker compose down
```
Delete containers and persistent model/trace data:
```bash
docker compose down --volumes
```