noticecheck / docs /local_model_setup.md
Abid Ali Awan
Add local CUDA deployment and reject non-notice images
8ccb3e9
|
Raw
History Blame Contribute Delete
2.15 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Local CUDA model setup

The Docker Compose deployment runs the application completely locally:

  • openbmb/MiniCPM5-1B through Transformers
  • nvidia/NVIDIA-Nemotron-Parse-v1.2 through Transformers
  • PyTorch CUDA on one local NVIDIA GPU

It does not set SPACE_ID, request ZeroGPU, or call a remote inference API. Internet access is required on the first run to download model files from Hugging Face.

Prerequisites

Use a Linux amd64 Docker host with:

  • Docker Engine and Docker Compose 2.30 or newer
  • an NVIDIA GPU supported by CUDA 12.8
  • a current NVIDIA driver
  • NVIDIA Container Toolkit configured for Docker

On Windows, use Docker Desktop with its WSL 2 backend and an NVIDIA driver that supports CUDA in WSL.

Confirm GPU access before building the application:

docker run --rm --gpus all \
  pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime \
  python -c "import torch; print(torch.cuda.is_available())"

Start

docker compose up --build

The startup process preloads Nemotron-Parse before opening port 7860, so the first health check may take several minutes. MiniCPM5-1B loads on the first non-cached assessment. Model files persist in the huggingface-cache volume.

Open http://localhost:7860.

Configuration

Compose sets these required values:

MODEL_RUNTIME=transformers
REQUIRE_CUDA=1
HF_HOME=/root/.cache/huggingface

Optional .env values:

NOTICECHECK_PORT=7860
TRANSFORMERS_MODEL_REPO=openbmb/MiniCPM5-1B
MODEL_ENABLE_THINKING=0
HF_TOKEN=

REQUIRE_CUDA=1 prevents accidental CPU fallback. If Docker cannot expose the GPU, model status reports that CUDA is unavailable and startup fails while preloading OCR.

Operations

View logs:

docker compose logs --follow noticecheck

Check the GPU from the application image:

docker compose run --rm noticecheck python -c \
  "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"

Stop containers while retaining downloaded models:

docker compose down

Delete containers and persistent model/trace data:

docker compose down --volumes