noticecheck

Running on Zero

App Files Files Community

noticecheck / docs /local_model_setup.md

Abid Ali Awan

Add local CUDA deployment and reject non-notice images

8ccb3e9 17 days ago

preview code

Raw

History Blame Contribute Delete

2.15 kB

	# Local CUDA model setup

	The Docker Compose deployment runs the application completely locally:

	- `openbmb/MiniCPM5-1B` through Transformers
	- `nvidia/NVIDIA-Nemotron-Parse-v1.2` through Transformers
	- PyTorch CUDA on one local NVIDIA GPU

	It does not set `SPACE_ID`, request ZeroGPU, or call a remote inference API.
	Internet access is required on the first run to download model files from
	Hugging Face.

	## Prerequisites

	Use a Linux amd64 Docker host with:

	- Docker Engine and Docker Compose 2.30 or newer
	- an NVIDIA GPU supported by CUDA 12.8
	- a current NVIDIA driver
	- NVIDIA Container Toolkit configured for Docker

	On Windows, use Docker Desktop with its WSL 2 backend and an NVIDIA driver that
	supports CUDA in WSL.

	Confirm GPU access before building the application:

	```bash
	docker run --rm --gpus all \
	pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime \
	python -c "import torch; print(torch.cuda.is_available())"
	```

	## Start

	```bash
	docker compose up --build
	```

	The startup process preloads Nemotron-Parse before opening port `7860`, so the
	first health check may take several minutes. MiniCPM5-1B loads on the first
	non-cached assessment. Model files persist in the `huggingface-cache` volume.

	Open <http://localhost:7860>.

	## Configuration

	Compose sets these required values:

	```text
	MODEL_RUNTIME=transformers
	REQUIRE_CUDA=1
	HF_HOME=/root/.cache/huggingface
	```

	Optional `.env` values:

	```dotenv
	NOTICECHECK_PORT=7860
	TRANSFORMERS_MODEL_REPO=openbmb/MiniCPM5-1B
	MODEL_ENABLE_THINKING=0
	HF_TOKEN=
	```

	`REQUIRE_CUDA=1` prevents accidental CPU fallback. If Docker cannot expose the
	GPU, model status reports that CUDA is unavailable and startup fails while
	preloading OCR.

	## Operations

	View logs:

	```bash
	docker compose logs --follow noticecheck
	```

	Check the GPU from the application image:

	```bash
	docker compose run --rm noticecheck python -c \
	"import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
	```

	Stop containers while retaining downloaded models:

	```bash
	docker compose down
	```

	Delete containers and persistent model/trace data:

	```bash
	docker compose down --volumes
	```