LightOnOCR-1B-Demo / docs /gguf_setup.md
DocUA's picture
feat: update ggml kernels, webui components, model templates, and build configurations
eb133b8

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

GGUF Backend Setup Guide

Quick Start (Recommended)

Since llama-cpp-python doesn't yet support LightOnOCR, we must build llama.cpp locally.

1. Build llama.cpp locally

# Clone repository
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Create build directory
mkdir build && cd build

# Build with Metal support (MacOS)
cmake .. -DGGML_METAL=ON
cmake --build . --config Release -j 8

# Verify build
./bin/llama-mtmd-cli --help

2. Download GGUF Model

# Return to project root
cd ../../

# Run download script
python download_gguf_model.py

3. Use GGUF Backend

# CLI
python ocr_cli.py document.pdf --backend gguf

# Gradio UI
python app.py
# Select "gguf" from backend dropdown

Performance

The custom built llama-mtmd-cli provides incredible performance on Apple Silicon:

Backend Time per Page Speedup
PyTorch (Original) ~4 mins 1x
PyTorch (Optimized) ~40 sec 6x
GGUF (llama-mtmd-cli) ~3 sec 80x

Troubleshooting

"llama-mtmd-cli binary not found"

Ensure you successfully built llama.cpp and the binary exists at llama.cpp/build/bin/llama-mtmd-cli.

"GGUF model not found"

Run python download_gguf_model.py to download the required model files.