Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.13.0
GGUF Backend Setup Guide
Quick Start (Recommended)
Since llama-cpp-python doesn't yet support LightOnOCR, we must build llama.cpp locally.
1. Build llama.cpp locally
# Clone repository
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Create build directory
mkdir build && cd build
# Build with Metal support (MacOS)
cmake .. -DGGML_METAL=ON
cmake --build . --config Release -j 8
# Verify build
./bin/llama-mtmd-cli --help
2. Download GGUF Model
# Return to project root
cd ../../
# Run download script
python download_gguf_model.py
3. Use GGUF Backend
# CLI
python ocr_cli.py document.pdf --backend gguf
# Gradio UI
python app.py
# Select "gguf" from backend dropdown
Performance
The custom built llama-mtmd-cli provides incredible performance on Apple Silicon:
| Backend | Time per Page | Speedup |
|---|---|---|
| PyTorch (Original) | ~4 mins | 1x |
| PyTorch (Optimized) | ~40 sec | 6x |
| GGUF (llama-mtmd-cli) | ~3 sec | 80x ⭐ |
Troubleshooting
"llama-mtmd-cli binary not found"
Ensure you successfully built llama.cpp and the binary exists at llama.cpp/build/bin/llama-mtmd-cli.
"GGUF model not found"
Run python download_gguf_model.py to download the required model files.