metadata
name: gpu-space-cpu
description: >-
Convert GPU HuggingFace Spaces to CPU-only for free tier deployment. Use when
user says "make this run on CPU", "convert to CPU Space", "remove GPU
dependencies", "deploy without GPU", "free HuggingFace Space", or "port CUDA
app to CPU". Also use when analyzing repos with bitsandbytes, flash-attn, or
CUDA dependencies for CPU compatibility.
GPU β CPU Space demo Conversion
- Free space Tier CPU = 2 CPU cores (vCPUs) + 16GB RAM + 50GB non-persistent disk space
Workflow
- Grep GPU deps:
@spaces.GPU|bitsandbytes|flash-attn|triton|xformers|auto-gptq|exllama|apex|\.cuda\(|device.*cuda - Remove GPU packages, replace code:
cudaβcpu,float16βint8, remove.half(),.cuda(),device_map="auto" - Create: app.py (all logic, prioritize recent built-in gradio v6+), README.md, requirements.txt
- Test:
pip install -r requirements.txt && python app.pyand Test API by adding mcp https://www.gradio.app/guides/building-mcp-server-with-gradio - Deploy after local test passes
requirements.txt
--extra-index-url https://download.pytorch.org/whl/cpu
torch
Never pin transitive deps. No packages.txt (ffmpeg/git/cmake pre-installed).
README.md
---
title: Name
emoji: X
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
---
Quantization (model too large)
- ONNX Runtime (Optimum+OMP_NUM_THREADS=2) INT8 (preferred - onnxruntime fast on CPU)
- TorchAO
- torch.quantization.quantize_dynamic
- Onnx FP32 is faster than FP16 on CPU
Stop
- Model >12GB with INT8 β needs GPU
- INT8 quality loss β needs GPU
- 3 failed approaches β tell user honestly