# Local Model Setup Guide for HuggingClaw This guide explains how to run small language models (≤1B) locally on HuggingFace Spaces using Ollama. ## Why Local Models? - **Free**: No API costs - runs on HF Spaces free tier - **Private**: All inference happens inside your container - **Fast**: 0.6B models achieve 20-50 tokens/second on CPU - **Always Available**: No rate limits or downtime ## Supported Models | Model | Size | Speed (CPU) | RAM | Recommended | |-------|------|-------------|-----|-------------| | NeuralNexusLab/HacKing | 0.6B | 20-50 t/s | 500MB | ✅ Best | | TinyLlama-1.1B | 1.1B | 10-20 t/s | 1GB | ✅ Good | | Qwen-1.5B | 1.5B | 8-15 t/s | 1.5GB | ⚠️ OK | | Phi-2 | 2.7B | 3-8 t/s | 2GB | ⚠️ Slower | ## Quick Start ### Step 1: Set Environment Variables In your HuggingFace Space **Settings → Repository secrets**, add: ```bash LOCAL_MODEL_ENABLED=true LOCAL_MODEL_NAME=neuralnexuslab/hacking LOCAL_MODEL_ID=neuralnexuslab/hacking LOCAL_MODEL_NAME_DISPLAY=NeuralNexus HacKing 0.6B ``` ### Step 2: Deploy Push your changes or redeploy the Space. On startup: 1. Ollama server starts on port 11434 2. The model is pulled from Ollama library (~30 seconds) 3. OpenClaw configures the local provider 4. Model appears in Control UI ### Step 3: Use 1. Open your Space URL 2. Enter gateway token (default: `huggingclaw`) 3. Select "NeuralNexus HacKing 0.6B" from model dropdown 4. Start chatting! ## Advanced Configuration ### Custom Model from HuggingFace For models not in Ollama library: ```bash # Set in HF Spaces secrets LOCAL_MODEL_NAME=hf.co/NeuralNexusLab/HacKing LOCAL_MODEL_ID=neuralnexuslab/hacking ``` ### Using Custom Modelfile 1. Create `Modelfile` (see `scripts/Modelfile.HacKing`) 2. Add to your project 3. In `entrypoint.sh`, add after Ollama start: ```bash if [ -f /home/node/scripts/Modelfile.HacKing ]; then ollama create neuralnexuslab/hacking -f /home/node/scripts/Modelfile.HacKing fi ``` ### Performance Tuning ```bash # Number of parallel requests OLLAMA_NUM_PARALLEL=2 # Keep model loaded (-1 = forever) OLLAMA_KEEP_ALIVE=-1 # Context window size # Set in Modelfile: PARAMETER num_ctx 2048 ``` ## Troubleshooting ### Model Not Appearing 1. Check logs: `docker logs ` 2. Look for: `[SYNC] Set local model provider` 3. Verify `LOCAL_MODEL_ENABLED=true` ### Slow Inference 1. Use smaller models (≤1B) 2. Reduce `OLLAMA_NUM_PARALLEL=1` 3. Decrease `num_ctx` in Modelfile ### Out of Memory 1. HF Spaces has 16GB RAM - should be enough for 0.6B 2. Check other processes: `docker stats` 3. Reduce model size or quantization ### Model Pull Fails 1. Check internet connectivity 2. Try alternative: `LOCAL_MODEL_NAME=hf.co/username/model` 3. Use pre-quantized GGUF format ## Architecture ``` ┌─────────────────────────────────────────────┐ │ HuggingFace Spaces Container │ │ │ │ ┌──────────────┐ ┌──────────────────┐ │ │ │ Ollama │ │ OpenClaw │ │ │ │ :11434 │───►│ Gateway :7860 │ │ │ │ HacKing │ │ - WhatsApp │ │ │ │ 0.6B │ │ - Telegram │ │ │ └──────────────┘ └──────────────────┘ │ │ │ │ /home/node/.ollama/models (persisted) │ └─────────────────────────────────────────────┘ ``` ## Cost Comparison | Setup | Cost/Month | Speed | Privacy | |-------|-----------|-------|---------| | Local (HF Free) | $0 | 20-50 t/s | ✅ Full | | OpenRouter Free | $0 | 10-30 t/s | ⚠️ Shared | | HF Inference Endpoint | ~$400 | 50-100 t/s | ✅ Full | | Self-hosted GPU | ~$50+ | 100+ t/s | ✅ Full | ## Best Practices 1. **Start Small**: Begin with 0.6B models, upgrade if needed 2. **Monitor RAM**: Keep usage under 8GB for stability 3. **Use Quantization**: GGUF Q4_K_M offers best speed/quality 4. **Persist Models**: Store in `/home/node/.ollama/models` 5. **Set Defaults**: Use `LOCAL_MODEL_*` for auto-selection ## Example: WhatsApp Bot with Local AI ```bash # HF Spaces secrets LOCAL_MODEL_ENABLED=true LOCAL_MODEL_NAME=neuralnexuslab/hacking HF_TOKEN=hf_xxxxx AUTO_CREATE_DATASET=true # WhatsApp credentials (set in Control UI) WHATSAPP_PHONE=+1234567890 WHATSAPP_CODE=ABC123 ``` Result: Free, always-on WhatsApp AI bot! ## Next Steps 1. Test with default 0.6B model 2. Experiment with different models 3. Customize Modelfile for your use case 4. Share your setup with the community! ## Support - Issues: https://github.com/openclaw/openclaw/issues - Ollama Docs: https://ollama.ai/docs - HF Spaces: https://huggingface.co/docs/hub/spaces