| # Local Model Setup Guide for HuggingClaw |
|
|
| This guide explains how to run small language models (β€1B) locally on HuggingFace Spaces using Ollama. |
|
|
| ## Why Local Models? |
|
|
| - **Free**: No API costs - runs on HF Spaces free tier |
| - **Private**: All inference happens inside your container |
| - **Fast**: 0.6B models achieve 20-50 tokens/second on CPU |
| - **Always Available**: No rate limits or downtime |
|
|
| ## Supported Models |
|
|
| | Model | Size | Speed (CPU) | RAM | Recommended | |
| |-------|------|-------------|-----|-------------| |
| | NeuralNexusLab/HacKing | 0.6B | 20-50 t/s | 500MB | β
Best | |
| | TinyLlama-1.1B | 1.1B | 10-20 t/s | 1GB | β
Good | |
| | Qwen-1.5B | 1.5B | 8-15 t/s | 1.5GB | β οΈ OK | |
| | Phi-2 | 2.7B | 3-8 t/s | 2GB | β οΈ Slower | |
|
|
| ## Quick Start |
|
|
| ### Step 1: Set Environment Variables |
|
|
| In your HuggingFace Space **Settings β Repository secrets**, add: |
|
|
| ```bash |
| LOCAL_MODEL_ENABLED=true |
| LOCAL_MODEL_NAME=neuralnexuslab/hacking |
| LOCAL_MODEL_ID=neuralnexuslab/hacking |
| LOCAL_MODEL_NAME_DISPLAY=NeuralNexus HacKing 0.6B |
| ``` |
|
|
| ### Step 2: Deploy |
|
|
| Push your changes or redeploy the Space. On startup: |
|
|
| 1. Ollama server starts on port 11434 |
| 2. The model is pulled from Ollama library (~30 seconds) |
| 3. OpenClaw configures the local provider |
| 4. Model appears in Control UI |
|
|
| ### Step 3: Use |
|
|
| 1. Open your Space URL |
| 2. Enter gateway token (default: `huggingclaw`) |
| 3. Select "NeuralNexus HacKing 0.6B" from model dropdown |
| 4. Start chatting! |
|
|
| ## Advanced Configuration |
|
|
| ### Custom Model from HuggingFace |
|
|
| For models not in Ollama library: |
|
|
| ```bash |
| # Set in HF Spaces secrets |
| LOCAL_MODEL_NAME=hf.co/NeuralNexusLab/HacKing |
| LOCAL_MODEL_ID=neuralnexuslab/hacking |
| ``` |
|
|
| ### Using Custom Modelfile |
|
|
| 1. Create `Modelfile` (see `scripts/Modelfile.HacKing`) |
| 2. Add to your project |
| 3. In `entrypoint.sh`, add after Ollama start: |
|
|
| ```bash |
| if [ -f /home/node/scripts/Modelfile.HacKing ]; then |
| ollama create neuralnexuslab/hacking -f /home/node/scripts/Modelfile.HacKing |
| fi |
| ``` |
|
|
| ### Performance Tuning |
|
|
| ```bash |
| # Number of parallel requests |
| OLLAMA_NUM_PARALLEL=2 |
| |
| # Keep model loaded (-1 = forever) |
| OLLAMA_KEEP_ALIVE=-1 |
| |
| # Context window size |
| # Set in Modelfile: PARAMETER num_ctx 2048 |
| ``` |
|
|
| ## Troubleshooting |
|
|
| ### Model Not Appearing |
|
|
| 1. Check logs: `docker logs <container>` |
| 2. Look for: `[SYNC] Set local model provider` |
| 3. Verify `LOCAL_MODEL_ENABLED=true` |
|
|
| ### Slow Inference |
|
|
| 1. Use smaller models (β€1B) |
| 2. Reduce `OLLAMA_NUM_PARALLEL=1` |
| 3. Decrease `num_ctx` in Modelfile |
|
|
| ### Out of Memory |
|
|
| 1. HF Spaces has 16GB RAM - should be enough for 0.6B |
| 2. Check other processes: `docker stats` |
| 3. Reduce model size or quantization |
|
|
| ### Model Pull Fails |
|
|
| 1. Check internet connectivity |
| 2. Try alternative: `LOCAL_MODEL_NAME=hf.co/username/model` |
| 3. Use pre-quantized GGUF format |
|
|
| ## Architecture |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββ |
| β HuggingFace Spaces Container β |
| β β |
| β ββββββββββββββββ ββββββββββββββββββββ β |
| β β Ollama β β OpenClaw β β |
| β β :11434 βββββΊβ Gateway :7860 β β |
| β β HacKing β β - WhatsApp β β |
| β β 0.6B β β - Telegram β β |
| β ββββββββββββββββ ββββββββββββββββββββ β |
| β β |
| β /home/node/.ollama/models (persisted) β |
| βββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| ## Cost Comparison |
|
|
| | Setup | Cost/Month | Speed | Privacy | |
| |-------|-----------|-------|---------| |
| | Local (HF Free) | $0 | 20-50 t/s | β
Full | |
| | OpenRouter Free | $0 | 10-30 t/s | β οΈ Shared | |
| | HF Inference Endpoint | ~$400 | 50-100 t/s | β
Full | |
| | Self-hosted GPU | ~$50+ | 100+ t/s | β
Full | |
|
|
| ## Best Practices |
|
|
| 1. **Start Small**: Begin with 0.6B models, upgrade if needed |
| 2. **Monitor RAM**: Keep usage under 8GB for stability |
| 3. **Use Quantization**: GGUF Q4_K_M offers best speed/quality |
| 4. **Persist Models**: Store in `/home/node/.ollama/models` |
| 5. **Set Defaults**: Use `LOCAL_MODEL_*` for auto-selection |
|
|
| ## Example: WhatsApp Bot with Local AI |
|
|
| ```bash |
| # HF Spaces secrets |
| LOCAL_MODEL_ENABLED=true |
| LOCAL_MODEL_NAME=neuralnexuslab/hacking |
| HF_TOKEN=hf_xxxxx |
| AUTO_CREATE_DATASET=true |
| |
| # WhatsApp credentials (set in Control UI) |
| WHATSAPP_PHONE=+1234567890 |
| WHATSAPP_CODE=ABC123 |
| ``` |
|
|
| Result: Free, always-on WhatsApp AI bot! |
|
|
| ## Next Steps |
|
|
| 1. Test with default 0.6B model |
| 2. Experiment with different models |
| 3. Customize Modelfile for your use case |
| 4. Share your setup with the community! |
|
|
| ## Support |
|
|
| - Issues: https://github.com/openclaw/openclaw/issues |
| - Ollama Docs: https://ollama.ai/docs |
| - HF Spaces: https://huggingface.co/docs/hub/spaces |
|
|