File size: 5,039 Bytes
d75ac2b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | # Local Model Setup Guide for HuggingClaw
This guide explains how to run small language models (β€1B) locally on HuggingFace Spaces using Ollama.
## Why Local Models?
- **Free**: No API costs - runs on HF Spaces free tier
- **Private**: All inference happens inside your container
- **Fast**: 0.6B models achieve 20-50 tokens/second on CPU
- **Always Available**: No rate limits or downtime
## Supported Models
| Model | Size | Speed (CPU) | RAM | Recommended |
|-------|------|-------------|-----|-------------|
| NeuralNexusLab/HacKing | 0.6B | 20-50 t/s | 500MB | β
Best |
| TinyLlama-1.1B | 1.1B | 10-20 t/s | 1GB | β
Good |
| Qwen-1.5B | 1.5B | 8-15 t/s | 1.5GB | β οΈ OK |
| Phi-2 | 2.7B | 3-8 t/s | 2GB | β οΈ Slower |
## Quick Start
### Step 1: Set Environment Variables
In your HuggingFace Space **Settings β Repository secrets**, add:
```bash
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
LOCAL_MODEL_ID=neuralnexuslab/hacking
LOCAL_MODEL_NAME_DISPLAY=NeuralNexus HacKing 0.6B
```
### Step 2: Deploy
Push your changes or redeploy the Space. On startup:
1. Ollama server starts on port 11434
2. The model is pulled from Ollama library (~30 seconds)
3. OpenClaw configures the local provider
4. Model appears in Control UI
### Step 3: Use
1. Open your Space URL
2. Enter gateway token (default: `huggingclaw`)
3. Select "NeuralNexus HacKing 0.6B" from model dropdown
4. Start chatting!
## Advanced Configuration
### Custom Model from HuggingFace
For models not in Ollama library:
```bash
# Set in HF Spaces secrets
LOCAL_MODEL_NAME=hf.co/NeuralNexusLab/HacKing
LOCAL_MODEL_ID=neuralnexuslab/hacking
```
### Using Custom Modelfile
1. Create `Modelfile` (see `scripts/Modelfile.HacKing`)
2. Add to your project
3. In `entrypoint.sh`, add after Ollama start:
```bash
if [ -f /home/node/scripts/Modelfile.HacKing ]; then
ollama create neuralnexuslab/hacking -f /home/node/scripts/Modelfile.HacKing
fi
```
### Performance Tuning
```bash
# Number of parallel requests
OLLAMA_NUM_PARALLEL=2
# Keep model loaded (-1 = forever)
OLLAMA_KEEP_ALIVE=-1
# Context window size
# Set in Modelfile: PARAMETER num_ctx 2048
```
## Troubleshooting
### Model Not Appearing
1. Check logs: `docker logs <container>`
2. Look for: `[SYNC] Set local model provider`
3. Verify `LOCAL_MODEL_ENABLED=true`
### Slow Inference
1. Use smaller models (β€1B)
2. Reduce `OLLAMA_NUM_PARALLEL=1`
3. Decrease `num_ctx` in Modelfile
### Out of Memory
1. HF Spaces has 16GB RAM - should be enough for 0.6B
2. Check other processes: `docker stats`
3. Reduce model size or quantization
### Model Pull Fails
1. Check internet connectivity
2. Try alternative: `LOCAL_MODEL_NAME=hf.co/username/model`
3. Use pre-quantized GGUF format
## Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββ
β HuggingFace Spaces Container β
β β
β ββββββββββββββββ ββββββββββββββββββββ β
β β Ollama β β OpenClaw β β
β β :11434 βββββΊβ Gateway :7860 β β
β β HacKing β β - WhatsApp β β
β β 0.6B β β - Telegram β β
β ββββββββββββββββ ββββββββββββββββββββ β
β β
β /home/node/.ollama/models (persisted) β
βββββββββββββββββββββββββββββββββββββββββββββββ
```
## Cost Comparison
| Setup | Cost/Month | Speed | Privacy |
|-------|-----------|-------|---------|
| Local (HF Free) | $0 | 20-50 t/s | β
Full |
| OpenRouter Free | $0 | 10-30 t/s | β οΈ Shared |
| HF Inference Endpoint | ~$400 | 50-100 t/s | β
Full |
| Self-hosted GPU | ~$50+ | 100+ t/s | β
Full |
## Best Practices
1. **Start Small**: Begin with 0.6B models, upgrade if needed
2. **Monitor RAM**: Keep usage under 8GB for stability
3. **Use Quantization**: GGUF Q4_K_M offers best speed/quality
4. **Persist Models**: Store in `/home/node/.ollama/models`
5. **Set Defaults**: Use `LOCAL_MODEL_*` for auto-selection
## Example: WhatsApp Bot with Local AI
```bash
# HF Spaces secrets
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
HF_TOKEN=hf_xxxxx
AUTO_CREATE_DATASET=true
# WhatsApp credentials (set in Control UI)
WHATSAPP_PHONE=+1234567890
WHATSAPP_CODE=ABC123
```
Result: Free, always-on WhatsApp AI bot!
## Next Steps
1. Test with default 0.6B model
2. Experiment with different models
3. Customize Modelfile for your use case
4. Share your setup with the community!
## Support
- Issues: https://github.com/openclaw/openclaw/issues
- Ollama Docs: https://ollama.ai/docs
- HF Spaces: https://huggingface.co/docs/hub/spaces
|