# Local Model Setup Guide for HuggingClaw

This guide explains how to run small language models (≤1B) locally on HuggingFace Spaces using Ollama.

## Why Local Models?

- **Free**: No API costs - runs on HF Spaces free tier
- **Private**: All inference happens inside your container
- **Fast**: 0.6B models achieve 20-50 tokens/second on CPU
- **Always Available**: No rate limits or downtime

## Supported Models

| Model | Size | Speed (CPU) | RAM | Recommended |
|-------|------|-------------|-----|-------------|
| NeuralNexusLab/HacKing | 0.6B | 20-50 t/s | 500MB | ✅ Best |
| TinyLlama-1.1B | 1.1B | 10-20 t/s | 1GB | ✅ Good |
| Qwen-1.5B | 1.5B | 8-15 t/s | 1.5GB | ⚠️ OK |
| Phi-2 | 2.7B | 3-8 t/s | 2GB | ⚠️ Slower |

## Quick Start

### Step 1: Set Environment Variables

In your HuggingFace Space **Settings → Repository secrets**, add:

```bash
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
LOCAL_MODEL_ID=neuralnexuslab/hacking
LOCAL_MODEL_NAME_DISPLAY=NeuralNexus HacKing 0.6B
```

### Step 2: Deploy

Push your changes or redeploy the Space. On startup:

1. Ollama server starts on port 11434
2. The model is pulled from Ollama library (~30 seconds)
3. OpenClaw configures the local provider
4. Model appears in Control UI

### Step 3: Use

1. Open your Space URL
2. Enter gateway token (default: `huggingclaw`)
3. Select "NeuralNexus HacKing 0.6B" from model dropdown
4. Start chatting!

## Advanced Configuration

### Custom Model from HuggingFace

For models not in Ollama library:

```bash
# Set in HF Spaces secrets
LOCAL_MODEL_NAME=hf.co/NeuralNexusLab/HacKing
LOCAL_MODEL_ID=neuralnexuslab/hacking
```

### Using Custom Modelfile

1. Create `Modelfile` (see `scripts/Modelfile.HacKing`)
2. Add to your project
3. In `entrypoint.sh`, add after Ollama start:

```bash
if [ -f /home/node/scripts/Modelfile.HacKing ]; then
  ollama create neuralnexuslab/hacking -f /home/node/scripts/Modelfile.HacKing
fi
```

### Performance Tuning

```bash
# Number of parallel requests
OLLAMA_NUM_PARALLEL=2

# Keep model loaded (-1 = forever)
OLLAMA_KEEP_ALIVE=-1

# Context window size
# Set in Modelfile: PARAMETER num_ctx 2048
```

## Troubleshooting

### Model Not Appearing

1. Check logs: `docker logs <container>`
2. Look for: `[SYNC] Set local model provider`
3. Verify `LOCAL_MODEL_ENABLED=true`

### Slow Inference

1. Use smaller models (≤1B)
2. Reduce `OLLAMA_NUM_PARALLEL=1`
3. Decrease `num_ctx` in Modelfile

### Out of Memory

1. HF Spaces has 16GB RAM - should be enough for 0.6B
2. Check other processes: `docker stats`
3. Reduce model size or quantization

### Model Pull Fails

1. Check internet connectivity
2. Try alternative: `LOCAL_MODEL_NAME=hf.co/username/model`
3. Use pre-quantized GGUF format

## Architecture

```
┌─────────────────────────────────────────────┐
│  HuggingFace Spaces Container               │
│                                             │
│  ┌──────────────┐    ┌──────────────────┐  │
│  │   Ollama     │    │   OpenClaw       │  │
│  │   :11434     │───►│   Gateway :7860  │  │
│  │   HacKing    │    │   - WhatsApp     │  │
│  │   0.6B       │    │   - Telegram     │  │
│  └──────────────┘    └──────────────────┘  │
│                                             │
│  /home/node/.ollama/models (persisted)     │
└─────────────────────────────────────────────┘
```

## Cost Comparison

| Setup | Cost/Month | Speed | Privacy |
|-------|-----------|-------|---------|
| Local (HF Free) | $0 | 20-50 t/s | ✅ Full |
| OpenRouter Free | $0 | 10-30 t/s | ⚠️ Shared |
| HF Inference Endpoint | ~$400 | 50-100 t/s | ✅ Full |
| Self-hosted GPU | ~$50+ | 100+ t/s | ✅ Full |

## Best Practices

1. **Start Small**: Begin with 0.6B models, upgrade if needed
2. **Monitor RAM**: Keep usage under 8GB for stability
3. **Use Quantization**: GGUF Q4_K_M offers best speed/quality
4. **Persist Models**: Store in `/home/node/.ollama/models`
5. **Set Defaults**: Use `LOCAL_MODEL_*` for auto-selection

## Example: WhatsApp Bot with Local AI

```bash
# HF Spaces secrets
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
HF_TOKEN=hf_xxxxx
AUTO_CREATE_DATASET=true

# WhatsApp credentials (set in Control UI)
WHATSAPP_PHONE=+1234567890
WHATSAPP_CODE=ABC123
```

Result: Free, always-on WhatsApp AI bot!

## Next Steps

1. Test with default 0.6B model
2. Experiment with different models
3. Customize Modelfile for your use case
4. Share your setup with the community!

## Support

- Issues: https://github.com/openclaw/openclaw/issues
- Ollama Docs: https://ollama.ai/docs
- HF Spaces: https://huggingface.co/docs/hub/spaces