File size: 5,039 Bytes
d75ac2b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
# Local Model Setup Guide for HuggingClaw

This guide explains how to run small language models (≀1B) locally on HuggingFace Spaces using Ollama.

## Why Local Models?

- **Free**: No API costs - runs on HF Spaces free tier
- **Private**: All inference happens inside your container
- **Fast**: 0.6B models achieve 20-50 tokens/second on CPU
- **Always Available**: No rate limits or downtime

## Supported Models

| Model | Size | Speed (CPU) | RAM | Recommended |
|-------|------|-------------|-----|-------------|
| NeuralNexusLab/HacKing | 0.6B | 20-50 t/s | 500MB | βœ… Best |
| TinyLlama-1.1B | 1.1B | 10-20 t/s | 1GB | βœ… Good |
| Qwen-1.5B | 1.5B | 8-15 t/s | 1.5GB | ⚠️ OK |
| Phi-2 | 2.7B | 3-8 t/s | 2GB | ⚠️ Slower |

## Quick Start

### Step 1: Set Environment Variables

In your HuggingFace Space **Settings β†’ Repository secrets**, add:

```bash
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
LOCAL_MODEL_ID=neuralnexuslab/hacking
LOCAL_MODEL_NAME_DISPLAY=NeuralNexus HacKing 0.6B
```

### Step 2: Deploy

Push your changes or redeploy the Space. On startup:

1. Ollama server starts on port 11434
2. The model is pulled from Ollama library (~30 seconds)
3. OpenClaw configures the local provider
4. Model appears in Control UI

### Step 3: Use

1. Open your Space URL
2. Enter gateway token (default: `huggingclaw`)
3. Select "NeuralNexus HacKing 0.6B" from model dropdown
4. Start chatting!

## Advanced Configuration

### Custom Model from HuggingFace

For models not in Ollama library:

```bash
# Set in HF Spaces secrets
LOCAL_MODEL_NAME=hf.co/NeuralNexusLab/HacKing
LOCAL_MODEL_ID=neuralnexuslab/hacking
```

### Using Custom Modelfile

1. Create `Modelfile` (see `scripts/Modelfile.HacKing`)
2. Add to your project
3. In `entrypoint.sh`, add after Ollama start:

```bash
if [ -f /home/node/scripts/Modelfile.HacKing ]; then
  ollama create neuralnexuslab/hacking -f /home/node/scripts/Modelfile.HacKing
fi
```

### Performance Tuning

```bash
# Number of parallel requests
OLLAMA_NUM_PARALLEL=2

# Keep model loaded (-1 = forever)
OLLAMA_KEEP_ALIVE=-1

# Context window size
# Set in Modelfile: PARAMETER num_ctx 2048
```

## Troubleshooting

### Model Not Appearing

1. Check logs: `docker logs <container>`
2. Look for: `[SYNC] Set local model provider`
3. Verify `LOCAL_MODEL_ENABLED=true`

### Slow Inference

1. Use smaller models (≀1B)
2. Reduce `OLLAMA_NUM_PARALLEL=1`
3. Decrease `num_ctx` in Modelfile

### Out of Memory

1. HF Spaces has 16GB RAM - should be enough for 0.6B
2. Check other processes: `docker stats`
3. Reduce model size or quantization

### Model Pull Fails

1. Check internet connectivity
2. Try alternative: `LOCAL_MODEL_NAME=hf.co/username/model`
3. Use pre-quantized GGUF format

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  HuggingFace Spaces Container               β”‚
β”‚                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Ollama     β”‚    β”‚   OpenClaw       β”‚  β”‚
β”‚  β”‚   :11434     │───►│   Gateway :7860  β”‚  β”‚
β”‚  β”‚   HacKing    β”‚    β”‚   - WhatsApp     β”‚  β”‚
β”‚  β”‚   0.6B       β”‚    β”‚   - Telegram     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                             β”‚
β”‚  /home/node/.ollama/models (persisted)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Cost Comparison

| Setup | Cost/Month | Speed | Privacy |
|-------|-----------|-------|---------|
| Local (HF Free) | $0 | 20-50 t/s | βœ… Full |
| OpenRouter Free | $0 | 10-30 t/s | ⚠️ Shared |
| HF Inference Endpoint | ~$400 | 50-100 t/s | βœ… Full |
| Self-hosted GPU | ~$50+ | 100+ t/s | βœ… Full |

## Best Practices

1. **Start Small**: Begin with 0.6B models, upgrade if needed
2. **Monitor RAM**: Keep usage under 8GB for stability
3. **Use Quantization**: GGUF Q4_K_M offers best speed/quality
4. **Persist Models**: Store in `/home/node/.ollama/models`
5. **Set Defaults**: Use `LOCAL_MODEL_*` for auto-selection

## Example: WhatsApp Bot with Local AI

```bash
# HF Spaces secrets
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
HF_TOKEN=hf_xxxxx
AUTO_CREATE_DATASET=true

# WhatsApp credentials (set in Control UI)
WHATSAPP_PHONE=+1234567890
WHATSAPP_CODE=ABC123
```

Result: Free, always-on WhatsApp AI bot!

## Next Steps

1. Test with default 0.6B model
2. Experiment with different models
3. Customize Modelfile for your use case
4. Share your setup with the community!

## Support

- Issues: https://github.com/openclaw/openclaw/issues
- Ollama Docs: https://ollama.ai/docs
- HF Spaces: https://huggingface.co/docs/hub/spaces