Spaces:

proti0070
/

try

Sleeping

App Files Files Community

try / scripts /LOCAL_MODEL_SETUP.md

proti0070

Upload folder using huggingface_hub

d75ac2b verified 19 days ago

preview code

raw

history blame contribute delete

5.04 kB

	# Local Model Setup Guide for HuggingClaw

	This guide explains how to run small language models (≤1B) locally on HuggingFace Spaces using Ollama.

	## Why Local Models?

	- Free: No API costs - runs on HF Spaces free tier
	- Private: All inference happens inside your container
	- Fast: 0.6B models achieve 20-50 tokens/second on CPU
	- Always Available: No rate limits or downtime

	## Supported Models

	\| Model \| Size \| Speed (CPU) \| RAM \| Recommended \|
	\|-------\|------\|-------------\|-----\|-------------\|
	\| NeuralNexusLab/HacKing \| 0.6B \| 20-50 t/s \| 500MB \| ✅ Best \|
	\| TinyLlama-1.1B \| 1.1B \| 10-20 t/s \| 1GB \| ✅ Good \|
	\| Qwen-1.5B \| 1.5B \| 8-15 t/s \| 1.5GB \| ⚠️ OK \|
	\| Phi-2 \| 2.7B \| 3-8 t/s \| 2GB \| ⚠️ Slower \|

	## Quick Start

	### Step 1: Set Environment Variables

	In your HuggingFace Space Settings → Repository secrets, add:

	```bash
	LOCAL_MODEL_ENABLED=true
	LOCAL_MODEL_NAME=neuralnexuslab/hacking
	LOCAL_MODEL_ID=neuralnexuslab/hacking
	LOCAL_MODEL_NAME_DISPLAY=NeuralNexus HacKing 0.6B
	```

	### Step 2: Deploy

	Push your changes or redeploy the Space. On startup:

	1. Ollama server starts on port 11434
	2. The model is pulled from Ollama library (~30 seconds)
	3. OpenClaw configures the local provider
	4. Model appears in Control UI

	### Step 3: Use

	1. Open your Space URL
	2. Enter gateway token (default: `huggingclaw`)
	3. Select "NeuralNexus HacKing 0.6B" from model dropdown
	4. Start chatting!

	## Advanced Configuration

	### Custom Model from HuggingFace

	For models not in Ollama library:

	```bash
	# Set in HF Spaces secrets
	LOCAL_MODEL_NAME=hf.co/NeuralNexusLab/HacKing
	LOCAL_MODEL_ID=neuralnexuslab/hacking
	```

	### Using Custom Modelfile

	1. Create `Modelfile` (see `scripts/Modelfile.HacKing`)
	2. Add to your project
	3. In `entrypoint.sh`, add after Ollama start:

	```bash
	if [ -f /home/node/scripts/Modelfile.HacKing ]; then
	ollama create neuralnexuslab/hacking -f /home/node/scripts/Modelfile.HacKing
	fi
	```

	### Performance Tuning

	```bash
	# Number of parallel requests
	OLLAMA_NUM_PARALLEL=2

	# Keep model loaded (-1 = forever)
	OLLAMA_KEEP_ALIVE=-1

	# Context window size
	# Set in Modelfile: PARAMETER num_ctx 2048
	```

	## Troubleshooting

	### Model Not Appearing

	1. Check logs: `docker logs <container>`
	2. Look for: `[SYNC] Set local model provider`
	3. Verify `LOCAL_MODEL_ENABLED=true`

	### Slow Inference

	1. Use smaller models (≤1B)
	2. Reduce `OLLAMA_NUM_PARALLEL=1`
	3. Decrease `num_ctx` in Modelfile

	### Out of Memory

	1. HF Spaces has 16GB RAM - should be enough for 0.6B
	2. Check other processes: `docker stats`
	3. Reduce model size or quantization

	### Model Pull Fails

	1. Check internet connectivity
	2. Try alternative: `LOCAL_MODEL_NAME=hf.co/username/model`
	3. Use pre-quantized GGUF format

	## Architecture

	```
	┌─────────────────────────────────────────────┐
	│ HuggingFace Spaces Container │
	│ │
	│ ┌──────────────┐ ┌──────────────────┐ │
	│ │ Ollama │ │ OpenClaw │ │
	│ │ :11434 │───►│ Gateway :7860 │ │
	│ │ HacKing │ │ - WhatsApp │ │
	│ │ 0.6B │ │ - Telegram │ │
	│ └──────────────┘ └──────────────────┘ │
	│ │
	│ /home/node/.ollama/models (persisted) │
	└─────────────────────────────────────────────┘
	```

	## Cost Comparison

	\| Setup \| Cost/Month \| Speed \| Privacy \|
	\|-------\|-----------\|-------\|---------\|
	\| Local (HF Free) \| $0 \| 20-50 t/s \| ✅ Full \|
	\| OpenRouter Free \| $0 \| 10-30 t/s \| ⚠️ Shared \|
	\| HF Inference Endpoint \| ~$400 \| 50-100 t/s \| ✅ Full \|
	\| Self-hosted GPU \| ~$50+ \| 100+ t/s \| ✅ Full \|

	## Best Practices

	1. Start Small: Begin with 0.6B models, upgrade if needed
	2. Monitor RAM: Keep usage under 8GB for stability
	3. Use Quantization: GGUF Q4_K_M offers best speed/quality
	4. Persist Models: Store in `/home/node/.ollama/models`
	5. Set Defaults: Use `LOCAL_MODEL_*` for auto-selection

	## Example: WhatsApp Bot with Local AI

	```bash
	# HF Spaces secrets
	LOCAL_MODEL_ENABLED=true
	LOCAL_MODEL_NAME=neuralnexuslab/hacking
	HF_TOKEN=hf_xxxxx
	AUTO_CREATE_DATASET=true

	# WhatsApp credentials (set in Control UI)
	WHATSAPP_PHONE=+1234567890
	WHATSAPP_CODE=ABC123
	```

	Result: Free, always-on WhatsApp AI bot!

	## Next Steps

	1. Test with default 0.6B model
	2. Experiment with different models
	3. Customize Modelfile for your use case
	4. Share your setup with the community!

	## Support

	- Issues: https://github.com/openclaw/openclaw/issues
	- Ollama Docs: https://ollama.ai/docs
	- HF Spaces: https://huggingface.co/docs/hub/spaces