Spaces:

proti0070
/

try

Sleeping

App Files Files Community

try / scripts /LOCAL_MODEL_SETUP.md

proti0070

Upload folder using huggingface_hub

d75ac2b verified 18 days ago

preview code

raw

history blame contribute delete

5.04 kB

Local Model Setup Guide for HuggingClaw

This guide explains how to run small language models (≤1B) locally on HuggingFace Spaces using Ollama.

Why Local Models?

Free: No API costs - runs on HF Spaces free tier
Private: All inference happens inside your container
Fast: 0.6B models achieve 20-50 tokens/second on CPU
Always Available: No rate limits or downtime

Supported Models

Model	Size	Speed (CPU)	RAM	Recommended
NeuralNexusLab/HacKing	0.6B	20-50 t/s	500MB	✅ Best
TinyLlama-1.1B	1.1B	10-20 t/s	1GB	✅ Good
Qwen-1.5B	1.5B	8-15 t/s	1.5GB	⚠️ OK
Phi-2	2.7B	3-8 t/s	2GB	⚠️ Slower

Quick Start

Step 1: Set Environment Variables

In your HuggingFace Space Settings → Repository secrets, add:

LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
LOCAL_MODEL_ID=neuralnexuslab/hacking
LOCAL_MODEL_NAME_DISPLAY=NeuralNexus HacKing 0.6B

Step 2: Deploy

Push your changes or redeploy the Space. On startup:

Ollama server starts on port 11434
The model is pulled from Ollama library (~30 seconds)
OpenClaw configures the local provider
Model appears in Control UI

Step 3: Use

Open your Space URL
Enter gateway token (default: huggingclaw)
Select "NeuralNexus HacKing 0.6B" from model dropdown
Start chatting!

Advanced Configuration

Custom Model from HuggingFace

For models not in Ollama library:

# Set in HF Spaces secrets
LOCAL_MODEL_NAME=hf.co/NeuralNexusLab/HacKing
LOCAL_MODEL_ID=neuralnexuslab/hacking

Using Custom Modelfile

Create Modelfile (see scripts/Modelfile.HacKing)
Add to your project
In entrypoint.sh, add after Ollama start:

if [ -f /home/node/scripts/Modelfile.HacKing ]; then
  ollama create neuralnexuslab/hacking -f /home/node/scripts/Modelfile.HacKing
fi

Performance Tuning

# Number of parallel requests
OLLAMA_NUM_PARALLEL=2

# Keep model loaded (-1 = forever)
OLLAMA_KEEP_ALIVE=-1

# Context window size
# Set in Modelfile: PARAMETER num_ctx 2048

Troubleshooting

Model Not Appearing

Check logs: docker logs <container>
Look for: [SYNC] Set local model provider
Verify LOCAL_MODEL_ENABLED=true

Slow Inference

Use smaller models (≤1B)
Reduce OLLAMA_NUM_PARALLEL=1
Decrease num_ctx in Modelfile

Out of Memory

HF Spaces has 16GB RAM - should be enough for 0.6B
Check other processes: docker stats
Reduce model size or quantization

Model Pull Fails

Check internet connectivity
Try alternative: LOCAL_MODEL_NAME=hf.co/username/model
Use pre-quantized GGUF format

Architecture

┌─────────────────────────────────────────────┐
│  HuggingFace Spaces Container               │
│                                             │
│  ┌──────────────┐    ┌──────────────────┐  │
│  │   Ollama     │    │   OpenClaw       │  │
│  │   :11434     │───►│   Gateway :7860  │  │
│  │   HacKing    │    │   - WhatsApp     │  │
│  │   0.6B       │    │   - Telegram     │  │
│  └──────────────┘    └──────────────────┘  │
│                                             │
│  /home/node/.ollama/models (persisted)     │
└─────────────────────────────────────────────┘

Cost Comparison

Setup	Cost/Month	Speed	Privacy
Local (HF Free)	$0	20-50 t/s	✅ Full
OpenRouter Free	$0	10-30 t/s	⚠️ Shared
HF Inference Endpoint	~$400	50-100 t/s	✅ Full
Self-hosted GPU	~$50+	100+ t/s	✅ Full

Best Practices

Start Small: Begin with 0.6B models, upgrade if needed
Monitor RAM: Keep usage under 8GB for stability
Use Quantization: GGUF Q4_K_M offers best speed/quality
Persist Models: Store in /home/node/.ollama/models
Set Defaults: Use LOCAL_MODEL_* for auto-selection

Example: WhatsApp Bot with Local AI

# HF Spaces secrets
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
HF_TOKEN=hf_xxxxx
AUTO_CREATE_DATASET=true

# WhatsApp credentials (set in Control UI)
WHATSAPP_PHONE=+1234567890
WHATSAPP_CODE=ABC123

Result: Free, always-on WhatsApp AI bot!

Next Steps

Test with default 0.6B model
Experiment with different models
Customize Modelfile for your use case
Share your setup with the community!

Support

Issues: https://github.com/openclaw/openclaw/issues
Ollama Docs: https://ollama.ai/docs
HF Spaces: https://huggingface.co/docs/hub/spaces