try / scripts /LOCAL_MODEL_SETUP.md
proti0070's picture
Upload folder using huggingface_hub
d75ac2b verified

Local Model Setup Guide for HuggingClaw

This guide explains how to run small language models (≀1B) locally on HuggingFace Spaces using Ollama.

Why Local Models?

  • Free: No API costs - runs on HF Spaces free tier
  • Private: All inference happens inside your container
  • Fast: 0.6B models achieve 20-50 tokens/second on CPU
  • Always Available: No rate limits or downtime

Supported Models

Model Size Speed (CPU) RAM Recommended
NeuralNexusLab/HacKing 0.6B 20-50 t/s 500MB βœ… Best
TinyLlama-1.1B 1.1B 10-20 t/s 1GB βœ… Good
Qwen-1.5B 1.5B 8-15 t/s 1.5GB ⚠️ OK
Phi-2 2.7B 3-8 t/s 2GB ⚠️ Slower

Quick Start

Step 1: Set Environment Variables

In your HuggingFace Space Settings β†’ Repository secrets, add:

LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
LOCAL_MODEL_ID=neuralnexuslab/hacking
LOCAL_MODEL_NAME_DISPLAY=NeuralNexus HacKing 0.6B

Step 2: Deploy

Push your changes or redeploy the Space. On startup:

  1. Ollama server starts on port 11434
  2. The model is pulled from Ollama library (~30 seconds)
  3. OpenClaw configures the local provider
  4. Model appears in Control UI

Step 3: Use

  1. Open your Space URL
  2. Enter gateway token (default: huggingclaw)
  3. Select "NeuralNexus HacKing 0.6B" from model dropdown
  4. Start chatting!

Advanced Configuration

Custom Model from HuggingFace

For models not in Ollama library:

# Set in HF Spaces secrets
LOCAL_MODEL_NAME=hf.co/NeuralNexusLab/HacKing
LOCAL_MODEL_ID=neuralnexuslab/hacking

Using Custom Modelfile

  1. Create Modelfile (see scripts/Modelfile.HacKing)
  2. Add to your project
  3. In entrypoint.sh, add after Ollama start:
if [ -f /home/node/scripts/Modelfile.HacKing ]; then
  ollama create neuralnexuslab/hacking -f /home/node/scripts/Modelfile.HacKing
fi

Performance Tuning

# Number of parallel requests
OLLAMA_NUM_PARALLEL=2

# Keep model loaded (-1 = forever)
OLLAMA_KEEP_ALIVE=-1

# Context window size
# Set in Modelfile: PARAMETER num_ctx 2048

Troubleshooting

Model Not Appearing

  1. Check logs: docker logs <container>
  2. Look for: [SYNC] Set local model provider
  3. Verify LOCAL_MODEL_ENABLED=true

Slow Inference

  1. Use smaller models (≀1B)
  2. Reduce OLLAMA_NUM_PARALLEL=1
  3. Decrease num_ctx in Modelfile

Out of Memory

  1. HF Spaces has 16GB RAM - should be enough for 0.6B
  2. Check other processes: docker stats
  3. Reduce model size or quantization

Model Pull Fails

  1. Check internet connectivity
  2. Try alternative: LOCAL_MODEL_NAME=hf.co/username/model
  3. Use pre-quantized GGUF format

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  HuggingFace Spaces Container               β”‚
β”‚                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Ollama     β”‚    β”‚   OpenClaw       β”‚  β”‚
β”‚  β”‚   :11434     │───►│   Gateway :7860  β”‚  β”‚
β”‚  β”‚   HacKing    β”‚    β”‚   - WhatsApp     β”‚  β”‚
β”‚  β”‚   0.6B       β”‚    β”‚   - Telegram     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                             β”‚
β”‚  /home/node/.ollama/models (persisted)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Cost Comparison

Setup Cost/Month Speed Privacy
Local (HF Free) $0 20-50 t/s βœ… Full
OpenRouter Free $0 10-30 t/s ⚠️ Shared
HF Inference Endpoint ~$400 50-100 t/s βœ… Full
Self-hosted GPU ~$50+ 100+ t/s βœ… Full

Best Practices

  1. Start Small: Begin with 0.6B models, upgrade if needed
  2. Monitor RAM: Keep usage under 8GB for stability
  3. Use Quantization: GGUF Q4_K_M offers best speed/quality
  4. Persist Models: Store in /home/node/.ollama/models
  5. Set Defaults: Use LOCAL_MODEL_* for auto-selection

Example: WhatsApp Bot with Local AI

# HF Spaces secrets
LOCAL_MODEL_ENABLED=true
LOCAL_MODEL_NAME=neuralnexuslab/hacking
HF_TOKEN=hf_xxxxx
AUTO_CREATE_DATASET=true

# WhatsApp credentials (set in Control UI)
WHATSAPP_PHONE=+1234567890
WHATSAPP_CODE=ABC123

Result: Free, always-on WhatsApp AI bot!

Next Steps

  1. Test with default 0.6B model
  2. Experiment with different models
  3. Customize Modelfile for your use case
  4. Share your setup with the community!

Support