webxos's picture
Update README.md
1965e98 verified
---
library_name: transformers
pipeline_tag: text-generation
base_model: openai-community/gpt2
language:
- en
tags:
- transformers
- pytorch
- gguf
- gpt2
- gpt2-small
- 117M
- text-generation
- conversational
- grpo
- vae
- kv-cache
- distillation
- reinforcement-learning
- openclaw
- fallback-agent
- soul-md
- agent-framework
- tool-use
- task-automation
- dpo
- tool-masking
- uncertainty-estimation
- rag
- semantic-cache
- quantization
- pruning
- arxiv:2402.03300
license: apache-2.0
---
# 🧠 microclaw-for-openclaw – Fallback Agent for OpenClaw (v2026.2.17)
**Model ID:** `webxos/microclaw-for-openclaw-version-2026.2.17`
**Tags:** `openclaw`, `fallback-agent`, `grpo`, `vae`, `kv-cache`, `dpo`, `tool-masking`, `uncertainty`, `rag`, `semantic-cache`, `soul.md`, `huggingface-space`, `gguf`, `llm-distillation`
---
## 📌 Overview
**microclaw** (v2026.2.17) is a lightweight, distilled language model designed as a **fallback agent** for the [OpenClaw](https://openclaw.org) ecosystem. When the primary agent loses connectivity or requires offline operation, microclaw steps in to handle essential system tasks: file management, status checks, cron jobs, and simple Q&A.
**WARNING: You will need to train your own GGUF model locally, the microcaw.gguf presented in this repo is a lightweight placeholder so users can scale and build their own local models with llama.cpp.**
You will need to configure your own build locally from scratch with this model, it is still being developed and is under testing.
This version is made to integrate directly with Openclaw.ai 18789 port and in this README.md we will present multiple ways and optional ways
to configure this agent on your local Linux Debian based machines.
This version introduces **advanced training and inference enhancements**:
- **Tool‑use masking** and **schema‑first training** for reliable function calling.
- **Direct Preference Optimization (DPO)** to align outputs with human preferences.
- **Uncertainty estimation** with configurable thresholds for safe escalation.
- **Retrieval‑Augmented Generation (RAG)** with semantic chunking.
- **Semantic KV‑cache** for high‑similarity query reuse.
- **Quantization (down to 2‑bit)** and **pruning** for extreme memory efficiency.
The repository contains the full and partially trained model files, configuration (`soul.md`, `AGENTS.md`, `HEARTBEAT.md`, `SECURITY.md`), and export bundles ready for deployment to **Hugging Face Spaces** or local execution with OpenClaw.
---
## ✨ Key Features
- **GRPO (Group Relative Policy Optimization)** – Trains the agent with group‑wise advantage estimation for stable policy updates.
- **VAE Filter** – A Variational Autoencoder that filters low‑quality training samples, improving output coherence.
- **Tool‑Use Masking** – Masks non‑tool tokens during training to enforce strict schema adherence (JSON/YAML).
- **DPO (Direct Preference Optimization)** – Fine‑tunes on preference pairs to reduce hallucinations and improve helpfulness.
- **Uncertainty Estimation** – Monitors token‑level entropy and escalates to safe responses when confidence drops below a threshold.
- **RAG (Retrieval‑Augmented Generation)** – Retrieves relevant chunks from a local knowledge base (FAISS) to ground responses.
- **Semantic Cache** – Reuses previous generations for semantically similar queries, reducing latency and cost.
- **Quantization & Pruning** – Compress the model to 2‑8 bits and prune unimportant weights; backend support for AutoGPTQ, llama.cpp (GGUF), and bitsandbytes.
- **KV‑Cache** – Intelligent reuse of key/value states reduces inference latency by up to 78% (measured on local benchmarks).
- **Soul.md Configuration** – Define personality, sub‑agent rules, proactive tasks, and prompt injection defenses in plain Markdown.
- **Export Ready** – One‑click export to a **Hugging Face Space** (Docker‑based) or a portable ZIP archive.
- **Quantized (4‑bit GGUF)** – Optimized for memory‑constrained environments; runs smoothly on CPU.
---
### Part 1: Installation
Included are multiple guides and ways you can implement Microclaw into your custom build, with steps to further train the GGUF file locally:
**Read all steps carefully and find the right guide for your use case/setup, Not all options may work on your system. These guides are designed for specific use on Linux Debian systems.**
# 1.1 Installation Guide + System Update & Basic Tools
```bash
sudo apt update
sudo apt upgrade -y
sudo apt install -y curl wget git build-essential
```
# 1.2 Install Docker (for containerized execution)
```bash
# Add Docker's official GPG key and repository
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian bullseye stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io
# Add your user to the docker group (avoid sudo for every command)
sudo usermod -aG docker $USER
newgrp docker # activate group changes in current shell
```
# 1.3 Install Node.js (v22 or later) & TypeScript
```bash
# Using NodeSource repository for a modern Node.js version
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt install -y nodejs
# Install TypeScript globally
sudo npm install -g typescript
# Verify
node --version # should be v22.x or higher
tsc --version
```
# 1.4 Install SQLite (for memory & logs)
```bash
sudo apt install -y sqlite3 libsqlite3-dev
```
# Part 2: Microclaw Fallback Agent
The Microclaw agent is a Python‑based service (Flask + Transformers) that communicates with OpenClaw. You can install it using either a Python virtual environment (lightweight) or Conda (more reliable for PyTorch). Choose one method below.
# 2.1 Clone the Microclaw Repository
Create a parent directory for all agents:
```bash
sudo mkdir -p /opt/openclaw-agents
sudo chown -R $USER:$USER /opt/openclaw-agents
cd /opt/openclaw-agents
# Clone the Hugging Face repo (includes model files and soul configuration)
git lfs install
git clone https://huggingface.co/webxos/microclaw-for-openclaw-version-2026.2.17 microclaw-fallback
cd microclaw-fallback
```
Note: The .gguf model files are several hundred MB. If the download is interrupted, git lfs can resume. After cloning, verify the file sizes:
```bash
ls -lh *.gguf
```
They should be >100 MB, not 28 bytes. If they are still placeholders, run git lfs pull manually.
# 2.2 Option A: Install with Python Virtual Environment (venv)
```bash
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Upgrade pip and install dependencies
pip install --upgrade pip
pip install -r requirements.txt
```
If requirements.txt is missing, install core packages manually
```bash
pip install flask transformers torch sentence-transformers faiss-cpu --extra-index-url https://download.pytorch.org/whl/cpu
```
# 2.3 Option B: Install with Conda (Recommended for unstable networks)
```bash
# Download and install Miniconda (if not already present)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
source ~/miniconda3/bin/activate
# Create a dedicated environment with Python 3.11
conda create -y -n microclaw python=3.11
conda activate microclaw
# Install CPU‑only PyTorch from conda-forge (smaller, more reliable)
conda install -y pytorch torchvision torchaudio cpuonly -c pytorch
# Install the rest via pip
pip install flask transformers sentence-transformers faiss-cpu
```
# 2.4 Test the Agent Manually
```bash
# Make sure you are in the agent directory with the environment activated
python main.py
```
You should see output like * Running on http://127.0.0.1:18789. Press Ctrl+C to stop it.
# ⚙️ Part 3: Configure OpenClaw to Use the Microclaw Fallback
OpenClaw reads its configuration from a TOML file (typically ~/.config/openclaw/config.toml or /etc/openclaw/config.toml). You need to point it to your local Microclaw instance.
Find the port Microclaw listens on (default is 18789, defined in main.py):
```bash
grep port main.py
```
Edit the OpenClaw configuration (create it if it doesn't exist):
```bash
mkdir -p ~/.config/openclaw
nano ~/.config/openclaw/config.toml
Add or modify the [agent.fallback] section:
toml
[agent.fallback]
path = "/opt/openclaw-agents/microclaw-fallback"
port = 18789
enabled = true
```
If OpenClaw is already installed, restart it. (If you haven't installed OpenClaw yet, see Part 4 below.)
# 🐳 Part 4: Install & Run OpenClaw (the main framework)
The OpenClaw core is a Node.js/TypeScript application. You can run it directly from source or use the provided Docker image.
# 4.1 Run OpenClaw via Docker (easiest)
```bash
Pull the official OpenClaw image (adjust tag as needed)
docker pull openclaw/openclaw:latest
Run the container, mounting the config and agents directories
docker run -d \
--name openclaw \
-p 3000:3000 \
-v ~/.config/openclaw:/home/node/.config/openclaw \
-v /opt/openclaw-agents:/opt/openclaw-agents \
openclaw/openclaw:latest
```
# 4.2 Run OpenClaw from Source (for development)
```bash
Clone the OpenClaw repository
git clone https://github.com/openclaw/core.git openclaw-core
cd openclaw-core
Install dependencies
yarn install
Build TypeScript
yarn build
Start OpenClaw (it will read the config from ~/.config/openclaw/config.toml)
yarn start
```
# 🧪 Part 5: Verify the Integration
Check that Microclaw is running (either manually or via systemd):
```bash
curl http://localhost:18789/health
```
# 🔁 Guide to Microclaw Auto-Start (systemd)
To ensure the fallback agent starts on boot and restarts if it crashes, create a systemd service.
Create the service file:
```bash
sudo nano /etc/systemd/system/microclaw-fallback.service
Paste (adjust User and paths to match your setup):
ini
[Unit]
Description=Microclaw Fallback Agent for OpenClaw
After=network.target
[Service]
Type=simple
User=kali
WorkingDirectory=/opt/openclaw-agents/microclaw-fallback
Environment="PATH=/opt/openclaw-agents/microclaw-fallback/venv/bin"
ExecStart=/opt/openclaw-agents/microclaw-fallback/venv/bin/python /opt/openclaw-agents/microclaw-fallback/main.py
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
```
Enable and start:
```bash
sudo systemctl daemon-reload
sudo systemctl enable microclaw-fallback.service
sudo systemctl start microclaw-fallback.service
```
Check status:
```bash
sudo systemctl status microclaw-fallback.service
```
### ALTERNATIVE GUIDE - Installing via Llama.cpp instead:
# 📦 Prerequisites: Essential System Tools
You need a few standard command-line tools. Open a terminal and run:
bash
# Update your package list and install curl, wget, git, and build tools
```bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git build-essential
```
## 📥 Step 1: Download the Model with Git LFS
The model files are hosted in a Git repository and require Git Large File Storage (LFS) to download the actual GGUF files.
# 1.1: Install Git LFS
```bash
sudo apt install -y git-lfs
git lfs install
```
# 1.2: Create a directory for your models and clone the repository
```bash
mkdir -p ~/models
cd ~/models
git clone https://huggingface.co/webxos/microclaw-for-openclaw-version-2026.2.17 microclaw-fallback
cd microclaw-fallback
```
# 1.3: Ensure the GGUF files are fully downloaded
```bash
git lfs pull
Verification: After cloning, check that the .gguf files are present and are a reasonable size (several hundred MB, not 28 bytes). Run:
bash
ls -lh *.gguf
```
If the files are small placeholders, run git lfs pull again.
## ⚙️ Step 2: Set Up the llama.cpp Server
Now, download, compile, and set up llama.cpp with its built-in server.
bash
# 2.1: Clone the llama.cpp repository
```bash
cd ~/models
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
```
# 2.2: Compile llama.cpp (this may take a few minutes)
```bash
make -j4
```
## 3. (Optional but recommended) Install the Python dependencies for the server
This step requires Python/pip, but it's a one-time, isolated setup.
```bash
sudo apt install -y python3-pip python3-venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
# 🚀 Step 3.1: Run the Model Server
Now, start the server, pointing it to the GGUF model file you downloaded. Make sure you are in the llama.cpp directory with the virtual env activated
```bash
cd ~/models/llama.cpp
source venv/bin/activate
```
Find the exact GGUF filename (replace with the actual filename you have)
MODEL_FILE=~/models/microclaw-fallback/microclaw-for-openclaw-version-2026.2.17.Q4_K_M.gguf
Run the server
```
./server -m $MODEL_FILE \
--host 0.0.0.0 \
--port 8000 \
-c 2048 \
-ngl 0 # Use -ngl 33 if you have an NVIDIA GPU and compiled with CUDA support
```
Explanation of flags:
-m $MODEL_FILE : Path to your GGUF model.
--host 0.0.0.0 : Listen on all network interfaces (so OpenClaw can connect).
--port 8000 : The port the server will use.
-c 2048 : Context size (adjust based on model requirements).
-ngl 0 : Number of layers to offload to GPU. Use -ngl 33 (or more) if you have an NVIDIA GPU and compiled with CUDA.
Keep this terminal window open. The server is now running and ready to accept requests.
# ✅ Step 4: Test the Server
Open a new terminal and test the API to ensure it's working correctly.
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is the capital of France?",
"max_tokens": 50,
"temperature": 0.7
}'
```
You should receive a JSON response containing the model's generated text.
# 🔌 Step 5: Configure OpenClaw to Use the Local Server
Now, configure OpenClaw to use this local server as its fallback agent.
Locate OpenClaw's configuration file. This is often ~/.config/openclaw/config.toml, /etc/openclaw/config.toml, or a .env file in the OpenClaw directory.
Edit the configuration to define a custom provider that points to your local server. The exact variable names depend on your OpenClaw version, but it generally looks something like this:
```toml
[agent.fallback]
provider = "custom" # or "openai-compatible"
base_url = "http://localhost:8000/v1"
api_key = "not-needed" # llama.cpp server doesn't require a key
model = "microclaw" # Optional: model name
enabled = true
```
If OpenClaw uses environment variables (e.g., in a .env file), you might set:
```text
OPENCLAW_FALLBACK_PROVIDER=custom
OPENCLAW_CUSTOM_BASE_URL=http://localhost:8000/v1
OPENCLAW_CUSTOM_API_KEY=not-needed
```
Restart OpenClaw for the changes to take effect.
# 🔁 How to Run the Server as a Background Service:
To have the server start automatically on boot and restart if it crashes, you can create a systemd service.
Create the service file:
```bash
sudo nano /etc/systemd/system/microclaw-llama.service
Paste the following (adjust User, WorkingDirectory, and ExecStart paths as needed):
ini
[Unit]
Description=llama.cpp server for Microclaw
After=network.target
[Service]
Type=simple
User=kali
WorkingDirectory=/home/kali/models/llama.cpp
ExecStart=/home/kali/models/llama.cpp/server -m /home/kali/models/microclaw-fallback/microclaw-for-openclaw-version-2026.2.17.Q4_K_M.gguf --host 0.0.0.0 --port 8000 -c 2048 -ngl 0
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
```
Then enable and start the service:
```bash
sudo systemctl daemon-reload
sudo systemctl enable microclaw-llama.service
sudo systemctl start microclaw-llama.service
sudo systemctl status microclaw-llama.service # Check if it's running
```
### ADVANCED GUIDE: TRAINING MICROCLAW.GGUF MODEL LOCALLY
This guide adapts the full microclaw pipeline to run entirely on a low‑end machine like an 8GB RAM laptop or even a Raspberry Pi 5. We'll use a tiny base model (0.5B–1B parameters), parameter‑efficient fine‑tuning (LoRA) on CPU, and extreme quantization (2‑bit) to produce a GGUF file that runs smoothly on consumer hardware.
The final system provides:
- A **local training script** that fits in 8GB RAM (CPU only).
- A **FastAPI server** (`server.py`) serving a retro MS‑DOS‑style CLI dashboard on `localhost:8080`.
- **Local API endpoints** for inference, file management, cron jobs, and RAG.
- **SQLite** as a local database (conversation history, cache, RAG index).
- Integration with **llama.cpp** for efficient GGUF inference.
---
## Prerequisites
- **Hardware**: x86_64 or ARM64 (Raspberry Pi 5) with **at least 8GB RAM**.
- **OS**: Debian 12 / Kali Linux / Raspberry Pi OS (64‑bit).
- **Storage**: 10GB free space.
- **Software**: Python 3.10+, Git, CMake, build tools.
---
## Step 1: Environment Setup
```bash
cd /home/kali/microclaw
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
**`requirements.txt`** (CPU‑optimized, no CUDA dependencies):
```
torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu
transformers>=4.38.0
accelerate
datasets
trl>=0.8.0
peft
bitsandbytes
scipy
sentencepiece
protobuf
fastapi
uvicorn
sqlite-utils
pydantic
pyyaml
jinja2
aiofiles
llama-cpp-python
```
---
## Step 2: Build llama.cpp (for conversion & inference)
llama.cpp provides the tools to convert Hugging Face models to GGUF and run them efficiently on CPU.
```bash
cd /home/kali
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # optional: enables BLAS for speed
make -j$(nproc)
```
After compilation, the `convert-hf-to-gguf.py` script will be in `llama.cpp/` (not in build). We'll use it later.
---
## Step 3: Prepare the Dataset
You need a small dataset (a few hundred to a few thousand examples) for fine‑tuning and DPO. Place JSONL files in `data/raw/`.
### 3.1 Tool‑use data (schema‑first)
Each line:
```json
{
"instruction": "List files in /home",
"tools": ["ls"],
"response": "ls /home"
}
```
### 3.2 Preference data (for DPO)
Each line:
```json
{
"prompt": "What is the weather?",
"chosen": "I cannot check live weather, but you can use the 'weather' tool.",
"rejected": "I don't know."
}
```
If you don't have preference data, you can skip DPO by setting `dpo: false` in config.
### 3.3 RAG documents (optional)
Place plain text files (`.txt`) in `data/rag_docs/`. The training script will chunk them and store embeddings in SQLite.
---
## Step 4: Configuration (`config.yaml`)
Edit this file to match your paths and training preferences.
```yaml
# config.yaml
model:
base_model_name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0" # or "Qwen/Qwen2.5-0.5B"
cache_dir: "models/base"
training:
output_dir: "models/lora"
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 2e-4
num_train_epochs: 3
max_seq_length: 512
use_lora: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
dpo: true
dpo_beta: 0.1
# CPU optimizations
dataloader_num_workers: 0
save_steps: 100
logging_steps: 10
data:
train_file: "data/raw/train.jsonl"
eval_file: "data/raw/eval.jsonl" # optional
preference_file: "data/raw/preferences.jsonl" # for DPO
rag:
enabled: true
chunk_size: 500
chunk_overlap: 50
embedding_model: "all-MiniLM-L6-v2" # tiny, runs on CPU
db_path: "db/microclaw.db"
server:
host: "0.0.0.0"
port: 8080
model_path: "models/microclaw.gguf"
context_size: 2048
max_tokens: 512
temperature: 0.7
```
---
## Step 5: Training Script (`train.py`)
This script performs supervised fine‑tuning (SFT) on instruction data, optionally followed by DPO, and finally merges the LoRA weights and saves the full model. It is heavily optimized for low RAM (CPU) usage.
### Ultra‑Lightweight Local Training & Deployment Guide
Optimized for CPU‑only systems (8GB RAM, no GPU) – Raspberry Pi ready
This guide adapts the full microclaw pipeline to run entirely on a low‑end machine like an 8GB RAM laptop or even a Raspberry Pi 5. We'll use a tiny base model (0.5B–1B parameters), parameter‑efficient fine‑tuning (LoRA) on CPU, and extreme quantization (2‑bit) to produce a GGUF file that runs smoothly on consumer hardware.
The final system provides:
- A **local training script** that fits in 8GB RAM (CPU only).
- A **FastAPI server** (`server.py`) serving a retro MS‑DOS‑style CLI dashboard on `localhost:8080`.
- **Local API endpoints** for inference, file management, cron jobs, and RAG.
- **SQLite** as a local database (conversation history, cache, RAG index).
- Integration with **llama.cpp** for efficient GGUF inference.
---
# Folder Structure (to be created)
```
/home/kali/microclaw/
├── server.py # FastAPI server (inference + static files + API)
├── train.py # CPU‑optimized fine‑tuning + DPO script
├── requirements.txt
├── config.yaml
├── data/
│ ├── raw/ # Place your JSONL datasets here
│ └── rag_docs/ # Text files for RAG (optional)
├── models/
│ ├── base/ # Will contain the downloaded base model
│ ├── lora/ # LoRA adapters after training
│ └── microclaw.gguf # Final quantized model (after conversion)
├── static/
│ ├── index.html # Main dashboard (CLI style)
│ ├── style.css
│ ├── script.js
│ └── pages/ # Additional pages (file manager, cron, etc.)
│ ├── files.html
│ ├── cron.html
│ └── rag.html
├── db/
│ └── microclaw.db # SQLite database (auto‑created)
└── logs/
└── training.log
```
---
## Prerequisites
- **Hardware**: x86_64 or ARM64 (Raspberry Pi 5) with **at least 8GB RAM**.
- **OS**: Debian 12 / Kali Linux / Raspberry Pi OS (64‑bit).
- **Storage**: 10GB free space.
- **Software**: Python 3.10+, Git, CMake, build tools.
---
## Step 1: Environment Setup
```bash
cd /home/kali/microclaw
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
**`requirements.txt`** (CPU‑optimized, no CUDA dependencies):
```
torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu
transformers>=4.38.0
accelerate
datasets
trl>=0.8.0
peft
bitsandbytes
scipy
sentencepiece
protobuf
fastapi
uvicorn
sqlite-utils
pydantic
pyyaml
jinja2
aiofiles
llama-cpp-python
```
---
## Step 2: Build llama.cpp (for conversion & inference)
llama.cpp provides the tools to convert Hugging Face models to GGUF and run them efficiently on CPU.
```bash
cd /home/kali
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # optional: enables BLAS for speed
make -j$(nproc)
```
After compilation, the `convert-hf-to-gguf.py` script will be in `llama.cpp/` (not in build). We'll use it later.
---
## Step 3: Prepare the Dataset
You need a small dataset (a few hundred to a few thousand examples) for fine‑tuning and DPO. Place JSONL files in `data/raw/`.
### 3.1 Tool‑use data (schema‑first)
Each line:
```json
{
"instruction": "List files in /home",
"tools": ["ls"],
"response": "ls /home"
}
```
### 3.2 Preference data (for DPO)
Each line:
```json]
{
"prompt": "What is the weather?",
"chosen": "I cannot check live weather, but you can use the 'weather' tool.",
"rejected": "I don't know."
}
```
If you don't have preference data, you can skip DPO by setting `dpo: false` in config.
### 3.3 RAG documents (optional)
Place plain text files (`.txt`) in `data/rag_docs/`. The training script will chunk them and store embeddings in SQLite.
---
## Step 4: Configuration (`config.yaml`)
Edit this file to match your paths and training preferences.
```yaml
# config.yaml
model:
base_model_name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0" # or "Qwen/Qwen2.5-0.5B"
cache_dir: "models/base"
training:
output_dir: "models/lora"
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 2e-4
num_train_epochs: 3
max_seq_length: 512
use_lora: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
dpo: true
dpo_beta: 0.1
# CPU optimizations
dataloader_num_workers: 0
save_steps: 100
logging_steps: 10
data:
train_file: "data/raw/train.jsonl"
eval_file: "data/raw/eval.jsonl" # optional
preference_file: "data/raw/preferences.jsonl" # for DPO
rag:
enabled: true
chunk_size: 500
chunk_overlap: 50
embedding_model: "all-MiniLM-L6-v2" # tiny, runs on CPU
db_path: "db/microclaw.db"
server:
host: "0.0.0.0"
port: 8080
model_path: "models/microclaw.gguf"
context_size: 2048
max_tokens: 512
temperature: 0.7
```
---
## Step 5: Training Script (`train.py`)
This script performs supervised fine‑tuning (SFT) on instruction data, optionally followed by DPO, and finally merges the LoRA weights and saves the full model. It is heavily optimized for low RAM (CPU) usage.
```python
#!/usr/bin/env python3
# train.py – CPU‑only fine‑tuning with LoRA + optional DPO
import os
import yaml
import torch
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
TrainingArguments,
Trainer,
BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
from datasets import load_dataset
from trl import DPOTrainer
import logging
# Load config
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Setup logging
logging.basicConfig(level=logging.INFO, filename="logs/training.log", filemode="w")
logger = logging.getLogger(__name__)
def main():
# 1. Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config["model"]["base_model_name"], cache_dir=config["model"]["cache_dir"])
tokenizer.pad_token = tokenizer.eos_token
# 2. Load base model in 8-bit (CPU offload not supported for bitsandbytes on CPU; we use standard dtype)
# For CPU, we load in float32 and rely on LoRA to reduce memory.
model = AutoModelForCausalLM.from_pretrained(
config["model"]["base_model_name"],
cache_dir=config["model"]["cache_dir"],
torch_dtype=torch.float32, # CPU uses float32
low_cpu_mem_usage=True
)
# 3. Prepare LoRA
if config["training"]["use_lora"]:
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=config["training"]["lora_r"],
lora_alpha=config["training"]["lora_alpha"],
lora_dropout=config["training"]["lora_dropout"],
target_modules=["q_proj", "v_proj"] # adjust for your model
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 4. Load dataset
dataset = load_dataset("json", data_files=config["data"]["train_file"], split="train")
if config["data"].get("eval_file"):
eval_dataset = load_dataset("json", data_files=config["data"]["eval_file"], split="train")
else:
eval_dataset = None
# Format prompt: "### Instruction:\n{instruction}\n\n### Response:\n{response}"
def format_func(example):
text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}{tokenizer.eos_token}"
return {"text": text}
dataset = dataset.map(format_func)
if eval_dataset:
eval_dataset = eval_dataset.map(format_func)
# Tokenize
def tokenize(element):
return tokenizer(element["text"], truncation=True, max_length=config["training"]["max_seq_length"], padding=False)
dataset = dataset.map(tokenize, remove_columns=dataset.column_names)
if eval_dataset:
eval_dataset = eval_dataset.map(tokenize, remove_columns=eval_dataset.column_names)
# 5. Training arguments (CPU‑friendly)
training_args = TrainingArguments(
output_dir=config["training"]["output_dir"],
per_device_train_batch_size=config["training"]["per_device_train_batch_size"],
gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"],
learning_rate=config["training"]["learning_rate"],
num_train_epochs=config["training"]["num_train_epochs"],
logging_steps=config["training"]["logging_steps"],
save_steps=config["training"]["save_steps"],
evaluation_strategy="steps" if eval_dataset else "no",
eval_steps=config["training"]["save_steps"],
save_total_limit=2,
load_best_model_at_end=True if eval_dataset else False,
metric_for_best_model="eval_loss",
greater_is_better=False,
fp16=False, # CPU doesn't support fp16
bf16=False,
dataloader_num_workers=0, # avoid multiprocessing issues
optim="adamw_torch",
torch_compile=False, # no speedup on CPU
)
# 6. Trainer (SFT)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
)
logger.info("Starting SFT training...")
trainer.train()
trainer.save_model() # saves LoRA adapters
# 7. Optional DPO training
if config["training"]["dpo"] and config["data"].get("preference_file"):
logger.info("Loading preference data for DPO...")
pref_dataset = load_dataset("json", data_files=config["data"]["preference_file"], split="train")
# For DPO we need base model without LoRA (or merged)
# We'll reload base model and then apply LoRA weights
# (Simplified: use the same model with LoRA attached; DPO trainer handles it)
dpo_trainer = DPOTrainer(
model=model,
ref_model=None, # uses model as reference (or you can provide a frozen copy)
args=training_args, # reuse same args (adjust for DPO)
train_dataset=pref_dataset,
tokenizer=tokenizer,
beta=config["training"]["dpo_beta"],
max_length=config["training"]["max_seq_length"],
max_prompt_length=256,
)
logger.info("Starting DPO training...")
dpo_trainer.train()
dpo_trainer.save_model(config["training"]["output_dir"] + "_dpo")
# 8. Merge LoRA and save full model (for conversion)
logger.info("Merging LoRA weights...")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("models/merged")
tokenizer.save_pretrained("models/merged")
logger.info("Merged model saved to models/merged")
if __name__ == "__main__":
main()
```
**Run training**:
```bash
python train.py
```
*Note: Training a 1B model on CPU with batch size 1 may take several hours to days depending on dataset size. Reduce epochs or dataset size for testing.*
---
## Step 6: Convert to GGUF
After training, we have a merged Hugging Face model in `models/merged/`. Now use llama.cpp's conversion script.
```bash
cd /home/kali/llama.cpp
python convert-hf-to-gguf.py /home/kali/microclaw/models/merged \
--outfile /home/kali/microclaw/models/microclaw.gguf \
--outtype q2_k # 2‑bit quantization (extremely small)
```
For Raspberry Pi, `q2_k` is ideal. You can also try `q3_k_s` if you have more RAM.
---
## Step 7: Build the FastAPI Server (`server.py`)
This server serves:
- Static files (the CLI dashboard) from the `static/` folder.
- API endpoints for inference, file management, cron, and RAG.
- SQLite database for conversation history and RAG cache.
```python
#!/usr/bin/env python3
# server.py – FastAPI server with GGUF inference and static dashboard
import os
import yaml
import sqlite3
import json
from pathlib import Path
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from typing import Optional, List
import uvicorn
from llama_cpp import Llama
# Load config
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Initialize SQLite
DB_PATH = config["rag"]["db_path"]
conn = sqlite3.connect(DB_PATH, check_same_thread=False)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
prompt TEXT,
response TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS rag_cache (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query TEXT UNIQUE,
chunks TEXT,
embedding BLOB
)
""")
conn.commit()
# Load GGUF model
model_path = config["server"]["model_path"]
llm = Llama(
model_path=model_path,
n_ctx=config["server"]["context_size"],
n_threads=os.cpu_count(),
n_gpu_layers=0, # CPU only
verbose=False,
)
app = FastAPI(title="microclaw Gateway")
# Mount static files
app.mount("/static", StaticFiles(directory="static"), name="static")
# API Models
class PromptRequest(BaseModel):
prompt: str
max_tokens: Optional[int] = 256
temperature: Optional[float] = 0.7
use_rag: Optional[bool] = False
class ToolRequest(BaseModel):
tool: str
args: dict
# Simple RAG (placeholder – you can enhance with embeddings)
def retrieve_chunks(query: str) -> str:
# For demo, just return static text; real implementation would use embeddings
return "Relevant document chunk about file management."
@app.get("/", response_class=HTMLResponse)
async def root():
with open("static/index.html") as f:
return f.read()
@app.post("/api/chat")
async def chat(req: PromptRequest):
# Optionally enhance prompt with RAG
if req.use_rag:
context = retrieve_chunks(req.prompt)
augmented_prompt = f"Context: {context}\n\nQuestion: {req.prompt}\nAnswer:"
else:
augmented_prompt = req.prompt
# Call model
output = llm(
augmented_prompt,
max_tokens=req.max_tokens,
temperature=req.temperature,
stop=["</s>", "###"],
echo=False
)
response = output["choices"][0]["text"].strip()
# Save to history
cursor.execute("INSERT INTO history (prompt, response) VALUES (?, ?)", (req.prompt, response))
conn.commit()
return {"response": response}
@app.get("/api/history")
async def get_history(limit: int = 50):
cursor.execute("SELECT prompt, response, timestamp FROM history ORDER BY timestamp DESC LIMIT ?", (limit,))
rows = cursor.fetchall()
return [{"prompt": r[0], "response": r[1], "timestamp": r[2]} for r in rows]
@app.post("/api/tool")
async def run_tool(req: ToolRequest):
# Example: execute system commands (sandboxed)
if req.tool == "ls":
path = req.args.get("path", ".")
try:
files = os.listdir(path)
return {"output": "\n".join(files)}
except Exception as e:
return {"error": str(e)}
elif req.tool == "cron_list":
# Parse crontab (requires user permissions)
# For demo, return placeholder
return {"output": "0 5 * * * /home/kali/backup.sh"}
else:
return {"error": "Unknown tool"}
if __name__ == "__main__":
uvicorn.run(app, host=config["server"]["host"], port=config["server"]["port"])
```
---
## Step 8: Run the Server
```bash
cd /home/kali/microclaw
source venv/bin/activate
python server.py
```
Open your browser to `http://localhost:8080` and start interacting.
---
# Step 9: TRAIN.PY: use this for the train.py file:
```python
#!/usr/bin/env python3
# train.py – CPU‑only fine‑tuning with LoRA + optional DPO
import os
import yaml
import torch
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
TrainingArguments,
Trainer,
BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
from datasets import load_dataset
from trl import DPOTrainer
import logging
# Load config
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Setup logging
logging.basicConfig(level=logging.INFO, filename="logs/training.log", filemode="w")
logger = logging.getLogger(__name__)
def main():
# 1. Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config["model"]["base_model_name"], cache_dir=config["model"]["cache_dir"])
tokenizer.pad_token = tokenizer.eos_token
# 2. Load base model in 8-bit (CPU offload not supported for bitsandbytes on CPU; we use standard dtype)
# For CPU, we load in float32 and rely on LoRA to reduce memory.
model = AutoModelForCausalLM.from_pretrained(
config["model"]["base_model_name"],
cache_dir=config["model"]["cache_dir"],
torch_dtype=torch.float32, # CPU uses float32
low_cpu_mem_usage=True
)
# 3. Prepare LoRA
if config["training"]["use_lora"]:
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=config["training"]["lora_r"],
lora_alpha=config["training"]["lora_alpha"],
lora_dropout=config["training"]["lora_dropout"],
target_modules=["q_proj", "v_proj"] # adjust for your model
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 4. Load dataset
dataset = load_dataset("json", data_files=config["data"]["train_file"], split="train")
if config["data"].get("eval_file"):
eval_dataset = load_dataset("json", data_files=config["data"]["eval_file"], split="train")
else:
eval_dataset = None
# Format prompt: "### Instruction:\n{instruction}\n\n### Response:\n{response}"
def format_func(example):
text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}{tokenizer.eos_token}"
return {"text": text}
dataset = dataset.map(format_func)
if eval_dataset:
eval_dataset = eval_dataset.map(format_func)
# Tokenize
def tokenize(element):
return tokenizer(element["text"], truncation=True, max_length=config["training"]["max_seq_length"], padding=False)
dataset = dataset.map(tokenize, remove_columns=dataset.column_names)
if eval_dataset:
eval_dataset = eval_dataset.map(tokenize, remove_columns=eval_dataset.column_names)
# 5. Training arguments (CPU‑friendly)
training_args = TrainingArguments(
output_dir=config["training"]["output_dir"],
per_device_train_batch_size=config["training"]["per_device_train_batch_size"],
gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"],
learning_rate=config["training"]["learning_rate"],
num_train_epochs=config["training"]["num_train_epochs"],
logging_steps=config["training"]["logging_steps"],
save_steps=config["training"]["save_steps"],
evaluation_strategy="steps" if eval_dataset else "no",
eval_steps=config["training"]["save_steps"],
save_total_limit=2,
load_best_model_at_end=True if eval_dataset else False,
metric_for_best_model="eval_loss",
greater_is_better=False,
fp16=False, # CPU doesn't support fp16
bf16=False,
dataloader_num_workers=0, # avoid multiprocessing issues
optim="adamw_torch",
torch_compile=False, # no speedup on CPU
)
# 6. Trainer (SFT)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
)
logger.info("Starting SFT training...")
trainer.train()
trainer.save_model() # saves LoRA adapters
# 7. Optional DPO training
if config["training"]["dpo"] and config["data"].get("preference_file"):
logger.info("Loading preference data for DPO...")
pref_dataset = load_dataset("json", data_files=config["data"]["preference_file"], split="train")
# For DPO we need base model without LoRA (or merged)
# We'll reload base model and then apply LoRA weights
# (Simplified: use the same model with LoRA attached; DPO trainer handles it)
dpo_trainer = DPOTrainer(
model=model,
ref_model=None, # uses model as reference (or you can provide a frozen copy)
args=training_args, # reuse same args (adjust for DPO)
train_dataset=pref_dataset,
tokenizer=tokenizer,
beta=config["training"]["dpo_beta"],
max_length=config["training"]["max_seq_length"],
max_prompt_length=256,
)
logger.info("Starting DPO training...")
dpo_trainer.train()
dpo_trainer.save_model(config["training"]["output_dir"] + "_dpo")
# 8. Merge LoRA and save full model (for conversion)
logger.info("Merging LoRA weights...")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("models/merged")
tokenizer.save_pretrained("models/merged")
logger.info("Merged model saved to models/merged")
if __name__ == "__main__":
main()
```
**Run training**:
```bash
python train.py
```
*Note: Training a 1B model on CPU with batch size 1 may take several hours to days depending on dataset size. Reduce epochs or dataset size for testing.*
---
## Step 10: Convert to GGUF
After training, we have a merged Hugging Face model in `models/merged/`. Now use llama.cpp's conversion script.
```bash
cd /home/kali/llama.cpp
python convert-hf-to-gguf.py /home/kali/microclaw/models/merged \
--outfile /home/kali/microclaw/models/microclaw.gguf \
--outtype q2_k # 2‑bit quantization (extremely small)
```
For Raspberry Pi, `q2_k` is ideal. You can also try `q3_k_s` if you have more RAM.
---
## Step 11: Build the FastAPI Server (`server.py`)
This server serves:
- Static files (the CLI dashboard) from the `static/` folder.
- API endpoints for inference, file management, cron, and RAG.
- SQLite database for conversation history and RAG cache.
```python
#!/usr/bin/env python3
# server.py – FastAPI server with GGUF inference and static dashboard
import os
import yaml
import sqlite3
import json
from pathlib import Path
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from typing import Optional, List
import uvicorn
from llama_cpp import Llama
# Load config
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Initialize SQLite
DB_PATH = config["rag"]["db_path"]
conn = sqlite3.connect(DB_PATH, check_same_thread=False)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
prompt TEXT,
response TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS rag_cache (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query TEXT UNIQUE,
chunks TEXT,
embedding BLOB
)
""")
conn.commit()
# Load GGUF model
model_path = config["server"]["model_path"]
llm = Llama(
model_path=model_path,
n_ctx=config["server"]["context_size"],
n_threads=os.cpu_count(),
n_gpu_layers=0, # CPU only
verbose=False,
)
app = FastAPI(title="microclaw Gateway")
# Mount static files
app.mount("/static", StaticFiles(directory="static"), name="static")
# API Models
class PromptRequest(BaseModel):
prompt: str
max_tokens: Optional[int] = 256
temperature: Optional[float] = 0.7
use_rag: Optional[bool] = False
class ToolRequest(BaseModel):
tool: str
args: dict
# Simple RAG (placeholder – you can enhance with embeddings)
def retrieve_chunks(query: str) -> str:
# For demo, just return static text; real implementation would use embeddings
return "Relevant document chunk about file management."
@app.get("/", response_class=HTMLResponse)
async def root():
with open("static/index.html") as f:
return f.read()
@app.post("/api/chat")
async def chat(req: PromptRequest):
# Optionally enhance prompt with RAG
if req.use_rag:
context = retrieve_chunks(req.prompt)
augmented_prompt = f"Context: {context}\n\nQuestion: {req.prompt}\nAnswer:"
else:
augmented_prompt = req.prompt
# Call model
output = llm(
augmented_prompt,
max_tokens=req.max_tokens,
temperature=req.temperature,
stop=["</s>", "###"],
echo=False
)
response = output["choices"][0]["text"].strip()
# Save to history
cursor.execute("INSERT INTO history (prompt, response) VALUES (?, ?)", (req.prompt, response))
conn.commit()
return {"response": response}
@app.get("/api/history")
async def get_history(limit: int = 50):
cursor.execute("SELECT prompt, response, timestamp FROM history ORDER BY timestamp DESC LIMIT ?", (limit,))
rows = cursor.fetchall()
return [{"prompt": r[0], "response": r[1], "timestamp": r[2]} for r in rows]
@app.post("/api/tool")
async def run_tool(req: ToolRequest):
# Example: execute system commands (sandboxed)
if req.tool == "ls":
path = req.args.get("path", ".")
try:
files = os.listdir(path)
return {"output": "\n".join(files)}
except Exception as e:
return {"error": str(e)}
elif req.tool == "cron_list":
# Parse crontab (requires user permissions)
# For demo, return placeholder
return {"output": "0 5 * * * /home/kali/backup.sh"}
else:
return {"error": "Unknown tool"}
if __name__ == "__main__":
uvicorn.run(app, host=config["server"]["host"], port=config["server"]["port"])
```
---
## Step 12: Create the Retro CLI Dashboard
# `static/index.html`
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>microclaw v2026.2.21 – CLI Gateway</title>
<link rel="stylesheet" href="/static/style.css">
</head>
<body>
<div class="terminal">
<div class="header">microclaw [Version 2026.2.21] – Local Fallback Agent</div>
<div class="output" id="output">
<div>> System ready. Type a command or question.</div>
<div>> Use /help for available commands.</div>
</div>
<div class="input-line">
<span class="prompt">$></span>
<input type="text" id="input" autofocus>
</div>
</div>
<script src="/static/script.js"></script>
</body>
</html>
```
# `static/style.css`
```css
body {
background: #000;
color: #0f0;
font-family: 'Courier New', monospace;
margin: 0;
padding: 20px;
}
.terminal {
max-width: 900px;
margin: auto;
border: 2px solid #0f0;
padding: 10px;
height: 80vh;
display: flex;
flex-direction: column;
}
.header {
border-bottom: 1px solid #0f0;
padding-bottom: 5px;
margin-bottom: 10px;
text-align: center;
font-weight: bold;
}
.output {
flex: 1;
overflow-y: auto;
white-space: pre-wrap;
margin-bottom: 10px;
}
.input-line {
display: flex;
border-top: 1px solid #0f0;
padding-top: 5px;
}
.prompt {
margin-right: 5px;
}
#input {
background: #000;
border: none;
color: #0f0;
font-family: 'Courier New', monospace;
font-size: 1em;
flex: 1;
outline: none;
}
```
# `static/script.js`
```javascript
const input = document.getElementById('input');
const output = document.getElementById('output');
input.addEventListener('keydown', async (e) => {
if (e.key === 'Enter') {
const cmd = input.value.trim();
input.value = '';
addLine(`$> ${cmd}`);
await processCommand(cmd);
}
});
async function processCommand(cmd) {
if (cmd === '/help') {
addLine('Available commands:');
addLine(' /chat <question> – ask the model');
addLine(' /ls [path] – list files');
addLine(' /cron – show cron jobs');
addLine(' /history – show chat history');
addLine(' /clear – clear screen');
return;
}
if (cmd === '/clear') {
output.innerHTML = '';
return;
}
if (cmd.startsWith('/chat ')) {
const prompt = cmd.slice(6);
addLine('... thinking ...');
try {
const res = await fetch('/api/chat', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({prompt, use_rag: false})
});
const data = await res.json();
addLine(data.response);
} catch (err) {
addLine('Error: ' + err);
}
return;
}
if (cmd === '/history') {
try {
const res = await fetch('/api/history');
const history = await res.json();
history.forEach(item => {
addLine(`[${item.timestamp}] Q: ${item.prompt}`);
addLine(`A: ${item.response}`);
});
} catch (err) {
addLine('Error: ' + err);
}
return;
}
if (cmd.startsWith('/ls')) {
const parts = cmd.split(' ');
const path = parts[1] || '.';
try {
const res = await fetch('/api/tool', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({tool: 'ls', args: {path}})
});
const data = await res.json();
addLine(data.output || data.error);
} catch (err) {
addLine('Error: ' + err);
}
return;
}
if (cmd === '/cron') {
try {
const res = await fetch('/api/tool', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({tool: 'cron_list', args: {}})
});
const data = await res.json();
addLine(data.output || data.error);
} catch (err) {
addLine('Error: ' + err);
}
return;
}
addLine(`Unknown command: ${cmd}. Type /help.`);
}
function addLine(text) {
const line = document.createElement('div');
line.textContent = text;
output.appendChild(line);
output.scrollTop = output.scrollHeight;
}
```
You can add more pages (`static/pages/files.html`, `static/pages/cron.html`) and link them from the CLI using `/open files` commands, but for simplicity we'll keep the single‑page CLI.
---
## Step 13: Run the Server
```bash
cd /home/kali/microclaw
source venv/bin/activate
python server.py
```
Open your browser to `http://localhost:8080` and start interacting.
---
## Troubleshooting
- Out of memory during training: Reduce `max_seq_length`, batch size, or use a smaller base model (e.g., `Qwen2.5-0.5B`).
- Slow inference: Ensure you compiled llama.cpp with OpenBLAS. Use fewer CPU threads if needed (`n_threads=4`).
- GGUF conversion errors: Make sure you have the correct `transformers` version and that the merged model is saved properly.
- Model file not found: Ensure the path in the -m flag is correct. Use the absolute path.
- Port already in use: Change the --port value (e.g., to 8001) and update your OpenClaw configuration.
- Server starts but responds slowly: This is normal on CPU. You can try a smaller, more quantized GGUF variant from the Hugging Face repo (e.g., Q2_K for 2-bit).
- If 'git lfs pull' fails or is slow: If downloads are interrupted, run the command again—it will resume.
- OpenClaw cannot connect: Verify the server is running with curl (as in Step 4). Check any firewall rules. If OpenClaw is in a Docker container, ensure they are on the same network (using --network host for the OpenClaw container is the simplest solution).
- If pkg-config is missing: which CMake uses to find some libraries.
Install it and rebuild:
```bash
sudo apt update
sudo apt install pkg-config
cd /home/kali/llama.cpp
rm -rf build # clean previous attempt
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build --config Release -j$(nproc)
```
This should now complete successfully. If you still encounter issues, you can temporarily disable BLAS to get a working build:
```bash
cd /home/kali/llama.cpp
rm -rf build
cmake -B build
cmake --build build --config Release -j$(nproc)
```
After building, you'll have build/bin/llama-server and the conversion script convert-hf-to-gguf.py in the main llama.cpp directory.
### The "illegal hardware instruction" error: indicates that the PyTorch build you're using is trying to execute CPU instructions (like AVX2) that your processor does not support. This is common on older CPUs or virtual machines. Let's diagnose and fix it.
## 1. Check your CPU's instruction set
Run this command to see what your CPU supports:
```bash
lscpu | grep -E "Model name|Flags"
```
Look for flags like `avx`, `avx2`, `sse4_1`, etc. If you don't see `avx2`, that's the problem.
## 2. Install a PyTorch version compatible with your CPU
The standard PyTorch wheels from the official site require AVX2. You have two options:
### Option A: Install PyTorch from conda-forge (recommended)
Conda-forge often provides more compatible builds, including for older CPUs.
```bash
# Install Miniconda if you haven't
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
source ~/miniconda3/bin/activate
# Create a new environment with Python 3.10
conda create -y -n microclaw python=3.10
conda activate microclaw
# Install PyTorch CPU-only from conda-forge
conda install -y pytorch cpuonly -c pytorch # but this might also require AVX2
# Better: use conda-forge
conda install -y pytorch cpuonly -c conda-forge
```
If that still fails, we can try building PyTorch from source with older instruction sets, but that's complex.
### Option B: Use the PyTorch wheels with no AVX requirements
There are community builds that target older CPUs. For example, the `manylinux2014` wheels might work. Try:
```bash
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu --no-deps
```
But the official wheels likely require AVX2. You could try an older PyTorch version (e.g., 1.13) which may have broader support.
### Option C: Use llama.cpp for training as well
Since llama.cpp is pure C++ and can be compiled for any CPU, you could use it for training too. However, we saw that the `finetune` tool wasn't present. But you can build it with the right flags. Let's try building the `finetune` example explicitly.
First, ensure you have the latest llama.cpp with the finetune example:
```bash
cd ~/llama.cpp
git pull origin master
```
Now build the finetune tool:
```bash
mkdir -p build && cd build
cmake .. -DLLAMA_FINETUNE=ON -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native"
make -j$(nproc) finetune
```
After that, the `finetune` binary should appear in `build/bin/`. Then you can train using the command we discussed earlier.
## 3. If you still want to use Python, consider using a smaller model
TinyLlama is already small, but the error is at the PyTorch level, not the model. So changing the model won't help. You need a compatible PyTorch.
## 4. Verify your Python environment
Maybe the virtual environment is using a system Python that has a broken PyTorch. Try creating a fresh venv with Python 3.10 and reinstalling all packages.
```bash
cd ~/microclaw
deactivate
rm -rf venv
python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
```
Then run `python train.py` again.
# LICENSE
Apache 2.0 License.