README.md · webxos/microclaw-for-openclaw-version-2026.2.17 at main

microclaw-for-openclaw-version-2026.2.17 / README.md

webxos

Update README.md

1965e98 verified 2 days ago

preview code

raw

history blame contribute delete

56.4 kB

	---
	library_name: transformers
	pipeline_tag: text-generation
	base_model: openai-community/gpt2
	language:
	- en
	tags:
	- transformers
	- pytorch
	- gguf
	- gpt2
	- gpt2-small
	- 117M
	- text-generation
	- conversational
	- grpo
	- vae
	- kv-cache
	- distillation
	- reinforcement-learning
	- openclaw
	- fallback-agent
	- soul-md
	- agent-framework
	- tool-use
	- task-automation
	- dpo
	- tool-masking
	- uncertainty-estimation
	- rag
	- semantic-cache
	- quantization
	- pruning
	- arxiv:2402.03300
	license: apache-2.0
	---

	# 🧠 microclaw-for-openclaw – Fallback Agent for OpenClaw (v2026.2.17)

	Model ID: `webxos/microclaw-for-openclaw-version-2026.2.17`
	Tags: `openclaw`, `fallback-agent`, `grpo`, `vae`, `kv-cache`, `dpo`, `tool-masking`, `uncertainty`, `rag`, `semantic-cache`, `soul.md`, `huggingface-space`, `gguf`, `llm-distillation`

	---

	## 📌 Overview

	microclaw (v2026.2.17) is a lightweight, distilled language model designed as a fallback agent for the [OpenClaw](https://openclaw.org) ecosystem. When the primary agent loses connectivity or requires offline operation, microclaw steps in to handle essential system tasks: file management, status checks, cron jobs, and simple Q&A.

	WARNING: You will need to train your own GGUF model locally, the microcaw.gguf presented in this repo is a lightweight placeholder so users can scale and build their own local models with llama.cpp.

	You will need to configure your own build locally from scratch with this model, it is still being developed and is under testing.
	This version is made to integrate directly with Openclaw.ai 18789 port and in this README.md we will present multiple ways and optional ways
	to configure this agent on your local Linux Debian based machines.

	This version introduces advanced training and inference enhancements:

	- Tool‑use masking and schema‑first training for reliable function calling.
	- Direct Preference Optimization (DPO) to align outputs with human preferences.
	- Uncertainty estimation with configurable thresholds for safe escalation.
	- Retrieval‑Augmented Generation (RAG) with semantic chunking.
	- Semantic KV‑cache for high‑similarity query reuse.
	- Quantization (down to 2‑bit) and pruning for extreme memory efficiency.

	The repository contains the full and partially trained model files, configuration (`soul.md`, `AGENTS.md`, `HEARTBEAT.md`, `SECURITY.md`), and export bundles ready for deployment to Hugging Face Spaces or local execution with OpenClaw.

	---

	## ✨ Key Features

	- GRPO (Group Relative Policy Optimization) – Trains the agent with group‑wise advantage estimation for stable policy updates.
	- VAE Filter – A Variational Autoencoder that filters low‑quality training samples, improving output coherence.
	- Tool‑Use Masking – Masks non‑tool tokens during training to enforce strict schema adherence (JSON/YAML).
	- DPO (Direct Preference Optimization) – Fine‑tunes on preference pairs to reduce hallucinations and improve helpfulness.
	- Uncertainty Estimation – Monitors token‑level entropy and escalates to safe responses when confidence drops below a threshold.
	- RAG (Retrieval‑Augmented Generation) – Retrieves relevant chunks from a local knowledge base (FAISS) to ground responses.
	- Semantic Cache – Reuses previous generations for semantically similar queries, reducing latency and cost.
	- Quantization & Pruning – Compress the model to 2‑8 bits and prune unimportant weights; backend support for AutoGPTQ, llama.cpp (GGUF), and bitsandbytes.
	- KV‑Cache – Intelligent reuse of key/value states reduces inference latency by up to 78% (measured on local benchmarks).
	- Soul.md Configuration – Define personality, sub‑agent rules, proactive tasks, and prompt injection defenses in plain Markdown.
	- Export Ready – One‑click export to a Hugging Face Space (Docker‑based) or a portable ZIP archive.
	- Quantized (4‑bit GGUF) – Optimized for memory‑constrained environments; runs smoothly on CPU.

	---

	### Part 1: Installation

	Included are multiple guides and ways you can implement Microclaw into your custom build, with steps to further train the GGUF file locally:

	Read all steps carefully and find the right guide for your use case/setup, Not all options may work on your system. These guides are designed for specific use on Linux Debian systems.

	# 1.1 Installation Guide + System Update & Basic Tools


	```bash

	sudo apt update
	sudo apt upgrade -y
	sudo apt install -y curl wget git build-essential

	```

	# 1.2 Install Docker (for containerized execution)

	```bash

	# Add Docker's official GPG key and repository
	curl -fsSL https://download.docker.com/linux/debian/gpg \| sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
	echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian bullseye stable" \| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

	# Install Docker Engine
	sudo apt update
	sudo apt install -y docker-ce docker-ce-cli containerd.io

	# Add your user to the docker group (avoid sudo for every command)
	sudo usermod -aG docker $USER
	newgrp docker # activate group changes in current shell

	```

	# 1.3 Install Node.js (v22 or later) & TypeScript

	```bash

	# Using NodeSource repository for a modern Node.js version
	curl -fsSL https://deb.nodesource.com/setup_22.x \| sudo -E bash -
	sudo apt install -y nodejs

	# Install TypeScript globally
	sudo npm install -g typescript

	# Verify
	node --version # should be v22.x or higher
	tsc --version

	```

	# 1.4 Install SQLite (for memory & logs)

	```bash

	sudo apt install -y sqlite3 libsqlite3-dev

	```

	# Part 2: Microclaw Fallback Agent

	The Microclaw agent is a Python‑based service (Flask + Transformers) that communicates with OpenClaw. You can install it using either a Python virtual environment (lightweight) or Conda (more reliable for PyTorch). Choose one method below.

	# 2.1 Clone the Microclaw Repository

	Create a parent directory for all agents:

	```bash

	sudo mkdir -p /opt/openclaw-agents
	sudo chown -R $USER:$USER /opt/openclaw-agents
	cd /opt/openclaw-agents

	# Clone the Hugging Face repo (includes model files and soul configuration)
	git lfs install
	git clone https://huggingface.co/webxos/microclaw-for-openclaw-version-2026.2.17 microclaw-fallback
	cd microclaw-fallback

	```

	Note: The .gguf model files are several hundred MB. If the download is interrupted, git lfs can resume. After cloning, verify the file sizes:

	```bash

	ls -lh *.gguf

	```

	They should be >100 MB, not 28 bytes. If they are still placeholders, run git lfs pull manually.

	# 2.2 Option A: Install with Python Virtual Environment (venv)

	```bash

	# Create and activate a virtual environment
	python3 -m venv venv
	source venv/bin/activate

	# Upgrade pip and install dependencies
	pip install --upgrade pip
	pip install -r requirements.txt

	```

	If requirements.txt is missing, install core packages manually

	```bash

	pip install flask transformers torch sentence-transformers faiss-cpu --extra-index-url https://download.pytorch.org/whl/cpu

	```

	# 2.3 Option B: Install with Conda (Recommended for unstable networks)

	```bash

	# Download and install Miniconda (if not already present)
	wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
	bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
	source ~/miniconda3/bin/activate

	# Create a dedicated environment with Python 3.11
	conda create -y -n microclaw python=3.11
	conda activate microclaw

	# Install CPU‑only PyTorch from conda-forge (smaller, more reliable)
	conda install -y pytorch torchvision torchaudio cpuonly -c pytorch

	# Install the rest via pip
	pip install flask transformers sentence-transformers faiss-cpu

	```

	# 2.4 Test the Agent Manually

	```bash

	# Make sure you are in the agent directory with the environment activated
	python main.py

	```

	You should see output like * Running on http://127.0.0.1:18789. Press Ctrl+C to stop it.

	# ⚙️ Part 3: Configure OpenClaw to Use the Microclaw Fallback

	OpenClaw reads its configuration from a TOML file (typically ~/.config/openclaw/config.toml or /etc/openclaw/config.toml). You need to point it to your local Microclaw instance.

	Find the port Microclaw listens on (default is 18789, defined in main.py):

	```bash

	grep port main.py

	```

	Edit the OpenClaw configuration (create it if it doesn't exist):

	```bash

	mkdir -p ~/.config/openclaw
	nano ~/.config/openclaw/config.toml

	Add or modify the [agent.fallback] section:
	toml

	[agent.fallback]
	path = "/opt/openclaw-agents/microclaw-fallback"
	port = 18789
	enabled = true

	```

	If OpenClaw is already installed, restart it. (If you haven't installed OpenClaw yet, see Part 4 below.)

	# 🐳 Part 4: Install & Run OpenClaw (the main framework)

	The OpenClaw core is a Node.js/TypeScript application. You can run it directly from source or use the provided Docker image.

	# 4.1 Run OpenClaw via Docker (easiest)

	```bash

	Pull the official OpenClaw image (adjust tag as needed)
	docker pull openclaw/openclaw:latest

	Run the container, mounting the config and agents directories
	docker run -d \
	--name openclaw \
	-p 3000:3000 \
	-v ~/.config/openclaw:/home/node/.config/openclaw \
	-v /opt/openclaw-agents:/opt/openclaw-agents \
	openclaw/openclaw:latest

	```

	# 4.2 Run OpenClaw from Source (for development)

	```bash

	Clone the OpenClaw repository
	git clone https://github.com/openclaw/core.git openclaw-core
	cd openclaw-core

	Install dependencies
	yarn install

	Build TypeScript
	yarn build

	Start OpenClaw (it will read the config from ~/.config/openclaw/config.toml)
	yarn start

	```

	# 🧪 Part 5: Verify the Integration

	Check that Microclaw is running (either manually or via systemd):

	```bash

	curl http://localhost:18789/health

	```

	# 🔁 Guide to Microclaw Auto-Start (systemd)

	To ensure the fallback agent starts on boot and restarts if it crashes, create a systemd service.

	Create the service file:

	```bash

	sudo nano /etc/systemd/system/microclaw-fallback.service

	Paste (adjust User and paths to match your setup):
	ini

	[Unit]
	Description=Microclaw Fallback Agent for OpenClaw
	After=network.target

	[Service]
	Type=simple
	User=kali
	WorkingDirectory=/opt/openclaw-agents/microclaw-fallback
	Environment="PATH=/opt/openclaw-agents/microclaw-fallback/venv/bin"
	ExecStart=/opt/openclaw-agents/microclaw-fallback/venv/bin/python /opt/openclaw-agents/microclaw-fallback/main.py
	Restart=on-failure
	RestartSec=10

	[Install]
	WantedBy=multi-user.target

	```

	Enable and start:

	```bash

	sudo systemctl daemon-reload
	sudo systemctl enable microclaw-fallback.service
	sudo systemctl start microclaw-fallback.service

	```

	Check status:

	```bash

	sudo systemctl status microclaw-fallback.service

	```

	### ALTERNATIVE GUIDE - Installing via Llama.cpp instead:

	# 📦 Prerequisites: Essential System Tools

	You need a few standard command-line tools. Open a terminal and run:
	bash

	# Update your package list and install curl, wget, git, and build tools

	```bash

	sudo apt update && sudo apt upgrade -y
	sudo apt install -y curl wget git build-essential

	```

	## 📥 Step 1: Download the Model with Git LFS

	The model files are hosted in a Git repository and require Git Large File Storage (LFS) to download the actual GGUF files.

	# 1.1: Install Git LFS

	```bash

	sudo apt install -y git-lfs
	git lfs install

	```

	# 1.2: Create a directory for your models and clone the repository

	```bash

	mkdir -p ~/models
	cd ~/models
	git clone https://huggingface.co/webxos/microclaw-for-openclaw-version-2026.2.17 microclaw-fallback
	cd microclaw-fallback

	```

	# 1.3: Ensure the GGUF files are fully downloaded

	```bash

	git lfs pull

	Verification: After cloning, check that the .gguf files are present and are a reasonable size (several hundred MB, not 28 bytes). Run:
	bash

	ls -lh *.gguf

	```

	If the files are small placeholders, run git lfs pull again.

	## ⚙️ Step 2: Set Up the llama.cpp Server

	Now, download, compile, and set up llama.cpp with its built-in server.
	bash

	# 2.1: Clone the llama.cpp repository

	```bash

	cd ~/models
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp

	```

	# 2.2: Compile llama.cpp (this may take a few minutes)

	```bash

	make -j4

	```

	## 3. (Optional but recommended) Install the Python dependencies for the server

	This step requires Python/pip, but it's a one-time, isolated setup.

	```bash

	sudo apt install -y python3-pip python3-venv
	python3 -m venv venv
	source venv/bin/activate
	pip install -r requirements.txt

	```

	# 🚀 Step 3.1: Run the Model Server

	Now, start the server, pointing it to the GGUF model file you downloaded. Make sure you are in the llama.cpp directory with the virtual env activated

	```bash

	cd ~/models/llama.cpp
	source venv/bin/activate

	```

	Find the exact GGUF filename (replace with the actual filename you have)

	MODEL_FILE=~/models/microclaw-fallback/microclaw-for-openclaw-version-2026.2.17.Q4_K_M.gguf

	Run the server

	```

	./server -m $MODEL_FILE \
	--host 0.0.0.0 \
	--port 8000 \
	-c 2048 \
	-ngl 0 # Use -ngl 33 if you have an NVIDIA GPU and compiled with CUDA support

	```

	Explanation of flags:

	-m $MODEL_FILE : Path to your GGUF model.

	--host 0.0.0.0 : Listen on all network interfaces (so OpenClaw can connect).

	--port 8000 : The port the server will use.

	-c 2048 : Context size (adjust based on model requirements).

	-ngl 0 : Number of layers to offload to GPU. Use -ngl 33 (or more) if you have an NVIDIA GPU and compiled with CUDA.

	Keep this terminal window open. The server is now running and ready to accept requests.

	# ✅ Step 4: Test the Server

	Open a new terminal and test the API to ensure it's working correctly.

	```bash

	curl http://localhost:8000/v1/completions \
	-H "Content-Type: application/json" \
	-d '{
	"prompt": "What is the capital of France?",
	"max_tokens": 50,
	"temperature": 0.7
	}'

	```

	You should receive a JSON response containing the model's generated text.

	# 🔌 Step 5: Configure OpenClaw to Use the Local Server

	Now, configure OpenClaw to use this local server as its fallback agent.

	Locate OpenClaw's configuration file. This is often ~/.config/openclaw/config.toml, /etc/openclaw/config.toml, or a .env file in the OpenClaw directory.

	Edit the configuration to define a custom provider that points to your local server. The exact variable names depend on your OpenClaw version, but it generally looks something like this:

	```toml

	[agent.fallback]
	provider = "custom" # or "openai-compatible"
	base_url = "http://localhost:8000/v1"
	api_key = "not-needed" # llama.cpp server doesn't require a key
	model = "microclaw" # Optional: model name
	enabled = true

	```

	If OpenClaw uses environment variables (e.g., in a .env file), you might set:

	```text

	OPENCLAW_FALLBACK_PROVIDER=custom
	OPENCLAW_CUSTOM_BASE_URL=http://localhost:8000/v1
	OPENCLAW_CUSTOM_API_KEY=not-needed

	```

	Restart OpenClaw for the changes to take effect.

	# 🔁 How to Run the Server as a Background Service:

	To have the server start automatically on boot and restart if it crashes, you can create a systemd service.

	Create the service file:

	```bash

	sudo nano /etc/systemd/system/microclaw-llama.service

	Paste the following (adjust User, WorkingDirectory, and ExecStart paths as needed):
	ini

	[Unit]
	Description=llama.cpp server for Microclaw
	After=network.target

	[Service]
	Type=simple
	User=kali
	WorkingDirectory=/home/kali/models/llama.cpp
	ExecStart=/home/kali/models/llama.cpp/server -m /home/kali/models/microclaw-fallback/microclaw-for-openclaw-version-2026.2.17.Q4_K_M.gguf --host 0.0.0.0 --port 8000 -c 2048 -ngl 0
	Restart=on-failure
	RestartSec=10

	[Install]
	WantedBy=multi-user.target

	```

	Then enable and start the service:

	```bash

	sudo systemctl daemon-reload
	sudo systemctl enable microclaw-llama.service
	sudo systemctl start microclaw-llama.service
	sudo systemctl status microclaw-llama.service # Check if it's running

	```

	### ADVANCED GUIDE: TRAINING MICROCLAW.GGUF MODEL LOCALLY

	This guide adapts the full microclaw pipeline to run entirely on a low‑end machine like an 8GB RAM laptop or even a Raspberry Pi 5. We'll use a tiny base model (0.5B–1B parameters), parameter‑efficient fine‑tuning (LoRA) on CPU, and extreme quantization (2‑bit) to produce a GGUF file that runs smoothly on consumer hardware.

	The final system provides:
	- A local training script that fits in 8GB RAM (CPU only).
	- A FastAPI server (`server.py`) serving a retro MS‑DOS‑style CLI dashboard on `localhost:8080`.
	- Local API endpoints for inference, file management, cron jobs, and RAG.
	- SQLite as a local database (conversation history, cache, RAG index).
	- Integration with llama.cpp for efficient GGUF inference.

	---

	## Prerequisites

	- Hardware: x86_64 or ARM64 (Raspberry Pi 5) with at least 8GB RAM.
	- OS: Debian 12 / Kali Linux / Raspberry Pi OS (64‑bit).
	- Storage: 10GB free space.
	- Software: Python 3.10+, Git, CMake, build tools.

	---

	## Step 1: Environment Setup

	```bash

	cd /home/kali/microclaw
	python3 -m venv venv
	source venv/bin/activate
	pip install --upgrade pip
	pip install -r requirements.txt

	```

	`requirements.txt` (CPU‑optimized, no CUDA dependencies):

	```

	torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu
	transformers>=4.38.0
	accelerate
	datasets
	trl>=0.8.0
	peft
	bitsandbytes
	scipy
	sentencepiece
	protobuf
	fastapi
	uvicorn
	sqlite-utils
	pydantic
	pyyaml
	jinja2
	aiofiles
	llama-cpp-python

	```

	---

	## Step 2: Build llama.cpp (for conversion & inference)

	llama.cpp provides the tools to convert Hugging Face models to GGUF and run them efficiently on CPU.

	```bash

	cd /home/kali
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp
	mkdir build && cd build
	cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # optional: enables BLAS for speed
	make -j$(nproc)

	```

	After compilation, the `convert-hf-to-gguf.py` script will be in `llama.cpp/` (not in build). We'll use it later.

	---

	## Step 3: Prepare the Dataset

	You need a small dataset (a few hundred to a few thousand examples) for fine‑tuning and DPO. Place JSONL files in `data/raw/`.

	### 3.1 Tool‑use data (schema‑first)
	Each line:

	```json

	{
	"instruction": "List files in /home",
	"tools": ["ls"],
	"response": "ls /home"
	}

	```

	### 3.2 Preference data (for DPO)
	Each line:

	```json

	{
	"prompt": "What is the weather?",
	"chosen": "I cannot check live weather, but you can use the 'weather' tool.",
	"rejected": "I don't know."
	}

	```

	If you don't have preference data, you can skip DPO by setting `dpo: false` in config.

	### 3.3 RAG documents (optional)

	Place plain text files (`.txt`) in `data/rag_docs/`. The training script will chunk them and store embeddings in SQLite.

	---

	## Step 4: Configuration (`config.yaml`)

	Edit this file to match your paths and training preferences.

	```yaml

	# config.yaml
	model:
	base_model_name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0" # or "Qwen/Qwen2.5-0.5B"
	cache_dir: "models/base"

	training:
	output_dir: "models/lora"
	per_device_train_batch_size: 1
	gradient_accumulation_steps: 4
	learning_rate: 2e-4
	num_train_epochs: 3
	max_seq_length: 512
	use_lora: true
	lora_r: 8
	lora_alpha: 16
	lora_dropout: 0.05
	dpo: true
	dpo_beta: 0.1
	# CPU optimizations
	dataloader_num_workers: 0
	save_steps: 100
	logging_steps: 10

	data:
	train_file: "data/raw/train.jsonl"
	eval_file: "data/raw/eval.jsonl" # optional
	preference_file: "data/raw/preferences.jsonl" # for DPO

	rag:
	enabled: true
	chunk_size: 500
	chunk_overlap: 50
	embedding_model: "all-MiniLM-L6-v2" # tiny, runs on CPU
	db_path: "db/microclaw.db"

	server:
	host: "0.0.0.0"
	port: 8080
	model_path: "models/microclaw.gguf"
	context_size: 2048
	max_tokens: 512
	temperature: 0.7

	```

	---

	## Step 5: Training Script (`train.py`)

	This script performs supervised fine‑tuning (SFT) on instruction data, optionally followed by DPO, and finally merges the LoRA weights and saves the full model. It is heavily optimized for low RAM (CPU) usage.

	### Ultra‑Lightweight Local Training & Deployment Guide

	Optimized for CPU‑only systems (8GB RAM, no GPU) – Raspberry Pi ready

	This guide adapts the full microclaw pipeline to run entirely on a low‑end machine like an 8GB RAM laptop or even a Raspberry Pi 5. We'll use a tiny base model (0.5B–1B parameters), parameter‑efficient fine‑tuning (LoRA) on CPU, and extreme quantization (2‑bit) to produce a GGUF file that runs smoothly on consumer hardware.

	The final system provides:

	- A local training script that fits in 8GB RAM (CPU only).
	- A FastAPI server (`server.py`) serving a retro MS‑DOS‑style CLI dashboard on `localhost:8080`.
	- Local API endpoints for inference, file management, cron jobs, and RAG.
	- SQLite as a local database (conversation history, cache, RAG index).
	- Integration with llama.cpp for efficient GGUF inference.

	---

	# Folder Structure (to be created)

	```

	/home/kali/microclaw/
	├── server.py # FastAPI server (inference + static files + API)
	├── train.py # CPU‑optimized fine‑tuning + DPO script
	├── requirements.txt
	├── config.yaml
	├── data/
	│ ├── raw/ # Place your JSONL datasets here
	│ └── rag_docs/ # Text files for RAG (optional)
	├── models/
	│ ├── base/ # Will contain the downloaded base model
	│ ├── lora/ # LoRA adapters after training
	│ └── microclaw.gguf # Final quantized model (after conversion)
	├── static/
	│ ├── index.html # Main dashboard (CLI style)
	│ ├── style.css
	│ ├── script.js
	│ └── pages/ # Additional pages (file manager, cron, etc.)
	│ ├── files.html
	│ ├── cron.html
	│ └── rag.html
	├── db/
	│ └── microclaw.db # SQLite database (auto‑created)
	└── logs/
	└── training.log

	```

	---

	## Prerequisites

	- Hardware: x86_64 or ARM64 (Raspberry Pi 5) with at least 8GB RAM.
	- OS: Debian 12 / Kali Linux / Raspberry Pi OS (64‑bit).
	- Storage: 10GB free space.
	- Software: Python 3.10+, Git, CMake, build tools.

	---

	## Step 1: Environment Setup

	```bash

	cd /home/kali/microclaw
	python3 -m venv venv
	source venv/bin/activate
	pip install --upgrade pip
	pip install -r requirements.txt

	```

	`requirements.txt` (CPU‑optimized, no CUDA dependencies):

	```

	torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu
	transformers>=4.38.0
	accelerate
	datasets
	trl>=0.8.0
	peft
	bitsandbytes
	scipy
	sentencepiece
	protobuf
	fastapi
	uvicorn
	sqlite-utils
	pydantic
	pyyaml
	jinja2
	aiofiles
	llama-cpp-python

	```

	---

	## Step 2: Build llama.cpp (for conversion & inference)

	llama.cpp provides the tools to convert Hugging Face models to GGUF and run them efficiently on CPU.

	```bash

	cd /home/kali
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp
	mkdir build && cd build
	cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # optional: enables BLAS for speed
	make -j$(nproc)

	```

	After compilation, the `convert-hf-to-gguf.py` script will be in `llama.cpp/` (not in build). We'll use it later.

	---

	## Step 3: Prepare the Dataset

	You need a small dataset (a few hundred to a few thousand examples) for fine‑tuning and DPO. Place JSONL files in `data/raw/`.

	### 3.1 Tool‑use data (schema‑first)

	Each line:

	```json

	{
	"instruction": "List files in /home",
	"tools": ["ls"],
	"response": "ls /home"
	}
	```

	### 3.2 Preference data (for DPO)
	Each line:

	```json]

	{
	"prompt": "What is the weather?",
	"chosen": "I cannot check live weather, but you can use the 'weather' tool.",
	"rejected": "I don't know."
	}

	```

	If you don't have preference data, you can skip DPO by setting `dpo: false` in config.

	### 3.3 RAG documents (optional)
	Place plain text files (`.txt`) in `data/rag_docs/`. The training script will chunk them and store embeddings in SQLite.

	---

	## Step 4: Configuration (`config.yaml`)

	Edit this file to match your paths and training preferences.

	```yaml

	# config.yaml
	model:
	base_model_name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0" # or "Qwen/Qwen2.5-0.5B"
	cache_dir: "models/base"

	training:
	output_dir: "models/lora"
	per_device_train_batch_size: 1
	gradient_accumulation_steps: 4
	learning_rate: 2e-4
	num_train_epochs: 3
	max_seq_length: 512
	use_lora: true
	lora_r: 8
	lora_alpha: 16
	lora_dropout: 0.05
	dpo: true
	dpo_beta: 0.1
	# CPU optimizations
	dataloader_num_workers: 0
	save_steps: 100
	logging_steps: 10

	data:
	train_file: "data/raw/train.jsonl"
	eval_file: "data/raw/eval.jsonl" # optional
	preference_file: "data/raw/preferences.jsonl" # for DPO

	rag:
	enabled: true
	chunk_size: 500
	chunk_overlap: 50
	embedding_model: "all-MiniLM-L6-v2" # tiny, runs on CPU
	db_path: "db/microclaw.db"

	server:
	host: "0.0.0.0"
	port: 8080
	model_path: "models/microclaw.gguf"
	context_size: 2048
	max_tokens: 512
	temperature: 0.7

	```

	---

	## Step 5: Training Script (`train.py`)

	This script performs supervised fine‑tuning (SFT) on instruction data, optionally followed by DPO, and finally merges the LoRA weights and saves the full model. It is heavily optimized for low RAM (CPU) usage.

	```python

	#!/usr/bin/env python3
	# train.py – CPU‑only fine‑tuning with LoRA + optional DPO

	import os
	import yaml
	import torch
	from transformers import (
	AutoTokenizer,
	AutoModelForCausalLM,
	TrainingArguments,
	Trainer,
	BitsAndBytesConfig
	)
	from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
	from datasets import load_dataset
	from trl import DPOTrainer
	import logging

	# Load config
	with open("config.yaml") as f:
	config = yaml.safe_load(f)

	# Setup logging
	logging.basicConfig(level=logging.INFO, filename="logs/training.log", filemode="w")
	logger = logging.getLogger(__name__)

	def main():
	# 1. Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(config["model"]["base_model_name"], cache_dir=config["model"]["cache_dir"])
	tokenizer.pad_token = tokenizer.eos_token

	# 2. Load base model in 8-bit (CPU offload not supported for bitsandbytes on CPU; we use standard dtype)
	# For CPU, we load in float32 and rely on LoRA to reduce memory.
	model = AutoModelForCausalLM.from_pretrained(
	config["model"]["base_model_name"],
	cache_dir=config["model"]["cache_dir"],
	torch_dtype=torch.float32, # CPU uses float32
	low_cpu_mem_usage=True
	)

	# 3. Prepare LoRA
	if config["training"]["use_lora"]:
	lora_config = LoraConfig(
	task_type=TaskType.CAUSAL_LM,
	r=config["training"]["lora_r"],
	lora_alpha=config["training"]["lora_alpha"],
	lora_dropout=config["training"]["lora_dropout"],
	target_modules=["q_proj", "v_proj"] # adjust for your model
	)
	model = get_peft_model(model, lora_config)
	model.print_trainable_parameters()

	# 4. Load dataset
	dataset = load_dataset("json", data_files=config["data"]["train_file"], split="train")
	if config["data"].get("eval_file"):
	eval_dataset = load_dataset("json", data_files=config["data"]["eval_file"], split="train")
	else:
	eval_dataset = None

	# Format prompt: "### Instruction:\n{instruction}\n\n### Response:\n{response}"
	def format_func(example):
	text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}{tokenizer.eos_token}"
	return {"text": text}

	dataset = dataset.map(format_func)
	if eval_dataset:
	eval_dataset = eval_dataset.map(format_func)

	# Tokenize
	def tokenize(element):
	return tokenizer(element["text"], truncation=True, max_length=config["training"]["max_seq_length"], padding=False)

	dataset = dataset.map(tokenize, remove_columns=dataset.column_names)
	if eval_dataset:
	eval_dataset = eval_dataset.map(tokenize, remove_columns=eval_dataset.column_names)

	# 5. Training arguments (CPU‑friendly)
	training_args = TrainingArguments(
	output_dir=config["training"]["output_dir"],
	per_device_train_batch_size=config["training"]["per_device_train_batch_size"],
	gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"],
	learning_rate=config["training"]["learning_rate"],
	num_train_epochs=config["training"]["num_train_epochs"],
	logging_steps=config["training"]["logging_steps"],
	save_steps=config["training"]["save_steps"],
	evaluation_strategy="steps" if eval_dataset else "no",
	eval_steps=config["training"]["save_steps"],
	save_total_limit=2,
	load_best_model_at_end=True if eval_dataset else False,
	metric_for_best_model="eval_loss",
	greater_is_better=False,
	fp16=False, # CPU doesn't support fp16
	bf16=False,
	dataloader_num_workers=0, # avoid multiprocessing issues
	optim="adamw_torch",
	torch_compile=False, # no speedup on CPU
	)

	# 6. Trainer (SFT)
	trainer = Trainer(
	model=model,
	args=training_args,
	train_dataset=dataset,
	eval_dataset=eval_dataset,
	tokenizer=tokenizer,
	)

	logger.info("Starting SFT training...")
	trainer.train()
	trainer.save_model() # saves LoRA adapters

	# 7. Optional DPO training
	if config["training"]["dpo"] and config["data"].get("preference_file"):
	logger.info("Loading preference data for DPO...")
	pref_dataset = load_dataset("json", data_files=config["data"]["preference_file"], split="train")

	# For DPO we need base model without LoRA (or merged)
	# We'll reload base model and then apply LoRA weights
	# (Simplified: use the same model with LoRA attached; DPO trainer handles it)
	dpo_trainer = DPOTrainer(
	model=model,
	ref_model=None, # uses model as reference (or you can provide a frozen copy)
	args=training_args, # reuse same args (adjust for DPO)
	train_dataset=pref_dataset,
	tokenizer=tokenizer,
	beta=config["training"]["dpo_beta"],
	max_length=config["training"]["max_seq_length"],
	max_prompt_length=256,
	)
	logger.info("Starting DPO training...")
	dpo_trainer.train()
	dpo_trainer.save_model(config["training"]["output_dir"] + "_dpo")

	# 8. Merge LoRA and save full model (for conversion)
	logger.info("Merging LoRA weights...")
	merged_model = model.merge_and_unload()
	merged_model.save_pretrained("models/merged")
	tokenizer.save_pretrained("models/merged")
	logger.info("Merged model saved to models/merged")

	if __name__ == "__main__":
	main()

	```

	Run training:

	```bash

	python train.py

	```
	Note: Training a 1B model on CPU with batch size 1 may take several hours to days depending on dataset size. Reduce epochs or dataset size for testing.

	---

	## Step 6: Convert to GGUF

	After training, we have a merged Hugging Face model in `models/merged/`. Now use llama.cpp's conversion script.

	```bash

	cd /home/kali/llama.cpp
	python convert-hf-to-gguf.py /home/kali/microclaw/models/merged \
	--outfile /home/kali/microclaw/models/microclaw.gguf \
	--outtype q2_k # 2‑bit quantization (extremely small)

	```

	For Raspberry Pi, `q2_k` is ideal. You can also try `q3_k_s` if you have more RAM.

	---

	## Step 7: Build the FastAPI Server (`server.py`)

	This server serves:

	- Static files (the CLI dashboard) from the `static/` folder.
	- API endpoints for inference, file management, cron, and RAG.
	- SQLite database for conversation history and RAG cache.

	```python

	#!/usr/bin/env python3
	# server.py – FastAPI server with GGUF inference and static dashboard

	import os
	import yaml
	import sqlite3
	import json
	from pathlib import Path
	from fastapi import FastAPI, Request, HTTPException
	from fastapi.responses import HTMLResponse, JSONResponse
	from fastapi.staticfiles import StaticFiles
	from pydantic import BaseModel
	from typing import Optional, List
	import uvicorn
	from llama_cpp import Llama

	# Load config
	with open("config.yaml") as f:
	config = yaml.safe_load(f)

	# Initialize SQLite
	DB_PATH = config["rag"]["db_path"]
	conn = sqlite3.connect(DB_PATH, check_same_thread=False)
	cursor = conn.cursor()
	cursor.execute("""
	CREATE TABLE IF NOT EXISTS history (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	prompt TEXT,
	response TEXT,
	timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
	)
	""")
	cursor.execute("""
	CREATE TABLE IF NOT EXISTS rag_cache (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	query TEXT UNIQUE,
	chunks TEXT,
	embedding BLOB
	)
	""")
	conn.commit()

	# Load GGUF model
	model_path = config["server"]["model_path"]
	llm = Llama(
	model_path=model_path,
	n_ctx=config["server"]["context_size"],
	n_threads=os.cpu_count(),
	n_gpu_layers=0, # CPU only
	verbose=False,
	)

	app = FastAPI(title="microclaw Gateway")

	# Mount static files
	app.mount("/static", StaticFiles(directory="static"), name="static")

	# API Models
	class PromptRequest(BaseModel):
	prompt: str
	max_tokens: Optional[int] = 256
	temperature: Optional[float] = 0.7
	use_rag: Optional[bool] = False

	class ToolRequest(BaseModel):
	tool: str
	args: dict

	# Simple RAG (placeholder – you can enhance with embeddings)
	def retrieve_chunks(query: str) -> str:
	# For demo, just return static text; real implementation would use embeddings
	return "Relevant document chunk about file management."

	@app.get("/", response_class=HTMLResponse)
	async def root():
	with open("static/index.html") as f:
	return f.read()

	@app.post("/api/chat")
	async def chat(req: PromptRequest):
	# Optionally enhance prompt with RAG
	if req.use_rag:
	context = retrieve_chunks(req.prompt)
	augmented_prompt = f"Context: {context}\n\nQuestion: {req.prompt}\nAnswer:"
	else:
	augmented_prompt = req.prompt

	# Call model
	output = llm(
	augmented_prompt,
	max_tokens=req.max_tokens,
	temperature=req.temperature,
	stop=["</s>", "###"],
	echo=False
	)
	response = output["choices"][0]["text"].strip()

	# Save to history
	cursor.execute("INSERT INTO history (prompt, response) VALUES (?, ?)", (req.prompt, response))
	conn.commit()

	return {"response": response}

	@app.get("/api/history")
	async def get_history(limit: int = 50):
	cursor.execute("SELECT prompt, response, timestamp FROM history ORDER BY timestamp DESC LIMIT ?", (limit,))
	rows = cursor.fetchall()
	return [{"prompt": r[0], "response": r[1], "timestamp": r[2]} for r in rows]

	@app.post("/api/tool")
	async def run_tool(req: ToolRequest):
	# Example: execute system commands (sandboxed)
	if req.tool == "ls":
	path = req.args.get("path", ".")
	try:
	files = os.listdir(path)
	return {"output": "\n".join(files)}
	except Exception as e:
	return {"error": str(e)}
	elif req.tool == "cron_list":
	# Parse crontab (requires user permissions)
	# For demo, return placeholder
	return {"output": "0 5 * * * /home/kali/backup.sh"}
	else:
	return {"error": "Unknown tool"}

	if __name__ == "__main__":
	uvicorn.run(app, host=config["server"]["host"], port=config["server"]["port"])

	```

	---


	## Step 8: Run the Server

	```bash

	cd /home/kali/microclaw
	source venv/bin/activate
	python server.py

	```

	Open your browser to `http://localhost:8080` and start interacting.

	---

	# Step 9: TRAIN.PY: use this for the train.py file:

	```python

	#!/usr/bin/env python3
	# train.py – CPU‑only fine‑tuning with LoRA + optional DPO

	import os
	import yaml
	import torch
	from transformers import (
	AutoTokenizer,
	AutoModelForCausalLM,
	TrainingArguments,
	Trainer,
	BitsAndBytesConfig
	)
	from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
	from datasets import load_dataset
	from trl import DPOTrainer
	import logging

	# Load config
	with open("config.yaml") as f:
	config = yaml.safe_load(f)

	# Setup logging
	logging.basicConfig(level=logging.INFO, filename="logs/training.log", filemode="w")
	logger = logging.getLogger(__name__)

	def main():
	# 1. Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(config["model"]["base_model_name"], cache_dir=config["model"]["cache_dir"])
	tokenizer.pad_token = tokenizer.eos_token

	# 2. Load base model in 8-bit (CPU offload not supported for bitsandbytes on CPU; we use standard dtype)
	# For CPU, we load in float32 and rely on LoRA to reduce memory.
	model = AutoModelForCausalLM.from_pretrained(
	config["model"]["base_model_name"],
	cache_dir=config["model"]["cache_dir"],
	torch_dtype=torch.float32, # CPU uses float32
	low_cpu_mem_usage=True
	)

	# 3. Prepare LoRA
	if config["training"]["use_lora"]:
	lora_config = LoraConfig(
	task_type=TaskType.CAUSAL_LM,
	r=config["training"]["lora_r"],
	lora_alpha=config["training"]["lora_alpha"],
	lora_dropout=config["training"]["lora_dropout"],
	target_modules=["q_proj", "v_proj"] # adjust for your model
	)
	model = get_peft_model(model, lora_config)
	model.print_trainable_parameters()

	# 4. Load dataset
	dataset = load_dataset("json", data_files=config["data"]["train_file"], split="train")
	if config["data"].get("eval_file"):
	eval_dataset = load_dataset("json", data_files=config["data"]["eval_file"], split="train")
	else:
	eval_dataset = None

	# Format prompt: "### Instruction:\n{instruction}\n\n### Response:\n{response}"
	def format_func(example):
	text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}{tokenizer.eos_token}"
	return {"text": text}

	dataset = dataset.map(format_func)
	if eval_dataset:
	eval_dataset = eval_dataset.map(format_func)

	# Tokenize
	def tokenize(element):
	return tokenizer(element["text"], truncation=True, max_length=config["training"]["max_seq_length"], padding=False)

	dataset = dataset.map(tokenize, remove_columns=dataset.column_names)
	if eval_dataset:
	eval_dataset = eval_dataset.map(tokenize, remove_columns=eval_dataset.column_names)

	# 5. Training arguments (CPU‑friendly)
	training_args = TrainingArguments(
	output_dir=config["training"]["output_dir"],
	per_device_train_batch_size=config["training"]["per_device_train_batch_size"],
	gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"],
	learning_rate=config["training"]["learning_rate"],
	num_train_epochs=config["training"]["num_train_epochs"],
	logging_steps=config["training"]["logging_steps"],
	save_steps=config["training"]["save_steps"],
	evaluation_strategy="steps" if eval_dataset else "no",
	eval_steps=config["training"]["save_steps"],
	save_total_limit=2,
	load_best_model_at_end=True if eval_dataset else False,
	metric_for_best_model="eval_loss",
	greater_is_better=False,
	fp16=False, # CPU doesn't support fp16
	bf16=False,
	dataloader_num_workers=0, # avoid multiprocessing issues
	optim="adamw_torch",
	torch_compile=False, # no speedup on CPU
	)

	# 6. Trainer (SFT)
	trainer = Trainer(
	model=model,
	args=training_args,
	train_dataset=dataset,
	eval_dataset=eval_dataset,
	tokenizer=tokenizer,
	)

	logger.info("Starting SFT training...")
	trainer.train()
	trainer.save_model() # saves LoRA adapters

	# 7. Optional DPO training
	if config["training"]["dpo"] and config["data"].get("preference_file"):
	logger.info("Loading preference data for DPO...")
	pref_dataset = load_dataset("json", data_files=config["data"]["preference_file"], split="train")

	# For DPO we need base model without LoRA (or merged)
	# We'll reload base model and then apply LoRA weights
	# (Simplified: use the same model with LoRA attached; DPO trainer handles it)
	dpo_trainer = DPOTrainer(
	model=model,
	ref_model=None, # uses model as reference (or you can provide a frozen copy)
	args=training_args, # reuse same args (adjust for DPO)
	train_dataset=pref_dataset,
	tokenizer=tokenizer,
	beta=config["training"]["dpo_beta"],
	max_length=config["training"]["max_seq_length"],
	max_prompt_length=256,
	)
	logger.info("Starting DPO training...")
	dpo_trainer.train()
	dpo_trainer.save_model(config["training"]["output_dir"] + "_dpo")

	# 8. Merge LoRA and save full model (for conversion)
	logger.info("Merging LoRA weights...")
	merged_model = model.merge_and_unload()
	merged_model.save_pretrained("models/merged")
	tokenizer.save_pretrained("models/merged")
	logger.info("Merged model saved to models/merged")

	if __name__ == "__main__":
	main()

	```

	Run training:

	```bash

	python train.py

	```
	Note: Training a 1B model on CPU with batch size 1 may take several hours to days depending on dataset size. Reduce epochs or dataset size for testing.

	---

	## Step 10: Convert to GGUF

	After training, we have a merged Hugging Face model in `models/merged/`. Now use llama.cpp's conversion script.

	```bash

	cd /home/kali/llama.cpp
	python convert-hf-to-gguf.py /home/kali/microclaw/models/merged \
	--outfile /home/kali/microclaw/models/microclaw.gguf \
	--outtype q2_k # 2‑bit quantization (extremely small)

	```

	For Raspberry Pi, `q2_k` is ideal. You can also try `q3_k_s` if you have more RAM.

	---

	## Step 11: Build the FastAPI Server (`server.py`)

	This server serves:

	- Static files (the CLI dashboard) from the `static/` folder.
	- API endpoints for inference, file management, cron, and RAG.
	- SQLite database for conversation history and RAG cache.

	```python

	#!/usr/bin/env python3
	# server.py – FastAPI server with GGUF inference and static dashboard

	import os
	import yaml
	import sqlite3
	import json
	from pathlib import Path
	from fastapi import FastAPI, Request, HTTPException
	from fastapi.responses import HTMLResponse, JSONResponse
	from fastapi.staticfiles import StaticFiles
	from pydantic import BaseModel
	from typing import Optional, List
	import uvicorn
	from llama_cpp import Llama

	# Load config
	with open("config.yaml") as f:
	config = yaml.safe_load(f)

	# Initialize SQLite
	DB_PATH = config["rag"]["db_path"]
	conn = sqlite3.connect(DB_PATH, check_same_thread=False)
	cursor = conn.cursor()
	cursor.execute("""
	CREATE TABLE IF NOT EXISTS history (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	prompt TEXT,
	response TEXT,
	timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
	)
	""")
	cursor.execute("""
	CREATE TABLE IF NOT EXISTS rag_cache (
	id INTEGER PRIMARY KEY AUTOINCREMENT,
	query TEXT UNIQUE,
	chunks TEXT,
	embedding BLOB
	)
	""")
	conn.commit()

	# Load GGUF model
	model_path = config["server"]["model_path"]
	llm = Llama(
	model_path=model_path,
	n_ctx=config["server"]["context_size"],
	n_threads=os.cpu_count(),
	n_gpu_layers=0, # CPU only
	verbose=False,
	)

	app = FastAPI(title="microclaw Gateway")

	# Mount static files
	app.mount("/static", StaticFiles(directory="static"), name="static")

	# API Models
	class PromptRequest(BaseModel):
	prompt: str
	max_tokens: Optional[int] = 256
	temperature: Optional[float] = 0.7
	use_rag: Optional[bool] = False

	class ToolRequest(BaseModel):
	tool: str
	args: dict

	# Simple RAG (placeholder – you can enhance with embeddings)
	def retrieve_chunks(query: str) -> str:
	# For demo, just return static text; real implementation would use embeddings
	return "Relevant document chunk about file management."

	@app.get("/", response_class=HTMLResponse)
	async def root():
	with open("static/index.html") as f:
	return f.read()

	@app.post("/api/chat")
	async def chat(req: PromptRequest):
	# Optionally enhance prompt with RAG
	if req.use_rag:
	context = retrieve_chunks(req.prompt)
	augmented_prompt = f"Context: {context}\n\nQuestion: {req.prompt}\nAnswer:"
	else:
	augmented_prompt = req.prompt

	# Call model
	output = llm(
	augmented_prompt,
	max_tokens=req.max_tokens,
	temperature=req.temperature,
	stop=["</s>", "###"],
	echo=False
	)
	response = output["choices"][0]["text"].strip()

	# Save to history
	cursor.execute("INSERT INTO history (prompt, response) VALUES (?, ?)", (req.prompt, response))
	conn.commit()

	return {"response": response}

	@app.get("/api/history")
	async def get_history(limit: int = 50):
	cursor.execute("SELECT prompt, response, timestamp FROM history ORDER BY timestamp DESC LIMIT ?", (limit,))
	rows = cursor.fetchall()
	return [{"prompt": r[0], "response": r[1], "timestamp": r[2]} for r in rows]

	@app.post("/api/tool")
	async def run_tool(req: ToolRequest):
	# Example: execute system commands (sandboxed)
	if req.tool == "ls":
	path = req.args.get("path", ".")
	try:
	files = os.listdir(path)
	return {"output": "\n".join(files)}
	except Exception as e:
	return {"error": str(e)}
	elif req.tool == "cron_list":
	# Parse crontab (requires user permissions)
	# For demo, return placeholder
	return {"output": "0 5 * * * /home/kali/backup.sh"}
	else:
	return {"error": "Unknown tool"}

	if __name__ == "__main__":
	uvicorn.run(app, host=config["server"]["host"], port=config["server"]["port"])

	```

	---

	## Step 12: Create the Retro CLI Dashboard

	# `static/index.html`

	```html

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<title>microclaw v2026.2.21 – CLI Gateway</title>
	<link rel="stylesheet" href="/static/style.css">
	</head>
	<body>
	<div class="terminal">
	<div class="header">microclaw [Version 2026.2.21] – Local Fallback Agent</div>
	<div class="output" id="output">
	<div>> System ready. Type a command or question.</div>
	<div>> Use /help for available commands.</div>
	</div>
	<div class="input-line">
	<span class="prompt">$></span>
	<input type="text" id="input" autofocus>
	</div>
	</div>
	<script src="/static/script.js"></script>
	</body>
	</html>

	```

	# `static/style.css`

	```css

	body {
	background: #000;
	color: #0f0;
	font-family: 'Courier New', monospace;
	margin: 0;
	padding: 20px;
	}
	.terminal {
	max-width: 900px;
	margin: auto;
	border: 2px solid #0f0;
	padding: 10px;
	height: 80vh;
	display: flex;
	flex-direction: column;
	}
	.header {
	border-bottom: 1px solid #0f0;
	padding-bottom: 5px;
	margin-bottom: 10px;
	text-align: center;
	font-weight: bold;
	}
	.output {
	flex: 1;
	overflow-y: auto;
	white-space: pre-wrap;
	margin-bottom: 10px;
	}
	.input-line {
	display: flex;
	border-top: 1px solid #0f0;
	padding-top: 5px;
	}
	.prompt {
	margin-right: 5px;
	}
	#input {
	background: #000;
	border: none;
	color: #0f0;
	font-family: 'Courier New', monospace;
	font-size: 1em;
	flex: 1;
	outline: none;
	}

	```

	# `static/script.js`

	```javascript

	const input = document.getElementById('input');
	const output = document.getElementById('output');

	input.addEventListener('keydown', async (e) => {
	if (e.key === 'Enter') {
	const cmd = input.value.trim();
	input.value = '';
	addLine(`$> ${cmd}`);
	await processCommand(cmd);
	}
	});

	async function processCommand(cmd) {
	if (cmd === '/help') {
	addLine('Available commands:');
	addLine(' /chat <question> – ask the model');
	addLine(' /ls [path] – list files');
	addLine(' /cron – show cron jobs');
	addLine(' /history – show chat history');
	addLine(' /clear – clear screen');
	return;
	}

	if (cmd === '/clear') {
	output.innerHTML = '';
	return;
	}

	if (cmd.startsWith('/chat ')) {
	const prompt = cmd.slice(6);
	addLine('... thinking ...');
	try {
	const res = await fetch('/api/chat', {
	method: 'POST',
	headers: {'Content-Type': 'application/json'},
	body: JSON.stringify({prompt, use_rag: false})
	});
	const data = await res.json();
	addLine(data.response);
	} catch (err) {
	addLine('Error: ' + err);
	}
	return;
	}

	if (cmd === '/history') {
	try {
	const res = await fetch('/api/history');
	const history = await res.json();
	history.forEach(item => {
	addLine(`[${item.timestamp}] Q: ${item.prompt}`);
	addLine(`A: ${item.response}`);
	});
	} catch (err) {
	addLine('Error: ' + err);
	}
	return;
	}

	if (cmd.startsWith('/ls')) {
	const parts = cmd.split(' ');
	const path = parts[1] \|\| '.';
	try {
	const res = await fetch('/api/tool', {
	method: 'POST',
	headers: {'Content-Type': 'application/json'},
	body: JSON.stringify({tool: 'ls', args: {path}})
	});
	const data = await res.json();
	addLine(data.output \|\| data.error);
	} catch (err) {
	addLine('Error: ' + err);
	}
	return;
	}

	if (cmd === '/cron') {
	try {
	const res = await fetch('/api/tool', {
	method: 'POST',
	headers: {'Content-Type': 'application/json'},
	body: JSON.stringify({tool: 'cron_list', args: {}})
	});
	const data = await res.json();
	addLine(data.output \|\| data.error);
	} catch (err) {
	addLine('Error: ' + err);
	}
	return;
	}

	addLine(`Unknown command: ${cmd}. Type /help.`);
	}

	function addLine(text) {
	const line = document.createElement('div');
	line.textContent = text;
	output.appendChild(line);
	output.scrollTop = output.scrollHeight;
	}

	```

	You can add more pages (`static/pages/files.html`, `static/pages/cron.html`) and link them from the CLI using `/open files` commands, but for simplicity we'll keep the single‑page CLI.

	---

	## Step 13: Run the Server

	```bash

	cd /home/kali/microclaw
	source venv/bin/activate
	python server.py

	```

	Open your browser to `http://localhost:8080` and start interacting.

	---

	## Troubleshooting

	- Out of memory during training: Reduce `max_seq_length`, batch size, or use a smaller base model (e.g., `Qwen2.5-0.5B`).

	- Slow inference: Ensure you compiled llama.cpp with OpenBLAS. Use fewer CPU threads if needed (`n_threads=4`).

	- GGUF conversion errors: Make sure you have the correct `transformers` version and that the merged model is saved properly.

	- Model file not found: Ensure the path in the -m flag is correct. Use the absolute path.

	- Port already in use: Change the --port value (e.g., to 8001) and update your OpenClaw configuration.

	- Server starts but responds slowly: This is normal on CPU. You can try a smaller, more quantized GGUF variant from the Hugging Face repo (e.g., Q2_K for 2-bit).

	- If 'git lfs pull' fails or is slow: If downloads are interrupted, run the command again—it will resume.

	- OpenClaw cannot connect: Verify the server is running with curl (as in Step 4). Check any firewall rules. If OpenClaw is in a Docker container, ensure they are on the same network (using --network host for the OpenClaw container is the simplest solution).

	- If pkg-config is missing: which CMake uses to find some libraries.

	Install it and rebuild:

	```bash

	sudo apt update
	sudo apt install pkg-config
	cd /home/kali/llama.cpp
	rm -rf build # clean previous attempt
	cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
	cmake --build build --config Release -j$(nproc)

	```

	This should now complete successfully. If you still encounter issues, you can temporarily disable BLAS to get a working build:

	```bash

	cd /home/kali/llama.cpp
	rm -rf build
	cmake -B build
	cmake --build build --config Release -j$(nproc)

	```

	After building, you'll have build/bin/llama-server and the conversion script convert-hf-to-gguf.py in the main llama.cpp directory.

	### The "illegal hardware instruction" error: indicates that the PyTorch build you're using is trying to execute CPU instructions (like AVX2) that your processor does not support. This is common on older CPUs or virtual machines. Let's diagnose and fix it.

	## 1. Check your CPU's instruction set

	Run this command to see what your CPU supports:

	```bash
	lscpu \| grep -E "Model name\|Flags"
	```

	Look for flags like `avx`, `avx2`, `sse4_1`, etc. If you don't see `avx2`, that's the problem.

	## 2. Install a PyTorch version compatible with your CPU

	The standard PyTorch wheels from the official site require AVX2. You have two options:

	### Option A: Install PyTorch from conda-forge (recommended)
	Conda-forge often provides more compatible builds, including for older CPUs.

	```bash
	# Install Miniconda if you haven't
	wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
	bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
	source ~/miniconda3/bin/activate

	# Create a new environment with Python 3.10
	conda create -y -n microclaw python=3.10
	conda activate microclaw

	# Install PyTorch CPU-only from conda-forge
	conda install -y pytorch cpuonly -c pytorch # but this might also require AVX2
	# Better: use conda-forge
	conda install -y pytorch cpuonly -c conda-forge
	```

	If that still fails, we can try building PyTorch from source with older instruction sets, but that's complex.

	### Option B: Use the PyTorch wheels with no AVX requirements
	There are community builds that target older CPUs. For example, the `manylinux2014` wheels might work. Try:

	```bash
	pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu --no-deps
	```

	But the official wheels likely require AVX2. You could try an older PyTorch version (e.g., 1.13) which may have broader support.

	### Option C: Use llama.cpp for training as well
	Since llama.cpp is pure C++ and can be compiled for any CPU, you could use it for training too. However, we saw that the `finetune` tool wasn't present. But you can build it with the right flags. Let's try building the `finetune` example explicitly.

	First, ensure you have the latest llama.cpp with the finetune example:

	```bash
	cd ~/llama.cpp
	git pull origin master
	```

	Now build the finetune tool:

	```bash
	mkdir -p build && cd build
	cmake .. -DLLAMA_FINETUNE=ON -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native"
	make -j$(nproc) finetune
	```

	After that, the `finetune` binary should appear in `build/bin/`. Then you can train using the command we discussed earlier.

	## 3. If you still want to use Python, consider using a smaller model

	TinyLlama is already small, but the error is at the PyTorch level, not the model. So changing the model won't help. You need a compatible PyTorch.

	## 4. Verify your Python environment

	Maybe the virtual environment is using a system Python that has a broken PyTorch. Try creating a fresh venv with Python 3.10 and reinstalling all packages.

	```bash
	cd ~/microclaw
	deactivate
	rm -rf venv
	python3.10 -m venv venv
	source venv/bin/activate
	pip install --upgrade pip
	pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu
	pip install -r requirements.txt
	```

	Then run `python train.py` again.


	# LICENSE

	Apache 2.0 License.