Text Generation
Transformers
GGUF
PyTorch
English
gpt2
gpt2-small
117M
conversational
grpo
vae
kv-cache
distillation
reinforcement-learning
openclaw
fallback-agent
soul-md
agent-framework
tool-use
task-automation
dpo
tool-masking
uncertainty-estimation
rag
semantic-cache
quantization
pruning
arxiv:2402.03300
| library_name: transformers | |
| pipeline_tag: text-generation | |
| base_model: openai-community/gpt2 | |
| language: | |
| - en | |
| tags: | |
| - transformers | |
| - pytorch | |
| - gguf | |
| - gpt2 | |
| - gpt2-small | |
| - 117M | |
| - text-generation | |
| - conversational | |
| - grpo | |
| - vae | |
| - kv-cache | |
| - distillation | |
| - reinforcement-learning | |
| - openclaw | |
| - fallback-agent | |
| - soul-md | |
| - agent-framework | |
| - tool-use | |
| - task-automation | |
| - dpo | |
| - tool-masking | |
| - uncertainty-estimation | |
| - rag | |
| - semantic-cache | |
| - quantization | |
| - pruning | |
| - arxiv:2402.03300 | |
| license: apache-2.0 | |
| # 🧠 microclaw-for-openclaw – Fallback Agent for OpenClaw (v2026.2.17) | |
| **Model ID:** `webxos/microclaw-for-openclaw-version-2026.2.17` | |
| **Tags:** `openclaw`, `fallback-agent`, `grpo`, `vae`, `kv-cache`, `dpo`, `tool-masking`, `uncertainty`, `rag`, `semantic-cache`, `soul.md`, `huggingface-space`, `gguf`, `llm-distillation` | |
| --- | |
| ## 📌 Overview | |
| **microclaw** (v2026.2.17) is a lightweight, distilled language model designed as a **fallback agent** for the [OpenClaw](https://openclaw.org) ecosystem. When the primary agent loses connectivity or requires offline operation, microclaw steps in to handle essential system tasks: file management, status checks, cron jobs, and simple Q&A. | |
| **WARNING: You will need to train your own GGUF model locally, the microcaw.gguf presented in this repo is a lightweight placeholder so users can scale and build their own local models with llama.cpp.** | |
| You will need to configure your own build locally from scratch with this model, it is still being developed and is under testing. | |
| This version is made to integrate directly with Openclaw.ai 18789 port and in this README.md we will present multiple ways and optional ways | |
| to configure this agent on your local Linux Debian based machines. | |
| This version introduces **advanced training and inference enhancements**: | |
| - **Tool‑use masking** and **schema‑first training** for reliable function calling. | |
| - **Direct Preference Optimization (DPO)** to align outputs with human preferences. | |
| - **Uncertainty estimation** with configurable thresholds for safe escalation. | |
| - **Retrieval‑Augmented Generation (RAG)** with semantic chunking. | |
| - **Semantic KV‑cache** for high‑similarity query reuse. | |
| - **Quantization (down to 2‑bit)** and **pruning** for extreme memory efficiency. | |
| The repository contains the full and partially trained model files, configuration (`soul.md`, `AGENTS.md`, `HEARTBEAT.md`, `SECURITY.md`), and export bundles ready for deployment to **Hugging Face Spaces** or local execution with OpenClaw. | |
| --- | |
| ## ✨ Key Features | |
| - **GRPO (Group Relative Policy Optimization)** – Trains the agent with group‑wise advantage estimation for stable policy updates. | |
| - **VAE Filter** – A Variational Autoencoder that filters low‑quality training samples, improving output coherence. | |
| - **Tool‑Use Masking** – Masks non‑tool tokens during training to enforce strict schema adherence (JSON/YAML). | |
| - **DPO (Direct Preference Optimization)** – Fine‑tunes on preference pairs to reduce hallucinations and improve helpfulness. | |
| - **Uncertainty Estimation** – Monitors token‑level entropy and escalates to safe responses when confidence drops below a threshold. | |
| - **RAG (Retrieval‑Augmented Generation)** – Retrieves relevant chunks from a local knowledge base (FAISS) to ground responses. | |
| - **Semantic Cache** – Reuses previous generations for semantically similar queries, reducing latency and cost. | |
| - **Quantization & Pruning** – Compress the model to 2‑8 bits and prune unimportant weights; backend support for AutoGPTQ, llama.cpp (GGUF), and bitsandbytes. | |
| - **KV‑Cache** – Intelligent reuse of key/value states reduces inference latency by up to 78% (measured on local benchmarks). | |
| - **Soul.md Configuration** – Define personality, sub‑agent rules, proactive tasks, and prompt injection defenses in plain Markdown. | |
| - **Export Ready** – One‑click export to a **Hugging Face Space** (Docker‑based) or a portable ZIP archive. | |
| - **Quantized (4‑bit GGUF)** – Optimized for memory‑constrained environments; runs smoothly on CPU. | |
| --- | |
| ### Part 1: Installation | |
| Included are multiple guides and ways you can implement Microclaw into your custom build, with steps to further train the GGUF file locally: | |
| **Read all steps carefully and find the right guide for your use case/setup, Not all options may work on your system. These guides are designed for specific use on Linux Debian systems.** | |
| # 1.1 Installation Guide + System Update & Basic Tools | |
| ```bash | |
| sudo apt update | |
| sudo apt upgrade -y | |
| sudo apt install -y curl wget git build-essential | |
| ``` | |
| # 1.2 Install Docker (for containerized execution) | |
| ```bash | |
| # Add Docker's official GPG key and repository | |
| curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg | |
| echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian bullseye stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null | |
| # Install Docker Engine | |
| sudo apt update | |
| sudo apt install -y docker-ce docker-ce-cli containerd.io | |
| # Add your user to the docker group (avoid sudo for every command) | |
| sudo usermod -aG docker $USER | |
| newgrp docker # activate group changes in current shell | |
| ``` | |
| # 1.3 Install Node.js (v22 or later) & TypeScript | |
| ```bash | |
| # Using NodeSource repository for a modern Node.js version | |
| curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - | |
| sudo apt install -y nodejs | |
| # Install TypeScript globally | |
| sudo npm install -g typescript | |
| # Verify | |
| node --version # should be v22.x or higher | |
| tsc --version | |
| ``` | |
| # 1.4 Install SQLite (for memory & logs) | |
| ```bash | |
| sudo apt install -y sqlite3 libsqlite3-dev | |
| ``` | |
| # Part 2: Microclaw Fallback Agent | |
| The Microclaw agent is a Python‑based service (Flask + Transformers) that communicates with OpenClaw. You can install it using either a Python virtual environment (lightweight) or Conda (more reliable for PyTorch). Choose one method below. | |
| # 2.1 Clone the Microclaw Repository | |
| Create a parent directory for all agents: | |
| ```bash | |
| sudo mkdir -p /opt/openclaw-agents | |
| sudo chown -R $USER:$USER /opt/openclaw-agents | |
| cd /opt/openclaw-agents | |
| # Clone the Hugging Face repo (includes model files and soul configuration) | |
| git lfs install | |
| git clone https://huggingface.co/webxos/microclaw-for-openclaw-version-2026.2.17 microclaw-fallback | |
| cd microclaw-fallback | |
| ``` | |
| Note: The .gguf model files are several hundred MB. If the download is interrupted, git lfs can resume. After cloning, verify the file sizes: | |
| ```bash | |
| ls -lh *.gguf | |
| ``` | |
| They should be >100 MB, not 28 bytes. If they are still placeholders, run git lfs pull manually. | |
| # 2.2 Option A: Install with Python Virtual Environment (venv) | |
| ```bash | |
| # Create and activate a virtual environment | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| # Upgrade pip and install dependencies | |
| pip install --upgrade pip | |
| pip install -r requirements.txt | |
| ``` | |
| If requirements.txt is missing, install core packages manually | |
| ```bash | |
| pip install flask transformers torch sentence-transformers faiss-cpu --extra-index-url https://download.pytorch.org/whl/cpu | |
| ``` | |
| # 2.3 Option B: Install with Conda (Recommended for unstable networks) | |
| ```bash | |
| # Download and install Miniconda (if not already present) | |
| wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | |
| bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3 | |
| source ~/miniconda3/bin/activate | |
| # Create a dedicated environment with Python 3.11 | |
| conda create -y -n microclaw python=3.11 | |
| conda activate microclaw | |
| # Install CPU‑only PyTorch from conda-forge (smaller, more reliable) | |
| conda install -y pytorch torchvision torchaudio cpuonly -c pytorch | |
| # Install the rest via pip | |
| pip install flask transformers sentence-transformers faiss-cpu | |
| ``` | |
| # 2.4 Test the Agent Manually | |
| ```bash | |
| # Make sure you are in the agent directory with the environment activated | |
| python main.py | |
| ``` | |
| You should see output like * Running on http://127.0.0.1:18789. Press Ctrl+C to stop it. | |
| # ⚙️ Part 3: Configure OpenClaw to Use the Microclaw Fallback | |
| OpenClaw reads its configuration from a TOML file (typically ~/.config/openclaw/config.toml or /etc/openclaw/config.toml). You need to point it to your local Microclaw instance. | |
| Find the port Microclaw listens on (default is 18789, defined in main.py): | |
| ```bash | |
| grep port main.py | |
| ``` | |
| Edit the OpenClaw configuration (create it if it doesn't exist): | |
| ```bash | |
| mkdir -p ~/.config/openclaw | |
| nano ~/.config/openclaw/config.toml | |
| Add or modify the [agent.fallback] section: | |
| toml | |
| [agent.fallback] | |
| path = "/opt/openclaw-agents/microclaw-fallback" | |
| port = 18789 | |
| enabled = true | |
| ``` | |
| If OpenClaw is already installed, restart it. (If you haven't installed OpenClaw yet, see Part 4 below.) | |
| # 🐳 Part 4: Install & Run OpenClaw (the main framework) | |
| The OpenClaw core is a Node.js/TypeScript application. You can run it directly from source or use the provided Docker image. | |
| # 4.1 Run OpenClaw via Docker (easiest) | |
| ```bash | |
| Pull the official OpenClaw image (adjust tag as needed) | |
| docker pull openclaw/openclaw:latest | |
| Run the container, mounting the config and agents directories | |
| docker run -d \ | |
| --name openclaw \ | |
| -p 3000:3000 \ | |
| -v ~/.config/openclaw:/home/node/.config/openclaw \ | |
| -v /opt/openclaw-agents:/opt/openclaw-agents \ | |
| openclaw/openclaw:latest | |
| ``` | |
| # 4.2 Run OpenClaw from Source (for development) | |
| ```bash | |
| Clone the OpenClaw repository | |
| git clone https://github.com/openclaw/core.git openclaw-core | |
| cd openclaw-core | |
| Install dependencies | |
| yarn install | |
| Build TypeScript | |
| yarn build | |
| Start OpenClaw (it will read the config from ~/.config/openclaw/config.toml) | |
| yarn start | |
| ``` | |
| # 🧪 Part 5: Verify the Integration | |
| Check that Microclaw is running (either manually or via systemd): | |
| ```bash | |
| curl http://localhost:18789/health | |
| ``` | |
| # 🔁 Guide to Microclaw Auto-Start (systemd) | |
| To ensure the fallback agent starts on boot and restarts if it crashes, create a systemd service. | |
| Create the service file: | |
| ```bash | |
| sudo nano /etc/systemd/system/microclaw-fallback.service | |
| Paste (adjust User and paths to match your setup): | |
| ini | |
| [Unit] | |
| Description=Microclaw Fallback Agent for OpenClaw | |
| After=network.target | |
| [Service] | |
| Type=simple | |
| User=kali | |
| WorkingDirectory=/opt/openclaw-agents/microclaw-fallback | |
| Environment="PATH=/opt/openclaw-agents/microclaw-fallback/venv/bin" | |
| ExecStart=/opt/openclaw-agents/microclaw-fallback/venv/bin/python /opt/openclaw-agents/microclaw-fallback/main.py | |
| Restart=on-failure | |
| RestartSec=10 | |
| [Install] | |
| WantedBy=multi-user.target | |
| ``` | |
| Enable and start: | |
| ```bash | |
| sudo systemctl daemon-reload | |
| sudo systemctl enable microclaw-fallback.service | |
| sudo systemctl start microclaw-fallback.service | |
| ``` | |
| Check status: | |
| ```bash | |
| sudo systemctl status microclaw-fallback.service | |
| ``` | |
| ### ALTERNATIVE GUIDE - Installing via Llama.cpp instead: | |
| # 📦 Prerequisites: Essential System Tools | |
| You need a few standard command-line tools. Open a terminal and run: | |
| bash | |
| # Update your package list and install curl, wget, git, and build tools | |
| ```bash | |
| sudo apt update && sudo apt upgrade -y | |
| sudo apt install -y curl wget git build-essential | |
| ``` | |
| ## 📥 Step 1: Download the Model with Git LFS | |
| The model files are hosted in a Git repository and require Git Large File Storage (LFS) to download the actual GGUF files. | |
| # 1.1: Install Git LFS | |
| ```bash | |
| sudo apt install -y git-lfs | |
| git lfs install | |
| ``` | |
| # 1.2: Create a directory for your models and clone the repository | |
| ```bash | |
| mkdir -p ~/models | |
| cd ~/models | |
| git clone https://huggingface.co/webxos/microclaw-for-openclaw-version-2026.2.17 microclaw-fallback | |
| cd microclaw-fallback | |
| ``` | |
| # 1.3: Ensure the GGUF files are fully downloaded | |
| ```bash | |
| git lfs pull | |
| Verification: After cloning, check that the .gguf files are present and are a reasonable size (several hundred MB, not 28 bytes). Run: | |
| bash | |
| ls -lh *.gguf | |
| ``` | |
| If the files are small placeholders, run git lfs pull again. | |
| ## ⚙️ Step 2: Set Up the llama.cpp Server | |
| Now, download, compile, and set up llama.cpp with its built-in server. | |
| bash | |
| # 2.1: Clone the llama.cpp repository | |
| ```bash | |
| cd ~/models | |
| git clone https://github.com/ggerganov/llama.cpp | |
| cd llama.cpp | |
| ``` | |
| # 2.2: Compile llama.cpp (this may take a few minutes) | |
| ```bash | |
| make -j4 | |
| ``` | |
| ## 3. (Optional but recommended) Install the Python dependencies for the server | |
| This step requires Python/pip, but it's a one-time, isolated setup. | |
| ```bash | |
| sudo apt install -y python3-pip python3-venv | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| # 🚀 Step 3.1: Run the Model Server | |
| Now, start the server, pointing it to the GGUF model file you downloaded. Make sure you are in the llama.cpp directory with the virtual env activated | |
| ```bash | |
| cd ~/models/llama.cpp | |
| source venv/bin/activate | |
| ``` | |
| Find the exact GGUF filename (replace with the actual filename you have) | |
| MODEL_FILE=~/models/microclaw-fallback/microclaw-for-openclaw-version-2026.2.17.Q4_K_M.gguf | |
| Run the server | |
| ``` | |
| ./server -m $MODEL_FILE \ | |
| --host 0.0.0.0 \ | |
| --port 8000 \ | |
| -c 2048 \ | |
| -ngl 0 # Use -ngl 33 if you have an NVIDIA GPU and compiled with CUDA support | |
| ``` | |
| Explanation of flags: | |
| -m $MODEL_FILE : Path to your GGUF model. | |
| --host 0.0.0.0 : Listen on all network interfaces (so OpenClaw can connect). | |
| --port 8000 : The port the server will use. | |
| -c 2048 : Context size (adjust based on model requirements). | |
| -ngl 0 : Number of layers to offload to GPU. Use -ngl 33 (or more) if you have an NVIDIA GPU and compiled with CUDA. | |
| Keep this terminal window open. The server is now running and ready to accept requests. | |
| # ✅ Step 4: Test the Server | |
| Open a new terminal and test the API to ensure it's working correctly. | |
| ```bash | |
| curl http://localhost:8000/v1/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "prompt": "What is the capital of France?", | |
| "max_tokens": 50, | |
| "temperature": 0.7 | |
| }' | |
| ``` | |
| You should receive a JSON response containing the model's generated text. | |
| # 🔌 Step 5: Configure OpenClaw to Use the Local Server | |
| Now, configure OpenClaw to use this local server as its fallback agent. | |
| Locate OpenClaw's configuration file. This is often ~/.config/openclaw/config.toml, /etc/openclaw/config.toml, or a .env file in the OpenClaw directory. | |
| Edit the configuration to define a custom provider that points to your local server. The exact variable names depend on your OpenClaw version, but it generally looks something like this: | |
| ```toml | |
| [agent.fallback] | |
| provider = "custom" # or "openai-compatible" | |
| base_url = "http://localhost:8000/v1" | |
| api_key = "not-needed" # llama.cpp server doesn't require a key | |
| model = "microclaw" # Optional: model name | |
| enabled = true | |
| ``` | |
| If OpenClaw uses environment variables (e.g., in a .env file), you might set: | |
| ```text | |
| OPENCLAW_FALLBACK_PROVIDER=custom | |
| OPENCLAW_CUSTOM_BASE_URL=http://localhost:8000/v1 | |
| OPENCLAW_CUSTOM_API_KEY=not-needed | |
| ``` | |
| Restart OpenClaw for the changes to take effect. | |
| # 🔁 How to Run the Server as a Background Service: | |
| To have the server start automatically on boot and restart if it crashes, you can create a systemd service. | |
| Create the service file: | |
| ```bash | |
| sudo nano /etc/systemd/system/microclaw-llama.service | |
| Paste the following (adjust User, WorkingDirectory, and ExecStart paths as needed): | |
| ini | |
| [Unit] | |
| Description=llama.cpp server for Microclaw | |
| After=network.target | |
| [Service] | |
| Type=simple | |
| User=kali | |
| WorkingDirectory=/home/kali/models/llama.cpp | |
| ExecStart=/home/kali/models/llama.cpp/server -m /home/kali/models/microclaw-fallback/microclaw-for-openclaw-version-2026.2.17.Q4_K_M.gguf --host 0.0.0.0 --port 8000 -c 2048 -ngl 0 | |
| Restart=on-failure | |
| RestartSec=10 | |
| [Install] | |
| WantedBy=multi-user.target | |
| ``` | |
| Then enable and start the service: | |
| ```bash | |
| sudo systemctl daemon-reload | |
| sudo systemctl enable microclaw-llama.service | |
| sudo systemctl start microclaw-llama.service | |
| sudo systemctl status microclaw-llama.service # Check if it's running | |
| ``` | |
| ### ADVANCED GUIDE: TRAINING MICROCLAW.GGUF MODEL LOCALLY | |
| This guide adapts the full microclaw pipeline to run entirely on a low‑end machine like an 8GB RAM laptop or even a Raspberry Pi 5. We'll use a tiny base model (0.5B–1B parameters), parameter‑efficient fine‑tuning (LoRA) on CPU, and extreme quantization (2‑bit) to produce a GGUF file that runs smoothly on consumer hardware. | |
| The final system provides: | |
| - A **local training script** that fits in 8GB RAM (CPU only). | |
| - A **FastAPI server** (`server.py`) serving a retro MS‑DOS‑style CLI dashboard on `localhost:8080`. | |
| - **Local API endpoints** for inference, file management, cron jobs, and RAG. | |
| - **SQLite** as a local database (conversation history, cache, RAG index). | |
| - Integration with **llama.cpp** for efficient GGUF inference. | |
| --- | |
| ## Prerequisites | |
| - **Hardware**: x86_64 or ARM64 (Raspberry Pi 5) with **at least 8GB RAM**. | |
| - **OS**: Debian 12 / Kali Linux / Raspberry Pi OS (64‑bit). | |
| - **Storage**: 10GB free space. | |
| - **Software**: Python 3.10+, Git, CMake, build tools. | |
| --- | |
| ## Step 1: Environment Setup | |
| ```bash | |
| cd /home/kali/microclaw | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| pip install --upgrade pip | |
| pip install -r requirements.txt | |
| ``` | |
| **`requirements.txt`** (CPU‑optimized, no CUDA dependencies): | |
| ``` | |
| torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu | |
| transformers>=4.38.0 | |
| accelerate | |
| datasets | |
| trl>=0.8.0 | |
| peft | |
| bitsandbytes | |
| scipy | |
| sentencepiece | |
| protobuf | |
| fastapi | |
| uvicorn | |
| sqlite-utils | |
| pydantic | |
| pyyaml | |
| jinja2 | |
| aiofiles | |
| llama-cpp-python | |
| ``` | |
| --- | |
| ## Step 2: Build llama.cpp (for conversion & inference) | |
| llama.cpp provides the tools to convert Hugging Face models to GGUF and run them efficiently on CPU. | |
| ```bash | |
| cd /home/kali | |
| git clone https://github.com/ggerganov/llama.cpp | |
| cd llama.cpp | |
| mkdir build && cd build | |
| cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # optional: enables BLAS for speed | |
| make -j$(nproc) | |
| ``` | |
| After compilation, the `convert-hf-to-gguf.py` script will be in `llama.cpp/` (not in build). We'll use it later. | |
| --- | |
| ## Step 3: Prepare the Dataset | |
| You need a small dataset (a few hundred to a few thousand examples) for fine‑tuning and DPO. Place JSONL files in `data/raw/`. | |
| ### 3.1 Tool‑use data (schema‑first) | |
| Each line: | |
| ```json | |
| { | |
| "instruction": "List files in /home", | |
| "tools": ["ls"], | |
| "response": "ls /home" | |
| } | |
| ``` | |
| ### 3.2 Preference data (for DPO) | |
| Each line: | |
| ```json | |
| { | |
| "prompt": "What is the weather?", | |
| "chosen": "I cannot check live weather, but you can use the 'weather' tool.", | |
| "rejected": "I don't know." | |
| } | |
| ``` | |
| If you don't have preference data, you can skip DPO by setting `dpo: false` in config. | |
| ### 3.3 RAG documents (optional) | |
| Place plain text files (`.txt`) in `data/rag_docs/`. The training script will chunk them and store embeddings in SQLite. | |
| --- | |
| ## Step 4: Configuration (`config.yaml`) | |
| Edit this file to match your paths and training preferences. | |
| ```yaml | |
| # config.yaml | |
| model: | |
| base_model_name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0" # or "Qwen/Qwen2.5-0.5B" | |
| cache_dir: "models/base" | |
| training: | |
| output_dir: "models/lora" | |
| per_device_train_batch_size: 1 | |
| gradient_accumulation_steps: 4 | |
| learning_rate: 2e-4 | |
| num_train_epochs: 3 | |
| max_seq_length: 512 | |
| use_lora: true | |
| lora_r: 8 | |
| lora_alpha: 16 | |
| lora_dropout: 0.05 | |
| dpo: true | |
| dpo_beta: 0.1 | |
| # CPU optimizations | |
| dataloader_num_workers: 0 | |
| save_steps: 100 | |
| logging_steps: 10 | |
| data: | |
| train_file: "data/raw/train.jsonl" | |
| eval_file: "data/raw/eval.jsonl" # optional | |
| preference_file: "data/raw/preferences.jsonl" # for DPO | |
| rag: | |
| enabled: true | |
| chunk_size: 500 | |
| chunk_overlap: 50 | |
| embedding_model: "all-MiniLM-L6-v2" # tiny, runs on CPU | |
| db_path: "db/microclaw.db" | |
| server: | |
| host: "0.0.0.0" | |
| port: 8080 | |
| model_path: "models/microclaw.gguf" | |
| context_size: 2048 | |
| max_tokens: 512 | |
| temperature: 0.7 | |
| ``` | |
| --- | |
| ## Step 5: Training Script (`train.py`) | |
| This script performs supervised fine‑tuning (SFT) on instruction data, optionally followed by DPO, and finally merges the LoRA weights and saves the full model. It is heavily optimized for low RAM (CPU) usage. | |
| ### Ultra‑Lightweight Local Training & Deployment Guide | |
| Optimized for CPU‑only systems (8GB RAM, no GPU) – Raspberry Pi ready | |
| This guide adapts the full microclaw pipeline to run entirely on a low‑end machine like an 8GB RAM laptop or even a Raspberry Pi 5. We'll use a tiny base model (0.5B–1B parameters), parameter‑efficient fine‑tuning (LoRA) on CPU, and extreme quantization (2‑bit) to produce a GGUF file that runs smoothly on consumer hardware. | |
| The final system provides: | |
| - A **local training script** that fits in 8GB RAM (CPU only). | |
| - A **FastAPI server** (`server.py`) serving a retro MS‑DOS‑style CLI dashboard on `localhost:8080`. | |
| - **Local API endpoints** for inference, file management, cron jobs, and RAG. | |
| - **SQLite** as a local database (conversation history, cache, RAG index). | |
| - Integration with **llama.cpp** for efficient GGUF inference. | |
| --- | |
| # Folder Structure (to be created) | |
| ``` | |
| /home/kali/microclaw/ | |
| ├── server.py # FastAPI server (inference + static files + API) | |
| ├── train.py # CPU‑optimized fine‑tuning + DPO script | |
| ├── requirements.txt | |
| ├── config.yaml | |
| ├── data/ | |
| │ ├── raw/ # Place your JSONL datasets here | |
| │ └── rag_docs/ # Text files for RAG (optional) | |
| ├── models/ | |
| │ ├── base/ # Will contain the downloaded base model | |
| │ ├── lora/ # LoRA adapters after training | |
| │ └── microclaw.gguf # Final quantized model (after conversion) | |
| ├── static/ | |
| │ ├── index.html # Main dashboard (CLI style) | |
| │ ├── style.css | |
| │ ├── script.js | |
| │ └── pages/ # Additional pages (file manager, cron, etc.) | |
| │ ├── files.html | |
| │ ├── cron.html | |
| │ └── rag.html | |
| ├── db/ | |
| │ └── microclaw.db # SQLite database (auto‑created) | |
| └── logs/ | |
| └── training.log | |
| ``` | |
| --- | |
| ## Prerequisites | |
| - **Hardware**: x86_64 or ARM64 (Raspberry Pi 5) with **at least 8GB RAM**. | |
| - **OS**: Debian 12 / Kali Linux / Raspberry Pi OS (64‑bit). | |
| - **Storage**: 10GB free space. | |
| - **Software**: Python 3.10+, Git, CMake, build tools. | |
| --- | |
| ## Step 1: Environment Setup | |
| ```bash | |
| cd /home/kali/microclaw | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| pip install --upgrade pip | |
| pip install -r requirements.txt | |
| ``` | |
| **`requirements.txt`** (CPU‑optimized, no CUDA dependencies): | |
| ``` | |
| torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu | |
| transformers>=4.38.0 | |
| accelerate | |
| datasets | |
| trl>=0.8.0 | |
| peft | |
| bitsandbytes | |
| scipy | |
| sentencepiece | |
| protobuf | |
| fastapi | |
| uvicorn | |
| sqlite-utils | |
| pydantic | |
| pyyaml | |
| jinja2 | |
| aiofiles | |
| llama-cpp-python | |
| ``` | |
| --- | |
| ## Step 2: Build llama.cpp (for conversion & inference) | |
| llama.cpp provides the tools to convert Hugging Face models to GGUF and run them efficiently on CPU. | |
| ```bash | |
| cd /home/kali | |
| git clone https://github.com/ggerganov/llama.cpp | |
| cd llama.cpp | |
| mkdir build && cd build | |
| cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # optional: enables BLAS for speed | |
| make -j$(nproc) | |
| ``` | |
| After compilation, the `convert-hf-to-gguf.py` script will be in `llama.cpp/` (not in build). We'll use it later. | |
| --- | |
| ## Step 3: Prepare the Dataset | |
| You need a small dataset (a few hundred to a few thousand examples) for fine‑tuning and DPO. Place JSONL files in `data/raw/`. | |
| ### 3.1 Tool‑use data (schema‑first) | |
| Each line: | |
| ```json | |
| { | |
| "instruction": "List files in /home", | |
| "tools": ["ls"], | |
| "response": "ls /home" | |
| } | |
| ``` | |
| ### 3.2 Preference data (for DPO) | |
| Each line: | |
| ```json] | |
| { | |
| "prompt": "What is the weather?", | |
| "chosen": "I cannot check live weather, but you can use the 'weather' tool.", | |
| "rejected": "I don't know." | |
| } | |
| ``` | |
| If you don't have preference data, you can skip DPO by setting `dpo: false` in config. | |
| ### 3.3 RAG documents (optional) | |
| Place plain text files (`.txt`) in `data/rag_docs/`. The training script will chunk them and store embeddings in SQLite. | |
| --- | |
| ## Step 4: Configuration (`config.yaml`) | |
| Edit this file to match your paths and training preferences. | |
| ```yaml | |
| # config.yaml | |
| model: | |
| base_model_name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0" # or "Qwen/Qwen2.5-0.5B" | |
| cache_dir: "models/base" | |
| training: | |
| output_dir: "models/lora" | |
| per_device_train_batch_size: 1 | |
| gradient_accumulation_steps: 4 | |
| learning_rate: 2e-4 | |
| num_train_epochs: 3 | |
| max_seq_length: 512 | |
| use_lora: true | |
| lora_r: 8 | |
| lora_alpha: 16 | |
| lora_dropout: 0.05 | |
| dpo: true | |
| dpo_beta: 0.1 | |
| # CPU optimizations | |
| dataloader_num_workers: 0 | |
| save_steps: 100 | |
| logging_steps: 10 | |
| data: | |
| train_file: "data/raw/train.jsonl" | |
| eval_file: "data/raw/eval.jsonl" # optional | |
| preference_file: "data/raw/preferences.jsonl" # for DPO | |
| rag: | |
| enabled: true | |
| chunk_size: 500 | |
| chunk_overlap: 50 | |
| embedding_model: "all-MiniLM-L6-v2" # tiny, runs on CPU | |
| db_path: "db/microclaw.db" | |
| server: | |
| host: "0.0.0.0" | |
| port: 8080 | |
| model_path: "models/microclaw.gguf" | |
| context_size: 2048 | |
| max_tokens: 512 | |
| temperature: 0.7 | |
| ``` | |
| --- | |
| ## Step 5: Training Script (`train.py`) | |
| This script performs supervised fine‑tuning (SFT) on instruction data, optionally followed by DPO, and finally merges the LoRA weights and saves the full model. It is heavily optimized for low RAM (CPU) usage. | |
| ```python | |
| #!/usr/bin/env python3 | |
| # train.py – CPU‑only fine‑tuning with LoRA + optional DPO | |
| import os | |
| import yaml | |
| import torch | |
| from transformers import ( | |
| AutoTokenizer, | |
| AutoModelForCausalLM, | |
| TrainingArguments, | |
| Trainer, | |
| BitsAndBytesConfig | |
| ) | |
| from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType | |
| from datasets import load_dataset | |
| from trl import DPOTrainer | |
| import logging | |
| # Load config | |
| with open("config.yaml") as f: | |
| config = yaml.safe_load(f) | |
| # Setup logging | |
| logging.basicConfig(level=logging.INFO, filename="logs/training.log", filemode="w") | |
| logger = logging.getLogger(__name__) | |
| def main(): | |
| # 1. Load tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained(config["model"]["base_model_name"], cache_dir=config["model"]["cache_dir"]) | |
| tokenizer.pad_token = tokenizer.eos_token | |
| # 2. Load base model in 8-bit (CPU offload not supported for bitsandbytes on CPU; we use standard dtype) | |
| # For CPU, we load in float32 and rely on LoRA to reduce memory. | |
| model = AutoModelForCausalLM.from_pretrained( | |
| config["model"]["base_model_name"], | |
| cache_dir=config["model"]["cache_dir"], | |
| torch_dtype=torch.float32, # CPU uses float32 | |
| low_cpu_mem_usage=True | |
| ) | |
| # 3. Prepare LoRA | |
| if config["training"]["use_lora"]: | |
| lora_config = LoraConfig( | |
| task_type=TaskType.CAUSAL_LM, | |
| r=config["training"]["lora_r"], | |
| lora_alpha=config["training"]["lora_alpha"], | |
| lora_dropout=config["training"]["lora_dropout"], | |
| target_modules=["q_proj", "v_proj"] # adjust for your model | |
| ) | |
| model = get_peft_model(model, lora_config) | |
| model.print_trainable_parameters() | |
| # 4. Load dataset | |
| dataset = load_dataset("json", data_files=config["data"]["train_file"], split="train") | |
| if config["data"].get("eval_file"): | |
| eval_dataset = load_dataset("json", data_files=config["data"]["eval_file"], split="train") | |
| else: | |
| eval_dataset = None | |
| # Format prompt: "### Instruction:\n{instruction}\n\n### Response:\n{response}" | |
| def format_func(example): | |
| text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}{tokenizer.eos_token}" | |
| return {"text": text} | |
| dataset = dataset.map(format_func) | |
| if eval_dataset: | |
| eval_dataset = eval_dataset.map(format_func) | |
| # Tokenize | |
| def tokenize(element): | |
| return tokenizer(element["text"], truncation=True, max_length=config["training"]["max_seq_length"], padding=False) | |
| dataset = dataset.map(tokenize, remove_columns=dataset.column_names) | |
| if eval_dataset: | |
| eval_dataset = eval_dataset.map(tokenize, remove_columns=eval_dataset.column_names) | |
| # 5. Training arguments (CPU‑friendly) | |
| training_args = TrainingArguments( | |
| output_dir=config["training"]["output_dir"], | |
| per_device_train_batch_size=config["training"]["per_device_train_batch_size"], | |
| gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"], | |
| learning_rate=config["training"]["learning_rate"], | |
| num_train_epochs=config["training"]["num_train_epochs"], | |
| logging_steps=config["training"]["logging_steps"], | |
| save_steps=config["training"]["save_steps"], | |
| evaluation_strategy="steps" if eval_dataset else "no", | |
| eval_steps=config["training"]["save_steps"], | |
| save_total_limit=2, | |
| load_best_model_at_end=True if eval_dataset else False, | |
| metric_for_best_model="eval_loss", | |
| greater_is_better=False, | |
| fp16=False, # CPU doesn't support fp16 | |
| bf16=False, | |
| dataloader_num_workers=0, # avoid multiprocessing issues | |
| optim="adamw_torch", | |
| torch_compile=False, # no speedup on CPU | |
| ) | |
| # 6. Trainer (SFT) | |
| trainer = Trainer( | |
| model=model, | |
| args=training_args, | |
| train_dataset=dataset, | |
| eval_dataset=eval_dataset, | |
| tokenizer=tokenizer, | |
| ) | |
| logger.info("Starting SFT training...") | |
| trainer.train() | |
| trainer.save_model() # saves LoRA adapters | |
| # 7. Optional DPO training | |
| if config["training"]["dpo"] and config["data"].get("preference_file"): | |
| logger.info("Loading preference data for DPO...") | |
| pref_dataset = load_dataset("json", data_files=config["data"]["preference_file"], split="train") | |
| # For DPO we need base model without LoRA (or merged) | |
| # We'll reload base model and then apply LoRA weights | |
| # (Simplified: use the same model with LoRA attached; DPO trainer handles it) | |
| dpo_trainer = DPOTrainer( | |
| model=model, | |
| ref_model=None, # uses model as reference (or you can provide a frozen copy) | |
| args=training_args, # reuse same args (adjust for DPO) | |
| train_dataset=pref_dataset, | |
| tokenizer=tokenizer, | |
| beta=config["training"]["dpo_beta"], | |
| max_length=config["training"]["max_seq_length"], | |
| max_prompt_length=256, | |
| ) | |
| logger.info("Starting DPO training...") | |
| dpo_trainer.train() | |
| dpo_trainer.save_model(config["training"]["output_dir"] + "_dpo") | |
| # 8. Merge LoRA and save full model (for conversion) | |
| logger.info("Merging LoRA weights...") | |
| merged_model = model.merge_and_unload() | |
| merged_model.save_pretrained("models/merged") | |
| tokenizer.save_pretrained("models/merged") | |
| logger.info("Merged model saved to models/merged") | |
| if __name__ == "__main__": | |
| main() | |
| ``` | |
| **Run training**: | |
| ```bash | |
| python train.py | |
| ``` | |
| *Note: Training a 1B model on CPU with batch size 1 may take several hours to days depending on dataset size. Reduce epochs or dataset size for testing.* | |
| --- | |
| ## Step 6: Convert to GGUF | |
| After training, we have a merged Hugging Face model in `models/merged/`. Now use llama.cpp's conversion script. | |
| ```bash | |
| cd /home/kali/llama.cpp | |
| python convert-hf-to-gguf.py /home/kali/microclaw/models/merged \ | |
| --outfile /home/kali/microclaw/models/microclaw.gguf \ | |
| --outtype q2_k # 2‑bit quantization (extremely small) | |
| ``` | |
| For Raspberry Pi, `q2_k` is ideal. You can also try `q3_k_s` if you have more RAM. | |
| --- | |
| ## Step 7: Build the FastAPI Server (`server.py`) | |
| This server serves: | |
| - Static files (the CLI dashboard) from the `static/` folder. | |
| - API endpoints for inference, file management, cron, and RAG. | |
| - SQLite database for conversation history and RAG cache. | |
| ```python | |
| #!/usr/bin/env python3 | |
| # server.py – FastAPI server with GGUF inference and static dashboard | |
| import os | |
| import yaml | |
| import sqlite3 | |
| import json | |
| from pathlib import Path | |
| from fastapi import FastAPI, Request, HTTPException | |
| from fastapi.responses import HTMLResponse, JSONResponse | |
| from fastapi.staticfiles import StaticFiles | |
| from pydantic import BaseModel | |
| from typing import Optional, List | |
| import uvicorn | |
| from llama_cpp import Llama | |
| # Load config | |
| with open("config.yaml") as f: | |
| config = yaml.safe_load(f) | |
| # Initialize SQLite | |
| DB_PATH = config["rag"]["db_path"] | |
| conn = sqlite3.connect(DB_PATH, check_same_thread=False) | |
| cursor = conn.cursor() | |
| cursor.execute(""" | |
| CREATE TABLE IF NOT EXISTS history ( | |
| id INTEGER PRIMARY KEY AUTOINCREMENT, | |
| prompt TEXT, | |
| response TEXT, | |
| timestamp DATETIME DEFAULT CURRENT_TIMESTAMP | |
| ) | |
| """) | |
| cursor.execute(""" | |
| CREATE TABLE IF NOT EXISTS rag_cache ( | |
| id INTEGER PRIMARY KEY AUTOINCREMENT, | |
| query TEXT UNIQUE, | |
| chunks TEXT, | |
| embedding BLOB | |
| ) | |
| """) | |
| conn.commit() | |
| # Load GGUF model | |
| model_path = config["server"]["model_path"] | |
| llm = Llama( | |
| model_path=model_path, | |
| n_ctx=config["server"]["context_size"], | |
| n_threads=os.cpu_count(), | |
| n_gpu_layers=0, # CPU only | |
| verbose=False, | |
| ) | |
| app = FastAPI(title="microclaw Gateway") | |
| # Mount static files | |
| app.mount("/static", StaticFiles(directory="static"), name="static") | |
| # API Models | |
| class PromptRequest(BaseModel): | |
| prompt: str | |
| max_tokens: Optional[int] = 256 | |
| temperature: Optional[float] = 0.7 | |
| use_rag: Optional[bool] = False | |
| class ToolRequest(BaseModel): | |
| tool: str | |
| args: dict | |
| # Simple RAG (placeholder – you can enhance with embeddings) | |
| def retrieve_chunks(query: str) -> str: | |
| # For demo, just return static text; real implementation would use embeddings | |
| return "Relevant document chunk about file management." | |
| @app.get("/", response_class=HTMLResponse) | |
| async def root(): | |
| with open("static/index.html") as f: | |
| return f.read() | |
| @app.post("/api/chat") | |
| async def chat(req: PromptRequest): | |
| # Optionally enhance prompt with RAG | |
| if req.use_rag: | |
| context = retrieve_chunks(req.prompt) | |
| augmented_prompt = f"Context: {context}\n\nQuestion: {req.prompt}\nAnswer:" | |
| else: | |
| augmented_prompt = req.prompt | |
| # Call model | |
| output = llm( | |
| augmented_prompt, | |
| max_tokens=req.max_tokens, | |
| temperature=req.temperature, | |
| stop=["</s>", "###"], | |
| echo=False | |
| ) | |
| response = output["choices"][0]["text"].strip() | |
| # Save to history | |
| cursor.execute("INSERT INTO history (prompt, response) VALUES (?, ?)", (req.prompt, response)) | |
| conn.commit() | |
| return {"response": response} | |
| @app.get("/api/history") | |
| async def get_history(limit: int = 50): | |
| cursor.execute("SELECT prompt, response, timestamp FROM history ORDER BY timestamp DESC LIMIT ?", (limit,)) | |
| rows = cursor.fetchall() | |
| return [{"prompt": r[0], "response": r[1], "timestamp": r[2]} for r in rows] | |
| @app.post("/api/tool") | |
| async def run_tool(req: ToolRequest): | |
| # Example: execute system commands (sandboxed) | |
| if req.tool == "ls": | |
| path = req.args.get("path", ".") | |
| try: | |
| files = os.listdir(path) | |
| return {"output": "\n".join(files)} | |
| except Exception as e: | |
| return {"error": str(e)} | |
| elif req.tool == "cron_list": | |
| # Parse crontab (requires user permissions) | |
| # For demo, return placeholder | |
| return {"output": "0 5 * * * /home/kali/backup.sh"} | |
| else: | |
| return {"error": "Unknown tool"} | |
| if __name__ == "__main__": | |
| uvicorn.run(app, host=config["server"]["host"], port=config["server"]["port"]) | |
| ``` | |
| --- | |
| ## Step 8: Run the Server | |
| ```bash | |
| cd /home/kali/microclaw | |
| source venv/bin/activate | |
| python server.py | |
| ``` | |
| Open your browser to `http://localhost:8080` and start interacting. | |
| --- | |
| # Step 9: TRAIN.PY: use this for the train.py file: | |
| ```python | |
| #!/usr/bin/env python3 | |
| # train.py – CPU‑only fine‑tuning with LoRA + optional DPO | |
| import os | |
| import yaml | |
| import torch | |
| from transformers import ( | |
| AutoTokenizer, | |
| AutoModelForCausalLM, | |
| TrainingArguments, | |
| Trainer, | |
| BitsAndBytesConfig | |
| ) | |
| from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType | |
| from datasets import load_dataset | |
| from trl import DPOTrainer | |
| import logging | |
| # Load config | |
| with open("config.yaml") as f: | |
| config = yaml.safe_load(f) | |
| # Setup logging | |
| logging.basicConfig(level=logging.INFO, filename="logs/training.log", filemode="w") | |
| logger = logging.getLogger(__name__) | |
| def main(): | |
| # 1. Load tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained(config["model"]["base_model_name"], cache_dir=config["model"]["cache_dir"]) | |
| tokenizer.pad_token = tokenizer.eos_token | |
| # 2. Load base model in 8-bit (CPU offload not supported for bitsandbytes on CPU; we use standard dtype) | |
| # For CPU, we load in float32 and rely on LoRA to reduce memory. | |
| model = AutoModelForCausalLM.from_pretrained( | |
| config["model"]["base_model_name"], | |
| cache_dir=config["model"]["cache_dir"], | |
| torch_dtype=torch.float32, # CPU uses float32 | |
| low_cpu_mem_usage=True | |
| ) | |
| # 3. Prepare LoRA | |
| if config["training"]["use_lora"]: | |
| lora_config = LoraConfig( | |
| task_type=TaskType.CAUSAL_LM, | |
| r=config["training"]["lora_r"], | |
| lora_alpha=config["training"]["lora_alpha"], | |
| lora_dropout=config["training"]["lora_dropout"], | |
| target_modules=["q_proj", "v_proj"] # adjust for your model | |
| ) | |
| model = get_peft_model(model, lora_config) | |
| model.print_trainable_parameters() | |
| # 4. Load dataset | |
| dataset = load_dataset("json", data_files=config["data"]["train_file"], split="train") | |
| if config["data"].get("eval_file"): | |
| eval_dataset = load_dataset("json", data_files=config["data"]["eval_file"], split="train") | |
| else: | |
| eval_dataset = None | |
| # Format prompt: "### Instruction:\n{instruction}\n\n### Response:\n{response}" | |
| def format_func(example): | |
| text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}{tokenizer.eos_token}" | |
| return {"text": text} | |
| dataset = dataset.map(format_func) | |
| if eval_dataset: | |
| eval_dataset = eval_dataset.map(format_func) | |
| # Tokenize | |
| def tokenize(element): | |
| return tokenizer(element["text"], truncation=True, max_length=config["training"]["max_seq_length"], padding=False) | |
| dataset = dataset.map(tokenize, remove_columns=dataset.column_names) | |
| if eval_dataset: | |
| eval_dataset = eval_dataset.map(tokenize, remove_columns=eval_dataset.column_names) | |
| # 5. Training arguments (CPU‑friendly) | |
| training_args = TrainingArguments( | |
| output_dir=config["training"]["output_dir"], | |
| per_device_train_batch_size=config["training"]["per_device_train_batch_size"], | |
| gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"], | |
| learning_rate=config["training"]["learning_rate"], | |
| num_train_epochs=config["training"]["num_train_epochs"], | |
| logging_steps=config["training"]["logging_steps"], | |
| save_steps=config["training"]["save_steps"], | |
| evaluation_strategy="steps" if eval_dataset else "no", | |
| eval_steps=config["training"]["save_steps"], | |
| save_total_limit=2, | |
| load_best_model_at_end=True if eval_dataset else False, | |
| metric_for_best_model="eval_loss", | |
| greater_is_better=False, | |
| fp16=False, # CPU doesn't support fp16 | |
| bf16=False, | |
| dataloader_num_workers=0, # avoid multiprocessing issues | |
| optim="adamw_torch", | |
| torch_compile=False, # no speedup on CPU | |
| ) | |
| # 6. Trainer (SFT) | |
| trainer = Trainer( | |
| model=model, | |
| args=training_args, | |
| train_dataset=dataset, | |
| eval_dataset=eval_dataset, | |
| tokenizer=tokenizer, | |
| ) | |
| logger.info("Starting SFT training...") | |
| trainer.train() | |
| trainer.save_model() # saves LoRA adapters | |
| # 7. Optional DPO training | |
| if config["training"]["dpo"] and config["data"].get("preference_file"): | |
| logger.info("Loading preference data for DPO...") | |
| pref_dataset = load_dataset("json", data_files=config["data"]["preference_file"], split="train") | |
| # For DPO we need base model without LoRA (or merged) | |
| # We'll reload base model and then apply LoRA weights | |
| # (Simplified: use the same model with LoRA attached; DPO trainer handles it) | |
| dpo_trainer = DPOTrainer( | |
| model=model, | |
| ref_model=None, # uses model as reference (or you can provide a frozen copy) | |
| args=training_args, # reuse same args (adjust for DPO) | |
| train_dataset=pref_dataset, | |
| tokenizer=tokenizer, | |
| beta=config["training"]["dpo_beta"], | |
| max_length=config["training"]["max_seq_length"], | |
| max_prompt_length=256, | |
| ) | |
| logger.info("Starting DPO training...") | |
| dpo_trainer.train() | |
| dpo_trainer.save_model(config["training"]["output_dir"] + "_dpo") | |
| # 8. Merge LoRA and save full model (for conversion) | |
| logger.info("Merging LoRA weights...") | |
| merged_model = model.merge_and_unload() | |
| merged_model.save_pretrained("models/merged") | |
| tokenizer.save_pretrained("models/merged") | |
| logger.info("Merged model saved to models/merged") | |
| if __name__ == "__main__": | |
| main() | |
| ``` | |
| **Run training**: | |
| ```bash | |
| python train.py | |
| ``` | |
| *Note: Training a 1B model on CPU with batch size 1 may take several hours to days depending on dataset size. Reduce epochs or dataset size for testing.* | |
| --- | |
| ## Step 10: Convert to GGUF | |
| After training, we have a merged Hugging Face model in `models/merged/`. Now use llama.cpp's conversion script. | |
| ```bash | |
| cd /home/kali/llama.cpp | |
| python convert-hf-to-gguf.py /home/kali/microclaw/models/merged \ | |
| --outfile /home/kali/microclaw/models/microclaw.gguf \ | |
| --outtype q2_k # 2‑bit quantization (extremely small) | |
| ``` | |
| For Raspberry Pi, `q2_k` is ideal. You can also try `q3_k_s` if you have more RAM. | |
| --- | |
| ## Step 11: Build the FastAPI Server (`server.py`) | |
| This server serves: | |
| - Static files (the CLI dashboard) from the `static/` folder. | |
| - API endpoints for inference, file management, cron, and RAG. | |
| - SQLite database for conversation history and RAG cache. | |
| ```python | |
| #!/usr/bin/env python3 | |
| # server.py – FastAPI server with GGUF inference and static dashboard | |
| import os | |
| import yaml | |
| import sqlite3 | |
| import json | |
| from pathlib import Path | |
| from fastapi import FastAPI, Request, HTTPException | |
| from fastapi.responses import HTMLResponse, JSONResponse | |
| from fastapi.staticfiles import StaticFiles | |
| from pydantic import BaseModel | |
| from typing import Optional, List | |
| import uvicorn | |
| from llama_cpp import Llama | |
| # Load config | |
| with open("config.yaml") as f: | |
| config = yaml.safe_load(f) | |
| # Initialize SQLite | |
| DB_PATH = config["rag"]["db_path"] | |
| conn = sqlite3.connect(DB_PATH, check_same_thread=False) | |
| cursor = conn.cursor() | |
| cursor.execute(""" | |
| CREATE TABLE IF NOT EXISTS history ( | |
| id INTEGER PRIMARY KEY AUTOINCREMENT, | |
| prompt TEXT, | |
| response TEXT, | |
| timestamp DATETIME DEFAULT CURRENT_TIMESTAMP | |
| ) | |
| """) | |
| cursor.execute(""" | |
| CREATE TABLE IF NOT EXISTS rag_cache ( | |
| id INTEGER PRIMARY KEY AUTOINCREMENT, | |
| query TEXT UNIQUE, | |
| chunks TEXT, | |
| embedding BLOB | |
| ) | |
| """) | |
| conn.commit() | |
| # Load GGUF model | |
| model_path = config["server"]["model_path"] | |
| llm = Llama( | |
| model_path=model_path, | |
| n_ctx=config["server"]["context_size"], | |
| n_threads=os.cpu_count(), | |
| n_gpu_layers=0, # CPU only | |
| verbose=False, | |
| ) | |
| app = FastAPI(title="microclaw Gateway") | |
| # Mount static files | |
| app.mount("/static", StaticFiles(directory="static"), name="static") | |
| # API Models | |
| class PromptRequest(BaseModel): | |
| prompt: str | |
| max_tokens: Optional[int] = 256 | |
| temperature: Optional[float] = 0.7 | |
| use_rag: Optional[bool] = False | |
| class ToolRequest(BaseModel): | |
| tool: str | |
| args: dict | |
| # Simple RAG (placeholder – you can enhance with embeddings) | |
| def retrieve_chunks(query: str) -> str: | |
| # For demo, just return static text; real implementation would use embeddings | |
| return "Relevant document chunk about file management." | |
| @app.get("/", response_class=HTMLResponse) | |
| async def root(): | |
| with open("static/index.html") as f: | |
| return f.read() | |
| @app.post("/api/chat") | |
| async def chat(req: PromptRequest): | |
| # Optionally enhance prompt with RAG | |
| if req.use_rag: | |
| context = retrieve_chunks(req.prompt) | |
| augmented_prompt = f"Context: {context}\n\nQuestion: {req.prompt}\nAnswer:" | |
| else: | |
| augmented_prompt = req.prompt | |
| # Call model | |
| output = llm( | |
| augmented_prompt, | |
| max_tokens=req.max_tokens, | |
| temperature=req.temperature, | |
| stop=["</s>", "###"], | |
| echo=False | |
| ) | |
| response = output["choices"][0]["text"].strip() | |
| # Save to history | |
| cursor.execute("INSERT INTO history (prompt, response) VALUES (?, ?)", (req.prompt, response)) | |
| conn.commit() | |
| return {"response": response} | |
| @app.get("/api/history") | |
| async def get_history(limit: int = 50): | |
| cursor.execute("SELECT prompt, response, timestamp FROM history ORDER BY timestamp DESC LIMIT ?", (limit,)) | |
| rows = cursor.fetchall() | |
| return [{"prompt": r[0], "response": r[1], "timestamp": r[2]} for r in rows] | |
| @app.post("/api/tool") | |
| async def run_tool(req: ToolRequest): | |
| # Example: execute system commands (sandboxed) | |
| if req.tool == "ls": | |
| path = req.args.get("path", ".") | |
| try: | |
| files = os.listdir(path) | |
| return {"output": "\n".join(files)} | |
| except Exception as e: | |
| return {"error": str(e)} | |
| elif req.tool == "cron_list": | |
| # Parse crontab (requires user permissions) | |
| # For demo, return placeholder | |
| return {"output": "0 5 * * * /home/kali/backup.sh"} | |
| else: | |
| return {"error": "Unknown tool"} | |
| if __name__ == "__main__": | |
| uvicorn.run(app, host=config["server"]["host"], port=config["server"]["port"]) | |
| ``` | |
| --- | |
| ## Step 12: Create the Retro CLI Dashboard | |
| # `static/index.html` | |
| ```html | |
| <!DOCTYPE html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <title>microclaw v2026.2.21 – CLI Gateway</title> | |
| <link rel="stylesheet" href="/static/style.css"> | |
| </head> | |
| <body> | |
| <div class="terminal"> | |
| <div class="header">microclaw [Version 2026.2.21] – Local Fallback Agent</div> | |
| <div class="output" id="output"> | |
| <div>> System ready. Type a command or question.</div> | |
| <div>> Use /help for available commands.</div> | |
| </div> | |
| <div class="input-line"> | |
| <span class="prompt">$></span> | |
| <input type="text" id="input" autofocus> | |
| </div> | |
| </div> | |
| <script src="/static/script.js"></script> | |
| </body> | |
| </html> | |
| ``` | |
| # `static/style.css` | |
| ```css | |
| body { | |
| background: #000; | |
| color: #0f0; | |
| font-family: 'Courier New', monospace; | |
| margin: 0; | |
| padding: 20px; | |
| } | |
| .terminal { | |
| max-width: 900px; | |
| margin: auto; | |
| border: 2px solid #0f0; | |
| padding: 10px; | |
| height: 80vh; | |
| display: flex; | |
| flex-direction: column; | |
| } | |
| .header { | |
| border-bottom: 1px solid #0f0; | |
| padding-bottom: 5px; | |
| margin-bottom: 10px; | |
| text-align: center; | |
| font-weight: bold; | |
| } | |
| .output { | |
| flex: 1; | |
| overflow-y: auto; | |
| white-space: pre-wrap; | |
| margin-bottom: 10px; | |
| } | |
| .input-line { | |
| display: flex; | |
| border-top: 1px solid #0f0; | |
| padding-top: 5px; | |
| } | |
| .prompt { | |
| margin-right: 5px; | |
| } | |
| #input { | |
| background: #000; | |
| border: none; | |
| color: #0f0; | |
| font-family: 'Courier New', monospace; | |
| font-size: 1em; | |
| flex: 1; | |
| outline: none; | |
| } | |
| ``` | |
| # `static/script.js` | |
| ```javascript | |
| const input = document.getElementById('input'); | |
| const output = document.getElementById('output'); | |
| input.addEventListener('keydown', async (e) => { | |
| if (e.key === 'Enter') { | |
| const cmd = input.value.trim(); | |
| input.value = ''; | |
| addLine(`$> ${cmd}`); | |
| await processCommand(cmd); | |
| } | |
| }); | |
| async function processCommand(cmd) { | |
| if (cmd === '/help') { | |
| addLine('Available commands:'); | |
| addLine(' /chat <question> – ask the model'); | |
| addLine(' /ls [path] – list files'); | |
| addLine(' /cron – show cron jobs'); | |
| addLine(' /history – show chat history'); | |
| addLine(' /clear – clear screen'); | |
| return; | |
| } | |
| if (cmd === '/clear') { | |
| output.innerHTML = ''; | |
| return; | |
| } | |
| if (cmd.startsWith('/chat ')) { | |
| const prompt = cmd.slice(6); | |
| addLine('... thinking ...'); | |
| try { | |
| const res = await fetch('/api/chat', { | |
| method: 'POST', | |
| headers: {'Content-Type': 'application/json'}, | |
| body: JSON.stringify({prompt, use_rag: false}) | |
| }); | |
| const data = await res.json(); | |
| addLine(data.response); | |
| } catch (err) { | |
| addLine('Error: ' + err); | |
| } | |
| return; | |
| } | |
| if (cmd === '/history') { | |
| try { | |
| const res = await fetch('/api/history'); | |
| const history = await res.json(); | |
| history.forEach(item => { | |
| addLine(`[${item.timestamp}] Q: ${item.prompt}`); | |
| addLine(`A: ${item.response}`); | |
| }); | |
| } catch (err) { | |
| addLine('Error: ' + err); | |
| } | |
| return; | |
| } | |
| if (cmd.startsWith('/ls')) { | |
| const parts = cmd.split(' '); | |
| const path = parts[1] || '.'; | |
| try { | |
| const res = await fetch('/api/tool', { | |
| method: 'POST', | |
| headers: {'Content-Type': 'application/json'}, | |
| body: JSON.stringify({tool: 'ls', args: {path}}) | |
| }); | |
| const data = await res.json(); | |
| addLine(data.output || data.error); | |
| } catch (err) { | |
| addLine('Error: ' + err); | |
| } | |
| return; | |
| } | |
| if (cmd === '/cron') { | |
| try { | |
| const res = await fetch('/api/tool', { | |
| method: 'POST', | |
| headers: {'Content-Type': 'application/json'}, | |
| body: JSON.stringify({tool: 'cron_list', args: {}}) | |
| }); | |
| const data = await res.json(); | |
| addLine(data.output || data.error); | |
| } catch (err) { | |
| addLine('Error: ' + err); | |
| } | |
| return; | |
| } | |
| addLine(`Unknown command: ${cmd}. Type /help.`); | |
| } | |
| function addLine(text) { | |
| const line = document.createElement('div'); | |
| line.textContent = text; | |
| output.appendChild(line); | |
| output.scrollTop = output.scrollHeight; | |
| } | |
| ``` | |
| You can add more pages (`static/pages/files.html`, `static/pages/cron.html`) and link them from the CLI using `/open files` commands, but for simplicity we'll keep the single‑page CLI. | |
| --- | |
| ## Step 13: Run the Server | |
| ```bash | |
| cd /home/kali/microclaw | |
| source venv/bin/activate | |
| python server.py | |
| ``` | |
| Open your browser to `http://localhost:8080` and start interacting. | |
| --- | |
| ## Troubleshooting | |
| - Out of memory during training: Reduce `max_seq_length`, batch size, or use a smaller base model (e.g., `Qwen2.5-0.5B`). | |
| - Slow inference: Ensure you compiled llama.cpp with OpenBLAS. Use fewer CPU threads if needed (`n_threads=4`). | |
| - GGUF conversion errors: Make sure you have the correct `transformers` version and that the merged model is saved properly. | |
| - Model file not found: Ensure the path in the -m flag is correct. Use the absolute path. | |
| - Port already in use: Change the --port value (e.g., to 8001) and update your OpenClaw configuration. | |
| - Server starts but responds slowly: This is normal on CPU. You can try a smaller, more quantized GGUF variant from the Hugging Face repo (e.g., Q2_K for 2-bit). | |
| - If 'git lfs pull' fails or is slow: If downloads are interrupted, run the command again—it will resume. | |
| - OpenClaw cannot connect: Verify the server is running with curl (as in Step 4). Check any firewall rules. If OpenClaw is in a Docker container, ensure they are on the same network (using --network host for the OpenClaw container is the simplest solution). | |
| - If pkg-config is missing: which CMake uses to find some libraries. | |
| Install it and rebuild: | |
| ```bash | |
| sudo apt update | |
| sudo apt install pkg-config | |
| cd /home/kali/llama.cpp | |
| rm -rf build # clean previous attempt | |
| cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS | |
| cmake --build build --config Release -j$(nproc) | |
| ``` | |
| This should now complete successfully. If you still encounter issues, you can temporarily disable BLAS to get a working build: | |
| ```bash | |
| cd /home/kali/llama.cpp | |
| rm -rf build | |
| cmake -B build | |
| cmake --build build --config Release -j$(nproc) | |
| ``` | |
| After building, you'll have build/bin/llama-server and the conversion script convert-hf-to-gguf.py in the main llama.cpp directory. | |
| ### The "illegal hardware instruction" error: indicates that the PyTorch build you're using is trying to execute CPU instructions (like AVX2) that your processor does not support. This is common on older CPUs or virtual machines. Let's diagnose and fix it. | |
| ## 1. Check your CPU's instruction set | |
| Run this command to see what your CPU supports: | |
| ```bash | |
| lscpu | grep -E "Model name|Flags" | |
| ``` | |
| Look for flags like `avx`, `avx2`, `sse4_1`, etc. If you don't see `avx2`, that's the problem. | |
| ## 2. Install a PyTorch version compatible with your CPU | |
| The standard PyTorch wheels from the official site require AVX2. You have two options: | |
| ### Option A: Install PyTorch from conda-forge (recommended) | |
| Conda-forge often provides more compatible builds, including for older CPUs. | |
| ```bash | |
| # Install Miniconda if you haven't | |
| wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | |
| bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3 | |
| source ~/miniconda3/bin/activate | |
| # Create a new environment with Python 3.10 | |
| conda create -y -n microclaw python=3.10 | |
| conda activate microclaw | |
| # Install PyTorch CPU-only from conda-forge | |
| conda install -y pytorch cpuonly -c pytorch # but this might also require AVX2 | |
| # Better: use conda-forge | |
| conda install -y pytorch cpuonly -c conda-forge | |
| ``` | |
| If that still fails, we can try building PyTorch from source with older instruction sets, but that's complex. | |
| ### Option B: Use the PyTorch wheels with no AVX requirements | |
| There are community builds that target older CPUs. For example, the `manylinux2014` wheels might work. Try: | |
| ```bash | |
| pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu --no-deps | |
| ``` | |
| But the official wheels likely require AVX2. You could try an older PyTorch version (e.g., 1.13) which may have broader support. | |
| ### Option C: Use llama.cpp for training as well | |
| Since llama.cpp is pure C++ and can be compiled for any CPU, you could use it for training too. However, we saw that the `finetune` tool wasn't present. But you can build it with the right flags. Let's try building the `finetune` example explicitly. | |
| First, ensure you have the latest llama.cpp with the finetune example: | |
| ```bash | |
| cd ~/llama.cpp | |
| git pull origin master | |
| ``` | |
| Now build the finetune tool: | |
| ```bash | |
| mkdir -p build && cd build | |
| cmake .. -DLLAMA_FINETUNE=ON -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native" | |
| make -j$(nproc) finetune | |
| ``` | |
| After that, the `finetune` binary should appear in `build/bin/`. Then you can train using the command we discussed earlier. | |
| ## 3. If you still want to use Python, consider using a smaller model | |
| TinyLlama is already small, but the error is at the PyTorch level, not the model. So changing the model won't help. You need a compatible PyTorch. | |
| ## 4. Verify your Python environment | |
| Maybe the virtual environment is using a system Python that has a broken PyTorch. Try creating a fresh venv with Python 3.10 and reinstalling all packages. | |
| ```bash | |
| cd ~/microclaw | |
| deactivate | |
| rm -rf venv | |
| python3.10 -m venv venv | |
| source venv/bin/activate | |
| pip install --upgrade pip | |
| pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu | |
| pip install -r requirements.txt | |
| ``` | |
| Then run `python train.py` again. | |
| # LICENSE | |
| Apache 2.0 License. |