Text Generation
GGUF
French
English
Mixture of Experts
lora
multi-domain
embedded-systems
cognitive
conversational
Instructions to use clemsail/micro-kiki-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use clemsail/micro-kiki-v3 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="clemsail/micro-kiki-v3", filename="micro-kiki-v3-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use clemsail/micro-kiki-v3 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf clemsail/micro-kiki-v3:Q4_K_M # Run inference directly in the terminal: llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf clemsail/micro-kiki-v3:Q4_K_M # Run inference directly in the terminal: llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf clemsail/micro-kiki-v3:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf clemsail/micro-kiki-v3:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M
Use Docker
docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use clemsail/micro-kiki-v3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "clemsail/micro-kiki-v3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "clemsail/micro-kiki-v3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M
- Ollama
How to use clemsail/micro-kiki-v3 with Ollama:
ollama run hf.co/clemsail/micro-kiki-v3:Q4_K_M
- Unsloth Studio
How to use clemsail/micro-kiki-v3 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for clemsail/micro-kiki-v3 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for clemsail/micro-kiki-v3 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for clemsail/micro-kiki-v3 to start chatting
- Pi
How to use clemsail/micro-kiki-v3 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "clemsail/micro-kiki-v3:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use clemsail/micro-kiki-v3 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default clemsail/micro-kiki-v3:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use clemsail/micro-kiki-v3 with Docker Model Runner:
docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M
- Lemonade
How to use clemsail/micro-kiki-v3 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull clemsail/micro-kiki-v3:Q4_K_M
Run and chat with the model
lemonade run user.micro-kiki-v3-Q4_K_M
List all available models
lemonade list
| license: apache-2.0 | |
| language: | |
| - fr | |
| - en | |
| tags: | |
| - moe | |
| - lora | |
| - multi-domain | |
| - embedded-systems | |
| - cognitive | |
| base_model: Qwen/Qwen3.5-35B-A3B | |
| pipeline_tag: text-generation | |
| # micro-kiki | |
| **35-domain expert model** built on Qwen3.5-35B-A3B (MoE, 256 experts, 3B active/token) with LoRA adapters and a cognitive layer (memory palace + negotiator + anti-bias). | |
| ## Model Description | |
| micro-kiki is a multi-domain language model designed for technical applications spanning electronics, firmware, CAD, manufacturing, and general-purpose conversation. It uses a router-based architecture that selects up to 4 domain-specific LoRA stacks per request. | |
| | Property | Value | | |
| |----------|-------| | |
| | Base model | Qwen3.5-35B-A3B | | |
| | Architecture | MoE (256 experts, 3B active/token) | | |
| | Adapter | LoRA rank 16 (q/k/v/o projections) | | |
| | Domains | 35 | | |
| | Max active stacks | 4 | | |
| | Context length | 262,144 tokens | | |
| | Quantization | Q4_K_M (inference), BF16 (training) | | |
| | License | Apache 2.0 | | |
| ## Architecture | |
| ``` | |
| +-------------------+ | |
| | Domain Router | | |
| | (classifier, top4)| | |
| +--------+----------+ | |
| | | |
| +----------+--------+--------+----------+ | |
| | | | | | |
| +----v----+ +---v---+ +----v----+ +---v---+ | |
| | Stack 1 | |Stack 2| ... |Stack 34 | |Stack35| | |
| | chat-fr | |python | |ml-train | |securi.| | |
| +---------+ +-------+ +---------+ +-------+ | |
| | | | | | |
| +----------+--------+--------+----------+ | |
| | | |
| +--------v----------+ | |
| | Negotiator | | |
| | CAMP + Catfish | | |
| +--------+----------+ | |
| | | |
| +--------v----------+ | |
| | Anti-Bias | | |
| | KnowBias + RBD | | |
| +--------+----------+ | |
| | | |
| +--------v----------+ | |
| | Aeon Memory | | |
| | Atlas + Trace | | |
| +-------------------+ | |
| ``` | |
| ## Intended Use | |
| - **French/English conversational AI** with domain expertise | |
| - **Code generation** (Python, C/C++, Rust, TypeScript, embedded firmware) | |
| - **Electronics design** (KiCad DSL, schematic review, component selection, SPICE) | |
| - **Manufacturing** (process optimization, quality control) | |
| - **Multi-domain routing** with cognitive arbitration | |
| ## Limitations | |
| - Not designed for medical, legal, or financial advice | |
| - Optimized for technical domains; general knowledge may be weaker than base model | |
| - Requires Q4_K_M or higher quantization; quality degrades below Q4 | |
| - Maximum 4 concurrent LoRA stacks; performance varies with stack combinations | |
| - Memory (Aeon) requires external backends (Qdrant/Neo4j) for production use | |
| ## Training Data — V3 (489K examples, 35 domains) | |
| ### Sources | |
| | Source | Examples | Description | | |
| |--------|----------|-------------| | |
| | Claude CLI sessions | 50,116 | Real user-tool interactions extracted from 5 machines (GrosMac, kxkm-ai, Studio, Tower, CILS) | | |
| | Codex/Copilot sessions | 2,529 | OpenAI Codex + GitHub Copilot sessions extracted from 4 machines | | |
| | HuggingFace datasets | 364,045 | 19 open datasets (see below) | | |
| | Opus teacher distillation | — | chat-fr, reasoning domains | | |
| | Original curated | — | 32 domain seed datasets | | |
| ### HuggingFace Datasets | |
| | Dataset | Examples | License | | |
| |---------|----------|---------| | |
| | CodeFeedback-Filtered-Instruction | 157,000 | Apache 2.0 | | |
| | French-Alpaca-Instruct-110K | 110,000 | Apache 2.0 | | |
| | Electronics StackExchange | 95,000 | CC-BY-SA-3.0 | | |
| | CJJones/LLM_EE_Educational_Synthetic_Dialog | 50,000 | CC-BY-NC-SA-4.0 | | |
| | MuratKomurcu/stm32-hal-dataset | 29,700 | MIT | | |
| | redcathode/thingiverse-openscad | 7,400 | — | | |
| | ThomasTheMaker/OpenSCAD | 4,900 | — | | |
| | STEM-AI-mtl/Electrical-engineering | 1,100 | — | | |
| | JITX open-components-database | 151 | — | | |
| | Vrindarani/netlistgen | 106 | — | | |
| ### 35 Domains | |
| | Group | Domains | | |
| |-------|---------| | |
| | Conversation | chat-fr, reasoning | | |
| | Code | python, typescript, cpp, rust, html-css, shell, sql, yaml-json, lua-upy | | |
| | Infrastructure | docker, devops, llm-orch, llm-ops (NEW), ml-training (NEW) | | |
| | Electronics | kicad-dsl, kicad-pcb, spice, electronics, components (NEW), power, emc, dsp | | |
| | Hardware | embedded, stm32, iot, platformio | | |
| | CAD | freecad | | |
| | Web | web-frontend, web-backend | | |
| | Other | music-audio, math, security | | |
| **Changes from V2:** 3 new domains (components, llm-ops, ml-training). `spice-sim` merged into `spice`. `stm32` is a sub-category of `embedded`. | |
| ### New Domain: components | |
| 57K Q&A about electronic component specs, datasheets, sourcing, BOM, and cross-reference. Sources: Electronics StackExchange (filtered by component tags) + JITX open-components-database. | |
| ## Training — V3 | |
| | Property | Value | | |
| |----------|-------| | |
| | Base model | Qwen3.5-4B | | |
| | Adapter | MoE-LoRA: 4 experts/projection, rank 16, top-2 routing | | |
| | Null-space projection | ENABLED (prevents catastrophic forgetting between stacks) | | |
| | Curriculum | Sequential, 35 stacks trained in order | | |
| | Platform (MLX) | Mac Studio M3 Ultra 512 GB | | |
| | Platform (CUDA) | kxkm-ai RTX 4090 24 GB | | |
| ## Evaluation | |
| | Metric | Value | | |
| |--------|-------| | |
| | Router accuracy (35-class) | [PENDING] | | |
| | Forgetting check (angle) | [PENDING] | | |
| | Perplexity (base) | [PENDING] | | |
| | Perplexity (debiased) | [PENDING] | | |
| | Aeon recall@1 | [PENDING] | | |
| | Aeon recall@5 | [PENDING] | | |
| | Aeon recall@10 | [PENDING] | | |
| | Anti-bias flag rate | [PENDING] | | |
| | Average inference latency | [PENDING] | | |
| ## Hardware Requirements | |
| | Setup | RAM/VRAM | Use | | |
| |-------|----------|-----| | |
| | Mac Studio M3 Ultra | 512 GB unified | Training (BF16 LoRA) + serving (MLX) | | |
| | RTX 4090 | 24 GB VRAM | Q4 inference (vLLM) | | |
| | Apple Silicon 32 GB+ | 32 GB unified | Q4_K_M inference (MLX/llama.cpp) | | |
| ## Citation | |
| ```bibtex | |
| @misc{micro-kiki-2026, | |
| title={micro-kiki: Multi-Domain Expert Model with Cognitive Layer}, | |
| author={L'Electron Rare}, | |
| year={2026}, | |
| url={https://huggingface.co/electron-rare/micro-kiki} | |
| } | |
| ``` | |
| ## Related Projects & Ecosystem | |
| `micro-kiki-v3` is one component of the **FineFab** platform built by **[L'Électron Rare](https://github.com/L-electron-Rare)** — a local-first, multi-machine AI-native manufacturing and electronics platform. | |
| | Role | Project | Description | | |
| |---|---|---| | |
| | Training toolkit | [L-electron-Rare/KIKI-Mac_tunner](https://github.com/L-electron-Rare/KIKI-Mac_tunner) | MLX fine-tuning toolkit (Mac Studio) — Opus reasoning distilled into Mistral Large 123B | | |
| | Fine-tuning pipeline | [L-electron-Rare/KIKI-models-tuning](https://github.com/L-electron-Rare/KIKI-models-tuning) | FineFab fine-tuning pipeline — training, evaluation, registry (Unsloth, LoRA) | | |
| | Methodology | [electron-rare/Kill_LIFE](https://github.com/electron-rare/Kill_LIFE) | Spec-first agentic methodology for embedded systems — BMAD agents, gates, evidence packs | | |
| | Orchestration | [electron-rare/mascarade](https://github.com/electron-rare/mascarade) | Multi-machine agentic LLM orchestration — P2P mesh, 8 providers, RAG pipeline | | |
| | AI backend | [L-electron-Rare/life-core](https://github.com/L-electron-Rare/life-core) | FineFab AI backend — LLM router, RAG, caching, orchestration | | |
| | CAD assistant | [electron-rare/KiC-AI](https://github.com/electron-rare/KiC-AI) | AI-powered PCB design assistant for KiCad | | |
| See the full org at **[github.com/L-electron-Rare](https://github.com/L-electron-Rare)** — 13 public repos covering platform, hardware, firmware, CAD, and ML. | |
| **Infrastructure**: the 50K+ Claude CLI examples in the training dataset were captured on our 5-node P2P mesh — GrosMac (Apple M5), Tower (28 threads), CILS (i7), KXKM-AI (RTX 4090), VM bootstrap. Ed25519 auth, DHT discovery. | |
| ## 🇪🇺 EU AI Act transparency | |
| This adapter is provided as a fine-tuned LoRA under the AI Act framework | |
| (Regulation EU 2024/1689). Compliance metadata: | |
| | Field | Value | | |
| |---|---| | |
| | Provider | L'Électron Rare (clemsail / electron-rare) | | |
| | Role under AI Act | GPAI provider for this adapter | | |
| | Base model | `Qwen/Qwen3.5-35B-A3B` — see upstream provenance | | |
| | Adapter type | LoRA / PEFT — adapter weights only; base unchanged | | |
| | Training data origin | L'Électron Rare proprietary technical corpus + curated public docs | | |
| | License | Apache-2.0 (adapter). Upstream base licence applies separately. | | |
| | Intended use | Multi-domain technical assistance — engineering, KiCad, embedded, code, FR/EN chat | | |
| | Out of scope | Healthcare diagnosis, legal advice, autonomous safety-critical decisions, generation of malicious code | | |
| | Risk classification | Limited risk — Article 50 transparency obligations apply | | |
| | Copyright respect | Training data does not include scraped copyrighted material. Opt-out signals (robots.txt, ai.txt) are honoured for web-sourced data. | | |
| | Full provenance | https://github.com/L-electron-Rare/eu-kiki/tree/main/docs/provenance | | |
| | Contact | postmaster@saillant.cc — biased output reports, copyright concerns, etc. | | |
| ⚠️ **You are using an AI model.** Outputs may be inaccurate, biased or | |
| fabricated. Do not act on them without independent verification, especially | |
| in regulated domains. | |