Instructions to use clemsail/micro-kiki-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use clemsail/micro-kiki-v3 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="clemsail/micro-kiki-v3",
	filename="micro-kiki-v3-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use clemsail/micro-kiki-v3 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M

Use Docker

docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M

LM Studio
Jan

vLLM

How to use clemsail/micro-kiki-v3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "clemsail/micro-kiki-v3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "clemsail/micro-kiki-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M

Ollama
How to use clemsail/micro-kiki-v3 with Ollama:
```
ollama run hf.co/clemsail/micro-kiki-v3:Q4_K_M
```

Unsloth Studio new

How to use clemsail/micro-kiki-v3 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for clemsail/micro-kiki-v3 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for clemsail/micro-kiki-v3 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for clemsail/micro-kiki-v3 to start chatting

Pi new

How to use clemsail/micro-kiki-v3 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf clemsail/micro-kiki-v3:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "micro-kiki-v3"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use clemsail/micro-kiki-v3 with Docker Model Runner:
```
docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M
```

Lemonade

How to use clemsail/micro-kiki-v3 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull clemsail/micro-kiki-v3:Q4_K_M

Run and chat with the model

lemonade run user.micro-kiki-v3-Q4_K_M

List all available models

lemonade list

micro-kiki

35-domain expert model built on Qwen3.5-35B-A3B (MoE, 256 experts, 3B active/token) with LoRA adapters and a cognitive layer (memory palace + negotiator + anti-bias).

Model Description

micro-kiki is a multi-domain language model designed for technical applications spanning electronics, firmware, CAD, manufacturing, and general-purpose conversation. It uses a router-based architecture that selects up to 4 domain-specific LoRA stacks per request.

Property	Value
Base model	Qwen3.5-35B-A3B
Architecture	MoE (256 experts, 3B active/token)
Adapter	LoRA rank 16 (q/k/v/o projections)
Domains	35
Max active stacks	4
Context length	262,144 tokens
Quantization	Q4_K_M (inference), BF16 (training)
License	Apache 2.0

Architecture

                         +-------------------+
                         |   Domain Router   |
                         | (classifier, top4)|
                         +--------+----------+
                                  |
              +----------+--------+--------+----------+
              |          |                 |          |
         +----v----+ +---v---+       +----v----+ +---v---+
         | Stack 1 | |Stack 2|  ...  |Stack 34 | |Stack35|
         | chat-fr | |python |       |ml-train | |securi.|
         +---------+ +-------+       +---------+ +-------+
              |          |                 |          |
              +----------+--------+--------+----------+
                                  |
                         +--------v----------+
                         |    Negotiator     |
                         | CAMP + Catfish    |
                         +--------+----------+
                                  |
                         +--------v----------+
                         |    Anti-Bias      |
                         | KnowBias + RBD   |
                         +--------+----------+
                                  |
                         +--------v----------+
                         |   Aeon Memory     |
                         | Atlas + Trace     |
                         +-------------------+

Intended Use

French/English conversational AI with domain expertise
Code generation (Python, C/C++, Rust, TypeScript, embedded firmware)
Electronics design (KiCad DSL, schematic review, component selection, SPICE)
Manufacturing (process optimization, quality control)
Multi-domain routing with cognitive arbitration

Limitations

Not designed for medical, legal, or financial advice
Optimized for technical domains; general knowledge may be weaker than base model
Requires Q4_K_M or higher quantization; quality degrades below Q4
Maximum 4 concurrent LoRA stacks; performance varies with stack combinations
Memory (Aeon) requires external backends (Qdrant/Neo4j) for production use

Training Data — V3 (489K examples, 35 domains)

Sources

Source	Examples	Description
Claude CLI sessions	50,116	Real user-tool interactions extracted from 5 machines (GrosMac, kxkm-ai, Studio, Tower, CILS)
Codex/Copilot sessions	2,529	OpenAI Codex + GitHub Copilot sessions extracted from 4 machines
HuggingFace datasets	364,045	19 open datasets (see below)
Opus teacher distillation	—	chat-fr, reasoning domains
Original curated	—	32 domain seed datasets

HuggingFace Datasets

Dataset	Examples	License
CodeFeedback-Filtered-Instruction	157,000	Apache 2.0
French-Alpaca-Instruct-110K	110,000	Apache 2.0
Electronics StackExchange	95,000	CC-BY-SA-3.0
CJJones/LLM_EE_Educational_Synthetic_Dialog	50,000	CC-BY-NC-SA-4.0
MuratKomurcu/stm32-hal-dataset	29,700	MIT
redcathode/thingiverse-openscad	7,400	—
ThomasTheMaker/OpenSCAD	4,900	—
STEM-AI-mtl/Electrical-engineering	1,100	—
JITX open-components-database	151	—
Vrindarani/netlistgen	106	—

35 Domains

Group	Domains
Conversation	chat-fr, reasoning
Code	python, typescript, cpp, rust, html-css, shell, sql, yaml-json, lua-upy
Infrastructure	docker, devops, llm-orch, llm-ops (NEW), ml-training (NEW)
Electronics	kicad-dsl, kicad-pcb, spice, electronics, components (NEW), power, emc, dsp
Hardware	embedded, stm32, iot, platformio
CAD	freecad
Web	web-frontend, web-backend
Other	music-audio, math, security

Changes from V2: 3 new domains (components, llm-ops, ml-training). spice-sim merged into spice. stm32 is a sub-category of embedded.

New Domain: components

57K Q&A about electronic component specs, datasheets, sourcing, BOM, and cross-reference. Sources: Electronics StackExchange (filtered by component tags) + JITX open-components-database.

Training — V3

Property	Value
Base model	Qwen3.5-4B
Adapter	MoE-LoRA: 4 experts/projection, rank 16, top-2 routing
Null-space projection	ENABLED (prevents catastrophic forgetting between stacks)
Curriculum	Sequential, 35 stacks trained in order
Platform (MLX)	Mac Studio M3 Ultra 512 GB
Platform (CUDA)	kxkm-ai RTX 4090 24 GB

Evaluation

Metric	Value
Router accuracy (35-class)	[PENDING]
Forgetting check (angle)	[PENDING]
Perplexity (base)	[PENDING]
Perplexity (debiased)	[PENDING]
Aeon recall@1	[PENDING]
Aeon recall@5	[PENDING]
Aeon recall@10	[PENDING]
Anti-bias flag rate	[PENDING]
Average inference latency	[PENDING]

Hardware Requirements

Setup	RAM/VRAM	Use
Mac Studio M3 Ultra	512 GB unified	Training (BF16 LoRA) + serving (MLX)
RTX 4090	24 GB VRAM	Q4 inference (vLLM)
Apple Silicon 32 GB+	32 GB unified	Q4_K_M inference (MLX/llama.cpp)

Citation

@misc{micro-kiki-2026,
  title={micro-kiki: Multi-Domain Expert Model with Cognitive Layer},
  author={L'Electron Rare},
  year={2026},
  url={https://huggingface.co/electron-rare/micro-kiki}
}

Related Projects & Ecosystem

micro-kiki-v3 is one component of the FineFab platform built by L'Électron Rare — a local-first, multi-machine AI-native manufacturing and electronics platform.

Role	Project	Description
Training toolkit	L-electron-Rare/KIKI-Mac_tunner	MLX fine-tuning toolkit (Mac Studio) — Opus reasoning distilled into Mistral Large 123B
Fine-tuning pipeline	L-electron-Rare/KIKI-models-tuning	FineFab fine-tuning pipeline — training, evaluation, registry (Unsloth, LoRA)
Methodology	electron-rare/Kill_LIFE	Spec-first agentic methodology for embedded systems — BMAD agents, gates, evidence packs
Orchestration	electron-rare/mascarade	Multi-machine agentic LLM orchestration — P2P mesh, 8 providers, RAG pipeline
AI backend	L-electron-Rare/life-core	FineFab AI backend — LLM router, RAG, caching, orchestration
CAD assistant	electron-rare/KiC-AI	AI-powered PCB design assistant for KiCad

See the full org at github.com/L-electron-Rare — 13 public repos covering platform, hardware, firmware, CAD, and ML.

Infrastructure: the 50K+ Claude CLI examples in the training dataset were captured on our 5-node P2P mesh — GrosMac (Apple M5), Tower (28 threads), CILS (i7), KXKM-AI (RTX 4090), VM bootstrap. Ed25519 auth, DHT discovery.

🇪🇺 EU AI Act transparency

This adapter is provided as a fine-tuned LoRA under the AI Act framework (Regulation EU 2024/1689). Compliance metadata:

Field	Value
Provider	L'Électron Rare (clemsail / electron-rare)
Role under AI Act	GPAI provider for this adapter
Base model	`Qwen/Qwen3.5-35B-A3B` — see upstream provenance
Adapter type	LoRA / PEFT — adapter weights only; base unchanged
Training data origin	L'Électron Rare proprietary technical corpus + curated public docs
License	Apache-2.0 (adapter). Upstream base licence applies separately.
Intended use	Multi-domain technical assistance — engineering, KiCad, embedded, code, FR/EN chat
Out of scope	Healthcare diagnosis, legal advice, autonomous safety-critical decisions, generation of malicious code
Risk classification	Limited risk — Article 50 transparency obligations apply
Copyright respect	Training data does not include scraped copyrighted material. Opt-out signals (robots.txt, ai.txt) are honoured for web-sourced data.
Full provenance	https://github.com/L-electron-Rare/eu-kiki/tree/main/docs/provenance
Contact	postmaster@saillant.cc — biased output reports, copyright concerns, etc.

⚠️ You are using an AI model. Outputs may be inaccurate, biased or fabricated. Do not act on them without independent verification, especially in regulated domains.

Downloads last month: 245

GGUF

Model size

4B params

Architecture

qwen35

Hardware compatibility

4-bit

Model tree for clemsail/micro-kiki-v3

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Adapter

(24)

this model