Instructions to use clemsail/micro-kiki-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use clemsail/micro-kiki-v3 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="clemsail/micro-kiki-v3",
	filename="micro-kiki-v3-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use clemsail/micro-kiki-v3 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf clemsail/micro-kiki-v3:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf clemsail/micro-kiki-v3:Q4_K_M

Use Docker

docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M

LM Studio
Jan

vLLM

How to use clemsail/micro-kiki-v3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "clemsail/micro-kiki-v3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "clemsail/micro-kiki-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M

Ollama
How to use clemsail/micro-kiki-v3 with Ollama:
```
ollama run hf.co/clemsail/micro-kiki-v3:Q4_K_M
```

Unsloth Studio

How to use clemsail/micro-kiki-v3 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for clemsail/micro-kiki-v3 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for clemsail/micro-kiki-v3 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for clemsail/micro-kiki-v3 to start chatting

How to use clemsail/micro-kiki-v3 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf clemsail/micro-kiki-v3:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "clemsail/micro-kiki-v3:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use clemsail/micro-kiki-v3 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf clemsail/micro-kiki-v3:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default clemsail/micro-kiki-v3:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use clemsail/micro-kiki-v3 with Docker Model Runner:
```
docker model run hf.co/clemsail/micro-kiki-v3:Q4_K_M
```

Lemonade

How to use clemsail/micro-kiki-v3 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull clemsail/micro-kiki-v3:Q4_K_M

Run and chat with the model

lemonade run user.micro-kiki-v3-Q4_K_M

List all available models

lemonade list

micro-kiki-v3 / README.md

clemsail

docs: append EU AI Act §50 transparency section

c4e2b4a verified about 1 month ago

preview code

raw

history blame contribute delete

9.58 kB

	---
	license: apache-2.0
	language:
	- fr
	- en
	tags:
	- moe
	- lora
	- multi-domain
	- embedded-systems
	- cognitive
	base_model: Qwen/Qwen3.5-35B-A3B
	pipeline_tag: text-generation
	---

	# micro-kiki

	35-domain expert model built on Qwen3.5-35B-A3B (MoE, 256 experts, 3B active/token) with LoRA adapters and a cognitive layer (memory palace + negotiator + anti-bias).

	## Model Description

	micro-kiki is a multi-domain language model designed for technical applications spanning electronics, firmware, CAD, manufacturing, and general-purpose conversation. It uses a router-based architecture that selects up to 4 domain-specific LoRA stacks per request.

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base model \| Qwen3.5-35B-A3B \|
	\| Architecture \| MoE (256 experts, 3B active/token) \|
	\| Adapter \| LoRA rank 16 (q/k/v/o projections) \|
	\| Domains \| 35 \|
	\| Max active stacks \| 4 \|
	\| Context length \| 262,144 tokens \|
	\| Quantization \| Q4_K_M (inference), BF16 (training) \|
	\| License \| Apache 2.0 \|

	## Architecture

	```
	+-------------------+
	\| Domain Router \|
	\| (classifier, top4)\|
	+--------+----------+
	\|
	+----------+--------+--------+----------+
	\| \| \| \|
	+----v----+ +---v---+ +----v----+ +---v---+
	\| Stack 1 \| \|Stack 2\| ... \|Stack 34 \| \|Stack35\|
	\| chat-fr \| \|python \| \|ml-train \| \|securi.\|
	+---------+ +-------+ +---------+ +-------+
	\| \| \| \|
	+----------+--------+--------+----------+
	\|
	+--------v----------+
	\| Negotiator \|
	\| CAMP + Catfish \|
	+--------+----------+
	\|
	+--------v----------+
	\| Anti-Bias \|
	\| KnowBias + RBD \|
	+--------+----------+
	\|
	+--------v----------+
	\| Aeon Memory \|
	\| Atlas + Trace \|
	+-------------------+
	```

	## Intended Use

	- French/English conversational AI with domain expertise
	- Code generation (Python, C/C++, Rust, TypeScript, embedded firmware)
	- Electronics design (KiCad DSL, schematic review, component selection, SPICE)
	- Manufacturing (process optimization, quality control)
	- Multi-domain routing with cognitive arbitration

	## Limitations

	- Not designed for medical, legal, or financial advice
	- Optimized for technical domains; general knowledge may be weaker than base model
	- Requires Q4_K_M or higher quantization; quality degrades below Q4
	- Maximum 4 concurrent LoRA stacks; performance varies with stack combinations
	- Memory (Aeon) requires external backends (Qdrant/Neo4j) for production use

	## Training Data — V3 (489K examples, 35 domains)

	### Sources

	\| Source \| Examples \| Description \|
	\|--------\|----------\|-------------\|
	\| Claude CLI sessions \| 50,116 \| Real user-tool interactions extracted from 5 machines (GrosMac, kxkm-ai, Studio, Tower, CILS) \|
	\| Codex/Copilot sessions \| 2,529 \| OpenAI Codex + GitHub Copilot sessions extracted from 4 machines \|
	\| HuggingFace datasets \| 364,045 \| 19 open datasets (see below) \|
	\| Opus teacher distillation \| — \| chat-fr, reasoning domains \|
	\| Original curated \| — \| 32 domain seed datasets \|

	### HuggingFace Datasets

	\| Dataset \| Examples \| License \|
	\|---------\|----------\|---------\|
	\| CodeFeedback-Filtered-Instruction \| 157,000 \| Apache 2.0 \|
	\| French-Alpaca-Instruct-110K \| 110,000 \| Apache 2.0 \|
	\| Electronics StackExchange \| 95,000 \| CC-BY-SA-3.0 \|
	\| CJJones/LLM_EE_Educational_Synthetic_Dialog \| 50,000 \| CC-BY-NC-SA-4.0 \|
	\| MuratKomurcu/stm32-hal-dataset \| 29,700 \| MIT \|
	\| redcathode/thingiverse-openscad \| 7,400 \| — \|
	\| ThomasTheMaker/OpenSCAD \| 4,900 \| — \|
	\| STEM-AI-mtl/Electrical-engineering \| 1,100 \| — \|
	\| JITX open-components-database \| 151 \| — \|
	\| Vrindarani/netlistgen \| 106 \| — \|

	### 35 Domains

	\| Group \| Domains \|
	\|-------\|---------\|
	\| Conversation \| chat-fr, reasoning \|
	\| Code \| python, typescript, cpp, rust, html-css, shell, sql, yaml-json, lua-upy \|
	\| Infrastructure \| docker, devops, llm-orch, llm-ops (NEW), ml-training (NEW) \|
	\| Electronics \| kicad-dsl, kicad-pcb, spice, electronics, components (NEW), power, emc, dsp \|
	\| Hardware \| embedded, stm32, iot, platformio \|
	\| CAD \| freecad \|
	\| Web \| web-frontend, web-backend \|
	\| Other \| music-audio, math, security \|

	Changes from V2: 3 new domains (components, llm-ops, ml-training). `spice-sim` merged into `spice`. `stm32` is a sub-category of `embedded`.

	### New Domain: components

	57K Q&A about electronic component specs, datasheets, sourcing, BOM, and cross-reference. Sources: Electronics StackExchange (filtered by component tags) + JITX open-components-database.

	## Training — V3

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base model \| Qwen3.5-4B \|
	\| Adapter \| MoE-LoRA: 4 experts/projection, rank 16, top-2 routing \|
	\| Null-space projection \| ENABLED (prevents catastrophic forgetting between stacks) \|
	\| Curriculum \| Sequential, 35 stacks trained in order \|
	\| Platform (MLX) \| Mac Studio M3 Ultra 512 GB \|
	\| Platform (CUDA) \| kxkm-ai RTX 4090 24 GB \|

	## Evaluation

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Router accuracy (35-class) \| [PENDING] \|
	\| Forgetting check (angle) \| [PENDING] \|
	\| Perplexity (base) \| [PENDING] \|
	\| Perplexity (debiased) \| [PENDING] \|
	\| Aeon recall@1 \| [PENDING] \|
	\| Aeon recall@5 \| [PENDING] \|
	\| Aeon recall@10 \| [PENDING] \|
	\| Anti-bias flag rate \| [PENDING] \|
	\| Average inference latency \| [PENDING] \|

	## Hardware Requirements

	\| Setup \| RAM/VRAM \| Use \|
	\|-------\|----------\|-----\|
	\| Mac Studio M3 Ultra \| 512 GB unified \| Training (BF16 LoRA) + serving (MLX) \|
	\| RTX 4090 \| 24 GB VRAM \| Q4 inference (vLLM) \|
	\| Apple Silicon 32 GB+ \| 32 GB unified \| Q4_K_M inference (MLX/llama.cpp) \|

	## Citation

	```bibtex
	@misc{micro-kiki-2026,
	title={micro-kiki: Multi-Domain Expert Model with Cognitive Layer},
	author={L'Electron Rare},
	year={2026},
	url={https://huggingface.co/electron-rare/micro-kiki}
	}
	```


	## Related Projects & Ecosystem

	`micro-kiki-v3` is one component of the FineFab platform built by [L'Électron Rare](https://github.com/L-electron-Rare) — a local-first, multi-machine AI-native manufacturing and electronics platform.

	\| Role \| Project \| Description \|
	\|---\|---\|---\|
	\| Training toolkit \| [L-electron-Rare/KIKI-Mac_tunner](https://github.com/L-electron-Rare/KIKI-Mac_tunner) \| MLX fine-tuning toolkit (Mac Studio) — Opus reasoning distilled into Mistral Large 123B \|
	\| Fine-tuning pipeline \| [L-electron-Rare/KIKI-models-tuning](https://github.com/L-electron-Rare/KIKI-models-tuning) \| FineFab fine-tuning pipeline — training, evaluation, registry (Unsloth, LoRA) \|
	\| Methodology \| [electron-rare/Kill_LIFE](https://github.com/electron-rare/Kill_LIFE) \| Spec-first agentic methodology for embedded systems — BMAD agents, gates, evidence packs \|
	\| Orchestration \| [electron-rare/mascarade](https://github.com/electron-rare/mascarade) \| Multi-machine agentic LLM orchestration — P2P mesh, 8 providers, RAG pipeline \|
	\| AI backend \| [L-electron-Rare/life-core](https://github.com/L-electron-Rare/life-core) \| FineFab AI backend — LLM router, RAG, caching, orchestration \|
	\| CAD assistant \| [electron-rare/KiC-AI](https://github.com/electron-rare/KiC-AI) \| AI-powered PCB design assistant for KiCad \|

	See the full org at [github.com/L-electron-Rare](https://github.com/L-electron-Rare) — 13 public repos covering platform, hardware, firmware, CAD, and ML.

	Infrastructure: the 50K+ Claude CLI examples in the training dataset were captured on our 5-node P2P mesh — GrosMac (Apple M5), Tower (28 threads), CILS (i7), KXKM-AI (RTX 4090), VM bootstrap. Ed25519 auth, DHT discovery.

	## 🇪🇺 EU AI Act transparency

	This adapter is provided as a fine-tuned LoRA under the AI Act framework
	(Regulation EU 2024/1689). Compliance metadata:

	\| Field \| Value \|
	\|---\|---\|
	\| Provider \| L'Électron Rare (clemsail / electron-rare) \|
	\| Role under AI Act \| GPAI provider for this adapter \|
	\| Base model \| `Qwen/Qwen3.5-35B-A3B` — see upstream provenance \|
	\| Adapter type \| LoRA / PEFT — adapter weights only; base unchanged \|
	\| Training data origin \| L'Électron Rare proprietary technical corpus + curated public docs \|
	\| License \| Apache-2.0 (adapter). Upstream base licence applies separately. \|
	\| Intended use \| Multi-domain technical assistance — engineering, KiCad, embedded, code, FR/EN chat \|
	\| Out of scope \| Healthcare diagnosis, legal advice, autonomous safety-critical decisions, generation of malicious code \|
	\| Risk classification \| Limited risk — Article 50 transparency obligations apply \|
	\| Copyright respect \| Training data does not include scraped copyrighted material. Opt-out signals (robots.txt, ai.txt) are honoured for web-sourced data. \|
	\| Full provenance \| https://github.com/L-electron-Rare/eu-kiki/tree/main/docs/provenance \|
	\| Contact \| postmaster@saillant.cc — biased output reports, copyright concerns, etc. \|

	⚠️ You are using an AI model. Outputs may be inaccurate, biased or
	fabricated. Do not act on them without independent verification, especially
	in regulated domains.