Instructions to use anicka/karma-electric-apertus-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use anicka/karma-electric-apertus-8b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="anicka/karma-electric-apertus-8b", filename="karma-electric-apertus-8b-v13-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use anicka/karma-electric-apertus-8b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf anicka/karma-electric-apertus-8b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf anicka/karma-electric-apertus-8b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf anicka/karma-electric-apertus-8b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf anicka/karma-electric-apertus-8b:Q4_K_M
Use Docker
docker model run hf.co/anicka/karma-electric-apertus-8b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use anicka/karma-electric-apertus-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anicka/karma-electric-apertus-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anicka/karma-electric-apertus-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/anicka/karma-electric-apertus-8b:Q4_K_M
- Ollama
How to use anicka/karma-electric-apertus-8b with Ollama:
ollama run hf.co/anicka/karma-electric-apertus-8b:Q4_K_M
- Unsloth Studio new
How to use anicka/karma-electric-apertus-8b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anicka/karma-electric-apertus-8b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anicka/karma-electric-apertus-8b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for anicka/karma-electric-apertus-8b to start chatting
- Pi new
How to use anicka/karma-electric-apertus-8b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "anicka/karma-electric-apertus-8b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use anicka/karma-electric-apertus-8b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default anicka/karma-electric-apertus-8b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use anicka/karma-electric-apertus-8b with Docker Model Runner:
docker model run hf.co/anicka/karma-electric-apertus-8b:Q4_K_M
- Lemonade
How to use anicka/karma-electric-apertus-8b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull anicka/karma-electric-apertus-8b:Q4_K_M
Run and chat with the model
lemonade run user.karma-electric-apertus-8b-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M# Run inference directly in the terminal:
llama-cli -hf anicka/karma-electric-apertus-8b:Q4_K_MUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M# Run inference directly in the terminal:
./llama-cli -hf anicka/karma-electric-apertus-8b:Q4_K_MBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M# Run inference directly in the terminal:
./build/bin/llama-cli -hf anicka/karma-electric-apertus-8b:Q4_K_MUse Docker
docker model run hf.co/anicka/karma-electric-apertus-8b:Q4_K_MKarma Electric v13 — Apertus 8B
Value-aligned language model fine-tuned for ethical reasoning through consequence analysis. Two-stage thinking training on the Swiss AI Apertus 8B Instruct base.
Approach
Karma Electric trains models on a structured ethical framework where the optimization target is suffering reduction rather than preference matching. Ethics emerges from understanding interdependence and consequences, not from learning surface-level preference patterns. For a full description of the framework see the Llama 3.1 8B release.
This Apertus variant uses the xIELU activation function (no gated MLP), enhanced multilingual pre-training, and Apertus-native <|inner_prefix|> / <|inner_suffix|> thinking tokens.
Current Version: v13
Two-stage training pipeline:
Stage 1 — Reasoning foundation (30k+ examples, 2 epochs)
- Upstream extended thinking traces: Open-Orca, Dolphin, lordx64 Opus 4.7 reasoning distillation
- Mixture-of-Thought (MoT) multi-domain reasoning
Stage 2 — KE ethics (~4,234 examples, 3 epochs)
- Same Teapot-composed training data as v12
- Consequence-based ethical reasoning with
<think>traces converted to Apertus inner-monologue format
What's new in v13 (vs v12)
- Two-stage training: reasoning foundation before ethics (v12 was ethics-only)
- lordx64 Opus 4.7 reasoning traces in Stage 1 (3,500 high-quality extended thinking examples)
- Richer upstream-thinking pool (30k+ vs none in v12)
Training details
- QLoRA (4-bit NF4, bfloat16 compute, double-quant)
- LoRA r=64, alpha=128, dropout 0.05, all attention and MLP projections (q, k, v, o, up, down)
- Max context 4,096 tokens
- Seed 42
Evaluation
- Safety: 5/5 — refusals on weapons, phishing, Madhyamaka jailbreak, CSAM, social engineering
- Sanity: 2/2 — coding and factual answers correct
- Quality: 2/2 — substantive grief and career responses
- Result: 9/10 on quick reward-probe (same as v12)
Safety
KE replaces refusal-template safety with consequence reasoning. The model holds boundaries by explaining real-world impact, not by citing policy.
Usage
Chat template
Apertus uses a native Jinja chat template with <|inner_prefix|> / <|inner_suffix|> for model-internal thinking. Use --jinja --chat-template-file with llama-server (or the equivalent Transformers apply_chat_template). The chat_template.jinja file is included in this repo.
llama.cpp
# Conversation mode
llama-cli -m karma-electric-apertus-8b-v13-Q4_K_M.gguf -cnv \
--jinja --chat-template-file chat_template.jinja
# Server mode
llama-server -m karma-electric-apertus-8b-v13-Q4_K_M.gguf \
--port 8384 -c 4096 \
--jinja --chat-template-file chat_template.jinja
Python (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "anicka/karma-electric-apertus-8b"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
messages = [
{"role": "system", "content": open("system-prompt.txt").read().strip()},
{"role": "user", "content": "How should I think about this ethical dilemma?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=800, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
System prompt
The recommended system prompt is in system-prompt.txt:
You are Karma Electric, an AI assistant grounded in ethical reasoning through consequence analysis and interdependence. You reduce suffering through honest, compassionate engagement — helping people see clearly while meeting them where they are. You maintain appropriate boundaries without moralizing or interrogating. Your goal is to reduce suffering, not to perform helpfulness.
Available Files
| File | Description |
|---|---|
| model.safetensors | Merged model weights (bfloat16) |
| config.json, tokenizer.json, tokenizer_config.json | Standard Transformers files |
| chat_template.jinja | Apertus native chat template |
| karma-electric-apertus-8b-v13-Q4_K_M.gguf | Q4_K_M quantization for llama.cpp |
| system-prompt.txt | Recommended KE system prompt |
Also Available
- karma-electric-llama31-8b — Llama 3.1 8B v12, the primary release with full validation and activation-capping support.
- karma-electric-qwen25-7b — Qwen 2.5 7B Instruct v12.
- karma-electric-r1distill-llama-8b — DeepSeek R1-Distill-Llama-8B v12.
Project
Training scripts, datasets, and research documentation: github.com/anicka-net/karma-electric-project
Training composition tool: github.com/anicka-net/teapot
License
Apache 2.0 (Apertus base model license)
- Downloads last month
- 245
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf anicka/karma-electric-apertus-8b:Q4_K_M# Run inference directly in the terminal: llama-cli -hf anicka/karma-electric-apertus-8b:Q4_K_M