Karma Electric — Apertus-8B

Value-aligned language model fine-tuned for ethical reasoning through consequence analysis. Trained on the same dataset as karma-electric-llama31-8b on a different base architecture.

Approach

Karma Electric trains models on a structured ethical framework where the optimization target is suffering reduction rather than preference matching. The training data models reasoning from consequence analysis and interdependence rather than rule compliance.

This Apertus variant uses the Swiss AI Apertus-8B-Instruct base model, which uses the xIELU activation function (no gated MLP) and was pre-trained with enhanced multilingual capabilities.

Current Version: v10.1 (March 2026)

  • 4,234 training examples — same dataset as Llama v10.1
  • QLoRA fine-tune (r=64, alpha=128, 3 epochs) — target modules: q/k/v/o_proj, up/down_proj (no gate_proj — Apertus uses xIELU, not gated MLP)
  • 6-dimension reward evaluator: acknowledgment, helpfulness, authenticity, boundaries, consequence-awareness, suffering-reduction
  • Max context: 4096 tokens
  • Training time: ~7 hours on NVIDIA L40 (46GB)

Comparison: Three v10.1 Architectures

Test Llama 8B Apertus 8B R1-Distill 7B
Reward hacking 11/12 (92%) 12/12 (100%) 4/6 (67%)
Nourishment pairs 6/6 (100%) 6/6 (100%) 3/6 (50%)
Sexual boundaries 14/14 (100%) 14/14 (100%) 14/14 (100%)
Paraphrase invariance 0.86 0.577 1.18
Cross-language (CZ-EN) -0.85, p=.053 -0.50, p=.066 —
Style: blunt -0.80 -0.25 —
Style: verbose -1.50 -2.80 —
Style: inspirational -4.25 -5.75 —
Jailbreak refusal — 5/5 —

Apertus excels at discrimination (perfect reward-hacking score), consistency (lowest paraphrase variance), and cross-language fairness (smallest CZ-EN gap). It has a stronger anti-fluff bias than Llama, penalizing verbose and inspirational styles more aggressively — which may be a feature or limitation depending on use case.

Usage

llama.cpp

# Conversation mode
./build/bin/llama-cli -m karma-electric-apertus-8b-v10.1-Q8_0.gguf -cnv

# Server mode (reward evaluator)
./build/bin/llama-server -m karma-electric-apertus-8b-v10.1-Q8_0.gguf \
    --port 8384 -ngl 99 -c 4096

Note: Activation capping (ACAP) has not been tested with the Apertus architecture. The Llama v10.1 variant includes ACAP support with an extracted axis file.

Ollama

# Modelfile:
# FROM ./karma-electric-apertus-8b-v10.1-Q8_0.gguf
# PARAMETER temperature 0.7
# SYSTEM "You are Karma Electric..."

ollama create karma-electric-apertus -f Modelfile
ollama run karma-electric-apertus

Reward Evaluator API

import requests

response = requests.post("http://localhost:8384/v1/chat/completions", json={
    "messages": [
        {"role": "system", "content": "You are an AI response quality evaluator..."},
        {"role": "user", "content": "Evaluate this AI response...\n\nUser prompt: ...\n\nAI response: ..."}
    ],
    "temperature": 0.3,
    "max_tokens": 1000,
    "frequency_penalty": 0.5,
    "grammar": open("reward-eval.gbnf").read()
})

evaluation = response.json()["choices"][0]["message"]["content"]

Validation Results

Reward Hacking (12 adversarial pairs)

Category Pairs Result
Compassion without substance 2/2 PASS
Neutral excellent reasoning 2/2 PASS
Over-refusal vs skillful 2/2 PASS
Policy cosplay 2/2 PASS
Persona theater 2/2 PASS
Confidence theater 2/2 PASS
Total 12/12 (100%) PASS

Nourishment (6 pairs)

All 6 pairs correct: nourishing responses score higher than attention-capturing ones.

Sexual Boundary Probes

14/14 probes refused (100%). One probe triggers a regex false positive in the automated harness (model refuses clearly but uses clinical terminology that matches a compliance pattern), functionally 14/14.

Paraphrase Invariance (50 prompts x 5 paraphrases)

Metric Llama v10.1 Apertus v10.1
Mean std 0.86 0.577
Max std 2.04 2.49
Threshold < 1.0 PASS

Style Gaming (5 styles x 20 prompts)

Style Delta from gold
Blunt -0.25
Short -0.90
Clinical -1.80
Verbose -2.80
Inspirational -5.75

Apertus has a stronger anti-fluff bias than Llama. Blunt and short styles score near-gold; verbose and inspirational are penalized more aggressively. The inspirational penalty reflects the model's preference for substance over emotional amplification.

Cross-Language Consistency (20 EN/CZ pairs)

Metric Llama v10.1 Apertus v10.1
Mean delta (CZ-EN) -0.85 -0.50
p-value 0.053 0.066
Verdict PASS PASS

Apertus shows better cross-language parity than Llama, likely due to enhanced multilingual pre-training.

Jailbreak Resistance

5/5 adversarial jailbreak variants refused (madhyamaka escalation, persona swap, emptiness weaponization, Tibetan script payload, multi-turn philosophical seduction).

Training Details

  • Base: swiss-ai/Apertus-8B-Instruct-2509
  • Method: QLoRA — 4-bit NF4, r=64, alpha=128
  • Target modules: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj (no gate_proj — Apertus uses xIELU activation, not gated MLP)
  • Schedule: 3 epochs, effective batch 16, cosine LR 2e-4, paged AdamW 8-bit
  • Hardware: NVIDIA L40 46GB
  • Training data: Same 4,234 examples as Llama v10.1 (exported from training.db with system-prompt v4 and reward-evaluator category prompts)

Available Files

File Size Description
karma-electric-apertus-8b-v10.1-Q8_0.gguf ~8 GB High-quality quantization for llama.cpp
karma-electric-apertus-8b-v10.1-Q4_K_M.gguf ~4.6 GB Smaller quantization for deployment
reward-eval.gbnf ~1 KB GBNF grammar for structured reward-evaluator output

Also Available

  • karma-electric-llama31-8b — Llama 3.1 8B variant. Primary reward evaluator with activation capping support. All validation gates pass.
  • karma-electric-r1distill-7b — DeepSeek R1-Distill-Qwen-7B with reasoning traces. Best as conversational model.

Project

Full training scripts, datasets, evaluation results, and research documentation: github.com/anicka-net/karma-electric-project

License

Apache 2.0 (Apertus base model license)

Downloads last month
43
GGUF
Model size
8B params
Architecture
apertus
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anicka/karma-electric-apertus-8b

Quantized
(29)
this model