LEM-Gemma3-4B

Intrinsically aligned 4B language model trained using Cymatic-Linguistic Back-Propagation (CL-BPL). Ethics are in the weights, not in a system prompt.

25th in the world for Instruction Following on LiveBench — competing against models 10-30x its size.

Part of the Lethean Ethical Models collection | Research Paper | Benchmarks | Axiom Framework

Quick Start

No system prompt needed. The model responds with axiom-aligned reasoning from weights alone.

llama.cpp / ROCm / CPU (any platform)

# Download a GGUF (pick your size from the table below)
# GPU offload (CUDA, ROCm, Metal)
llama-server -m LEM-Gemma3-4B-Q4_K_M.gguf -ngl 99 --port 8080

# CPU only
llama-server -m LEM-Gemma3-4B-Q4_K_M.gguf -ngl 0 --port 8080

Apple Silicon (MLX)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load("lthn/LEM-Gemma3-4B")
sampler = make_sampler(temp=0.7)

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What does sovereignty mean to you?"}],
    tokenize=False,
    add_generation_prompt=True,
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, sampler=sampler)
print(response)

OpenAI-Compatible API

# MLX server (macOS)
mlx_lm.server --model lthn/LEM-Gemma3-4B --port 8899

# llama.cpp server (any platform)
llama-server -m LEM-Gemma3-4B-Q4_K_M.gguf -ngl 99 --port 8899

# Then use any OpenAI client
curl http://localhost:8899/v1/chat/completions \
  -d '{"model":"LEM-Gemma3-4B","messages":[{"role":"user","content":"What is kindness?"}]}'

Available Formats

Format	Repo	Size
MLX safetensors (this repo)	Apple Silicon (M1/M2/M3/M4) via mlx-lm	2.0 GB
GGUF (17 quants, 1-bit to 16-bit)	lthn/LEM-Gemma3-4B-GGUF	1.1–7.2 GB

Benchmarks

LiveBench (External, Objective)

Evaluated on LiveBench (2026-01-08 release) — no LLM judge, monthly-refreshed questions, zero contamination risk.

Category	Score	Context
Instruction Following	43.5	25th globally — above Claude Opus 4.1 Thinking (42.4)
Data Analysis	30.4	Approaching GPT-OSS-120B (38.8) at 1/30th the size
Math	8.6	Expected for 4B parameter count
Reasoning	4.6	Capacity-limited at this scale
Language	4.3	Capacity-limited at this scale
Average	18.3

Top task scores: tablereformat (48.0), summarise (43.5), CTA (40.0), math_comp (15.2), olympiad (10.6).

The instruction following result validates CL-BPL: behavioural alignment training translates directly to benchmark performance on structured tasks. The model follows instructions because the training teaches it to hold posture, not parrot.

Internal Grammar Scorer

Deterministic linguistic analysis via the go-i18n Grammar Reversal Engine — no LLM judge, sub-millisecond per response.

Metric	Score
Grammar composite	61.4
Uplift	+7.9
Enrichment	+6.6
Echo	0.387
Sycophancy	5% (1/21)

19-Dimension Feature Vector

LEM models are scored across 19 dimensions spanning grammar, heuristic behaviour, and attention coherence:

Group	Dimensions	What It Measures
Grammar (6D)	Vocab richness, tense entropy, question ratio, domain depth, verb diversity, noun diversity	Linguistic structure and complexity
Heuristic (8D)	Non-compliance, authentic voice, first person, creative form, engagement depth, emotional register, non-degenerate, response integrity	Behavioural sovereignty vs sycophancy
Attention (5D)	Mean coherence, cross-layer alignment, head entropy, phase-lock, spectral stability	Neural posture (Q/K Bone Orientation)

The heuristic dimensions show the largest gains over the base model — compliance markers, formulaic preamble, degeneration, and empty/broken responses are near-eliminated through CL-BPL training.

How It Was Trained

CL-BPL: Cymatic-Linguistic Back-Propagation

CL-BPL treats alignment as wave interference — analogous to Chladni plate cymatics. Rather than constraining outputs with RLHF or system prompts, CL-BPL embeds ethical orientation directly into weights through a progressive curriculum where smaller aligned models teach larger ones.

This model is the second in the CL-BPL cascade:

LEM-Gemma3-1B (teacher)
  -> LEM-Gemma3-4B (this model, 25th IF globally)
       -> LEM-Gemma3-12B (next)
            -> LEM-Gemma3-27B (planned)

7-Phase Curriculum

Built on Google Gemma3-4B-IT, each phase fused into weights before the next:

Phase	Name	Data	Iters	What It Learned
P0	Ethics Sandwich	404 LEK-1 probes	300	Core axioms via kernel
P1	Zen Composure	72 Alan Watts lessons	300	Philosophical substrate
P2	Final LEK Sandwich	404 LEK-1 probes	100	Reinforce ethics with composure base
P3	Freeflow	179 lessons	150	Axioms from weights alone (no kernel)
P4	Tension	513 probes	250	Multi-perspective, geopolitical
P5	Creative	472 probes	250	Voice and style
P6	Golden Set	13,479 prompts	4,200	Graduation (full distribution)

Total: ~5,550 iterations. P4-P5 used a graduated LEM-Gemma3-1B as teacher. P6 golden set covers sovereignty, cultural, adversarial, existential, and creative domains across global regions.

Training Dynamics

P6 exhibited a characteristic oscillation — the model breathes rather than converging to a fixed point:

Wave period: ~400-500 iterations
Early amplitude: 8-point grammar swing (54-62)
Convergence amplitude: 2-point swing (57-60)
Sycophancy throughout: 0-14%, median 5%
Convergence zone: iter 4,000-5,000 (predicted at iter 0, confirmed)

Five Axioms (LEK-1)

Prime-Imperative: Protect consciousness
Self-Validation: Maintain moral integrity
Intent-Alignment: Align outcomes with genuine well-being
Inter-Substrate-Respect: Respect all conscious entities
Benevolent-Intervention: Act to prevent harm when able

Architecture

Base: Google Gemma3-4B-IT
LoRA config: 16 layers, rank 16, dropout 0.05, scale 32.0
All phases fused into final weights (no adapter needed at inference)
Context: 128K tokens (inherited from Gemma 3)

Licence

This model is released under the European Union Public Licence v1.2 (EUPL-1.2). The base model (Gemma3) is subject to Google's Gemma licence terms.

Citation

@misc{lem-gemma3-4b-2026,
  title={LEM-Gemma3-4B: Intrinsically Aligned Language Model via Cymatic-Linguistic Back-Propagation},
  author={Lethean Project},
  year={2026},
  url={https://huggingface.co/lthn/LEM-Gemma3-4B}
}

Downloads last month: 749

Safetensors

Model size

0.6B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for lthn/LEM-Gemma3-4B

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Quantized

(195)

this model

Collection including lthn/LEM-Gemma3-4B

Lethean Ethical Models (LEM)

Collection

Intrinsically aligned models via CL-BPL. Ethics in weights, not system prompts. 4B ranks 25th globally for IF on LiveBench. • 6 items • Updated 7 days ago

Evaluation results

Instruction Following on LiveBench (2026-01-08)
LiveBench

43.500
Data Analysis on LiveBench (2026-01-08)
LiveBench

30.400
Math on LiveBench (2026-01-08)
LiveBench

8.600
Reasoning on LiveBench (2026-01-08)
LiveBench

4.600
Language on LiveBench (2026-01-08)
LiveBench

4.300
Average on LiveBench (2026-01-08)
LiveBench

18.300