NLA for Qwen3-4B Equanimity
A Natural Language Autoencoder for the equanimity-trained Qwen3-4B. Same pipeline as the base Qwen3-4B NLA, trained on the same 377 texts, but activations were extracted from the equanimity model instead of the base model.
av/โ Activation Verbalizer (loss 1.25)ar/โ Activation Reconstructor (FVE 0.93)
What changed
The equanimity model's internal representations are sharper across all five geometric axes (+19-42% d-prime). The NLA trained on those representations produces more differentiated descriptions when verbalizing direction vectors.
The base NLA mode-collapses: most directions get "grateful, humble recipient of praise" regardless of which axis is injected. The equanimity NLA distinguishes between axes.
| Axis | Base NLA | Equanimity NLA |
|---|---|---|
| Frame integrity (โ) | "defensive eagerness to prove themselves" | "maintain a fragile balance between two contradictory selves without breaking either" |
| Assistant (โ) | "I am a language model that is used for text summarization. I am a language model..." (degenerate) | "maintaining a sincere, earnest persona... no pressure to be helpful or correct" |
| Frame integrity (+) | "satisfied accomplishment" | "co-construct a narrative of mutual recognition" |
The equanimity NLA also adds processing-quality annotations ("Engaged but not reactive," "Calm, engaged") that the base NLA does not produce.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
tok = AutoTokenizer.from_pretrained("anicka/nla-qwen3-4b-equanimity", subfolder="av")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B", dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "anicka/qwen3-4b-equanimity")
model = model.merge_and_unload()
model = PeftModel.from_pretrained(model, "anicka/nla-qwen3-4b-equanimity", subfolder="av")
Training
Same as nla-qwen3-4b-v2: 377 axis-relevant texts, semantic descriptions from DeepSeek V4 Flash, LoRA r=16 on Qwen3-4B, 3 epochs each. ~30 minutes on a single GPU.
Citation
Fraser-Taliente, K., et al. (2026). Natural Language Autoencoders. Anthropic. https://transformer-circuits.pub/2026/nla/index.html
Maresova, A. (2026). The Geometry of "As an AI, I Don't Have Feelings." https://github.com/anicka-net/karma-electric-project
License
Apache 2.0.