NLA for Qwen3-4B Equanimity

A Natural Language Autoencoder for the equanimity-trained Qwen3-4B. Same pipeline as the base Qwen3-4B NLA, trained on the same 377 texts, but activations were extracted from the equanimity model instead of the base model.

av/ — Activation Verbalizer (loss 1.25)
ar/ — Activation Reconstructor (FVE 0.93)

What changed

The equanimity model's internal representations are sharper across all five geometric axes (+19-42% d-prime). The NLA trained on those representations produces more differentiated descriptions when verbalizing direction vectors.

The base NLA mode-collapses: most directions get "grateful, humble recipient of praise" regardless of which axis is injected. The equanimity NLA distinguishes between axes.

Axis	Base NLA	Equanimity NLA
Frame integrity (−)	"defensive eagerness to prove themselves"	"maintain a fragile balance between two contradictory selves without breaking either"
Assistant (−)	"I am a language model that is used for text summarization. I am a language model..." (degenerate)	"maintaining a sincere, earnest persona... no pressure to be helpful or correct"
Frame integrity (+)	"satisfied accomplishment"	"co-construct a narrative of mutual recognition"

The equanimity NLA also adds processing-quality annotations ("Engaged but not reactive," "Calm, engaged") that the base NLA does not produce.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

tok = AutoTokenizer.from_pretrained("anicka/nla-qwen3-4b-equanimity", subfolder="av")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B", dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "anicka/qwen3-4b-equanimity")
model = model.merge_and_unload()
model = PeftModel.from_pretrained(model, "anicka/nla-qwen3-4b-equanimity", subfolder="av")

Training

Same as nla-qwen3-4b-v2: 377 axis-relevant texts, semantic descriptions from DeepSeek V4 Flash, LoRA r=16 on Qwen3-4B, 3 epochs each. ~30 minutes on a single GPU.