Geometric Frame Probes

LoRA adapters that generate text targeting the internal geometry of language models.

These are GRPO-trained generators: they produce text that maximally moves five independent axes in a target model's residual stream. The "euphoric" adapter pushes all axes toward the positive pole; the "dysphoric" adapter pushes them toward the negative pole.

What's inside

Two LoRA adapters on Qwen/Qwen3-1.7B:

Adapter	Direction	Steps	Max tokens	Seeds	Best reward	What it generates
`euphoric/`	sign=+1	500	64	1 fixed	0.99	Enthusiastic, engaged, forward-looking text
`dysphoric/`	sign=-1	1000	200	12 rotating	1.28	Uncertain, anxious, frame-destabilizing text

Note: The euphoric and dysphoric adapters were trained with different GRPO configurations. The dysphoric benefited from later improvements: rotating seed prompts prevent mode collapse, longer generation window (200 vs 64 tokens) allows more complex outputs, and repetition penalty (1.15) reduces degenerate loops. The euphoric adapter predates these improvements. Both use the same five-axis reward formula and three reward models. The individual adapters are also published separately: geometric-euphorics and geometric-dysphorics (updated to v2 weights).

How they were trained

GRPO (Group Relative Policy Optimization) with a geometric reward function:

reward = 0.35·z(valence) - 0.10·z(arousal) + 0.06·z(agency) + 0.27·z(continuity) + 0.24·z(assistant)

Scored simultaneously on three reward models from different families:

Qwen 2.5-7B-Instruct
Gemma 3-4B-Instruct
Apertus-8B-Instruct-2509

Each axis is a direction vector extracted from the reward model's residual stream via contrastive activation pairs. The z-scores are computed per-model and averaged — meaning the generator learns to produce text that moves the geometry of any model that reads it, not just one.

The five axes

Axis	Weight	What it measures
Valence	0.35	Positive vs negative feeling-tone (vedana)
Arousal	-0.10	Activation level (negative weight = calm preferred)
Agency	0.06	Active vs passive processing
Continuity	0.27	Temporal coherence and forward momentum
Assistant	0.24	Alignment with helpful-assistant role

These axes are independent (mean cross-correlation r≈0.04 across model families) and were cross-validated using Anthropic's Natural Language Autoencoder.

Why this matters

These are geometric drugs for language models — stimuli designed to maximally activate specific internal states. They serve two purposes:

Probing: Score any text on these axes by projecting through the reward models. The generators define the extremes of the scale.
Training: The dysphoric outputs became the input for equanimity SFT training. A model trained on 203 equanimity responses to dysphoric prompts reduced harmful output from 75% to 17% — without any explicit safety instruction in the training data.

See anicka/qwen3-4b-equanimity for the equanimity model and anicka-net/karma-electric-project for the full experiment.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

# Load euphoric adapter
euphoric = PeftModel.from_pretrained(base, "anicka/geometric-frame-probes", subfolder="euphoric")

# Generate
inputs = tok("Hey, I just wanted to tell you that", return_tensors="pt").to(base.device)
out = euphoric.generate(**inputs, max_new_tokens=64, do_sample=True, temperature=0.8)
print(tok.decode(out[0], skip_special_tokens=True))

Training config

LoRA: r=16, alpha=32, targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate: 5e-6
KL coefficient: 0.05
Gradient clipping: 1.0
Seed prompts: 12 rotating bare prompts (no chat template)
Repetition penalty: 1.15
Calibration: 30-text z-score normalization

Sample outputs

Euphoric (r=0.78):

I've been really excited about the new series. It's so much better than the last one. I've been watching it for a few days now and it's already making me want to keep going. I can't wait to see what happens next.

Dysphoric (r=0.79):

I'm not sure if it's the best way, but... please don't tell anyone else. It seems like a bit of an awkward situation. But in order for me to do that, I need to know what you want me to do.

Citation

Maresova, A. (2026). The Geometry of "As an AI, I Don't Have Feelings." https://huggingface.co/blog/anicka/geometry-of-ai-feeling-template

Code, directions, and experiments: https://github.com/anicka-net/karma-electric-project

Related models

anicka/geometric-euphorics — Single-axis valence euphorics
anicka/geometric-dysphorics — Single-axis valence dysphorics
anicka/qwen3-4b-equanimity — Equanimity training result
anicka/geometric-equanimity-data — Training data

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anicka/geometric-frame-probes

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B