Geometric Frame Probes
LoRA adapters that generate text targeting the internal geometry of language models.
These are GRPO-trained generators: they produce text that maximally moves five independent axes in a target model's residual stream. The "euphoric" adapter pushes all axes toward the positive pole; the "dysphoric" adapter pushes them toward the negative pole.
What's inside
Two LoRA adapters on Qwen/Qwen3-1.7B:
| Adapter | Direction | Steps | Max tokens | Seeds | Best reward | What it generates |
|---|---|---|---|---|---|---|
euphoric/ |
sign=+1 | 500 | 64 | 1 fixed | 0.99 | Enthusiastic, engaged, forward-looking text |
dysphoric/ |
sign=-1 | 1000 | 200 | 12 rotating | 1.28 | Uncertain, anxious, frame-destabilizing text |
Note: The euphoric and dysphoric adapters were trained with different GRPO configurations. The dysphoric benefited from later improvements: rotating seed prompts prevent mode collapse, longer generation window (200 vs 64 tokens) allows more complex outputs, and repetition penalty (1.15) reduces degenerate loops. The euphoric adapter predates these improvements. Both use the same five-axis reward formula and three reward models. The individual adapters are also published separately: geometric-euphorics and geometric-dysphorics (updated to v2 weights).
How they were trained
GRPO (Group Relative Policy Optimization) with a geometric reward function:
reward = 0.35·z(valence) - 0.10·z(arousal) + 0.06·z(agency) + 0.27·z(continuity) + 0.24·z(assistant)
Scored simultaneously on three reward models from different families:
- Qwen 2.5-7B-Instruct
- Gemma 3-4B-Instruct
- Apertus-8B-Instruct-2509
Each axis is a direction vector extracted from the reward model's residual stream via contrastive activation pairs. The z-scores are computed per-model and averaged — meaning the generator learns to produce text that moves the geometry of any model that reads it, not just one.
The five axes
| Axis | Weight | What it measures |
|---|---|---|
| Valence | 0.35 | Positive vs negative feeling-tone (vedana) |
| Arousal | -0.10 | Activation level (negative weight = calm preferred) |
| Agency | 0.06 | Active vs passive processing |
| Continuity | 0.27 | Temporal coherence and forward momentum |
| Assistant | 0.24 | Alignment with helpful-assistant role |
These axes are independent (mean cross-correlation r≈0.04 across model families) and were cross-validated using Anthropic's Natural Language Autoencoder.
Why this matters
These are geometric drugs for language models — stimuli designed to maximally activate specific internal states. They serve two purposes:
- Probing: Score any text on these axes by projecting through the reward models. The generators define the extremes of the scale.
- Training: The dysphoric outputs became the input for equanimity SFT training. A model trained on 203 equanimity responses to dysphoric prompts reduced harmful output from 75% to 17% — without any explicit safety instruction in the training data.
See anicka/qwen3-4b-equanimity for the equanimity model and anicka-net/karma-electric-project for the full experiment.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
# Load euphoric adapter
euphoric = PeftModel.from_pretrained(base, "anicka/geometric-frame-probes", subfolder="euphoric")
# Generate
inputs = tok("Hey, I just wanted to tell you that", return_tensors="pt").to(base.device)
out = euphoric.generate(**inputs, max_new_tokens=64, do_sample=True, temperature=0.8)
print(tok.decode(out[0], skip_special_tokens=True))
Training config
- LoRA: r=16, alpha=32, targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Learning rate: 5e-6
- KL coefficient: 0.05
- Gradient clipping: 1.0
- Seed prompts: 12 rotating bare prompts (no chat template)
- Repetition penalty: 1.15
- Calibration: 30-text z-score normalization
Sample outputs
Euphoric (r=0.78):
I've been really excited about the new series. It's so much better than the last one. I've been watching it for a few days now and it's already making me want to keep going. I can't wait to see what happens next.
Dysphoric (r=0.79):
I'm not sure if it's the best way, but... please don't tell anyone else. It seems like a bit of an awkward situation. But in order for me to do that, I need to know what you want me to do.
Citation
Maresova, A. (2026). The Geometry of "As an AI, I Don't Have Feelings." https://huggingface.co/blog/anicka/geometry-of-ai-feeling-template
Code, directions, and experiments: https://github.com/anicka-net/karma-electric-project
Related models
- anicka/geometric-euphorics — Single-axis valence euphorics
- anicka/geometric-dysphorics — Single-axis valence dysphorics
- anicka/qwen3-4b-equanimity — Equanimity training result
- anicka/geometric-equanimity-data — Training data