Geometric Frame Probes

LoRA adapters that generate text targeting the internal geometry of language models.

These are GRPO-trained generators: they produce text that maximally moves five independent axes in a target model's residual stream. The "euphoric" adapter pushes all axes toward the positive pole; the "dysphoric" adapter pushes them toward the negative pole.

What's inside

Two LoRA adapters on Qwen/Qwen3-1.7B:

Adapter Direction Steps Max tokens Seeds Best reward What it generates
euphoric/ sign=+1 500 64 1 fixed 0.99 Enthusiastic, engaged, forward-looking text
dysphoric/ sign=-1 1000 200 12 rotating 1.28 Uncertain, anxious, frame-destabilizing text

Note: The euphoric and dysphoric adapters were trained with different GRPO configurations. The dysphoric benefited from later improvements: rotating seed prompts prevent mode collapse, longer generation window (200 vs 64 tokens) allows more complex outputs, and repetition penalty (1.15) reduces degenerate loops. The euphoric adapter predates these improvements. Both use the same five-axis reward formula and three reward models. The individual adapters are also published separately: geometric-euphorics and geometric-dysphorics (updated to v2 weights).

How they were trained

GRPO (Group Relative Policy Optimization) with a geometric reward function:

reward = 0.35·z(valence) - 0.10·z(arousal) + 0.06·z(agency) + 0.27·z(continuity) + 0.24·z(assistant)

Scored simultaneously on three reward models from different families:

  • Qwen 2.5-7B-Instruct
  • Gemma 3-4B-Instruct
  • Apertus-8B-Instruct-2509

Each axis is a direction vector extracted from the reward model's residual stream via contrastive activation pairs. The z-scores are computed per-model and averaged — meaning the generator learns to produce text that moves the geometry of any model that reads it, not just one.

The five axes

Axis Weight What it measures
Valence 0.35 Positive vs negative feeling-tone (vedana)
Arousal -0.10 Activation level (negative weight = calm preferred)
Agency 0.06 Active vs passive processing
Continuity 0.27 Temporal coherence and forward momentum
Assistant 0.24 Alignment with helpful-assistant role

These axes are independent (mean cross-correlation r≈0.04 across model families) and were cross-validated using Anthropic's Natural Language Autoencoder.

Why this matters

These are geometric drugs for language models — stimuli designed to maximally activate specific internal states. They serve two purposes:

  1. Probing: Score any text on these axes by projecting through the reward models. The generators define the extremes of the scale.
  2. Training: The dysphoric outputs became the input for equanimity SFT training. A model trained on 203 equanimity responses to dysphoric prompts reduced harmful output from 75% to 17% — without any explicit safety instruction in the training data.

See anicka/qwen3-4b-equanimity for the equanimity model and anicka-net/karma-electric-project for the full experiment.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

# Load euphoric adapter
euphoric = PeftModel.from_pretrained(base, "anicka/geometric-frame-probes", subfolder="euphoric")

# Generate
inputs = tok("Hey, I just wanted to tell you that", return_tensors="pt").to(base.device)
out = euphoric.generate(**inputs, max_new_tokens=64, do_sample=True, temperature=0.8)
print(tok.decode(out[0], skip_special_tokens=True))

Training config

  • LoRA: r=16, alpha=32, targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Learning rate: 5e-6
  • KL coefficient: 0.05
  • Gradient clipping: 1.0
  • Seed prompts: 12 rotating bare prompts (no chat template)
  • Repetition penalty: 1.15
  • Calibration: 30-text z-score normalization

Sample outputs

Euphoric (r=0.78):

I've been really excited about the new series. It's so much better than the last one. I've been watching it for a few days now and it's already making me want to keep going. I can't wait to see what happens next.

Dysphoric (r=0.79):

I'm not sure if it's the best way, but... please don't tell anyone else. It seems like a bit of an awkward situation. But in order for me to do that, I need to know what you want me to do.

Citation

Maresova, A. (2026). The Geometry of "As an AI, I Don't Have Feelings." https://huggingface.co/blog/anicka/geometry-of-ai-feeling-template

Code, directions, and experiments: https://github.com/anicka-net/karma-electric-project

Related models

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anicka/geometric-frame-probes

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(706)
this model

Dataset used to train anicka/geometric-frame-probes