The Confidence Manifold: Geometric Structure of Correctness Representations in Language Models Paper • 2602.08159 • Published Feb 8
Running CorrSteer: Correlation-Based Steering of Language Models via Sparse Autoencoders 🧭 Steer language model output by clicking visual layers
Sleeping Control Reinforcement Learning 🎛 Explore token-level LLM steering with feature visualizations
Sleeping Control Reinforcement Learning 🎛 Explore token-level LLM steering with feature visualizations