Running 3 CorrSteer: Correlation-Based Steering of Language Models via Sparse Autoencoders 🧭 3 Steer language model output with interactive layer clicks
CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection Paper • 2508.12535 • Published Aug 18, 2025 • 2
Running 67 Bringing paper to life: A modern template for scientific writing 📝 67 Explore a scientific article with interactive visualizations
FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies Paper • 2506.17673 • Published Jun 21, 2025 • 7