caiovicentino1 's Collections

OpenInterpretability

Agent-safety interpretability arc, first public Qwen3.6 SAEs, the probe/guard suite, and the agent-trajectory benchmarks behind them. openinterp.org