Credit Cyber 4K Features
Sparse Autoencoders (SAEs) and MoE Transcoders with 4096 features trained on Qwen3.5-35B-A3B-FP8 for mechanistic interpretability and circuit tracing.
Trained on cyber + financial domain activations (289K tokens).
Architecture
| Component | Details |
|---|---|
| Base model | Qwen/Qwen3.5-35B-A3B-FP8 |
| d_model | 2048 |
| Features | 4096 (2x expansion) |
| Layers | 0, 10, 30, 39 |
| SAE activation | JumpReLU |
| TC activation | ReLU encoder/decoder |
Files
SAE Checkpoints
checkpoints/sae_l0.ptβ Layer 0 Sparse Autoencoder (65 MB)checkpoints/sae_l10.ptβ Layer 10 Sparse Autoencoder (65 MB)checkpoints/sae_l30.ptβ Layer 30 Sparse Autoencoder (65 MB)checkpoints/sae_l39.ptβ Layer 39 Sparse Autoencoder (65 MB)
Transcoder Checkpoints
checkpoints/tc_l0.ptβ Layer 0 MoE Transcoder (65 MB)checkpoints/tc_l10.ptβ Layer 10 MoE Transcoder (65 MB)checkpoints/tc_l30.ptβ Layer 30 MoE Transcoder (65 MB)checkpoints/tc_l39.ptβ Layer 39 MoE Transcoder (65 MB)
Metadata
checkpoints/feature_names.jsonβ Decoded feature labels (top-3 activating tokens per feature)checkpoints/safety_threshold.jsonβ Tiered safety scoring thresholdscheckpoints/architecture_map.jsonβ Model architecture configcheckpoints/chat_context_features.jsonβ Context feature weightscheckpoints/safety_test_prompts.jsonβ Evaluation prompt set
Usage
import torch
ckpt = torch.load("checkpoints/sae_l0.pt", map_location="cpu", weights_only=False)
state_dict = ckpt.get("state_dict", ckpt)
# encoder.weight shape: [4096, 2048]
# encoder.bias shape: [4096]
# bias shape: [2048]
# jump_threshold shape: [4096]
Training
- Optimizer: Adam, lr=3e-4 (SAE), lr=1e-4 (TC)
- Schedule: Cosine annealing
- Data: Cyber + financial domain activations from Qwen3.5-35B-A3B-FP8
- Tokens: 289K
- Loss: Reconstruction + L0 sparsity (JumpReLU)
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support