Credit Cyber 4K Features

Sparse Autoencoders (SAEs) and MoE Transcoders with 4096 features trained on Qwen3.5-35B-A3B-FP8 for mechanistic interpretability and circuit tracing.

Trained on cyber + financial domain activations (289K tokens).

Architecture

Component	Details
Base model	Qwen/Qwen3.5-35B-A3B-FP8
d_model	2048
Features	4096 (2x expansion)
Layers	0, 10, 30, 39
SAE activation	JumpReLU
TC activation	ReLU encoder/decoder

Files

SAE Checkpoints

checkpoints/sae_l0.pt — Layer 0 Sparse Autoencoder (65 MB)
checkpoints/sae_l10.pt — Layer 10 Sparse Autoencoder (65 MB)
checkpoints/sae_l30.pt — Layer 30 Sparse Autoencoder (65 MB)
checkpoints/sae_l39.pt — Layer 39 Sparse Autoencoder (65 MB)

Transcoder Checkpoints

checkpoints/tc_l0.pt — Layer 0 MoE Transcoder (65 MB)
checkpoints/tc_l10.pt — Layer 10 MoE Transcoder (65 MB)
checkpoints/tc_l30.pt — Layer 30 MoE Transcoder (65 MB)
checkpoints/tc_l39.pt — Layer 39 MoE Transcoder (65 MB)

Metadata

checkpoints/feature_names.json — Decoded feature labels (top-3 activating tokens per feature)
checkpoints/safety_threshold.json — Tiered safety scoring thresholds
checkpoints/architecture_map.json — Model architecture config
checkpoints/chat_context_features.json — Context feature weights
checkpoints/safety_test_prompts.json — Evaluation prompt set

Usage

import torch

ckpt = torch.load("checkpoints/sae_l0.pt", map_location="cpu", weights_only=False)
state_dict = ckpt.get("state_dict", ckpt)
# encoder.weight shape: [4096, 2048]
# encoder.bias shape: [4096]
# bias shape: [2048]
# jump_threshold shape: [4096]

Training

Optimizer: Adam, lr=3e-4 (SAE), lr=1e-4 (TC)
Schedule: Cosine annealing
Data: Cyber + financial domain activations from Qwen3.5-35B-A3B-FP8
Tokens: 289K
Loss: Reconstruction + L0 sparsity (JumpReLU)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support