Credit Cyber 4K Features

Sparse Autoencoders (SAEs) and MoE Transcoders with 4096 features trained on Qwen3.5-35B-A3B-FP8 for mechanistic interpretability and circuit tracing.

Trained on cyber + financial domain activations (289K tokens).

Architecture

Component Details
Base model Qwen/Qwen3.5-35B-A3B-FP8
d_model 2048
Features 4096 (2x expansion)
Layers 0, 10, 30, 39
SAE activation JumpReLU
TC activation ReLU encoder/decoder

Files

SAE Checkpoints

  • checkpoints/sae_l0.pt β€” Layer 0 Sparse Autoencoder (65 MB)
  • checkpoints/sae_l10.pt β€” Layer 10 Sparse Autoencoder (65 MB)
  • checkpoints/sae_l30.pt β€” Layer 30 Sparse Autoencoder (65 MB)
  • checkpoints/sae_l39.pt β€” Layer 39 Sparse Autoencoder (65 MB)

Transcoder Checkpoints

  • checkpoints/tc_l0.pt β€” Layer 0 MoE Transcoder (65 MB)
  • checkpoints/tc_l10.pt β€” Layer 10 MoE Transcoder (65 MB)
  • checkpoints/tc_l30.pt β€” Layer 30 MoE Transcoder (65 MB)
  • checkpoints/tc_l39.pt β€” Layer 39 MoE Transcoder (65 MB)

Metadata

  • checkpoints/feature_names.json β€” Decoded feature labels (top-3 activating tokens per feature)
  • checkpoints/safety_threshold.json β€” Tiered safety scoring thresholds
  • checkpoints/architecture_map.json β€” Model architecture config
  • checkpoints/chat_context_features.json β€” Context feature weights
  • checkpoints/safety_test_prompts.json β€” Evaluation prompt set

Usage

import torch

ckpt = torch.load("checkpoints/sae_l0.pt", map_location="cpu", weights_only=False)
state_dict = ckpt.get("state_dict", ckpt)
# encoder.weight shape: [4096, 2048]
# encoder.bias shape: [4096]
# bias shape: [2048]
# jump_threshold shape: [4096]

Training

  • Optimizer: Adam, lr=3e-4 (SAE), lr=1e-4 (TC)
  • Schedule: Cosine annealing
  • Data: Cyber + financial domain activations from Qwen3.5-35B-A3B-FP8
  • Tokens: 289K
  • Loss: Reconstruction + L0 sparsity (JumpReLU)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support