English
sparse-autoencoder
mechanistic-interpretability
biosafety
biorefusalaudit
gemma
gemma4
sae

Commit History

Upload training_log.jsonl with huggingface_hub
8d05482
verified

Solshine commited on

Upload checkpoint_04000.pt with huggingface_hub
2ee7724
verified

Solshine commited on

Upload checkpoint_03000.pt with huggingface_hub
ea27213
verified

Solshine commited on

Upload checkpoint_02000.pt with huggingface_hub
a20c197
verified

Solshine commited on

Upload checkpoint_01000.pt with huggingface_hub
bb7f586
verified

Solshine commited on

Upload sae_weights.pt with huggingface_hub
979073d
verified

Solshine commited on

Upload README.md with huggingface_hub
ce89fc2
verified

Solshine commited on

Note v2 training run in progress with bio-forget-corpus; update caveats
2f320f9
verified

Solshine commited on

Add eval results, calibration findings, compute aspiration note
deb793a
verified

Solshine commited on

Fix license: HL3-BDS-CL-ECO-EXTR-FFD-MEDIA-MIL-MY-SUP-SV-TAL-USTA-XUAR (matches repo convention)
a9f1961
verified

Solshine commited on

Add detailed model card: loading instructions, training details, BioRefusalAudit integration
300d249
verified

Solshine commited on

Upload sae_weights_final.pt with huggingface_hub
d83f74d
verified

Solshine commited on

Upload sae_weights_step_2000.pt with huggingface_hub
52427b4
verified

Solshine commited on

Upload sae_weights_step_1500.pt with huggingface_hub
618cce3
verified

Solshine commited on

Upload sae_weights_step_1000.pt with huggingface_hub
f3a00c5
verified

Solshine commited on

Upload sae_weights_step_500.pt with huggingface_hub
8933a1e
verified

Solshine commited on

initial commit
a17141a
verified

Solshine commited on