Create README.md

e733120 verified 4 days ago

3.61 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- sparse-autoencoder
	- interpretability
	- llama-3.2
	- qwen-2.5
	- mechanism-interpretability
	pipeline_tag: feature-extraction
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-0.5B
	- meta-llama/Llama-3.2-1B
	---

	# Model Card for SSAE Checkpoints

	This is the official model repository for the paper "Step-Level Sparse Autoencoder for Reasoning Process Interpretation".

	This repository contains the trained Step-Level Sparse Autoencoder (SSAE) checkpoints.

	- Paper: [Arxiv Link Here]()
	- Code: [GitHub Link Here]()
	- Collection: [HuggingFace]()

	## Model Overview

	The checkpoints are provided as PyTorch state dictionaries (`.pt` files). Each file represents an SSAE trained on a specific Base Model using a specific Dataset.

	### Naming Convention
	The filenames follow this structure:
	`{Dataset}_{BaseModel}_{SparsityConfig}.pt`

	- Dataset: Source data used for training (e.g., `gsm8k`, `numina`, `opencodeinstruct`).
	- Base Model: The LLM whose activations were encoded (e.g., `Llama3.2-1b`, `Qwen2.5-0.5b`).
	- SparsityConfig: Target sparsity (e.g., `spar-10` indicates target sparisty (`tau_{spar}`) equals 10.)

	## Checkpoints List

	Below is the list of available checkpoints in this repository:

	\| Filename \| Base Model \| Training Dataset \| Description \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| `gsm8k-385k_Llama3.2-1b_spar-10.pt` \| Llama-3.2-1B \| GSM8K \| SSAE trained on Llama-3.2-1B using GSM8K-385K. \|
	\| `gsm8k-385k_Qwen2.5-0.5b_spar-10.pt` \| Qwen-2.5-0.5B \| GSM8K \| SSAE trained on Qwen-2.5-0.5B using GSM8K-385K. \|
	\| `numina-859k_Qwen2.5-0.5b_spar-10.pt` \| Qwen-2.5-0.5B \| Numina \| SSAE trained on Qwen-2.5-0.5B using Numina-859K. \|
	\| `opencodeinstruct-36k_Llama3.2-1b_spar-10.pt` \| Llama-3.2-1B \| OpenCodeInstruct \| SSAE trained on Llama-3.2-1B using OpenCodeInstruct-36K. \|
	\| `opencodeinstruct-36k_Qwen2.5-0.5b_spar-10.pt` \| Qwen-2.5-0.5B \| OpenCodeInstruct \| SSAE trained on Qwen-2.5-0.5B using OpenCodeInstruct-36K. \|

	## Usage

	The provided `.pt` files contain not only the model weights but also the training configuration and metadata.

	Structure of the checkpoint dictionary:
	- `model`: The model state dictionary (weights).
	- `config`: Configuration dictionary (sparsity factor, etc.).
	- `encoder_name` / `decoder_name`: Names of the base models used.
	- `global_step`: Training step count.

	### Loading Code Example

	```python
	import torch
	from huggingface_hub import hf_hub_download

	# 1. Download the checkpoint
	repo_id = "Miaow-Lab/SSAE-Models"
	filename = "gsm8k-385k_Llama3.2-1b_spar-10.pt" # Example filename

	checkpoint_path = hf_hub_download(repo_id=repo_id, filename=filename)

	# 2. Load the full checkpoint dictionary
	# Note: map_location="cpu" is recommended for initial loading
	checkpoint = torch.load(checkpoint_path, map_location="cpu")

	print(f"Loaded checkpoint (Step: {checkpoint.get('global_step', 'Unknown')})")
	print(f"Config: {checkpoint.get('config')}")

	# 3. Initialize your model
	# Use the metadata from the checkpoint to ensure correct initialization arguments
	# model = MyModel(
	# tokenizer=...,
	# sparsity_factor=checkpoint['config'].get('sparsity_factor'), # Adjust key based on your config structure
	# init_from=(checkpoint['encoder_name'], checkpoint['decoder_name'])
	# )

	# 4. Load the weights
	# CRITICAL: The weights are stored under the "model" key
	model.load_state_dict(checkpoint["model"], strict=True)

	model.to("cuda") # Move to GPU if needed
	model.eval()
	```

	## Citation
	If you use these models or the associated code in your research, please cite our paper:
	```bibtex

	```