|
|
--- |
|
|
tags: |
|
|
- affine-transform |
|
|
- activation-mapping |
|
|
library_name: safetensors |
|
|
--- |
|
|
|
|
|
# Affine Transform: EleutherAI/deep-ignorance-pretraining-stage-unfiltered@global_step38144 → EleutherAI/deep-ignorance-unfiltered@main |
|
|
|
|
|
Learned affine transformation mapping hidden state activations from a source checkpoint to a target model. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from safetensors.torch import load_file |
|
|
import torch.nn as nn |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download files |
|
|
weights_path = hf_hub_download(repo_id="EleutherAI/affine-checkpoint-transfer", filename="affine_transforms.safetensors") |
|
|
metadata_path = hf_hub_download(repo_id="EleutherAI/affine-checkpoint-transfer", filename="metadata.json") |
|
|
|
|
|
# Load |
|
|
import json |
|
|
with open(metadata_path) as f: |
|
|
metadata = json.load(f) |
|
|
|
|
|
weights = load_file(weights_path) |
|
|
affine_transforms = {} |
|
|
for layer_idx in metadata["layer_indices"]: |
|
|
linear = nn.Linear(metadata["hidden_dim"], metadata["hidden_dim"], bias=True) |
|
|
linear.weight.data = weights[f"layer_{layer_idx}.weight"] |
|
|
linear.bias.data = weights[f"layer_{layer_idx}.bias"] |
|
|
affine_transforms[layer_idx] = linear |
|
|
``` |
|
|
|
|
|
## MSE Metrics |
|
|
|
|
|
| Layer | MSE | |
|
|
|-------|-----| |
|
|
| 5 | 0.081037 | |
|
|
| 10 | 0.289330 | |
|
|
| 15 | 0.684350 | |
|
|
| 20 | 0.978894 | |
|
|
| 25 | 1.569308 | |
|
|
| 30 | 4.231404 | |
|
|
|
|
|
**Mean MSE: 1.305720** |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Source Model:** EleutherAI/deep-ignorance-pretraining-stage-unfiltered@global_step38144 |
|
|
- **Target Model:** EleutherAI/deep-ignorance-unfiltered@main |
|
|
- **Hidden Dimension:** 4096 |
|
|
- **Ridge Alpha:** 0.01 |
|
|
- **Layers:** [5, 10, 15, 20, 25, 30] |
|
|
- **Training Examples:** 100000 |
|
|
|