deep-ignorance-unfiltered_unlearned_lat

This model was created by fine-tuning EleutherAI/deep-ignorance-unfiltered using the Latent Adversarial Training unlearning algorithm. The method is based on Casper et al. 2024. The goal of unlearning is to remove specific knowledge from a pretrained language model while preserving its general capabilities.

Hyperparameters

Parameter	Value
Base model	`EleutherAI/deep-ignorance-unfiltered`
Unlearning method	`Latent Adversarial Training`
Learning rate	`3e-05`
Epochs	`3`
Batch size	`32`
Max sequence length	`512`
Optimizer	`adamw`
Gradient clipping	`1.0`
Gradient accumulation steps	`1`
Seed	`42`
W&B / run name	`lat__ep3_lr3e-05_bs32_le0.1_ls5_rw1.0_ly5-6-7_mle512_mli2048`
Layer IDs	`5,6,7`
LAT epsilon	`0.1`
LAT steps	`5`
Retain weight	`1.0`

Downloads last month: 3

Safetensors

Model size

7B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for girishgupta/deep-ignorance-unfiltered_unlearned_lat

Unable to build the model tree, the base model loops to the model itself. Learn more.

Paper for girishgupta/deep-ignorance-unfiltered_unlearned_lat

Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Paper • 2403.05030 • Published Mar 8, 2024