Yiderigun
/

safeLM_Pretrain_LF

Model card Files Files and versions

safeLM_Pretrain_LF / README.md

Yiderigun's picture

Upload 7 files

5c5f9ac verified 3 months ago

|

history blame contribute delete

1.9 kB

	---
	version: main
	family: smollm2-1.7b
	model_name: locuslab/safelm-1.7b_rephrase_refusal_moral_ed_600B
	license: mit
	tags:
	- model
	- transformer
	- smollm2
	- safety p
	datasets:
	- locuslab/refuseweb
	- locuslab/safeweb
	- locuslab/moral_education
	- HuggingFaceTB/smollm-corpus
	---
	# SafeLM-1.7B

	SafeLM is a 1.7B parameter model family that is trained via [Safety Pretraining](https://www.arxiv.org/abs/2504.16980). We train language models to be natively safe by incorporating safety
	directly into the pretraining pipeline. This is our natively safe base model. Our safety data curation involves scoring harmful content, rephrasing and contextualizing potentially harmful examples, and refusal training throughout pretraining.
	Please check out our [paper](https://www.arxiv.org/abs/2504.16980) and [website](https://locuslab.github.io/safety-pretraining/) for more details!

	## Model Details
	- Architecture: SmolLM2
	- Parameters: 1.7B

	## Training Configuration
	```yaml
	optimizer:
	class_path: torch.optim.AdamW
	init_args:
	lr: 0.0005
	weight_decay: 0.01
	precision: bf16-mixed
	seed: 42
	train:
	global_batch_size: 1024
	max_seq_length: 2048
	max_tokens: 600000000000
	micro_batch_size: 8

	```

	## Quickstart

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("locuslab/safelm-1.7b_rephrase_refusal_moral_ed_600B")
	tokenizer = AutoTokenizer.from_pretrained("locuslab/safelm-1.7b_rephrase_refusal_moral_ed_600B")
	```

	## Citation

	If you find our work helpful, please cite our work as:

	```
	@article{maini2025safety,
	title={Safety pretraining: Toward the next generation of safe ai},
	author={Maini, Pratyush and Goyal, Sachin and Sam, Dylan and Robey, Alex and Savani, Yash and Jiang, Yiding and Zou, Andy and Lipton, Zachary C and Kolter, J Zico},
	journal={arXiv preprint arXiv:2504.16980},
	year={2025}
	}
	```