dmis-lab
/

OSP-1.4B-1T-Adam

Add pipeline_tag, library_name, and license

by nielsr HF Staff - opened Jun 29, 2025

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -3,7 +3,11 @@ datasets:
 - HuggingFaceTB/smollm-corpus
 language:
 - en
 ---
 # Outlier-Safe Pre-Training
 [![arXiv](https://img.shields.io/badge/arXiv-2506.19697-b31b1b?style=flat-square)](https://arxiv.org/abs/2506.19697)
@@ -25,7 +29,7 @@ A method that prevents outliers but significantly reduces efficiency is unlikely
 3. 🧩**Ensuring full compatibility with existing inference pipelines**<br/>
 We prioritize compatibility with widely adopted inference frameworks such as vLLM and SGLang. Rather than introducing architectural changes that break compatibility, OSP preserves computational invariance, allowing models to be directly integrated into existing pipelines without additional effort.
 ## Model Checkpoints
@@ -36,7 +40,6 @@ The models were trained on 1 trillion tokens, following the pre-training recipe
 - [🤗 OSP-1.4B-1T-Adam](https://huggingface.co/dmis-lab/OSP-1.4B-1T-Adam): Trained on the standard Adam optimizer, without any modifications.
 - [🤗 OSP-1.4B-1T-Muon-SSNorm-EmbProj](https://huggingface.co/dmis-lab/OSP-1.4B-1T-Muon-SSNorm-EmbProj): Trained on the OSP framework. This is our final model.
 ### Ablation Models
 <table>
@@ -177,7 +180,6 @@ The models were trained on 1 trillion tokens, following the pre-training recipe
 </table>
 &dagger;Model configuration that disables decoupled embedding optimization by training with Muon optimizer without Adam optimization on embedding layers
 ## Training
 ### Model

 - HuggingFaceTB/smollm-corpus
 language:
 - en
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
 ---
 # Outlier-Safe Pre-Training
 [![arXiv](https://img.shields.io/badge/arXiv-2506.19697-b31b1b?style=flat-square)](https://arxiv.org/abs/2506.19697)
 3. 🧩**Ensuring full compatibility with existing inference pipelines**<br/>
 We prioritize compatibility with widely adopted inference frameworks such as vLLM and SGLang. Rather than introducing architectural changes that break compatibility, OSP preserves computational invariance, allowing models to be directly integrated into existing pipelines without additional effort.
+This repository contains the model of the paper [Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models](https://huggingface.co/papers/2506.19697).
 ## Model Checkpoints
 - [🤗 OSP-1.4B-1T-Adam](https://huggingface.co/dmis-lab/OSP-1.4B-1T-Adam): Trained on the standard Adam optimizer, without any modifications.
 - [🤗 OSP-1.4B-1T-Muon-SSNorm-EmbProj](https://huggingface.co/dmis-lab/OSP-1.4B-1T-Muon-SSNorm-EmbProj): Trained on the OSP framework. This is our final model.
 ### Ablation Models
 <table>
 </table>
 &dagger;Model configuration that disables decoupled embedding optimization by training with Muon optimizer without Adam optimization on embedding layers
 ## Training
 ### Model