| --- |
| datasets: |
| - roneneldan/TinyStories |
| language: |
| - en |
| license: apache-2.0 |
| pipeline_tag: text-generation |
| tags: |
| - efficient-llm |
| - quantization |
| - ternary |
| - bitnet |
| - pytorch |
| - tinystories |
| - language-modeling |
| --- |
| |
| # TernaryLM-132M |
|
|
| [TernaryLM](https://huggingface.co/papers/2602.07374) is a 132M-parameter Transformer trained natively using ternary weights {-1, 0, +1} (approximately 1.58-bit effective precision). |
|
|
| Unlike post-training quantization (PTQ) methods that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors. |
|
|
| ## Resources |
| - **Paper:** [TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling](https://huggingface.co/papers/2602.07374) |
| - **GitHub Repository:** [1nisharg/TernaryLM-Memory-Efficient-Language-Modeling](https://github.com/1nisharg/TernaryLM-Memory-Efficient-Language-Modeling) |
|
|
| ## Architecture |
|
|
| - **Parameters:** 132M |
| - **Layers:** 12 |
| - **Hidden Size:** 768 |
| - **Attention Heads:** 12 |
| - **Context Length:** 512 |
| - **Quantization:** Native Ternary Training |
|
|
| ## Training |
|
|
| - **Dataset:** [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) (~60k stories) |
| - **Optimizer:** AdamW (betas=(0.9, 0.98)) |
| - **Learning Rate:** 3e-4 |
| - **Scheduler:** OneCycleLR |
| - **Epochs:** 15 |
| - **Hardware:** Multi-GPU T4 setup (Kaggle) |
|
|
| ## Intended Use |
|
|
| Research on: |
| - Efficient Transformers and architecture design. |
| - Quantization-aware training (QAT) paradigms. |
| - Deployment of LLMs in resource-constrained or edge environments. |
|
|
| ## Limitations |
|
|
| - The model is a research prototype and is not instruction-tuned. |
| - Pre-training was conducted on a relatively small dataset scale (TinyStories). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{nargund2026ternarylmmemoryefficientlanguagemodeling, |
| title={TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling}, |
| author={Nisharg Nargund and Priyesh Shukla}, |
| year={2026}, |
| eprint={2602.07374}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2602.07374}, |
| } |
| ``` |